]> Gentwo Git Trees - linux/.git/log
linux/.git
2 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Paolo Abeni [Wed, 1 Oct 2025 08:10:50 +0000 (10:10 +0200)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR (net-6.17-rc8).

Conflicts:

tools/testing/selftests/drivers/net/bonding/Makefile
  87951b566446 selftests: bonding: add test for passive LACP mode
  c2377f1763e9 selftests: bonding: add test for LACP actor port priority

Adjacent changes:

drivers/net/ethernet/cadence/macb.h
  fca3dc859b20 net: macb: remove illusion about TBQPH/RBQPH being per-queue
  89934dbf169e net: macb: Add TAPRIO traffic scheduling support

drivers/net/ethernet/cadence/macb_main.c
  fca3dc859b20 net: macb: remove illusion about TBQPH/RBQPH being per-queue
  89934dbf169e net: macb: Add TAPRIO traffic scheduling support

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoMerge branch 'net-stmmac-add-support-for-allwinner-a523-gmac200'
Paolo Abeni [Wed, 1 Oct 2025 08:01:37 +0000 (10:01 +0200)]
Merge branch 'net-stmmac-add-support-for-allwinner-a523-gmac200'

Chen-Yu Tsai says:

====================
net: stmmac: Add support for Allwinner A523 GMAC200

This is v8 of my Allwinner A523 GMAC200 support series. This is based on
next-20250925.

This version only contains the DT binding and driver patches. The device
tree patches are basically the same as the previous version.

This series adds support for the second Ethernet controller found on the
Allwinner A523 SoC family. This controller, dubbed GMAC200, is a DWMAC4
core with an integration layer around it. The integration layer is
similar to older Allwinner generations, but with an extra memory bus
gate and separate power domain.

Patch 1 adds a new compatible string combo to the existing Allwinner
EMAC binding.

Patch 2 adds a new driver for this core and integration combo.
====================

Link: https://patch.msgid.link/20250925191600.3306595-1-wens@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: stmmac: Add support for Allwinner A523 GMAC200
Chen-Yu Tsai [Thu, 25 Sep 2025 19:15:59 +0000 (03:15 +0800)]
net: stmmac: Add support for Allwinner A523 GMAC200

The Allwinner A523 SoC family has a second Ethernet controller, called
the GMAC200 in the BSP and T527 datasheet, and referred to as GMAC1 for
numbering. This controller, according to BSP sources, is fully
compatible with a slightly newer version of the Synopsys DWMAC core.
The glue layer around the controller is the same as found around older
DWMAC cores on Allwinner SoCs. The only slight difference is that since
this is the second controller on the SoC, the register for the clock
delay controls is at a different offset. Last, the integration includes
a dedicated clock gate for the memory bus and the whole thing is put in
a separately controllable power domain.

Add a new driver for this hardware supporting the integration layer.

Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20250925191600.3306595-3-wens@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agodt-bindings: net: sun8i-emac: Add A523 GMAC200 compatible
Chen-Yu Tsai [Thu, 25 Sep 2025 19:15:58 +0000 (03:15 +0800)]
dt-bindings: net: sun8i-emac: Add A523 GMAC200 compatible

The Allwinner A523 SoC family has a second Ethernet controller, called
the GMAC200 in the BSP and T527 datasheet, and referred to as GMAC1 for
numbering. This controller, according to BSP sources, is fully
compatible with a slightly newer version of the Synopsys DWMAC core.
The glue layer around the controller is the same as found around older
DWMAC cores on Allwinner SoCs. The only slight difference is that since
this is the second controller on the SoC, the register for the clock
delay controls is at a different offset. Last, the integration includes
a dedicated clock gate for the memory bus and the whole thing is put in
a separately controllable power domain.

Add a compatible string entry for it, and work in the requirements for
a second clock and a power domain.

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Link: https://patch.msgid.link/20250925191600.3306595-2-wens@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoRevert "Documentation: net: add flow control guide and document ethtool API"
Paolo Abeni [Tue, 30 Sep 2025 13:45:06 +0000 (15:45 +0200)]
Revert "Documentation: net: add flow control guide and document ethtool API"

This reverts commit 7bd80ed89d72285515db673803b021469ba71ee8.

I should not have merged it to begin with due to pending review and
changes to be addressed.

Link: https://patch.msgid.link/c6f3af12df9b7998920a02027fc8893ce82afc4c.1759239721.git.pabeni@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoMerge branch 'octeontx2-fix-bitmap-leaks-in-pf-and-vf'
Jakub Kicinski [Wed, 1 Oct 2025 00:27:01 +0000 (17:27 -0700)]
Merge branch 'octeontx2-fix-bitmap-leaks-in-pf-and-vf'

Bo Sun says:

====================
octeontx2: fix bitmap leaks in PF and VF

Two small patches that free the AF_XDP bitmap in the PF and VF
remove paths.  Both carry the same Fixes tag and should go to
stable.
====================

Link: https://patch.msgid.link/20250930061236.31359-1-bo@mboxify.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoocteontx2-pf: fix bitmap leak
Bo Sun [Tue, 30 Sep 2025 06:12:36 +0000 (14:12 +0800)]
octeontx2-pf: fix bitmap leak

The bitmap allocated with bitmap_zalloc() in otx2_probe() was not
released in otx2_remove(). Unbinding and rebinding the driver therefore
triggers a kmemleak warning:

    unreferenced object (size 8):
      backtrace:
        bitmap_zalloc
        otx2_probe

Call bitmap_free() in the remove path to fix the leak.

Fixes: efabce290151 ("octeontx2-pf: AF_XDP zero copy receive support")
Signed-off-by: Bo Sun <bo@mboxify.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoocteontx2-vf: fix bitmap leak
Bo Sun [Tue, 30 Sep 2025 06:12:35 +0000 (14:12 +0800)]
octeontx2-vf: fix bitmap leak

The bitmap allocated with bitmap_zalloc() in otx2vf_probe() was not
released in otx2vf_remove(). Unbinding and rebinding the driver therefore
triggers a kmemleak warning:

    unreferenced object (size 8):
      backtrace:
        bitmap_zalloc
        otx2vf_probe

Call bitmap_free() in the remove path to fix the leak.

Fixes: efabce290151 ("octeontx2-pf: AF_XDP zero copy receive support")
Signed-off-by: Bo Sun <bo@mboxify.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'net-mlx5-misc-changes-2025-09-28'
Jakub Kicinski [Wed, 1 Oct 2025 00:21:17 +0000 (17:21 -0700)]
Merge branch 'net-mlx5-misc-changes-2025-09-28'

Tariq Toukan says:

====================
net/mlx5: misc changes 2025-09-28

This series contains misc enhancements to the mlx5 driver.

v1: https://lore.kernel.org/1758531671-819655-1-git-send-email-tariqt@nvidia.com
====================

Link: https://patch.msgid.link/1759094723-843774-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/mlx5e: Use extack in set rxfh callback
Gal Pressman [Sun, 28 Sep 2025 21:25:23 +0000 (00:25 +0300)]
net/mlx5e: Use extack in set rxfh callback

The ->set/create/modify_rxfh() callbacks now pass a valid extack instead
of NULL through netlink [1]. In case of an error, reflect it through
extack instead of a dmesg print.

[1]
commit c0ae03588bbb ("ethtool: rss: initial RSS_SET (indirection table handling)")

Signed-off-by: Gal Pressman <gal@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759094723-843774-8-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/mlx5e: Introduce mlx5e_rss_params for RSS configuration
Carolina Jubran [Sun, 28 Sep 2025 21:25:22 +0000 (00:25 +0300)]
net/mlx5e: Introduce mlx5e_rss_params for RSS configuration

Group RSS-related parameters into a dedicated mlx5e_rss_params
struct. Pass this struct instead of individual arguments when
initializing RSS.

No functional changes.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759094723-843774-7-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/mlx5e: Introduce mlx5e_rss_init_params
Carolina Jubran [Sun, 28 Sep 2025 21:25:21 +0000 (00:25 +0300)]
net/mlx5e: Introduce mlx5e_rss_init_params

Introduce a dedicated structure to group RSS initialization parameters
that are only used during RSS creation, and drop the "init" prefix
from pkt_merge_param.

No functional changes.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759094723-843774-6-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/mlx5e: Remove unused mdev param from RSS indir init
Carolina Jubran [Sun, 28 Sep 2025 21:25:20 +0000 (00:25 +0300)]
net/mlx5e: Remove unused mdev param from RSS indir init

The mdev parameter is not used in mlx5e_rss_params_indir_init, so drop
it from the function and update all callers accordingly.

No functional changes.

Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759094723-843774-5-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/mlx5: Improve QoS error messages with actual depth values
Carolina Jubran [Sun, 28 Sep 2025 21:25:19 +0000 (00:25 +0300)]
net/mlx5: Improve QoS error messages with actual depth values

Enhance error messages in MLX5 QoS scheduling depth validation by
including the actual values that caused the validation to fail.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759094723-843774-4-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/mlx5e: Prevent entering switchdev mode with inconsistent netns
Jianbo Liu [Sun, 28 Sep 2025 21:25:18 +0000 (00:25 +0300)]
net/mlx5e: Prevent entering switchdev mode with inconsistent netns

When a PF enters switchdev mode, its netdevice becomes the uplink
representor but remains in its current network namespace. All other
representors (VFs, SFs) are created in the netns of the devlink
instance.

If the PF's netns has been moved and differs from the devlink's netns,
enabling switchdev mode would create a state where the OVS control
plane (ovs-vsctl) cannot manage the switch because the PF uplink
representor and the other representors are split across different
namespaces.

To prevent this inconsistent configuration, block the request to enter
switchdev mode if the PF netdevice's netns does not match the netns of
its devlink instance.

As part of this change, the PF's netns is first marked as immutable.
This prevents race conditions where the netns could be changed after
the check is performed but before the mode transition is complete, and
it aligns the PF's behavior with that of the final uplink representor.

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759094723-843774-3-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/mlx5: HWS, Generalize complex matchers
Vlad Dogaru [Sun, 28 Sep 2025 21:25:17 +0000 (00:25 +0300)]
net/mlx5: HWS, Generalize complex matchers

The existing solution of complex matchers splits the match parameters
across two, and exactly two, matchers. For some rather extreme cases
(e.g. IPv6-in-IPv6 tunnels), even two matchers are not enough.

Generalize complex matchers to up to 4 submatchers, and allow easy
extension to more if needed. This resulted in rewriting a large part
of the high-level complex matchers logic, but the original concepts
were rock solid and still hold.

Key characteristics of the new implementation:

* Rework complex matchers to include multiple submatchers. All
  submatchers but the first are isolated, in keeping with the existing
  paradigm of handing off to specialized matchers that are not otherwise
  reachable by regular rules.

* Similarly, rework complex rules to allow splitting them into more than
  two simple rules. Rules continue to be refcounted to allow for
  multiple complex rules matching on identical parts of the match
  params.

* Rely on the match tag, as opposed to the entire match_param, to hash
  subrules. This results in lower memory usage.

* Prefer to split the original user-supplied match parameters rather
  than the internal field descriptors. This avoids the awkward
  transition back and forth between the two formats.

* Allow splitting multi-dword fields across matchers. The only
  restrictions that the new implementation impose are: a) any fragment
  of an IP address must be accompanied by a match on the IP version; and
  b) a single lower dword of an IPv6 address cannot be present in a
  submatcher as it would be interpreted as an IPv4 address.

* Employ a greedy algorithm to split the match params, as opposed to
  complete search. The results are not optimal, but the algorithm is now
  linear compared to exponential. Consequently, we see complex matcher
  creation time drops two orders of magnitude in our tests.

Signed-off-by: Vlad Dogaru <vdogaru@nvidia.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759094723-843774-2-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
Patrisious Haddad [Sun, 28 Sep 2025 21:08:08 +0000 (00:08 +0300)]
net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs

Write combining is an optimization feature in CPUs that is frequently
used by modern devices to generate 32 or 64 byte TLPs at the PCIe level.
These large TLPs allow certain optimizations in the driver to HW
communication that improve performance. As WC is unpredictable and
optional the HW designs all tolerate cases where combining doesn't
happen and simply experience a performance degradation.

Unfortunately many virtualization environments on all architectures have
done things that completely disable WC inside the VM with no generic way
to detect this. For example WC was fully blocked in ARM64 KVM until
commit 8c47ce3e1d2c ("KVM: arm64: Set io memory s2 pte as normalnc for
vfio pci device").

Trying to use WC when it is known not to work has a measurable
performance cost (~5%). Long ago mlx5 developed an boot time algorithm
to test if WC is available or not by using unique mlx5 HW features to
measure how many large TLPs the device is receiving. The SW generates a
large number of combining opportunities and if any succeed then WC is
declared working.

In mlx5 the WC optimization feature is never used by the kernel except
for the boot time test. The WC is only used by userspace in rdma-core.

Sadly modern ARM CPUs, especially NVIDIA Grace, have a combining
implementation that is very unreliable compared to pretty much
everything prior. This is being fixed architecturally in new CPUs with a
new ST64B instruction, but current shipping devices suffer this problem.

Unreliable means the SW can present thousands of combining opportunities
and the HW will not combine for any of them, which creates a performance
degradation, and critically fails the mlx5 boot test. However, the CPU
is very sensitive to the instruction sequence used, with the better
options being sufficiently good that the performance loss from the
unreliable CPU is not measurable.

Broadly there are several options, from worst to best:
1) A C loop doing a u64 memcpy.
   This was used prior to commit ef302283ddfc
   ("IB/mlx5: Use __iowrite64_copy() for write combining stores")
   and failed almost all the time on Grace CPUs.

2) ARM64 assembly with consecutive 8 byte stores. This was implemented
   as an arch-generic __iowriteXX_copy() family of functions suitable
   for performance use in drivers for WC. commit ead79118dae6
   ("arm64/io: Provide a WC friendly __iowriteXX_copy()") provided the
   ARM implementation.

3) ARM64 assembly with consecutive 16 byte stores. This was rejected
   from kernel use over fears of virtualization failures. Common ARM
   VMMs will crash if STP is used against emulated memory.

4) A single NEON store instruction. Userspace has used this option for a
   very long time, it performs well.

5) For future silicon the new ST64B instruction is guaranteed to
   generate a 64 byte TLP 100% of the time

The past upgrade from #1 to #2 was thought to be sufficient to solve
this problem. However, more testing on more systems shows that #3 is
still problematic at a low frequency and the kernel test fails.

Thus, make the mlx5 use the same instructions as userspace during the
boot time WC self test. This way the WC test matches the userspace and
will properly detect the ability of HW to support the WC workload that
userspace will generate. While #4 still has imperfect combining
performance, it is substantially better than #2, and does actually give
a performance win to applications. Self-test failures with #2 are like
3/10 boots, on some systems, #4 has never seen a boot failure.

There is no real general use case for a NEON based WC flow in the
kernel. This is not suitable for any performance path work as getting
into/out of a NEON context is fairly expensive compared to the gain of
WC. Future CPUs are going to fix this issue by using an new ARM
instruction and __iowriteXX_copy() will be updated to use that
automatically, probably using the ALTERNATES mechanism.

Since this problem is constrained to mlx5's unique situation of needing
a non-performance code path to duplicate what mlx5 userspace is doing as
a matter of self-testing, implement it as a one line inline assembly in
the driver directly.

Lastly, this was concluded from the discussion with ARM maintainers
which confirms that this is the best approach for the solution:
https://lore.kernel.org/r/aHqN_hpJl84T1Usi@arm.com

Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/1759093688-841357-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests/net: add tcp_port_share to .gitignore
Gopi Krishna Menon [Mon, 29 Sep 2025 16:31:38 +0000 (22:01 +0530)]
selftests/net: add tcp_port_share to .gitignore

Add the tcp_port_share test binary to .gitignore to avoid
accidentally staging the build artifact.

Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250929163140.122383-1-krishnagopi487@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoRevert "net/mlx5e: Update and set Xon/Xoff upon MTU set"
Jakub Kicinski [Mon, 29 Sep 2025 18:15:29 +0000 (11:15 -0700)]
Revert "net/mlx5e: Update and set Xon/Xoff upon MTU set"

This reverts commit ceddedc969f0532b7c62ca971ee50d519d2bc0cb.

Commit in question breaks the mapping of PGs to pools for some SKUs.
Specifically multi-host NICs seem to be shipped with a custom buffer
configuration which maps the lossy PG to pool 4. But the bad commit
overrides this with pool 0 which does not have sufficient buffer space
reserved. Resulting in ~40% packet loss. The commit also breaks BMC /
OOB connection completely (100% packet loss).

Revert, similarly to commit 3fbfe251cc9f ("Revert "net/mlx5e: Update and
set Xon/Xoff upon port speed set""). The breakage is exactly the same,
the only difference is that quoted commit would break the NIC immediately
on boot, and the currently reverted commit only when MTU is changed.

Note: "good" kernels do not restore the configuration, so downgrade isn't
enough to recover machines. A NIC power cycle seems to be necessary to
return to a healthy state (or overriding the relevant registers using
a custom patch).

Fixes: ceddedc969f0 ("net/mlx5e: Update and set Xon/Xoff upon MTU set")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250929181529.1848157-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoMerge branch 'net-lockless-skb_attempt_defer_free'
Paolo Abeni [Tue, 30 Sep 2025 13:45:54 +0000 (15:45 +0200)]
Merge branch 'net-lockless-skb_attempt_defer_free'

Eric Dumazet says:

====================
net: lockless skb_attempt_defer_free()

Platforms with many cpus and relatively slow inter connect show
a significant spinlock contention in skb_attempt_defer_free().

This series refactors this infrastructure to be NUMA aware,
and lockless.

Tested on various platforms, including AMD Zen 2/3/4
and Intel Granite Rapids, showing significant cost reductions
under network stress (more than 20 Mpps).
====================

Link: https://patch.msgid.link/20250928084934.3266948-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: add NUMA awareness to skb_attempt_defer_free()
Eric Dumazet [Sun, 28 Sep 2025 08:49:34 +0000 (08:49 +0000)]
net: add NUMA awareness to skb_attempt_defer_free()

Instead of sharing sd->defer_list & sd->defer_count with
many cpus, add one pair for each NUMA node.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250928084934.3266948-4-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: use llist for sd->defer_list
Eric Dumazet [Sun, 28 Sep 2025 08:49:33 +0000 (08:49 +0000)]
net: use llist for sd->defer_list

Get rid of sd->defer_lock and adopt llist operations.

We optimize skb_attempt_defer_free() for the common case,
where the packet is queued. Otherwise sd->defer_count
is increasing, until skb_defer_free_flush() clears it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250928084934.3266948-3-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: make softnet_data.defer_count an atomic
Eric Dumazet [Sun, 28 Sep 2025 08:49:32 +0000 (08:49 +0000)]
net: make softnet_data.defer_count an atomic

This is preparation work to remove the softnet_data.defer_lock,
as it is contended on hosts with large number of cores.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250928084934.3266948-2-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoMerge branch 'psp-add-a-kselftest-suite-and-netdevsim-implementation'
Paolo Abeni [Tue, 30 Sep 2025 13:17:23 +0000 (15:17 +0200)]
Merge branch 'psp-add-a-kselftest-suite-and-netdevsim-implementation'

Jakub Kicinski says:

====================
psp: add a kselftest suite and netdevsim implementation

Add a basic test suite for drivers that support PSP. Also, add a PSP
implementation in the netdevsim driver.

The netdevsim implementation does encapsulation and decapsulation of
PSP packets, but no crypto.

The tests cover the basic usage of the uapi, and demonstrate key
exchange and connection setup. The tests and netdevsim support IPv4
and IPv6. Here is an example run on a system with a CX7 NIC.

    TAP version 13
    1..28
    ok 1 psp.data_basic_send_v0_ip4
    ok 2 psp.data_basic_send_v0_ip6
    ok 3 psp.data_basic_send_v1_ip4
    ok 4 psp.data_basic_send_v1_ip6
    ok 5 psp.data_basic_send_v2_ip4 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-128')
    ok 6 psp.data_basic_send_v2_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-128')
    ok 7 psp.data_basic_send_v3_ip4 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-256')
    ok 8 psp.data_basic_send_v3_ip6 # SKIP ('PSP version not supported', 'hdr0-aes-gmac-256')
    ok 9 psp.data_mss_adjust_ip4
    ok 10 psp.data_mss_adjust_ip6
    ok 11 psp.dev_list_devices
    ok 12 psp.dev_get_device
    ok 13 psp.dev_get_device_bad
    ok 14 psp.dev_rotate
    ok 15 psp.dev_rotate_spi
    ok 16 psp.assoc_basic
    ok 17 psp.assoc_bad_dev
    ok 18 psp.assoc_sk_only_conn
    ok 19 psp.assoc_sk_only_mismatch
    ok 20 psp.assoc_sk_only_mismatch_tx
    ok 21 psp.assoc_sk_only_unconn
    ok 22 psp.assoc_version_mismatch
    ok 23 psp.assoc_twice
    ok 24 psp.data_send_bad_key
    ok 25 psp.data_send_disconnect
    ok 26 psp.data_stale_key
    ok 27 psp.removal_device_rx # XFAIL Test only works on netdevsim
    ok 28 psp.removal_device_bi # XFAIL Test only works on netdevsim
    # Totals: pass:22 fail:0 xfail:2 xpass:0 skip:4 error:0
    #
    # Responder logs (0):
    # STDERR:
    #  Set PSP enable on device 1 to 0x3
    #  Set PSP enable on device 1 to 0x0

v2: https://lore.kernel.org/20250925211647.3450332-1-daniel.zahka@gmail.com
v1: https://lore.kernel.org/20250924194959.2845473-1-daniel.zahka@gmail.com
====================

Link: https://patch.msgid.link/20250927225420.1443468-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoselftests: drv-net: psp: add tests for destroying devices
Jakub Kicinski [Sat, 27 Sep 2025 22:54:20 +0000 (15:54 -0700)]
selftests: drv-net: psp: add tests for destroying devices

Add tests for making sure device can disappear while associations
exist. This is netdevsim-only since destroying real devices is
more tricky.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-9-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoselftests: drv-net: psp: add test for auto-adjusting TCP MSS
Jakub Kicinski [Sat, 27 Sep 2025 22:54:19 +0000 (15:54 -0700)]
selftests: drv-net: psp: add test for auto-adjusting TCP MSS

Test TCP MSS getting auto-adjusted. PSP adds an encapsulation overhead
of 40B per packet, when used in transport mode without any
virtualization cookie or other optional PSP header fields. The kernel
should adjust the MSS for a connection after PSP tx state is reached.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-8-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoselftests: drv-net: psp: add connection breaking tests
Jakub Kicinski [Sat, 27 Sep 2025 22:54:18 +0000 (15:54 -0700)]
selftests: drv-net: psp: add connection breaking tests

Add test checking conditions which lead to connections breaking.
Using bad key or connection gets stuck if device key is rotated
twice.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-7-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoselftests: drv-net: psp: add association tests
Jakub Kicinski [Sat, 27 Sep 2025 22:54:17 +0000 (15:54 -0700)]
selftests: drv-net: psp: add association tests

Add tests for exercising PSP associations for TCP sockets.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-6-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoselftests: drv-net: psp: add basic data transfer and key rotation tests
Jakub Kicinski [Sat, 27 Sep 2025 22:54:16 +0000 (15:54 -0700)]
selftests: drv-net: psp: add basic data transfer and key rotation tests

Add basic tests for sending data over PSP and making sure that key
rotation toggles the MSB of the spi.

Deploy PSP responder on the remote end. We also need a healthy dose
of common helpers for setting up the connections, assertions and
interrogating socket state on the Python side.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-5-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoselftests: drv-net: add PSP responder
Jakub Kicinski [Sat, 27 Sep 2025 22:54:15 +0000 (15:54 -0700)]
selftests: drv-net: add PSP responder

PSP tests need the remote system to support PSP, and some PSP capable
application to exchange data with. Create a simple PSP responder app
which we can build and deploy to the remote host. The tests themselves
can be written in Python but for ease of deploying the responder is in C
(using C YNL).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-4-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoselftests: drv-net: base device access API test
Jakub Kicinski [Sat, 27 Sep 2025 22:54:14 +0000 (15:54 -0700)]
selftests: drv-net: base device access API test

Simple PSP test to getting info about PSP devices.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-3-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonetdevsim: a basic test PSP implementation
Jakub Kicinski [Sat, 27 Sep 2025 22:54:13 +0000 (15:54 -0700)]
netdevsim: a basic test PSP implementation

Provide a PSP implementation for netdevsim.

Use psp_dev_encapsulate() and psp_dev_rcv() to do actual encapsulation
and decapsulation on skbs, but perform no encryption or decryption. In
order to make encryption with a bad key result in a drop on the peer's
rx side, we stash our psd's generation number in the first byte of each
key before handing to the peer.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Co-developed-by: Daniel Zahka <daniel.zahka@gmail.com>
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com>
Link: https://patch.msgid.link/20250927225420.1443468-2-kuba@kernel.org
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: sfp: improve poll interval handling
Heiner Kallweit [Sat, 27 Sep 2025 20:23:19 +0000 (22:23 +0200)]
net: sfp: improve poll interval handling

The poll interval is a fixed value, so we don't need a static variable
for it. The change also allows to use standard macro
module_platform_driver, avoiding some boilerplate code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/b8079f96-6865-431c-a908-a0b9e9bd5379@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: sfp: don't include swphy.h
Heiner Kallweit [Sat, 27 Sep 2025 20:03:35 +0000 (22:03 +0200)]
net: sfp: don't include swphy.h

Nothing from swphy.h is used here, so don't include it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/19921899-f0a8-4752-a897-1b6d62ade6eb@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: phy: annotate linkmode initializers as not used after init phase
Heiner Kallweit [Sat, 27 Sep 2025 19:57:07 +0000 (21:57 +0200)]
net: phy: annotate linkmode initializers as not used after init phase

Code and data used from phy_init() only, can be annotated accordingly.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/5fb9c41b-bf44-4915-a3c3-f20952fce6de@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: phy: stop exporting phy_driver_unregister
Heiner Kallweit [Sat, 27 Sep 2025 19:52:30 +0000 (21:52 +0200)]
net: phy: stop exporting phy_driver_unregister

After 42e2a9e11a1d ("net: phy: dp83640: improve phydev and driver
removal handling") we can stop exporting also phy_driver_unregister().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/2bab950e-4b70-4030-b997-03f48379586f@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agopage_pool: Clamp pool size to max 16K pages
Dragos Tatulea [Fri, 26 Sep 2025 13:16:05 +0000 (16:16 +0300)]
page_pool: Clamp pool size to max 16K pages

page_pool_init() returns E2BIG when the page_pool size goes above 32K
pages. As some drivers are configuring the page_pool size according to
the MTU and ring size, there are cases where this limit is exceeded and
the queue creation fails.

The page_pool size doesn't have to cover a full queue, especially for
larger ring size. So clamp the size instead of returning an error. Do
this in the core to avoid having each driver do the clamping.

The current limit was deemed to high [1] so it was reduced to 16K to avoid
page waste.

[1] https://lore.kernel.org/all/1758532715-820422-3-git-send-email-tariqt@nvidia.com/

Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20250926131605.2276734-2-dtatulea@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agotipc: adjust tipc_nodeid2string() to return string length
Dmitry Antipov [Fri, 26 Sep 2025 07:41:13 +0000 (10:41 +0300)]
tipc: adjust tipc_nodeid2string() to return string length

Since the value returned by 'tipc_nodeid2string()' is not used, the
function may be adjusted to return the length of the result, which
is helpful to drop a few calls to 'strlen()' in 'tipc_link_create()'
and 'tipc_link_bc_create()'. Compile tested only.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250926074113.914399-1-dmantipov@yandex.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: enetc: initialize SW PIR and CIR based HW PIR and CIR values
Wei Fang [Fri, 26 Sep 2025 01:39:53 +0000 (09:39 +0800)]
net: enetc: initialize SW PIR and CIR based HW PIR and CIR values

Software can only initialize the PIR and CIR of the command BD ring after
a FLR, and these two registers can only be set to 0. But the reset values
of these two registers are 0, so software does not need to update them.
If there is no a FLR and PIR and CIR are not 0, resetting them to 0 or
other values by software will cause the command BD ring to work
abnormally. This is because of an internal context in the ring prefetch
logic that will retain the state from the first incarnation of the ring
and continue prefetching from the stale location when the ring is
reinitialized. The internal context can only be reset by the FLR.

In addition, there is a logic error in the implementation, next_to_clean
indicates the software CIR and next_to_use indicates the software PIR.
But the current driver uses next_to_clean to set PIR and use next_to_use
to set CIR. This does not cause a problem in actual use, because the
current command BD ring is only initialized after FLR, and the initial
values of next_to_use and next_to_clean are both 0.

Therefore, this patch removes the initialization of PIR and CIR. Instead,
next_to_use and next_to_clean are initialized by reading the values of
PIR and CIR.

Fixes: 4701073c3deb ("net: enetc: add initial netc-lib driver to support NTMP")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20250926013954.2003456-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agonet: nfc: nci: Add parameter validation for packet data
Deepak Sharma [Thu, 25 Sep 2025 13:28:46 +0000 (18:58 +0530)]
net: nfc: nci: Add parameter validation for packet data

Syzbot reported an uninitialized value bug in nci_init_req, which was
introduced by commit 5aca7966d2a7 ("Merge tag
'perf-tools-fixes-for-v6.17-2025-09-16' of
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools").

This bug arises due to very limited and poor input validation
that was done at nic_valid_size(). This validation only
validates the skb->len (directly reflects size provided at the
userspace interface) with the length provided in the buffer
itself (interpreted as NCI_HEADER). This leads to the processing
of memory content at the address assuming the correct layout
per what opcode requires there. This leads to the accesses to
buffer of `skb_buff->data` which is not assigned anything yet.

Following the same silent drop of packets of invalid sizes at
`nic_valid_size()`, add validation of the data in the respective
handlers and return error values in case of failure. Release
the skb if error values are returned from handlers in
`nci_nft_packet` and effectively do a silent drop

Possible TODO: because we silently drop the packets, the
call to `nci_request` will be waiting for completion of request
and will face timeouts. These timeouts can get excessively logged
in the dmesg. A proper handling of them may require to export
`nci_request_cancel` (or propagate error handling from the
nft packets handlers).

Reported-by: syzbot+740e04c2a93467a0f8c8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=740e04c2a93467a0f8c8
Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Tested-by: syzbot+740e04c2a93467a0f8c8@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Deepak Sharma <deepak.sharma.472935@gmail.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250925132846.213425-1-deepak.sharma.472935@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months ago6pack: drop redundant locking and refcounting
Qingfang Deng [Thu, 25 Sep 2025 05:10:59 +0000 (13:10 +0800)]
6pack: drop redundant locking and refcounting

The TTY layer already serializes line discipline operations with
tty->ldisc_sem, so the extra disc_data_lock and refcnt in 6pack
are unnecessary.

Removing them simplifies the code and also resolves a lockdep warning
reported by syzbot. The warning did not indicate a real deadlock, since
the write-side lock was only taken in process context with hardirqs
disabled.

Reported-by: syzbot+5fd749c74105b0e1b302@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68c858b0.050a0220.3c6139.0d1c.GAE@google.com/
Signed-off-by: Qingfang Deng <dqfext@gmail.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/20250925051059.26876-1-dqfext@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoselftests: bonding: add ipsec offload test
Hangbin Liu [Thu, 25 Sep 2025 02:33:04 +0000 (02:33 +0000)]
selftests: bonding: add ipsec offload test

This introduces a test for IPSec offload over bonding, utilizing netdevsim
for the testing process, as veth interfaces do not support IPSec offload.
The test will ensure that the IPSec offload functionality remains operational
even after a failover event occurs in the bonding configuration.

Here is the test result:

TEST: bond_ipsec_offload (active_slave eth0)                        [ OK ]
TEST: bond_ipsec_offload (active_slave eth1)                        [ OK ]

Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250925023304.472186-2-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agobonding: fix xfrm offload feature setup on active-backup mode
Hangbin Liu [Thu, 25 Sep 2025 02:33:03 +0000 (02:33 +0000)]
bonding: fix xfrm offload feature setup on active-backup mode

The active-backup bonding mode supports XFRM ESP offload. However, when
a bond is added using command like `ip link add bond0 type bond mode 1
miimon 100`, the `ethtool -k` command shows that the XFRM ESP offload is
disabled. This occurs because, in bond_newlink(), we change bond link
first and register bond device later. So the XFRM feature update in
bond_option_mode_set() is not called as the bond device is not yet
registered, leading to the offload feature not being set successfully.

To resolve this issue, we can modify the code order in bond_newlink() to
ensure that the bond device is registered first before changing the bond
link parameters. This change will allow the XFRM ESP offload feature to be
correctly enabled.

Fixes: 007ab5345545 ("bonding: fix feature flag setting at init time")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://patch.msgid.link/20250925023304.472186-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoDocumentation: net: add flow control guide and document ethtool API
Oleksij Rempel [Wed, 24 Sep 2025 12:02:41 +0000 (14:02 +0200)]
Documentation: net: add flow control guide and document ethtool API

Introduce a new document, flow_control.rst, to provide a comprehensive
guide on Ethernet Flow Control in Linux. The guide explains how flow
control works, how autonegotiation resolves pause capabilities, and how
to configure it using ethtool and Netlink.

In parallel, document the pause and pause-stat attributes in the
ethtool.yaml netlink spec. This enables the ynl tool to generate
kernel-doc comments for the corresponding enums in the UAPI header,
making the C interface self-documenting.

Finally, replace the legacy flow control section in phy.rst with a
reference to the new document and add pointers in the relevant C source
files.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Link: https://patch.msgid.link/20250924120241.724850-1-o.rempel@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2 months agoMerge branch 'dpll-add-phase-offset-averaging-factor'
Jakub Kicinski [Tue, 30 Sep 2025 01:57:43 +0000 (18:57 -0700)]
Merge branch 'dpll-add-phase-offset-averaging-factor'

Ivan Vecera says:

====================
dpll: add phase offset averaging factor

For some hardware, the phase shift may result from averaging previous values
and the newly measured value. In this case, the averaging is controlled by
a configurable averaging factor.

Add new device level attribute phase-offset-avg-factor, appropriate
callbacks and implement them in zl3073x driver.
====================

Link: https://patch.msgid.link/20250927084912.2343597-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agodpll: zl3073x: Allow to configure phase offset averaging factor
Ivan Vecera [Sat, 27 Sep 2025 08:49:12 +0000 (10:49 +0200)]
dpll: zl3073x: Allow to configure phase offset averaging factor

The DPLL phase measurement block uses an exponential moving average with
a configurable averaging factor. Measurements are taken at approximately
40 Hz or at the reference frequency, whichever is lower.

Currently, factor=2 is used to prioritize fast response for dynamic
phase changes. For applications needing a stable, precise average phase
offset where rapid changes are unlikely, a higher factor is recommended.

Implement the .phase_offset_avg_factor_get/set callbacks to allow a user
to adjust this factor.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250927084912.2343597-4-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agodpll: add phase_offset_avg_factor_get/set callback ops
Ivan Vecera [Sat, 27 Sep 2025 08:49:11 +0000 (10:49 +0200)]
dpll: add phase_offset_avg_factor_get/set callback ops

Add new callback operations for a dpll device:
- phase_offset_avg_factor_get(...) - to obtain current phase offset
  averaging factor from dpll device,
- phase_offset_avg_factor_set(...) - to set phase offset averaging factor

Obtain the factor value using the get callback and provide it to the user
if the device driver implement this callback. Execute the set callback upon
user requests, if the driver implement it.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
v2:
* do not require 'set' callback to retrieve current value
* always call 'set' callback regardless of current value
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250927084912.2343597-3-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agodpll: add phase-offset-avg-factor device attribute to netlink spec
Ivan Vecera [Sat, 27 Sep 2025 08:49:10 +0000 (10:49 +0200)]
dpll: add phase-offset-avg-factor device attribute to netlink spec

Add dpll device level attribute DPLL_A_PHASE_OFFSET_AVG_FACTOR to allow
control over a calculation of reported phase offset value. Attribute is
present, if the driver provides such capability, otherwise attribute
shall not be present.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20250927084912.2343597-2-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'mlx5-misc-fixes-2025-09-28'
Jakub Kicinski [Tue, 30 Sep 2025 01:50:51 +0000 (18:50 -0700)]
Merge branch 'mlx5-misc-fixes-2025-09-28'

Tariq Toukan says:

====================
mlx5 misc fixes 2025-09-28

misc bug fixes from the team to the mlx5 core driver.
====================

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/mlx5: fw reset, add reset timeout work
Moshe Shemesh [Sun, 28 Sep 2025 21:02:09 +0000 (00:02 +0300)]
net/mlx5: fw reset, add reset timeout work

Add sync reset timeout to stop poll_sync_reset in case there was no
reset done or abort event within timeout. Otherwise poll sync reset will
just continue and in case of fw fatal error no health reporting will be
done.

Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/mlx5: pagealloc: Fix reclaim race during command interface teardown
Shay Drory [Sun, 28 Sep 2025 21:02:08 +0000 (00:02 +0300)]
net/mlx5: pagealloc: Fix reclaim race during command interface teardown

The reclaim_pages_cmd() function sends a command to the firmware to
reclaim pages if the command interface is active.

A race condition can occur if the command interface goes down (e.g., due
to a PCI error) while the mlx5_cmd_do() call is in flight. In this
case, mlx5_cmd_do() will return an error. The original code would
propagate this error immediately, bypassing the software-based page
reclamation logic that is supposed to run when the command interface is
down.

Fix this by checking whether mlx5_cmd_do() returns -ENXIO, which mark
that command interface is down. If this is the case, fall through to
the software reclamation path. If the command failed for any another
reason, or finished successfully, return as before.

Fixes: b898ce7bccf1 ("net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet/mlx5: Stop polling for command response if interface goes down
Moshe Shemesh [Sun, 28 Sep 2025 21:02:07 +0000 (00:02 +0300)]
net/mlx5: Stop polling for command response if interface goes down

Stop polling on firmware response to command in polling mode if the
command interface got down. This situation can occur, for example, if a
firmware fatal error is detected during polling.

This change halts the polling process when the command interface goes
down, preventing unnecessary waits.

Fixes: b898ce7bccf1 ("net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Shay Drori <shayd@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge tag 'mlx5-next-lag' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
Jakub Kicinski [Tue, 30 Sep 2025 01:49:58 +0000 (18:49 -0700)]
Merge tag 'mlx5-next-lag' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Tariq Toukan says:

====================
mlx5-next updates 2025-09-28

* tag 'mlx5-next-lag' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: IFC add balance ID and LAG per MP group bits
  net/mlx5: Add IFC bit for TIR/SQ order capability
====================

Link: https://patch.msgid.link/1759093989-841873-1-git-send-email-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: dlink: handle copy_thresh allocation failure
Yeounsu Moon [Sun, 28 Sep 2025 19:01:24 +0000 (04:01 +0900)]
net: dlink: handle copy_thresh allocation failure

The driver did not handle failure of `netdev_alloc_skb_ip_align()`.
If the allocation failed, dereferencing `skb->protocol` could lead to
a NULL pointer dereference.

This patch tries to allocate `skb`. If the allocation fails, it falls
back to the normal path.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Tested-on: D-Link DGE-550T Rev-A3
Signed-off-by: Yeounsu Moon <yyyynoom@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250928190124.1156-1-yyyynoom@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: remove stmmac_hw_setup() excess documentation parameter
Russell King (Oracle) [Mon, 29 Sep 2025 07:43:55 +0000 (08:43 +0100)]
net: stmmac: remove stmmac_hw_setup() excess documentation parameter

The kernel build bot reports:

Warning: drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:3438 Excess function parameter 'ptp_register' description in 'stmmac_hw_setup'

Fix it.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 98d8ea566b85 ("net: stmmac: move timestamping/ptp init to stmmac_hw_setup() caller")
Closes: https://lore.kernel.org/oe-kbuild-all/202509290927.svDd6xuw-lkp@intel.com/
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1v38Y7-00000008UCQ-3w27@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'selftest-packetdrill-import-tfo-server-tests'
Jakub Kicinski [Tue, 30 Sep 2025 01:41:39 +0000 (18:41 -0700)]
Merge branch 'selftest-packetdrill-import-tfo-server-tests'

Kuniyuki Iwashima says:

====================
selftest: packetdrill: Import TFO server tests.

The series imports 15 TFO server tests from google/packetdrill and
adds 2 more tests.

The repository has two versions of tests for most scenarios; one uses
the non-experimental option (34), and the other uses the experimental
option (255) with 0xF989.

Basically, we only import the non-experimental version of tests, and
for the experimental option, tcp_fastopen_server_experimental_option.pkt
is added.

The following tests are not (yet) imported:

  * icmp-baseline.pkt
  * simple1.pkt / simple2.pkt / simple3.pkt

The former is completely covered by icmp-before-accept.pkt.

The later's delta is the src/dst IP pair to generate a different
cookie, but supporting dualstack requires churn in ksft_runner.sh,
so defered to future series.  Also, sockopt-fastopen-key.pkt covers
the same function.

The following tests have the experimental version only, so converted
to the non-experimental option:

  * client-ack-dropped-then-recovery-ms-timestamps.pkt
  * sockopt-fastopen-key.pkt

For the imported tests, these common changes are applied.

  * Add SPDX header
  * Adjust path to default.sh
  * Adjust sysctl w/ set_sysctls.py
  * Use TFO_COOKIE instead of a raw hex value
  * Use SOCK_NONBLOCK for socket() not to block accept()
  * Add assertions for TCP state if commented
  * Remove unnecessary delay (e.g. +0.1 setsockopt(SO_REUSEADDR), etc)

With this series, except for simple{1,2,3}.pkt, we can remove TFO server
tests in google/packetdrill.
====================

Link: https://patch.msgid.link/20250927213022.1850048-1-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: packetdrill: Import client-ack-dropped-then-recovery-ms-timestamps.pkt
Kuniyuki Iwashima [Sat, 27 Sep 2025 21:29:51 +0000 (21:29 +0000)]
selftest: packetdrill: Import client-ack-dropped-then-recovery-ms-timestamps.pkt

This also does not have the non-experimental version, so converted to FO.

The comment in .pkt explains the detailed scenario.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-14-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: packetdrill: Import sockopt-fastopen-key.pkt
Kuniyuki Iwashima [Sat, 27 Sep 2025 21:29:50 +0000 (21:29 +0000)]
selftest: packetdrill: Import sockopt-fastopen-key.pkt

sockopt-fastopen-key.pkt does not have the non-experimental
version, so the Experimental version is converted, FOEXP -> FO.

The test sets net.ipv4.tcp_fastopen_key=0-0-0-0 and instead
sets another key via setsockopt(TCP_FASTOPEN_KEY).

The first listener generates a valid cookie in response to TFO
option without cookie, and the second listner creates a TFO socket
using the valid cookie.

TCP_FASTOPEN_KEY is adjusted to use the common key in default.sh
so that we can use TFO_COOKIE and support dualstack.  Similarly,
TFO_COOKIE_ZERO for the 0-0-0-0 key is defined.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-13-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: packetdrill: Refine tcp_fastopen_server_reset-after-disconnect.pkt.
Kuniyuki Iwashima [Sat, 27 Sep 2025 21:29:49 +0000 (21:29 +0000)]
selftest: packetdrill: Refine tcp_fastopen_server_reset-after-disconnect.pkt.

These changes are applied to follow the imported packetdrill tests.

  * Call setsockopt(TCP_FASTOPEN)
  * Remove unnecessary accept() delay
  * Add assertion for TCP states
  * Rename to tcp_fastopen_server_trigger-rst-reconnect.pkt.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-12-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: packetdrill: Import opt34/*-trigger-rst.pkt.
Kuniyuki Iwashima [Sat, 27 Sep 2025 21:29:48 +0000 (21:29 +0000)]
selftest: packetdrill: Import opt34/*-trigger-rst.pkt.

This imports the non-experimental version of opt34/*-trigger-rst.pkt.

                                     | accept() | SYN data |
  -----------------------------------+----------+----------+
  listener-closed-trigger-rst.pkt    |    no    |  unread  |
  unread-data-closed-trigger-rst.pkt |   yes    |  unread  |

Both files test that close()ing a SYN_RECV socket with unread SYN data
triggers RST.

The files are renamed to have the common prefix, trigger-rst.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-11-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: packetdrill: Import opt34/reset-* tests.
Kuniyuki Iwashima [Sat, 27 Sep 2025 21:29:47 +0000 (21:29 +0000)]
selftest: packetdrill: Import opt34/reset-* tests.

This imports the non-experimental version of opt34/reset-*.pkt.

                                   |  Child  |              RST              | sk_err  |
  ---------------------------------+---------+-------------------------------+---------+
  reset-after-accept.pkt           |   TFO   |   after accept(), SYN_RECV    |  read() |
  reset-close-with-unread-data.pkt |   TFO   |   after accept(), SYN_RECV    | write() |
  reset-before-accept.pkt          |   TFO   |  before accept(), SYN_RECV    |  read() |
  reset-non-tfo-socket.pkt         | non-TFO |  before accept(), ESTABLISHED | write() |

The first 3 files test scenarios where a SYN_RECV socket receives RST
before/after accept() and data in SYN must be read() without error,
but the following read() or fist write() will return ECONNRESET.

The last test is similar but with non-TFO socket.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-10-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: packetdrill: Import opt34/icmp-before-accept.pkt.
Kuniyuki Iwashima [Sat, 27 Sep 2025 21:29:46 +0000 (21:29 +0000)]
selftest: packetdrill: Import opt34/icmp-before-accept.pkt.

This imports the non-experimental version of icmp-before-accept.pkt.

This file tests the scenario where an ICMP unreachable packet for a
not-yet-accept()ed socket changes its state to TCP_CLOSE, but the
SYN data must be read without error, and the following read() returns
EHOSTUNREACH.

Note that this test support only IPv4 as icmp is used.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-9-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: packetdrill: Import opt34/fin-close-socket.pkt.
Kuniyuki Iwashima [Sat, 27 Sep 2025 21:29:45 +0000 (21:29 +0000)]
selftest: packetdrill: Import opt34/fin-close-socket.pkt.

This imports the non-experimental version of fin-close-socket.pkt.

This file tests the scenario where a TFO child socket's state
transitions from SYN_RECV to CLOSE_WAIT before accept()ed.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-8-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: packetdrill: Add test for experimental option.
Kuniyuki Iwashima [Sat, 27 Sep 2025 21:29:44 +0000 (21:29 +0000)]
selftest: packetdrill: Add test for experimental option.

The only difference between non-experimental vs experimental TFO
option handling is SYN+ACK generation.

When tcp_parse_fastopen_option() parses a TFO option, it sets
tcp_fastopen_cookie.exp to false if the option number is 34,
and true if 255.

The value is carried to tcp_options_write() to generate a TFO option
with the same option number.

Other than that, all the TFO handling is the same and the kernel must
generate the same cookie regardless of the option number.

Let's add a test for the handling so that we can consolidate
fastopen/server/ tests and fastopen/server/opt34 tests.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-7-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: packetdrill: Add test for TFO_SERVER_WO_SOCKOPT1.
Kuniyuki Iwashima [Sat, 27 Sep 2025 21:29:43 +0000 (21:29 +0000)]
selftest: packetdrill: Add test for TFO_SERVER_WO_SOCKOPT1.

TFO_SERVER_WO_SOCKOPT1 is no longer enabled by default, and
each server test requires setsockopt(TCP_FASTOPEN).

Let's add a basic test for TFO_SERVER_WO_SOCKOPT1.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-6-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: packetdrill: Import TFO server basic tests.
Kuniyuki Iwashima [Sat, 27 Sep 2025 21:29:42 +0000 (21:29 +0000)]
selftest: packetdrill: Import TFO server basic tests.

This imports basic TFO server tests from google/packetdrill.

The repository has two versions of tests for most scenarios; one uses
the non-experimental option (34), and the other uses the experimental
option (255) with 0xF989.

This only imports the following tests of the non-experimental version
placed in [0].  I will add a specific test for the experimental option
handling later.

                             | TFO | Cookie | Payload |
  ---------------------------+-----+--------+---------+
  basic-rw.pkt               | yes |  yes   |   yes   |
  basic-zero-payload.pkt     | yes |  yes   |    no   |
  basic-cookie-not-reqd.pkt  | yes |   no   |   yes   |
  basic-non-tfo-listener.pkt |  no |  yes   |   yes   |
  pure-syn-data.pkt          | yes |   no   |   yes   |

The original pure-syn-data.pkt missed setsockopt(TCP_FASTOPEN) and did
not test TFO server in some scenarios unintentionally, so setsockopt()
is added where needed.  In addition, non-TFO scenario is stripped as
it is covered by basic-non-tfo-listener.pkt.  Also, I added basic- prefix.

Link: https://github.com/google/packetdrill/tree/bfc96251310f/gtests/net/tcp/fastopen/server/opt34
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-5-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: packetdrill: Define common TCP Fast Open cookie.
Kuniyuki Iwashima [Sat, 27 Sep 2025 21:29:41 +0000 (21:29 +0000)]
selftest: packetdrill: Define common TCP Fast Open cookie.

TCP Fast Open cookie is generated in __tcp_fastopen_cookie_gen_cipher().

The cookie value is generated from src/dst IPs and a key configured by
setsockopt(TCP_FASTOPEN_KEY) or net.ipv4.tcp_fastopen_key.

The default.sh sets net.ipv4.tcp_fastopen_key, and the original packetdrill
defines the corresponding cookie as TFO_COOKIE in run_all.py. [0]

Then, each test does not need to care about the value, and we can easily
update TFO_COOKIE in case __tcp_fastopen_cookie_gen_cipher() changes the
algorithm.

However, some tests use the bare hex value for specific IPv4 addresses
and do not support IPv6.

Let's define the same TFO_COOKIE in ksft_runner.sh.

We will replace such bare hex values with TFO_COOKIE except for a single
test for setsockopt(TCP_FASTOPEN_KEY).

Link: https://github.com/google/packetdrill/blob/7230b3990f94/gtests/net/packetdrill/run_all.py#L65
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-4-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: packetdrill: Require explicit setsockopt(TCP_FASTOPEN).
Kuniyuki Iwashima [Sat, 27 Sep 2025 21:29:40 +0000 (21:29 +0000)]
selftest: packetdrill: Require explicit setsockopt(TCP_FASTOPEN).

To enable TCP Fast Open on a server, net.ipv4.tcp_fastopen must
have 0x2 (TFO_SERVER_ENABLE), and we need to do either

  1. Call setsockopt(TCP_FASTOPEN) for the socket
  2. Set 0x400 (TFO_SERVER_WO_SOCKOPT1) additionally to net.ipv4.tcp_fastopen

The default.sh sets 0x70403 so that each test does not need setsockopt().
(0x1 is TFO_CLIENT_ENABLE, and 0x70000 is ...???)

However, some tests overwrite net.ipv4.tcp_fastopen without
TFO_SERVER_WO_SOCKOPT1 and forgot setsockopt(TCP_FASTOPEN).

For example, pure-syn-data.pkt [0] tests non-TFO servers unintentionally,
except in the first scenario.

To prevent such an accident, let's require explicit setsockopt().

TFO_CLIENT_ENABLE is necessary for
tcp_syscall_bad_arg_fastopen-invalid-buf-ptr.pkt.

Link: https://github.com/google/packetdrill/blob/bfc96251310f/gtests/net/tcp/fastopen/server/opt34/pure-syn-data.pkt
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-3-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftest: packetdrill: Set ktap_set_plan properly for single protocol test.
Kuniyuki Iwashima [Sat, 27 Sep 2025 21:29:39 +0000 (21:29 +0000)]
selftest: packetdrill: Set ktap_set_plan properly for single protocol test.

The cited commit forgot to update the ktap_set_plan call.

ktap_set_plan sets the number of tests (KSFT_NUM_TESTS), which must
match the number of executed tests (KTAP_CNT_PASS + KTAP_CNT_SKIP +
KTAP_CNT_XFAIL) in ktap_finished.

Otherwise, the selftest exit()s with 1.

Let's adjust KSFT_NUM_TESTS based on supported protocols.

While at it, misalignment is fixed up.

Fixes: a5c10aa3d1ba ("selftests/net: packetdrill: Support single protocol test.")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://patch.msgid.link/20250927213022.1850048-2-kuniyu@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: ena: return 0 in ena_get_rxfh_key_size() when RSS hash key is not configurable
Kohei Enju [Mon, 29 Sep 2025 05:02:22 +0000 (14:02 +0900)]
net: ena: return 0 in ena_get_rxfh_key_size() when RSS hash key is not configurable

In EC2 instances where the RSS hash key is not configurable, ethtool
shows bogus RSS hash key since ena_get_rxfh_key_size() unconditionally
returns ENA_HASH_KEY_SIZE.

Commit 6a4f7dc82d1e ("net: ena: rss: do not allocate key when not
supported") added proper handling for devices that don't support RSS
hash key configuration, but ena_get_rxfh_key_size() has been unchanged.

When the RSS hash key is not configurable, return 0 instead of
ENA_HASH_KEY_SIZE to clarify getting the value is not supported.

Tested on m5 instance families.

Without patch:
 # ethtool -x ens5 | grep -A 1 "RSS hash key"
 RSS hash key:
 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00

With patch:
 # ethtool -x ens5 | grep -A 1 "RSS hash key"
 RSS hash key:
 Operation not supported

Fixes: 6a4f7dc82d1e ("net: ena: rss: do not allocate key when not supported")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Link: https://patch.msgid.link/20250929050247.51680-1-enjuk@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonfp: fix RSS hash key size when RSS is not supported
Kohei Enju [Mon, 29 Sep 2025 05:42:15 +0000 (14:42 +0900)]
nfp: fix RSS hash key size when RSS is not supported

The nfp_net_get_rxfh_key_size() function returns -EOPNOTSUPP when
devices don't support RSS, and callers treat the negative value as a
large positive value since the return type is u32.

Return 0 when devices don't support RSS, aligning with the ethtool
interface .get_rxfh_key_size() that requires returning 0 in such cases.

Fixes: 9ff304bfaf58 ("nfp: add support for reporting CRC32 hash function")
Signed-off-by: Kohei Enju <enjuk@amazon.com>
Link: https://patch.msgid.link/20250929054230.68120-1-enjuk@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: rtnetlink: fix typo in rtnl_unregister_all() comment
Alok Tiwari [Mon, 29 Sep 2025 08:54:12 +0000 (01:54 -0700)]
net: rtnetlink: fix typo in rtnl_unregister_all() comment

Corrected "rtnl_unregster()" -> "rtnl_unregister()" in the
  documentation comment of "rtnl_unregister_all()"

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250929085418.49200-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoRevert "net: group sk_backlog and sk_receive_queue"
Eric Dumazet [Mon, 29 Sep 2025 18:21:12 +0000 (18:21 +0000)]
Revert "net: group sk_backlog and sk_receive_queue"

This reverts commit 4effb335b5dab08cb6e2c38d038910f8b527cfc9.

This was a benefit for UDP flood case, which was later greatly improved
with commits 6471658dc66c ("udp: use skb_attempt_defer_free()")
and b650bf0977d3 ("udp: remove busylock and add per NUMA queues").

Apparently blamed commit added a regression for RAW sockets, possibly
because they do not use the dual RX queue strategy that UDP has.

sock_queue_rcv_skb_reason() and RAW recvmsg() compete for sk_receive_buf
and sk_rmem_alloc changes, and them being in the same
cache line reduce performance.

Fixes: 4effb335b5da ("net: group sk_backlog and sk_receive_queue")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202509281326.f605b4eb-lkp@intel.com
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: David Ahern <dsahern@kernel.org>
Cc: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20250929182112.824154-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: Convert open-coded register polling to helper macro
Furong Xu [Sat, 27 Sep 2025 08:10:36 +0000 (16:10 +0800)]
net: stmmac: Convert open-coded register polling to helper macro

Drop the open-coded register polling routines.
Use readl_poll_timeout_atomic() in atomic state.

Also adjust the delay time to 10us which seems more reasonable.

Tested on NXP i.MX8MP and ROCKCHIP RK3588 boards,
the break condition was met right after the first polling,
no delay involved at all.
So the 10us delay should be long enough for most cases.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Furong Xu <0x1207@gmail.com>
Link: https://patch.msgid.link/20250927081036.10611-1-0x1207@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'mptcp-receive-path-improvement'
Jakub Kicinski [Tue, 30 Sep 2025 01:23:37 +0000 (18:23 -0700)]
Merge branch 'mptcp-receive-path-improvement'

Matthieu Baerts says:

====================
mptcp: receive path improvement

This series includes several changes to the MPTCP RX path. The main
goals are improving the RX performances, and increase the long term
maintainability.

Some changes reflects recent(ish) improvements introduced in the TCP
stack: patch 1, 2 and 3 are the MPTCP counter part of SKB deferral free
and auto-tuning improvements. Note that patch 3 could possibly fix
additional issues, and overall such patch should protect from similar
issues to arise in the future.

Patches 4-7 are aimed at introducing the socket backlog usage which will
be done in a later series to process the packets received by the
different subflows while the msk socket is owned.

Patch 8 is not related to the RX path, but it contains additional tests
for new features recently introduced in net-next.
====================

Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-0-5da266aa9c1a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoselftests: mptcp: join: validate new laminar endp
Matthieu Baerts (NGI0) [Sat, 27 Sep 2025 09:40:44 +0000 (11:40 +0200)]
selftests: mptcp: join: validate new laminar endp

Here are a few sub-tests for mptcp_join.sh, validating the new 'laminar'
endpoint type.

In a setup where subflows created using the routing rules would be
rejected by the listener, and where the latter announces one IP address,
some cases are verified:

- Without any 'laminar' endpoints: no new subflows are created.

- With one 'laminar' endpoint: a second subflow is created.

- With multiple 'laminar' endpoints: 2 IPv4 subflows are created.

- With one 'laminar' endpoint, but the server announcing a second IP
  address, only one subflow is created.

- With one 'laminar' + 'subflow' endpoint, the same endpoint is only
  used once.

Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-8-5da266aa9c1a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agomptcp: minor move_skbs_to_msk() cleanup
Paolo Abeni [Sat, 27 Sep 2025 09:40:43 +0000 (11:40 +0200)]
mptcp: minor move_skbs_to_msk() cleanup

Such function is called only by __mptcp_data_ready(), which in turn
is always invoked when msk is not owned by the user: we can drop the
redundant, related check.

Additionally mptcp needs to propagate the socket error only for
current subflow.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-7-5da266aa9c1a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agomptcp: factor out a basic skb coalesce helper
Paolo Abeni [Sat, 27 Sep 2025 09:40:42 +0000 (11:40 +0200)]
mptcp: factor out a basic skb coalesce helper

The upcoming patch will introduced backlog processing for MPTCP
socket, and we want to leverage coalescing in such data path.

Factor out the relevant bits not touching memory accounting to
deal with such use-case.

Co-developed-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-6-5da266aa9c1a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agomptcp: remove unneeded mptcp_move_skb()
Paolo Abeni [Sat, 27 Sep 2025 09:40:41 +0000 (11:40 +0200)]
mptcp: remove unneeded mptcp_move_skb()

Since commit b7535cfed223 ("mptcp: drop legacy code around RX EOF"),
sk_shutdown can't change during the main recvmsg loop, we can drop
the related race breaker.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-5-5da266aa9c1a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agomptcp: introduce the mptcp_init_skb helper
Paolo Abeni [Sat, 27 Sep 2025 09:40:40 +0000 (11:40 +0200)]
mptcp: introduce the mptcp_init_skb helper

Factor out all the skb initialization step in a new helper and
use it. Note that this change moves the MPTCP CB initialization
earlier: we can do such step as soon as the skb leaves the
subflow socket receive queues.

Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-4-5da266aa9c1a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agomptcp: rcvbuf auto-tuning improvement
Paolo Abeni [Sat, 27 Sep 2025 09:40:39 +0000 (11:40 +0200)]
mptcp: rcvbuf auto-tuning improvement

Apply to the MPTCP auto-tuning the same improvements introduced for the
TCP protocol by the merge commit 2da35e4b4df9 ("Merge branch
'tcp-receive-side-improvements'").

The main difference is that TCP subflow and the main MPTCP socket need
to account separately for OoO: MPTCP does not care for TCP-level OoO
and vice versa, as a consequence do not reflect MPTCP-level rcvbuf
increase due to OoO packets at the subflow level.

This refeactor additionally allow dropping the msk receive buffer update
at receive time, as the latter only intended to cope with subflow receive
buffer increase due to OoO packets.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/487
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/559
Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-3-5da266aa9c1a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agotcp: make tcp_rcvbuf_grow() accessible to mptcp code
Paolo Abeni [Sat, 27 Sep 2025 09:40:38 +0000 (11:40 +0200)]
tcp: make tcp_rcvbuf_grow() accessible to mptcp code

To leverage the auto-tuning improvements brought by commit 2da35e4b4df9
("Merge branch 'tcp-receive-side-improvements'"), the MPTCP stack need
to access the mentioned helper.

Acked-by: Geliang Tang <geliang@kernel.org>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-2-5da266aa9c1a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agomptcp: leverage skb deferral free
Paolo Abeni [Sat, 27 Sep 2025 09:40:37 +0000 (11:40 +0200)]
mptcp: leverage skb deferral free

Usage of the skb deferral API is straight-forward; with multiple
subflows actives this allow moving part of the received application
load into multiple CPUs.

Also fix a typo in the related comment.

Reviewed-by: Geliang Tang <geliang@kernel.org>
Tested-by: Geliang Tang <geliang@kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-1-5da266aa9c1a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agotcp: use skb->len instead of skb->truesize in tcp_can_ingest()
Eric Dumazet [Sat, 27 Sep 2025 09:28:27 +0000 (09:28 +0000)]
tcp: use skb->len instead of skb->truesize in tcp_can_ingest()

Some applications are stuck to the 20th century and still use
small SO_RCVBUF values.

After the blamed commit, we can drop packets especially
when using LRO/hw-gro enabled NIC and small MSS (1500) values.

LRO/hw-gro NIC pack multiple segments into pages, allowing
tp->scaling_ratio to be set to a high value.

Whenever the receive queue gets full, we can receive a small packet
filling RWIN, but with a high skb->truesize, because most NIC use 4K page
plus sk_buff metadata even when receiving less than 1500 bytes of payload.

Even if we refine how tp->scaling_ratio is estimated,
we could have an issue at the start of the flow, because
the first round of packets (IW10) will be sent based on
the initial tp->scaling_ratio (1/2)

Relax tcp_can_ingest() to use skb->len instead of skb->truesize,
allowing the peer to use final RWIN, assuming a 'perfect'
scaling_ratio of 1.

Fixes: 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250927092827.2707901-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge tag 'for-net-next-2025-09-27' of git://git.kernel.org/pub/scm/linux/kernel...
Jakub Kicinski [Tue, 30 Sep 2025 01:13:51 +0000 (18:13 -0700)]
Merge tag 'for-net-next-2025-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Luiz Augusto von Dentz says:

====================
bluetooth-next pull request for net-next:

core:

 - MAINTAINERS: add a sub-entry for the Qualcomm bluetooth driver
 - Avoid a couple dozen -Wflex-array-member-not-at-end warnings
 - bcsp: receive data only if registered
 - HCI: Fix using LE/ACL buffers for ISO packets
 - hci_core: Detect if an ISO link has stalled
 - ISO: Don't initiate CIS connections if there are no buffers
 - ISO: Use sk_sndtimeo as conn_timeout

drivers:

 - btusb: Check for unexpected bytes when defragmenting HCI frames
 - btusb: Add new VID/PID 13d3/3627 for MT7925
 - btusb: Add new VID/PID 13d3/3633 for MT7922
 - btusb: Add USB ID 2001:332a for D-Link AX9U rev. A1
 - btintel: Add support for BlazarIW core
 - btintel_pcie: Add support for _suspend() / _resume()
 - btintel_pcie: Define hdev->wakeup() callback
 - btintel_pcie: Add Bluetooth core/platform as comments
 - btintel_pcie: Add id of Scorpious, Panther Lake-H484
 - btintel_pcie: Refactor Device Coredump

* tag 'for-net-next-2025-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (30 commits)
  Bluetooth: Avoid a couple dozen -Wflex-array-member-not-at-end warnings
  Bluetooth: hci_sync: Fix using random address for BIG/PA advertisements
  Bluetooth: ISO: don't leak skb in ISO_CONT RX
  Bluetooth: ISO: free rx_skb if not consumed
  Bluetooth: ISO: Fix possible UAF on iso_conn_free
  Bluetooth: SCO: Fix UAF on sco_conn_free
  Bluetooth: bcsp: receive data only if registered
  Bluetooth: btusb: Add new VID/PID 13d3/3633 for MT7922
  Bluetooth: btusb: Add new VID/PID 13d3/3627 for MT7925
  Bluetooth: remove duplicate h4_recv_buf() in header
  Bluetooth: btusb: Check for unexpected bytes when defragmenting HCI frames
  Bluetooth: hci_core: Print information of hcon on hci_low_sent
  Bluetooth: hci_core: Print number of packets in conn->data_q
  Bluetooth: Add function and line information to bt_dbg
  Bluetooth: MGMT: Fix not exposing debug UUID on MGMT_OP_READ_EXP_FEATURES_INFO
  Bluetooth: hci_core: Detect if an ISO link has stalled
  Bluetooth: ISO: Use sk_sndtimeo as conn_timeout
  Bluetooth: HCI: Fix using LE/ACL buffers for ISO packets
  Bluetooth: ISO: Don't initiate CIS connections if there are no buffers
  MAINTAINERS: add a sub-entry for the Qualcomm bluetooth driver
  ...
====================

Link: https://patch.msgid.link/20250927154616.1032839-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoptr_ring: __ptr_ring_zero_tail micro optimization
Michael S. Tsirkin [Sat, 27 Sep 2025 12:29:35 +0000 (08:29 -0400)]
ptr_ring: __ptr_ring_zero_tail micro optimization

__ptr_ring_zero_tail currently does the - 1 operation twice:
- during initialization of head
- at each loop iteration

Let's just do it in one place, all we need to do
is adjust the loop condition. this is better:
- a slightly clearer logic with less duplication
- uses prefix -- we don't need to save the old value
- one less - 1 operation - for example, when ring is empty
  we now don't do - 1 at all, existing code does it once

Text size shrinks from 15081 to 15050 bytes.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/bcd630c7edc628e20d4f8e037341f26c90ab4365.1758976026.git.mst@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'net-wangxun-support-to-configure-rss'
Jakub Kicinski [Tue, 30 Sep 2025 01:11:18 +0000 (18:11 -0700)]
Merge branch 'net-wangxun-support-to-configure-rss'

Jiawen Wu says:

====================
net: wangxun: support to configure RSS

Implement ethtool ops for RSS configuration, and support multiple RSS
for multiple pools.
====================

Link: https://patch.msgid.link/20250926023843.34340-1-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: libwx: restrict change user-set RSS configuration
Jiawen Wu [Fri, 26 Sep 2025 02:38:43 +0000 (10:38 +0800)]
net: libwx: restrict change user-set RSS configuration

Enable/disable SR-IOV will change the number of rings, thereby changing
the RSS configuration that the user has set.

So reject these attempts if netif_is_rxfh_configured() returns true. And
remind the user to reset the RSS configuration.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20250926023843.34340-5-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: wangxun: add RSS reta and rxfh fields support
Jiawen Wu [Fri, 26 Sep 2025 02:38:42 +0000 (10:38 +0800)]
net: wangxun: add RSS reta and rxfh fields support

Add ethtool ops for Rx flow hashing, query and set RSS indirection table
and hash key. Disable UDP RSS by default, and support to configure L4
header fields with TCP/UDP/SCTP for flow hasing.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20250926023843.34340-4-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: libwx: move rss_field to struct wx
Jiawen Wu [Fri, 26 Sep 2025 02:38:41 +0000 (10:38 +0800)]
net: libwx: move rss_field to struct wx

For global RSS and multiple RSS scheme, the RSS type fields are defined
identically in the registers. So they can be defined as the macros
WX_RSS_FIELD_* to cleanup the codes. And to prepare for the RXFH support
in the next patch, move the rss_field to struct wx.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20250926023843.34340-3-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: libwx: support separate RSS configuration for every pool
Jiawen Wu [Fri, 26 Sep 2025 02:38:40 +0000 (10:38 +0800)]
net: libwx: support separate RSS configuration for every pool

For those devices which support 64 pools, they also support PF and VF
(i.e. different pools) to configure different RSS key and hash table.
Enable multiple RSS, use up to 64 RSS configurations and each pool has a
specific configuration.

Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com>
Link: https://patch.msgid.link/20250926023843.34340-2-jiawenwu@trustnetic.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoidpf: fix mismatched free function for dma_alloc_coherent
Alok Tiwari [Thu, 25 Sep 2025 18:02:10 +0000 (11:02 -0700)]
idpf: fix mismatched free function for dma_alloc_coherent

The mailbox receive path allocates coherent DMA memory with
dma_alloc_coherent(), but frees it with dmam_free_coherent().
This is incorrect since dmam_free_coherent() is only valid for
buffers allocated with dmam_alloc_coherent().

Fix the mismatch by using dma_free_coherent() instead of
dmam_free_coherent

Fixes: e54232da1238 ("idpf: refactor idpf_recv_mb_msg")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Link: https://patch.msgid.link/20250925180212.415093-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: remove one stac/clac pair from move_addr_to_user()
Eric Dumazet [Thu, 25 Sep 2025 23:09:29 +0000 (23:09 +0000)]
net: remove one stac/clac pair from move_addr_to_user()

Convert the get_user() and __put_user() code to the
fast masked_user_access_begin()/unsafe_{get|put}_user()
variant.

This patch increases the performance of an UDP recvfrom()
receiver (netserver) on 120 bytes messages by 7 %
on an AMD EPYC 7B12 64-Core Processor platform.

Presence of audit_sockaddr() makes difficult
to avoid the stac/clac pair in the copy_to_user() call,
this is left for a future patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250925230929.3727873-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoscm: use masked_user_access_begin() in put_cmsg()
Eric Dumazet [Thu, 25 Sep 2025 22:49:14 +0000 (22:49 +0000)]
scm: use masked_user_access_begin() in put_cmsg()

Use the greatest and latest uaccess construct to get an optimal code.

Before :

lea    (%r9,%rcx,1),%r10
movabs $<USER_PTR_MAX>,%r11
mov    $0xfffffff2,%eax
cmp    %rcx,%r10
jb     ffffffff81cdc312 <put_cmsg+0x152>
cmp    %r11,%r10
ja     ffffffff81cdc312 <put_cmsg+0x152>
stac
lfence
mov    %r9,(%rcx)

After:

movabs $<USER_PTR_MAX>,%r9
cmp    %r9,%rax
cmova  %r9,%rax
stac
mov    %rcx,(%rax)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250925224914.3590290-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoMerge branch 'net-stmmac-drop-frames-causing-hlbs-error'
Jakub Kicinski [Tue, 30 Sep 2025 00:49:35 +0000 (17:49 -0700)]
Merge branch 'net-stmmac-drop-frames-causing-hlbs-error'

Rohan G Thomas says:

====================
net: stmmac: Drop frames causing HLBS error

This patchset consists of following patchset to avoid netdev watchdog
reset due to Head-of-Line Blocking due to EST scheduling error.
 1. Drop those frames causing HLBS error
 2. Add HLBS frame drops to taprio stats

v2: https://lore.kernel.org/r/20250915-hlbs_2-v2-1-27266b2afdd9@altera.com
====================

Link: https://patch.msgid.link/20250925-hlbs_2-v3-0-3b39472776c2@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: tc: Add HLBS drop count to taprio stats
Rohan G Thomas [Thu, 25 Sep 2025 14:06:14 +0000 (22:06 +0800)]
net: stmmac: tc: Add HLBS drop count to taprio stats

Add the count of the frames dropped by Head-Of-Line Blocking due to
Scheduling(HLBS) error to taprio window drop count stats.

Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Reviewed-by: Furong Xu <0x1207@gmail.com>
Link: https://patch.msgid.link/20250925-hlbs_2-v3-2-3b39472776c2@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: stmmac: est: Drop frames causing HLBS error
Rohan G Thomas [Thu, 25 Sep 2025 14:06:13 +0000 (22:06 +0800)]
net: stmmac: est: Drop frames causing HLBS error

Drop those frames causing Head-of-Line Blocking due to Scheduling
(HLBS) error to avoid HLBS interrupt flooding and netdev watchdog
timeouts due to blocked packets. Tx queues can be configured to drop
those blocked packets by setting Drop Frames causing Scheduling Error
(DFBS) bit of EST_CONTROL register.

Also, add per queue HLBS drop count.

Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Reviewed-by: Furong Xu <0x1207@gmail.com>
Link: https://patch.msgid.link/20250925-hlbs_2-v3-1-3b39472776c2@altera.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agoixgbe: fix typos and docstring inconsistencies
Alok Tiwari [Mon, 29 Sep 2025 12:44:01 +0000 (05:44 -0700)]
ixgbe: fix typos and docstring inconsistencies

Corrected function and variable name typos in comments and docstrings:
 ixgbe_write_ee_hostif_X550 -> ixgbe_write_ee_hostif_data_X550
 ixgbe_get_lcd_x550em -> ixgbe_get_lcd_t_x550em
 "Determime" -> "Determine"
 "point to hardware structure" -> "pointer to hardware structure"
 "To turn on the LED" -> "To turn off the LED"

These changes improve readability, consistency.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250929124427.79219-1-alok.a.tiwari@oracle.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agodocs: networking: phy: clarify abbreviation "PAL"
Markus Heidelberg [Fri, 26 Sep 2025 13:15:20 +0000 (15:15 +0200)]
docs: networking: phy: clarify abbreviation "PAL"

It is suddenly used in the text without introduction, so the meaning
might have been unclear to readers.

Signed-off-by: Markus Heidelberg <m.heidelberg@cab.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://patch.msgid.link/20250926131520.222346-1-m.heidelberg@cab.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 months agonet: ethtool: remove duplicated mm.o from Makefile
Markus Heidelberg [Fri, 26 Sep 2025 13:13:23 +0000 (15:13 +0200)]
net: ethtool: remove duplicated mm.o from Makefile

Fixes: 2b30f8291a30 ("net: ethtool: add support for MAC Merge layer")
Signed-off-by: Markus Heidelberg <m.heidelberg@cab.de>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20250926131323.222192-1-m.heidelberg@cab.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>