Gentwo Git Trees - linux/.git/log

wireguard: netlink: generate netlink code

This patch adopts netlink policies and command definitions
generated by ynl-gen, thus completing the conversion to YNL.

Given that the old and new policies are functionally identical
and have just been moved to a new file, it serves to verify
that the policies generated from the spec are identical to the
previous policy code.

The following functions are renamed:
wg_get_device_dump() -> wg_get_device_dumpit()
wg_set_device() -> wg_set_device_doit()

The new files are covered by the existing drivers/net/wireguard/
pattern in MAINTAINERS.

No behavioural changes intended.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

wireguard: uapi: generate header with ynl-gen

Use ynl-gen to generate the UAPI header for WireGuard.

The cosmetic changes in this patch confirms that the spec is aligned
with the implementation. By using the generated version, it ensures
that they stay in sync.

Changes in the generated header:
* Trivial header guard rename.
* Trivial white space changes.
* Trivial comment changes.
* Precompute bitflags in ynl-gen (see [1]).
* Drop __*_F_ALL constants (see [1]).

[1] https://lore.kernel.org/r/20251014123201.6ecfd146@kernel.org/

No behavioural changes intended.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

wireguard: uapi: move flag enums

Move the wg*_flag enums, so they are defined above the attribute set
enums, where ynl-gen would place them.

This is an incremental step towards adopting an UAPI header generated
by ynl-gen. This is split out to keep the patches readable.

This is a trivial patch with no behavioural changes intended.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

wireguard: uapi: move enum wg_cmd

This patch moves enum wg_cmd to the end of the file, where ynl-gen
would generate it.

This is an incremental step towards adopting an UAPI header generated
by ynl-gen. This is split out to keep the patches readable.

This is a trivial patch with no behavioural changes intended.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

wireguard: netlink: add YNL specification

This patch adds a near[1] complete YNL specification for WireGuard,
documenting the protocol in a machine-readable format, rather than
comments in wireguard.h, and eases usage from C and non-C programming
languages alike.

The generated C library will be featured in a later patch, so in
this patch I will use the in-kernel python client for examples.

This makes the documentation in the UAPI header redundant, it is
therefore removed. The in-line documentation in the spec is based
on the existing comment in wireguard.h, and once released it will
be available in the kernel documentation at:
  https://docs.kernel.org/netlink/specs/wireguard.html
  (until then run: make htmldocs)

Generate wireguard.rst from this spec:
$ make -C tools/net/ynl/generated/ wireguard.rst

Query wireguard interface through pyynl:
$ sudo ./tools/net/ynl/pyynl/cli.py --family wireguard \
                                    --dump get-device \
                                    --json '{"ifindex":3}'
[{'fwmark': 0,
  'ifindex': 3,
  'ifname': 'wg-test',
  'listen-port': 54318,
  'peers': [{0: {'allowedips': [{0: {'cidr-mask': 0,
                                     'family': 2,
                                     'ipaddr': '0.0.0.0'}},
                                {0: {'cidr-mask': 0,
                                     'family': 10,
                                     'ipaddr': '::'}}],
                 'endpoint': b'[...]',
                 'last-handshake-time': {'nsec': 42, 'sec': 42},
                 'persistent-keepalive-interval': 42,
                 'preshared-key': '[...]',
                 'protocol-version': 1,
                 'public-key': '[...]',
                 'rx-bytes': 42,
                 'tx-bytes': 42}}],
  'private-key': '[...]',
  'public-key': '[...]'}]

Add another allowed IP prefix:
$ sudo ./tools/net/ynl/pyynl/cli.py --family wireguard \
  --do set-device --json '{"ifindex":3,"peers":[
    {"public-key":"6a df b1 83 a4 ..","allowedips":[
      {"cidr-mask":0,"family":10,"ipaddr":"::"}]}]}'

[1] As can be seen above, the "endpoint" is only dumped as binary data,
    as it can't be fully described in YNL. It's either a struct
    sockaddr_in or struct sockaddr_in6 depending on the attribute length.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

wireguard: netlink: lower .maxattr for WG_CMD_GET_DEVICE

Previously .maxattr was shared for both WG_CMD_GET_DEVICE and
WG_CMD_SET_DEVICE. Now that it is split, then we can lower it
for WG_CMD_GET_DEVICE to follow the documentation which defines
.maxattr as WGDEVICE_A_IFNAME for WG_CMD_GET_DEVICE.

$ grep -hC5 'one but not both of:' include/uapi/linux/wireguard.h
* WG_CMD_GET_DEVICE
* -----------------
*
* May only be called via NLM_F_REQUEST | NLM_F_DUMP. The command
* should contain one but not both of:
*
* WGDEVICE_A_IFINDEX: NLA_U32
* WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMSIZ - 1
*
* The kernel will then return several messages [...]

While other attributes weren't rejected previously, the consensus
is that nobody sends those attributes, so nothing should break.

Link: https://lore.kernel.org/r/aRyLoy2iqbkUipZW@zx2c4.com/
Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

wireguard: netlink: convert to split ops

This patch converts WireGuard from using the legacy struct genl_ops
to struct genl_split_ops, by applying the same transformation as
genl_cmd_full_to_split() would otherwise do at runtime.

WGDEVICE_A_MAX is swapped for WGDEVICE_A_PEERS, while they are
currently equivalent, then .maxattr should be the maximum attribute
that a given command supports, and not change along with WGDEVICE_A_MAX.

This is an incremental step towards adopting netlink policy code
generated by ynl-gen, ensuring that the code and spec is aligned.

This is a trivial patch with no behavioural changes intended.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

wireguard: netlink: use WG_KEY_LEN in policies

When converting the netlink policies to YNL, the constants used
in the policy have to be visible to userspace.

As NOISE_*_KEY_LEN isn't visible to userspace, change the policy
to use WG_KEY_LEN, as also documented in the UAPI header:

$ grep WG_KEY_LEN include/uapi/linux/wireguard.h
*    WGDEVICE_A_PRIVATE_KEY: NLA_EXACT_LEN, len WG_KEY_LEN
*    WGDEVICE_A_PUBLIC_KEY: NLA_EXACT_LEN, len WG_KEY_LEN
*            WGPEER_A_PUBLIC_KEY: NLA_EXACT_LEN, len WG_KEY_LEN
*            WGPEER_A_PRESHARED_KEY: NLA_EXACT_LEN, len WG_KEY_LEN
[...]

Add a couple of BUILD_BUG_ON() to ensure that they stay in sync.

No behavioural changes intended.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

wireguard: netlink: validate nested arrays in policy

Use NLA_POLICY_NESTED_ARRAY() to perform nested array validation
in the policy validation step.

The nested policy was already enforced through nla_parse_nested(),
however extack wasn't passed previously, so no fancy error messages.

With the nested attributes being validated directly in the policy, the
policy argument can be set to NULL in the calls to nla_parse_nested().

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

wireguard: netlink: enable strict genetlink validation

WireGuard is a modern enough genetlink family, that it doesn't need
resv_start_op. It already had policies in place when it was first
merged, it has also never used the reserved field, or other things
toggled by resv_start_op.

wireguard-tools have always used zero initialized memory, and have never
touched the reserved field, neither have any other clients I have
checked. Closed-source clients are much more likely to use the
embeddedable library from wireguard-tools, than a DIY implementation
using uninitialized memory.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Merge tag 'linux-can-next-for-6.19-20251129' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2025-11-29

The patch is by Oliver Hartkopp and fixes the compilation of the
CAN_RAW protocol if the CAN driver infrastructure is not enabled.

This problem was introduced in the current development cycle of
net-next.

* tag 'linux-can-next-for-6.19-20251129' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: Kconfig: select CAN driver infrastructure by default
====================

Link: https://patch.msgid.link/20251129125036.467177-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

can: Kconfig: select CAN driver infrastructure by default

The CAN bus support enabled with CONFIG_CAN provides a socket-based
access to CAN interfaces. With the introduction of the latest CAN protocol
CAN XL additional configuration status information needs to be exposed to
the network layer than formerly provided by standard Linux network drivers.

This requires the CAN driver infrastructure to be selected by default.
As the CAN network layer can only operate on CAN interfaces anyway all
distributions and common default configs enable at least one CAN driver.

So selecting CONFIG_CAN_DEV when CONFIG_CAN is selected by the user has
no effect on established configurations but solves potential build issues
when CONFIG_CAN[_XXX]=y is set together with CANFIG_CAN_DEV=m

Fixes: 1a620a723853 ("can: raw: instantly reject unsupported CAN frames")
Reported-by: Vincent Mailhol <mailhol@kernel.org>
Closes: https://lore.kernel.org/all/CAMZ6RqL_nGszwoLPXn1Li8op-ox4k3Hs6p=Hw6+w0W=DTtobPw@mail.gmail.com/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511280531.YnWW2Rxc-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202511280842.djCQ0N0O-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202511282325.uVQFRTkA-lkp@intel.com/
Closes: https://lore.kernel.org/oe-kbuild-all/202511291520.guIE1QHj-lkp@intel.com/
Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://patch.msgid.link/20251129090500.17484-1-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

net: ipconfig: Replace strncpy with strscpy in ic_proto_name

strncpy() is deprecated [1] for NUL-terminated destination buffers
because it does not guarantee NUL termination. Replace it with strscpy()
to ensure the destination buffer is always NUL-terminated and to avoid
any additional NUL padding.

Although the identifier buffer has 252 usable bytes, strncpy() copied
only up to 251 bytes to the zero-initialized buffer, relying on the last
byte to act as an implicit NUL terminator. Switching to strscpy() avoids
this implicit behavior and does not use magic numbers.

The source string is also NUL-terminated and satisfies the
__must_be_cstr() requirement of strscpy().

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://patch.msgid.link/20251126220804.102160-2-thorsten.blum@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'nf-next-25-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following batch contains Netfilter updates for net-next:

0) Add sanity check for maximum encapsulations in bridge vlan,
   reported by the new AI robot.

1) Move the flowtable path discovery code to its own file, the
   nft_flow_offload.c mixes the nf_tables evaluation with the path
   discovery logic, just split this in two for clarity.

2) Consolidate flowtable xmit path by using dev_queue_xmit() and the
   real device behind the layer 2 vlan/pppoe device. This allows to
   inline encapsulation. After this update, hw_ifidx can be removed
   since both ifidx and hw_ifidx now point to the same device.

3) Support for IPIP encapsulation in the flowtable, extend selftest
   to cover for this new layer 3 offload, from Lorenzo Bianconi.

4) Push down the skb into the conncount API to fix duplicates in the
   conncount list for packets with non-confirmed conntrack entries,
   this is due to an optimization introduced in d265929930e2
   ("netfilter: nf_conncount: reduce unnecessary GC").
   From Fernando Fernandez Mancera.

5) In conncount, disable BH when performing garbage collection
   to consolidate existing behaviour in the conncount API, also
   from Fernando.

6) A matching packet with a confirmed conntrack invokes GC if
   conncount reaches the limit in an attempt to release slots.
   This allows the existing extensions to be used for real conntrack
   counting, not just limiting new connections, from Fernando.

7) Support for updating ct count objects in nf_tables, from Fernando.

8) Extend nft_flowtables.sh selftest to send IPv6 TCP traffic,
   from Lorenzo Bianconi.

9) Fixes for UAPI kernel-doc documentation, from Randy Dunlap.

* tag 'nf-next-25-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: nf_tables: improve UAPI kernel-doc comments
  netfilter: ip6t_srh: fix UAPI kernel-doc comments format
  selftests: netfilter: nft_flowtable.sh: Add the capability to send IPv6 TCP traffic
  netfilter: nft_connlimit: add support to object update operation
  netfilter: nft_connlimit: update the count if add was skipped
  netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH
  netfilter: nf_conncount: rework API to use sk_buff directly
  selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest
  netfilter: flowtable: Add IPIP tx sw acceleration
  netfilter: flowtable: Add IPIP rx sw acceleration
  netfilter: flowtable: use tuple address to calculate next hop
  netfilter: flowtable: remove hw_ifidx
  netfilter: flowtable: inline pppoe encapsulation in xmit path
  netfilter: flowtable: inline vlan encapsulation in xmit path
  netfilter: flowtable: consolidate xmit path
  netfilter: flowtable: move path discovery infrastructure to its own file
  netfilter: flowtable: check for maximum number of encapsulations in bridge vlan
====================

Link: https://patch.msgid.link/20251128002345.29378-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'introduce-the-dsa_xmit_port_mask-tagging-protocol-helper'

Vladimir Oltean says:

====================
Introduce the dsa_xmit_port_mask() tagging protocol helper

What
----

Some DSA tags have just the port number in the TX header format, others
have a bit field where in theory, multiple bits can be set, even though
DSA only sets one.

The latter kind is now making use of a dsa_xmit_port_mask() helper,
which will decide when to set more than 1 bit in that mask.

Why
---

David Yang has pointed out in a recently posted patch that HSR packet
duplication on transmission can be offloaded even on HSR-unaware
switches. This should be made generally available to all DSA switches.

How to test
-----------

These patches just lay the groundwork, and there should be no functional
change - so for this set, regression testing is all that's necessary.

For testing the HSR packet duplication idea, I've put together a branch:
https://github.com/vladimiroltean/linux/commits/dsa-simple-hsr-offload/
where most drivers are patched to call dsa_port_simple_hsr_join() and
dsa_port_simple_hsr_leave().

Assuming there are volunteers to also test the latter, one can enable
CONFIG_HSR and create a HSR device using:
$ ip link add name hsr0 type hsr slave1 swp0 slave2 swp1 supervision 45 version 1

This needs to be connected using 2 cables to another system where the
same command was run. Then, one should be able to ping the other board
through the hsr0 interface.

Without the Github branch, a ping over HSR should increase the DSA
conduit interface's TX counters by 2 packets. With the Github branch,
the TX counters should increase by only 1 packet.

Why so many patches
-------------------

To avoid the situation where a patch has to be backported, conflicts
with the work done here, pulls this in as a dependency, and that pulls
in 13 other unrelated drivers. These don't have any dependencies between
each other and can be cherry-picked at will (except they all depend on
patch 1/15).
====================

Link: https://patch.msgid.link/20251127120902.292555-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_yt921x: use the dsa_xmit_port_mask() helper

The "yt921x" tagging protocol populates a bit mask for the TX ports,
so we can use dsa_xmit_port_mask() to centralize the decision of how to
set that field.

Cc: David Yang <mmyangfl@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-16-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_xrs700x: use the dsa_xmit_port_mask() helper

The "xrs700x" is the original DSA tagging protocol with HSR TX
replication support, we now essentially move that logic to the
dsa_xmit_port_mask() helper. The end result is something akin to
hellcreek_xmit() (but reminds me I should also take care of
skb_checksum_help() for tail taggers in the core).

The implementation differences to dsa_xmit_port_mask() are immaterial.

Cc: George McCollister <george.mccollister@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-15-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_trailer: use the dsa_xmit_port_mask() helper

The "trailer" tagging protocol populates a bit mask for the TX ports, so
we can use dsa_xmit_port_mask() to centralize the decision of how to set
that field.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-14-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_rzn1_a5psw: use the dsa_xmit_port_mask() helper

The "a5psw" tagging protocol populates a bit mask for the TX ports,
so we can use dsa_xmit_port_mask() to centralize the decision of how to
set that field.

Cc: "Clément Léger" <clement.leger@bootlin.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-13-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_rtl8_4: use the dsa_xmit_port_mask() helper

The "rtl8_4" and "rtl8_4t" tagging protocols populate a bit mask for the
TX ports, so we can use dsa_xmit_port_mask() to centralize the decision
of how to set that field.

Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: "Alvin Šipraga" <alsi@bang-olufsen.dk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-12-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_rtl4_a: use the dsa_xmit_port_mask() helper

The "rtl4a" tagging protocol populates a bit mask for the TX ports,
so we can use dsa_xmit_port_mask() to centralize the decision of how to
set that field.

Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: "Alvin Šipraga" <alsi@bang-olufsen.dk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-11-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_qca: use the dsa_xmit_port_mask() helper

The "qca" tagging protocol populates a bit mask for the TX ports, so we
can use dsa_xmit_port_mask() to centralize the decision of how to set
that field.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-10-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_ocelot: use the dsa_xmit_port_mask() helper

The "ocelot" and "seville" tagging protocols populate a bit mask for the
TX ports, so we can use dsa_xmit_port_mask() to centralize the decision
of how to set that field.

This protocol used BIT_ULL() rather than simple BIT() to silence Smatch,
as explained in commit 1f778d500df3 ("net: mscc: ocelot: avoid type
promotion when calling ocelot_ifh_set_dest"). I would expect that this
tool no longer complains now, when the BIT(dp->index) is hidden inside
the dsa_xmit_port_mask() function, the return value of which is promoted
to u64.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-9-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_mxl_gsw1xx: use the dsa_xmit_port_mask() helper

The "gsw1xx" tagging protocol populates a bit mask for the TX ports, so
we can use dsa_xmit_port_mask() to centralize the decision of how to set
that field.

Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-8-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_mtk: use the dsa_xmit_port_mask() helper

The "mtk" tagging protocol populates a bit mask for the TX ports, so we
can use dsa_xmit_port_mask() to centralize the decision of how to set
that field.

Cc: Chester A. Unal" <chester.a.unal@arinc9.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-7-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_ksz: use the dsa_xmit_port_mask() helper

The "ksz8795", "ksz9893", "ksz9477" and "lan937x" tagging protocols
populate a bit mask for the TX ports.

Unlike the others, "ksz9477" also accelerates HSR packet duplication.

Make the HSR duplication logic available generically to all 4 taggers by
using the dsa_xmit_port_mask() function to set the TX port mask.

Cc: Woojung Huh <woojung.huh@microchip.com>
Cc: UNGLinuxDriver@microchip.com
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-6-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_hellcreek: use the dsa_xmit_port_mask() helper

The "hellcreek" tagging protocol populates a bit mask for the TX ports,
so we can use dsa_xmit_port_mask() to centralize the decision of how to
set that field.

Cc: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-5-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_gswip: use the dsa_xmit_port_mask() helper

The "gswip" tagging protocol populates a bit mask for the TX ports, so
we can use dsa_xmit_port_mask() to centralize the decision of how to set
that field.

Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Daniel Golle <daniel@makrotopia.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: tag_brcm: use the dsa_xmit_port_mask() helper

The "brcm" and "brcm-prepend" tagging protocols populate a bit mask for
the TX ports, so we can use dsa_xmit_port_mask() to centralize the
decision of how to set that field. The port mask is written u8 by u8,
first the high octet and then the low octet.

Cc: Florian Fainelli <florian.fainelli@broadcom.com>
Cc: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-3-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: introduce the dsa_xmit_port_mask() tagging protocol helper

Many tagging protocols deal with the transmit port mask being a bit
mask, and set it to BIT(dp->index). Not a big deal.

Also, some tagging protocols are written for switches which support HSR
offload (including packet duplication offload), there we see a walk
using dsa_hsr_foreach_port() to find the other port in the same switch
that's member of the HSR, and set that bit in the port mask too.

That isn't sufficiently interesting either, until you come to realize
that there isn't anything special in the second case that switches just
in the first one can't do too.

It just becomes a matter of "is it wise to do it? are sufficient people
using HSR/PRP with generic off-the-shelf switches to justify add an
extra test in the data path?" - the answer to which is probably "it
depends". It isn't _much_ worse to not have HSR offload at all, so as to
make it impractical, esp. with a rich OS like Linux. But the HSR users
are rather specialized in industrial networking.

Anyway, the change acts on the premise that we're going to have support
for this, it should be uniformly implemented for everyone, and that if
we find some sort of balance, we can keep everyone relatively happy.

So I've disabled that logic if CONFIG_HSR isn't enabled, and I've tilted
the branch predictor to say it's unlikely we're transmitting through a
port with this capability currently active. On branch miss, we're still
going to save the transmission of one packet, so there's some remaining
benefit there too. I don't _think_ we need to jump to static keys yet.

The helper returns a 32-bit zero-based unsigned number, that callers
have to transpose using FIELD_PREP(). It is not the first time we assume
DSA switches won't be larger than 32 ports - dsa_user_ports() has that
assumption baked into it too.

One last development note about why pass the "skb" argument when this
isn't used. Looking at the compiled code on arm64, which is identical
both with and without it, the answer is "why not?" - who knows what
other features dependent on the skb may be handled in the future.

Link: https://lore.kernel.org/netdev/20251126093240.2853294-4-mmyangfl@gmail.com/
Cc: "Alvin Šipraga" <alsi@bang-olufsen.dk>
Cc: Chester A. Unal" <chester.a.unal@arinc9.com>
Cc: "Clément Léger" <clement.leger@bootlin.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: David Yang <mmyangfl@gmail.com>
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Florian Fainelli <florian.fainelli@broadcom.com>
Cc: George McCollister <george.mccollister@gmail.com>
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Cc: Jonas Gorski <jonas.gorski@gmail.com>
Cc: Kurt Kanzenbach <kurt@linutronix.de>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: UNGLinuxDriver@microchip.com
Cc: Woojung Huh <woojung.huh@microchip.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://patch.msgid.link/20251127120902.292555-2-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-broadcom-migrate-to-get_rx_ring_count-ethtool-callback'

Breno Leitao says:

====================
net: broadcom: migrate to .get_rx_ring_count() ethtool callback

This series migrates Broadcom ethernet drivers to use the new
.get_rx_ring_count() ethtool callback introduced in commit 84eaf4359c36
("net: ethtool: add get_rx_ring_count callback to optimize RX ring
queries").

This change simplifies the .get_rxnfc() implementation by
extracting the ETHTOOL_GRXRINGS case handling into a dedicated callback,
making the code cleaner and aligning these drivers with the updated
ethtool API.

The series covers two Broadcom drivers: bnxt and bcmgenet. Each patch
removes the ETHTOOL_GRXRINGS case from the driver's .get_rxnfc() switch
statement and implements the new .get_rx_ring_count() callback that
returns the number of RX rings.
====================

Link: https://patch.msgid.link/20251127-grxrings_broadcom-v1-0-b0b182864950@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bcmgenet: extract GRXRINGS from .get_rxnfc

Commit 84eaf4359c36 ("net: ethtool: add get_rx_ring_count callback to
optimize RX ring queries") added specific support for GRXRINGS callback,
simplifying .get_rxnfc.

Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new
.get_rx_ring_count().

This simplifies the RX ring count retrieval and aligns bcmgenet with the
new ethtool API for querying RX ring parameters.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251127-grxrings_broadcom-v1-2-b0b182864950@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: bnxt: extract GRXRINGS from .get_rxnfc

Commit 84eaf4359c36 ("net: ethtool: add get_rx_ring_count callback to
optimize RX ring queries") added specific support for GRXRINGS callback,
simplifying .get_rxnfc.

Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new
.get_rx_ring_count().

This simplifies the RX ring count retrieval and aligns bnxt with the new
ethtool API for querying RX ring parameters.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251127-grxrings_broadcom-v1-1-b0b182864950@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'tools-ynl-add-schema-checking'

Donald Hunter says:

====================
tools: ynl: add schema checking

Add schema checking and yaml linting for the YNL specs.

Patch 1 adds a schema_check make target using a pyynl --validate option
Patch 2 adds a lint make target using yamllint
Patches 3,4 fix issues reported by make -C tools/net/ynl lint schema_check
====================

Link: https://patch.msgid.link/20251127123502.89142-1-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ynl: fix schema check errors

Fix two schema check errors that have lurked since the attribute name
validation was made more strict:

not ok 2 conntrack.yaml schema validation
'labels mask' does not match '^[0-9a-z-]+$'

not ok 13 nftables.yaml schema validation
'set id' does not match '^[0-9a-z-]+$'

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20251127123502.89142-5-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ynl: fix a yamllint warning in ethtool spec

Fix warning reported by yamllint:

../../../Documentation/netlink/specs/ethtool.yaml
1272:21 warning truthy value should be one of [false, true] (truthy)

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20251127123502.89142-4-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: add a lint makefile target

Add a lint target to run yamllint on the YNL specs.

make -C tools/net/ynl lint
make: Entering directory '/home/donaldh/net-next/tools/net/ynl'
yamllint ../../../Documentation/netlink/specs/*.yaml
../../../Documentation/netlink/specs/ethtool.yaml
1272:21 warning truthy value should be one of [false, true] (truthy)

make: Leaving directory '/home/donaldh/net-next/tools/net/ynl'

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20251127123502.89142-3-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

tools: ynl: add schema checking

Add a --validate flag to pyynl for explicit schema check with error
reporting and add a schema_check make target to check all YNL specs.

make -C tools/net/ynl schema_check
make: Entering directory '/home/donaldh/net-next/tools/net/ynl'
ok 1 binder.yaml schema validation
not ok 2 conntrack.yaml schema validation
'labels mask' does not match '^[0-9a-z-]+$'

Failed validating 'pattern' in schema['properties']['attribute-sets']['items']['properties']['attributes']['items']['properties']['name']:
{'type': 'string', 'pattern': '^[0-9a-z-]+$'}

On instance['attribute-sets'][14]['attributes'][22]['name']:
'labels mask'

ok 3 devlink.yaml schema validation
[...]

Signed-off-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20251127123502.89142-2-donald.hunter@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'wireless-next-2025-11-27' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
Apart from the usual small things just driver updates:
- mt76:
   - WED support for >32-bit DMA
   - airoha NPU support
   - regdomain improvements
   - continued WiFi7/MLO work
- rtw89
   - support USB devices RTL8852AU and RTL8852CU
   - initial work for RTL8922DE
   - improved injection support
- rtl8xxxu: 40 MHz connection fixes/support
- brcmfmac: Acer A1 840 tablet quirk

* tag 'wireless-next-2025-11-27' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (152 commits)
  wifi: mac80211: allow sharing identical chanctx for S1G interfaces
  wifi: nl80211: vendor-cmd: intel: fix a blank kernel-doc line warning
  wifi: cfg80211: include s1g_primary_2mhz when comparing chandefs
  wifi: cfg80211: include s1g_primary_2mhz when sending chandef
  wifi: ieee80211: correct FILS status codes
  mt76: mt7615: Fix memory leak in mt7615_mcu_wtbl_sta_add()
  wifi: mt76: mt792x: fix wifi init fail by setting MCU_RUNNING after CLC load
  wifi: mt76: Strip whitespace from build ddate
  wifi: mt76: mt7996: Add missing locking in mt7996_mac_sta_rc_work()
  wifi: mt76: mt7996: skip ieee80211_iter_keys() on scanning link remove
  wifi: mt76: mt7996: skip deflink accounting for offchannel links
  wifi: mt76: Move mt76_abort_scan out of mt76_reset_device()
  wifi: mt76: mt7996: move mt7996_update_beacons under mt76 mutex
  wifi: mt76: mt7996: grab mt76 mutex in mt7996_mac_sta_event()
  wifi: mt76: mt7925: ensure the 6GHz A-MPDU density cap from the hardware.
  wifi: mt76: mt7996: fix EMI rings for RRO
  wifi: mt76: mt7996: fix using wrong phy to start in mt7996_mac_restart()
  wifi: mt76: mt7996: fix MLO set key and group key issues
  wifi: mt76: mt7996: fix MLD group index assignment
  wifi: mt76: mt7996: use correct link_id when filling TXD and TXP
  ...
====================

Link: https://patch.msgid.link/20251127103806.17776-3-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: Remove KMSG_COMPONENT macro

The KMSG_COMPONENT macro is a leftover of the s390 specific "kernel message
catalog" from 2008 [1] which never made it upstream.

The macro was added to s390 code to allow for an out-of-tree patch which
used this to generate unique message ids. Also this out-of-tree patch
doesn't exist anymore.

The pattern of how the KMSG_COMPONENT macro is used can also be found at
some non s390 specific code, for whatever reasons. Besides adding an
indirection it is unused.

Remove the macro in order to get rid of a pointless indirection. Replace
all users with the string it defines. In all cases this leads to a simple
replacement like this:

- #define KMSG_COMPONENT "af_iucv"
- #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
+ #define pr_fmt(fmt) "af_iucv: " fmt

[1] https://lwn.net/Articles/292650/

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Sidraya Jayagond <sidraya@linux.ibm.com>
Link: https://patch.msgid.link/20251126140705.1944278-1-hca@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'bnxt_en-updates-for-net-next'

Michael Chan says:

====================
bnxt_en: Updates for net-next (part)

This series includes an enhnacement to the priority TX counters,
an enhancement to a PHY module error extack message, cleanup of
unneeded MSIX logic in bnxt_ulp.c, adding CQ dump during TX timeout,
LRO/HW_GRO performance improvement by enabling Relaxed Ordering,
and improved SRIOV admin link state support.
====================

Link: https://patch.msgid.link/20251126215648.1885936-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Add Virtual Admin Link State Support for VFs

The firmware can now cache the virtual link admin state (auto/on/off) of
all VFs and as such, the PF driver no longer has to intercept the VF
driver's port_phy_qcfg() call and then provide the link admin state.

If the FW does not have this capability, fall back to the existing
interception method.

The initial default link admin state (auto) is also set initially when
the VFs are created.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Mohammad Shuab Siddique <mohammad-shuab.siddique@broadcom.com>
Signed-off-by: Rob Miller <rmiller@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20251126215648.1885936-7-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Do not set EOP on RX AGG BDs on 5760X chips

With End-of-Packet padding (EOP) set, the chip will disable Relaxed
Ordering (RO) of TPA data packets. A TPA segment with EOP set will be
padded to the next cache boundary and can potentially overwrite the
beginning bytes of the next TPA segment when RO is enabled on 5760X.
To prevent that, the chip disables RO for TPA when EOP is set.

To take advantge of RO and higher performance, do not set EOP on
5760X chips when TPA is enabled. Define a proper RX_BD_FLAGS_AGG_EOP
constant to make it clear that we are setting EOP.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20251126215648.1885936-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Add CQ ring dump to bnxt_dump_cp_sw_state()

On newer chips that use NQs and CQs, add the CQ ring dump to
bnxt_dump_cp_sw_state() to make it more complete.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20251126215648.1885936-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Remove the redundant BNXT_EN_FLAG_MSIX_REQUESTED flag

MSIX is always requested when the RoCE driver calls bnxt_register_dev().
We already check bnxt_ulp_registered(), so checking the flag is
redundant. It was a left-over flag after converting to auxbus, so
remove it.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20251126215648.1885936-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Enhance log message in bnxt_get_module_status()

Rturn early with -EOPNOTSUPP and an extack message if the PHY type is
BaseT since module status is not available for BaseT.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Gautam R A <gautam-r.a@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20251126215648.1885936-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnxt_en: Enhance TX pri counters

The priority packet and byte counters in ethtool -S are returned by
the driver based on the pri2cos mapping.  The assumption is that each
priority is mapped to one and only one hardware CoS queue.  In a
special RoCE configuration, the FW uses combined CoS queue 0 and CoS
queue 1 for the priority mapped to CoS queue 0.  In this special
case, we need to add the CoS queue 0 and CoS queue 1 counters for
the priority packet and byte counters.

Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20251126215648.1885936-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'intel-wired-lan-driver-updates-2025-11-25-ice-idpf-iavf-ixgbe-ixgbevf-e1000e'

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2025-11-25 (ice, idpf, iavf, ixgbe, ixgbevf, e1000e)

Natalia cleans up ixgbevf_q_vector struct removing an unused field.

Emil converts vport state tracking from enum to bitmap and removes
unneeded states for idpf.

Tony removes an unneeded check from e1000e.

Alok Tiwari removes an unnecessary second call to
ixgbe_non_sfp_link_config() and adjusts the checked member, in idpf, to
reflect the member that is later used. He also fixes various typos and
messages for better clarity misc Intel drivers.
====================

Link: https://patch.msgid.link/20251125223632.1857532-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

iavf: clarify VLAN add/delete log messages and lower log level

The current dev_warn messages for too many VLAN changes are confusing
and one place incorrectly references "add" instead of "delete" VLANs
due to copy-paste errors.

- Use dev_info instead of dev_warn to lower the log level.
- Rephrase the message to: "virtchnl: Too many VLAN [add|delete]
([v1|v2]) requests; splitting into multiple messages to PF\n".

Suggested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20251125223632.1857532-12-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ice: fix comment typo and correct module format string

- Fix a typo in the ice_fdir_has_frag() kernel-doc comment ("is" -> "if")

- Correct the NVM erase error message format string from "0x02%x" to
"0x%02x" so the module value is printed correctly.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20251125223632.1857532-11-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

idpf: correct queue index in Rx allocation error messages

The error messages in idpf_rx_desc_alloc_all() used the group index i
when reporting memory allocation failures for individual Rx and Rx buffer
queues. This is incorrect.

Update the messages to use the correct queue index j and include the
queue group index i for clearer identification of the affected Rx and Rx
buffer queues.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20251125223632.1857532-10-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

idpf: use desc_ring when checking completion queue DMA allocation

idpf_compl_queue uses a union for comp, comp_4b, and desc_ring. The
release path should check complq->desc_ring to determine whether the DMA
descriptor ring is allocated. The current check against comp works but is
leftover from a previous commit and is misleading in this context.

Switching the check to desc_ring improves readability and more directly
reflects the intended meaning, since desc_ring is the field representing
the allocated DMA-backed descriptor ring.

No functional change.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20251125223632.1857532-9-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ixgbe: avoid redundant call to ixgbe_non_sfp_link_config()

ixgbe_non_sfp_link_config() is called twice in ixgbe_open()
once to assign its return value to err and again in the
conditional check. This patch uses the stored err value
instead of calling the function a second time. This avoids
redundant work and ensures consistent error reporting.

Also fix a small typo in the ixgbe_remove() comment:
"The could be caused" -> "This could be caused".

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20251125223632.1857532-8-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

e1000e: Remove unneeded checks

The caller, ethtool_set_eeprom(), already performs the same checks so
these are unnecessary in the driver. This reverts commit
90fb7db49c6d ("e1000e: fix heap overflow in e1000_set_eeprom"), however,
corrections for RCT have been kept.

Link: https://lore.kernel.org/all/db92fcc8-114d-4e85-9d15-7860545bc65e@suse.de/
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20251125223632.1857532-7-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

idpf: convert vport state to bitmap

Convert vport state to a bitmap and remove the DOWN state which is
redundant in the existing logic. There are no functional changes aside
from the use of bitwise operations when setting and checking the states.
Removed the double underscore to be consistent with the naming of other
bitmaps in the header and renamed current_state to vport_is_up to match
the meaning of the new variable.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Chittim Madhu <madhu.chittim@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20251125223632.1857532-6-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ixgbevf: ixgbevf_q_vector clean up

Flex array should be at the end of the structure and use [] syntax

Remove unused fields of ixgbevf_q_vector.
They aren't used since busy poll was moved to core code in commit
508aac6dee02 ("ixgbevf: get rid of custom busy polling code").

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Natalia Wochtman <natalia.wochtman@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://patch.msgid.link/20251125223632.1857532-5-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dibs: Remove KMSG_COMPONENT macro

The KMSG_COMPONENT macro is a leftover of the s390 specific "kernel message
catalog" from 2008 [1] which never made it upstream.

The macro was added to s390 code to allow for an out-of-tree patch which
used this to generate unique message ids. Also this out-of-tree doesn't
exist anymore.

The pattern of how the KMSG_COMPONENT is used was partially also used for
non s390 specific code, for whatever reasons.

Remove the macro in order to get rid of a pointless indirection.

[1] https://lwn.net/Articles/292650/

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Alexandra Winter <wintera@linux.ibm.com>
Link: https://patch.msgid.link/20251126142242.2124317-1-hca@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: thunder: convert to use .get_rx_ring_count

Convert the Cavium Thunder NIC VF driver to use the new .get_rx_ring_count
ethtool operation instead of implementing .get_rxnfc solely for handling
ETHTOOL_GRXRINGS command. This simplifies the code by removing the
switch statement and replacing it with a direct return of the queue
count.

The new callback provides the same functionality in a more direct way,
following the ongoing ethtool API modernization.

Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20251126-gxring_cavium-v1-1-a066c0c9e0c6@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: stmmac: fix rx limit check in stmmac_rx_zc()

The extra "count >= limit" check in stmmac_rx_zc() is redundant and
has no effect because the value of "count" doesn't change after the
while condition at this point.

However, it can change after "read_again:" label:

        while (count < limit) {
            ...

            if (count >= limit)
                break;
    read_again:
            ...
            /* XSK pool expects RX frame 1:1 mapped to XSK buffer */
            if (likely(status & rx_not_ls)) {
                xsk_buff_free(buf->xdp);
                buf->xdp = NULL;
                dirty++;
                count++;
                goto read_again;
            }
            ...

This patch addresses the same issue previously resolved in stmmac_rx()
by commit fa02de9e7588 ("net: stmmac: fix rx budget limit check").
The fix is the same: move the check after the label to ensure that it
bounds the goto loop.

Fixes: bba2556efad6 ("net: stmmac: Enable RX via AF_XDP zero-copy")
Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20251126104327.175590-1-aleksei.kodanev@bell-sw.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'net-dsa-yt921x-fix-parsing-mib-attributes'

David Yang says:

====================
net: dsa: yt921x: Fix parsing MIB attributes
====================

Link: https://patch.msgid.link/20251126084024.2843851-1-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: yt921x: Use macros for MIB locations

Extract MIB constants into the header file to improve code style. This
patch will not change the behavior of the function.

Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20251126084024.2843851-3-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dsa: yt921x: Fix parsing MIB attributes

There are hard-to-find unused fields in the MIB table I didn't notice in
the example driver code, causing wrong interpretation of the MIB data.

For some 64-bit attributes, the current (wrong) implementation took the
correct lower 32 bits, but messed up the upper 32 bits, so it would work
accidentally until 32-bit overflows happen. Fix that too.

Fixes: 186623f4aa72 ("net: dsa: yt921x: Add support for Motorcomm YT921x")
Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20251126084024.2843851-2-mmyangfl@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

r8169: add DASH support for RTL8127AP

This adds DASH support for chip RTL8127AP. Its mac version is
RTL_GIGA_MAC_VER_80. DASH is a standard for remote management of network
device, allowing out-of-band control.

Signed-off-by: Javen Xu <javen_xu@realsil.com.cn>
Link: https://patch.msgid.link/20251126055950.2050-1-javen_xu@realsil.com.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

if_ether.h: Clarify ethertype validity for gsw1xx dsa

This 0x88C3 is registered to Infineon Technologies Corporate Research ST
and are used by MaxLinear.
Infineon made a spin off called Lantiq.
Lantiq was acquired by Intel
MaxLinear acquired Intels Connected Home division.

The product FAQ from MaxLinear describes it's history from the F24S.
The driver for the gsw1xx is based on Lantiq showing it's similarities.

Ref https://standards-oui.ieee.org/ethertype/eth.txt

Signed-off-by: Peter Enderborg <Peter.Enderborg@axis.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: wwan: mhi_wwan_mbim: Avoid -Wflex-array-member-not-at-end warning

Use DEFINE_RAW_FLEX() to avoid a -Wflex-array-member-not-at-end warning.

Remove fixed-size array struct usb_cdc_ncm_dpe16 dpe16[2]; from struct
mbim_tx_hdr, so that flex-array member struct mbim_tx_hdr::ndp16.dpe16[]
ends last in this structure.

Compensate for this by using the DEFINE_RAW_FLEX() helper to declare the
on-stack struct instance that contains struct usb_cdc_ncm_ndp16 as a
member. Adjust the rest of the code, accordingly.

So, with these changes fix the following warning:

drivers/net/wwan/mhi_wwan_mbim.c:81:34: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end]

Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

gve: Fix race condition on tx->dropped_pkt update

The tx->dropped_pkt counter is a 64-bit integer that is incremented
directly. On 32-bit architectures, this operation is not atomic and
can lead to read/write tearing if a reader accesses the counter during
the update. This can result in incorrect values being reported for
dropped packets.

To prevent this potential data corruption, wrap the increment
operation with u64_stats_update_begin() and u64_stats_update_end().
This ensures that updates to the 64-bit counter are atomic, even on
32-bit systems, by using a sequence lock.

The u64_stats_sync API requires the writer to have exclusive access,
which is already provided in this context by the network stack's
serialization of the transmit path (net_device_ops::ndo_start_xmit
[1]) for a given queue.

[1]: https://www.kernel.org/doc/Documentation/networking/netdevices.txt

Signed-off-by: Max Yuan <maxyuan@google.com>
Reviewed-by: Jordan Rhee <jordanrhee@google.com>
Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: restore napi_consume_skb()'s NULL-handling

Commit e20dfbad8aab ("net: fix napi_consume_skb() with alien skbs")
added a skb->cpu check to napi_consume_skb(), before the point where
napi_consume_skb() validated skb is not NULL.

Add an explicit check to the early exit condition.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

eth: bnxt: make use of napi_consume_skb()

As those following recent changes from Eric know very well
using NAPI skb cache is crucial to achieve good perf, at
least on recent AMD platforms. Make sure bnxt feeds the skb
cache with Tx skbs.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netmem, devmem, tcp: access pp fields through @desc in net_iov

Convert all the legacy code directly accessing the pp fields in net_iov
to access them through @desc in net_iov.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netfilter: nf_tables: improve UAPI kernel-doc comments

In include/uapi/linux/netfilter/nf_tables.h,
correct the kernel-doc comments for mistyped enum names and enum values to
avoid these kernel-doc warnings and improve the documentation:

nf_tables.h:896: warning: Enum value 'NFT_EXTHDR_OP_TCPOPT' not described
in enum 'nft_exthdr_op'
nf_tables.h:896: warning: Excess enum value 'NFT_EXTHDR_OP_TCP' description
in 'nft_exthdr_op'

nf_tables.h:1210: warning: expecting prototype for enum
nft_flow_attributes. Prototype was for enum nft_offload_attributes instead

nf_tables.h:1428: warning: expecting prototype for enum nft_reject_code.
Prototype was for enum nft_reject_inet_code instead

(add beginning '@' to each enum value description:)
nf_tables.h:1493: warning: Enum value 'NFTA_TPROXY_FAMILY' not described
in enum 'nft_tproxy_attributes'
nf_tables.h:1493: warning: Enum value 'NFTA_TPROXY_REG_ADDR' not described
in enum 'nft_tproxy_attributes'
nf_tables.h:1493: warning: Enum value 'NFTA_TPROXY_REG_PORT' not described
in enum 'nft_tproxy_attributes'

nf_tables.h:1796: warning: expecting prototype for enum
nft_device_attributes. Prototype was for enum
nft_devices_attributes instead

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ip6t_srh: fix UAPI kernel-doc comments format

Fix the kernel-doc format for struct members to be "@member" instead of
"@ member" to avoid kernel-doc warnings.

Warning: ip6t_srh.h:60 struct member 'next_hdr' not described in 'ip6t_srh'
Warning: ip6t_srh.h:60 struct member 'hdr_len' not described in 'ip6t_srh'
Warning: ip6t_srh.h:60 struct member 'segs_left' not described
in 'ip6t_srh'
Warning: ip6t_srh.h:60 struct member 'last_entry' not described
in 'ip6t_srh'
Warning: ip6t_srh.h:60 struct member 'tag' not described in 'ip6t_srh'
Warning: ip6t_srh.h:60 struct member 'mt_flags' not described in 'ip6t_srh'
Warning: ip6t_srh.h:60 struct member 'mt_invflags' not described
in 'ip6t_srh'
Warning: ip6t_srh.h:93 struct member 'next_hdr' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'hdr_len' not described in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'segs_left' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'last_entry' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'tag' not described in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'psid_addr' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'nsid_addr' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'lsid_addr' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'psid_msk' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'nsid_msk' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'lsid_msk' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'mt_flags' not described
in 'ip6t_srh1'
Warning: ip6t_srh.h:93 struct member 'mt_invflags' not described
in 'ip6t_srh1'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

selftests: netfilter: nft_flowtable.sh: Add the capability to send IPv6 TCP traffic

Introduce the capability to send TCP traffic over IPv6 to
nft_flowtable netfilter selftest.

Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_connlimit: add support to object update operation

This is useful to update the limit or flags without clearing the
connections tracked. Use READ_ONCE() on packetpath as it can be modified
on controlplane.

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nft_connlimit: update the count if add was skipped

Connlimit expression can be used for all kind of packets and not only
for packets with connection state new. See this ruleset as example:

table ip filter {
        chain input {
                type filter hook input priority filter; policy accept;
                tcp dport 22 ct count over 4 counter
        }
}

Currently, if the connection count goes over the limit the counter will
count the packets. When a connection is closed, the connection count
won't decrement as it should because it is only updated for new
connections due to an optimization on __nf_conncount_add() that prevents
updating the list if the connection is duplicated.

To solve this problem, check whether the connection was skipped and if
so, update the list. Adjust count_tree() too so the same fix is applied
for xt_connlimit.

Fixes: 976afca1ceba ("netfilter: nf_conncount: Early exit in nf_conncount_lookup() and cleanup")
Closes: https://lore.kernel.org/netfilter/trinity-85c72a88-d762-46c3-be97-36f10e5d9796-1761173693813@3c-app-mailcom-bs12/
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conncount: make nf_conncount_gc_list() to disable BH

For convenience when performing GC over the connection list, make
nf_conncount_gc_list() to disable BH. This unifies the behavior with
nf_conncount_add() and nf_conncount_count().

Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conncount: rework API to use sk_buff directly

When using nf_conncount infrastructure for non-confirmed connections a
duplicated track is possible due to an optimization introduced since
commit d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC").

In order to fix this introduce a new conncount API that receives
directly an sk_buff struct. It fetches the tuple and zone and the
corresponding ct from it. It comes with both existing conncount variants
nf_conncount_count_skb() and nf_conncount_add_skb(). In addition remove
the old API and adjust all the users to use the new one.

This way, for each sk_buff struct it is possible to check if there is a
ct present and already confirmed. If so, skip the add operation.

Fixes: d265929930e2 ("netfilter: nf_conncount: reduce unnecessary GC")
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

selftests: netfilter: nft_flowtable.sh: Add IPIP flowtable selftest

Introduce specific selftest for IPIP flowtable SW acceleration in
nft_flowtable.sh

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: flowtable: Add IPIP tx sw acceleration

Introduce sw acceleration for tx path of IPIP tunnels relying on the
netfilter flowtable infrastructure.
This patch introduces basic infrastructure to accelerate other tunnel
types (e.g. IP6IP6).
IPIP sw tx acceleration can be tested running the following scenario where
the traffic is forwarded between two NICs (eth0 and eth1) and an IPIP
tunnel is used to access a remote site (using eth1 as the underlay device):

ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (192.168.100.2)

$ip addr show
6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.2/24 scope global eth0
       valid_lft forever preferred_lft forever
7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.1/24 scope global eth1
       valid_lft forever preferred_lft forever
8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 192.168.1.1 peer 192.168.1.2
    inet 192.168.100.1/24 scope global tun0
       valid_lft forever preferred_lft forever

$ip route show
default via 192.168.100.2 dev tun0
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.2
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.1
192.168.100.0/24 dev tun0 proto kernel scope link src 192.168.100.1

$nft list ruleset
table inet filter {
        flowtable ft {
                hook ingress priority filter
                devices = { eth0, eth1 }
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
                meta l4proto { tcp, udp } flow add @ft
        }
}

Reproducing the scenario described above using veths I got the following
results:
- TCP stream trasmitted into the IPIP tunnel:
  - net-next: (baseline)                ~ 85Gbps
  - net-next + IPIP flowtable support:  ~102Gbps

Co-developed-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: flowtable: Add IPIP rx sw acceleration

Introduce sw acceleration for rx path of IPIP tunnels relying on the
netfilter flowtable infrastructure. Subsequent patches will add sw
acceleration for IPIP tunnels tx path.
This series introduces basic infrastructure to accelerate other tunnel
types (e.g. IP6IP6).
IPIP rx sw acceleration can be tested running the following scenario where
the traffic is forwarded between two NICs (eth0 and eth1) and an IPIP
tunnel is used to access a remote site (using eth1 as the underlay device):

ETH0 -- TUN0 <==> ETH1 -- [IP network] -- TUN1 (192.168.100.2)

$ip addr show
6: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:00:22:33:11:55 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.2/24 scope global eth0
       valid_lft forever preferred_lft forever
7: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:11:22:33:11:55 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.1/24 scope global eth1
       valid_lft forever preferred_lft forever
8: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 192.168.1.1 peer 192.168.1.2
    inet 192.168.100.1/24 scope global tun0
       valid_lft forever preferred_lft forever

$ip route show
default via 192.168.100.2 dev tun0
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.2
192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.1
192.168.100.0/24 dev tun0 proto kernel scope link src 192.168.100.1

$nft list ruleset
table inet filter {
        flowtable ft {
                hook ingress priority filter
                devices = { eth0, eth1 }
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
                meta l4proto { tcp, udp } flow add @ft
        }
}

Reproducing the scenario described above using veths I got the following
results:
- TCP stream received from the IPIP tunnel:
  - net-next: (baseline) ~ 71Gbps
  - net-next + IPIP flowtbale support: ~101Gbps

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: flowtable: use tuple address to calculate next hop

This simplifies IPIP tunnel support coming in follow up patches.

No function changes are intended.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: flowtable: remove hw_ifidx

hw_ifidx was originally introduced to store the real netdevice as a
requirement for the hardware offload support in:

73f97025a972 ("netfilter: nft_flow_offload: use direct xmit if hardware offload is enabled")

Since ("netfilter: flowtable: consolidate xmit path"), ifidx and
hw_ifidx points to the real device in the xmit path, remove it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: flowtable: inline pppoe encapsulation in xmit path

Push the pppoe header from the flowtable xmit path, inlining is faster
than the original xmit path because it can avoid some locking.

This is based on a patch originally written by wenxu.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: flowtable: inline vlan encapsulation in xmit path

Push the vlan header from the flowtable xmit path, instead of passing
the packet to the vlan device.

This is based on a patch originally written by wenxu.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: flowtable: consolidate xmit path

Use dev_queue_xmit() for the XMIT_NEIGH case. Store the interface index
of the real device behind the vlan/pppoe device, this introduces an
extra lookup for the real device in the xmit path because rt->dst.dev
provides the vlan/pppoe device.

XMIT_NEIGH now looks more similar to XMIT_DIRECT but the check for stale
dst and the neighbour lookup still remain in place which is convenient
to deal with network topology changes.

Note that nft_flow_route() needs to relax the check for _XMIT_NEIGH so
the existing basic xfrm offload (which only works in one direction) does
not break.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: flowtable: move path discovery infrastructure to its own file

This file contains the path discovery that is run from the forward chain
for the packet offloading the flow into the flowtable. This consists
of a series of calls to dev_fill_forward_path() for each device stack.

More topologies may be supported in the future, so move this code to its
own file to separate it from the nftables flow_offload expression.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: flowtable: check for maximum number of encapsulations in bridge vlan

Add a sanity check to skip path discovery if the maximum number of
encapsulation is reached. While at it, check for underflow too.

Fixes: 26267bf9bb57 ("netfilter: flowtable: bridge vlan hardware offload and switchdev")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Conflicts:

net/xdp/xsk.c
  0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number")
  8da7bea7db69 ("xsk: add indirect call for xsk_destruct_skb")
  30ed05adca4a ("xsk: use a smaller new lock for shared pool case")
https://lore.kernel.org/20251127105450.4a1665ec@canb.auug.org.au
https://lore.kernel.org/eb4eee14-7e24-4d1b-b312-e9ea738fefee@kernel.org

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth and CAN. No known outstanding
  regressions.

  Current release - regressions:

   - mptcp: initialize rcv_mss before calling tcp_send_active_reset()

   - eth: mlx5e: fix validation logic in rate limiting

  Previous releases - regressions:

   - xsk: avoid data corruption on cq descriptor number

   - bluetooth:
       - prevent race in socket write iter and sock bind
       - fix not generating mackey and ltk when repairing

   - can:
       - kvaser_usb: fix potential infinite loop in command parsers
       - rcar_canfd: fix CAN-FD mode as default

   - eth:
       - veth: reduce XDP no_direct return section to fix race
       - virtio-net: avoid unnecessary checksum calculation on guest RX

  Previous releases - always broken:

   - sched: fix TCF_LAYER_TRANSPORT handling in tcf_get_base_ptr()

   - bluetooth: mediatek: fix kernel crash when releasing iso interface

   - vhost: rewind next_avail_head while discarding descriptors

   - eth:
       - r8169: fix RTL8127 hang on suspend/shutdown
       - aquantia: add missing descriptor cache invalidation on ATL2

   - dsa: microchip: fix resource releases in error path"

* tag 'net-6.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
  mptcp: Initialise rcv_mss before calling tcp_send_active_reset() in mptcp_do_fastclose().
  net: fec: do not register PPS event for PEROUT
  net: fec: do not allow enabling PPS and PEROUT simultaneously
  net: fec: do not update PEROUT if it is enabled
  net: fec: cancel perout_timer when PEROUT is disabled
  net: mctp: unconditionally set skb->dev on dst output
  net: atlantic: fix fragment overflow handling in RX path
  MAINTAINERS: separate VIRTIO NET DRIVER and add netdev
  virtio-net: avoid unnecessary checksum calculation on guest RX
  eth: fbnic: Fix counter roll-over issue
  mptcp: clear scheduled subflows on retransmit
  net: dsa: sja1105: fix SGMII linking at 10M or 100M but not passing traffic
  s390/net: list Aswin Karuvally as maintainer
  net: wwan: mhi: Keep modem name match with Foxconn T99W640
  vhost: rewind next_avail_head while discarding descriptors
  net/sched: em_canid: fix uninit-value in em_canid_match
  can: rcar_canfd: Fix CAN-FD mode as default
  xsk: avoid data corruption on cq descriptor number
  r8169: fix RTL8127 hang on suspend/shutdown
  net: sxgbe: fix potential NULL dereference in sxgbe_rx()
  ...

Merge tag 'platform-drivers-x86-v6.18-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull platform driver fixes from Ilpo Järvinen:

- arm64/thinkpad-t14s-ec:
    - Fix IRQ race condition
    - Sleep after EC access

- intel/punit_ipc: Fix memory corruption

* tag 'platform-drivers-x86-v6.18-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
  platform/x86: intel: punit_ipc: fix memory corruption
  platform: arm64: thinkpad-t14s-ec: sleep after EC access
  platform: arm64: thinkpad-t14s-ec: fix IRQ race condition

Merge tag 'linux-can-next-for-6.19-20251126' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2025-11-26

this is a pull request of 27 patches for net-next/main.

The first 17 patches are by Vincent Mailhol and Oliver Hartkopp and
add CAN XL support to the CAN netlink interface.

Geert Uytterhoeven and Biju Das provide 7 patches for the rcar_canfd
driver to add suspend/resume support.

The next 2 patches are by Markus Schneider-Pargmann and add them as
the m_can maintainer.

Conor Dooley's patch updates the mpfs-can DT bindungs.

linux-can-next-for-6.19-20251126

* tag 'linux-can-next-for-6.19-20251126' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (27 commits)
  dt-bindings: can: mpfs: document resets
  MAINTAINERS: Simplify m_can section
  MAINTAINERS: Add myself as m_can maintainer
  can: rcar_canfd: Add suspend/resume support
  can: rcar_canfd: Convert to DEFINE_SIMPLE_DEV_PM_OPS()
  can: rcar_canfd: Invert CAN clock and close_candev() order
  can: rcar_canfd: Extract rcar_canfd_global_{,de}init()
  can: rcar_canfd: Use devm_clk_get_optional() for RAM clk
  can: rcar_canfd: Invert global vs. channel teardown
  can: rcar_canfd: Invert reset assert order
  can: dev: print bitrate error with two decimal digits
  can: raw: instantly reject unsupported CAN frames
  can: add dummy_can driver
  can: calc_bittiming: add can_calc_sample_point_pwm()
  can: calc_bittiming: add can_calc_sample_point_nrz()
  can: calc_bittiming: replace misleading "nominal" by "reference"
  can: netlink: add PWM netlink interface
  can: calc_bittiming: add PWM calculation
  can: bittiming: add PWM validation
  can: bittiming: add PWM parameters
  ...
====================

Link: https://patch.msgid.link/20251126120106.154635-1-mkl@pengutronix.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

mptcp: Initialise rcv_mss before calling tcp_send_active_reset() in mptcp_do_fastclose().

syzbot reported divide-by-zero in __tcp_select_window() by
MPTCP socket. [0]

We had a similar issue for the bare TCP and fixed in commit
499350a5a6e7 ("tcp: initialize rcv_mss to TCP_MIN_MSS instead
of 0").

Let's apply the same fix to mptcp_do_fastclose().

[0]:
Oops: divide error: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 6068 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
RIP: 0010:__tcp_select_window+0x824/0x1320 net/ipv4/tcp_output.c:3336
Code: ff ff ff 44 89 f1 d3 e0 89 c1 f7 d1 41 01 cc 41 21 c4 e9 a9 00 00 00 e8 ca 49 01 f8 e9 9c 00 00 00 e8 c0 49 01 f8 44 89 e0 99 <f7> 7c 24 1c 41 29 d4 48 bb 00 00 00 00 00 fc ff df e9 80 00 00 00
RSP: 0018:ffffc90003017640 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88807b469e40
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90003017730 R08: ffff888033268143 R09: 1ffff1100664d028
R10: dffffc0000000000 R11: ffffed100664d029 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 000055557faa0500(0000) GS:ffff888126135000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f64a1912ff8 CR3: 0000000072122000 CR4: 00000000003526f0
Call Trace:
<TASK>
tcp_select_window net/ipv4/tcp_output.c:281 [inline]
__tcp_transmit_skb+0xbc7/0x3aa0 net/ipv4/tcp_output.c:1568
tcp_transmit_skb net/ipv4/tcp_output.c:1649 [inline]
tcp_send_active_reset+0x2d1/0x5b0 net/ipv4/tcp_output.c:3836
mptcp_do_fastclose+0x27e/0x380 net/mptcp/protocol.c:2793
mptcp_disconnect+0x238/0x710 net/mptcp/protocol.c:3253
mptcp_sendmsg_fastopen+0x2f8/0x580 net/mptcp/protocol.c:1776
mptcp_sendmsg+0x1774/0x1980 net/mptcp/protocol.c:1855
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg+0xe5/0x270 net/socket.c:742
__sys_sendto+0x3bd/0x520 net/socket.c:2244
__do_sys_sendto net/socket.c:2251 [inline]
__se_sys_sendto net/socket.c:2247 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2247
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xfa0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f66e998f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffff9acedb8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f66e9be5fa0 RCX: 00007f66e998f749
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007ffff9acee10 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007f66e9be5fa0 R14: 00007f66e9be5fa0 R15: 0000000000000006
</TASK>

Fixes: ae155060247b ("mptcp: fix duplicate reset on fastclose")
Reported-by: syzbot+3a92d359bc2ec6255a33@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/69260882.a70a0220.d98e3.00b4.GAE@google.com/
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251125195331.309558-1-kuniyu@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: stmmac: dwmac: Disable flushing frames on Rx Buffer Unavailable

In Store and Forward mode, flushing frames when the receive buffer is
unavailable, can cause the MTL Rx FIFO to go out of sync. This results
in buffering of a few frames in the FIFO without triggering Rx DMA
from transferring the data to the system memory until another packet
is received. Once the issue happens, for a ping request, the packet is
forwarded to the system memory only after we receive another packet
and hece we observe a latency equivalent to the ping interval.

64 bytes from 192.168.2.100: seq=1 ttl=64 time=1000.344 ms

Also, we can observe constant gmacgrp_debug register value of
0x00000120, which indicates "Reading frame data".

The issue is not reproducible after disabling frame flushing when Rx
buffer is unavailable. But in that case, the Rx DMA enters a suspend
state due to buffer unavailability. To resume operation, software
must write to the receive_poll_demand register after adding new
descriptors, which reactivates the Rx DMA.

This issue is observed in the socfpga platforms which has dwmac1000 IP
like Arria 10, Cyclone V and Agilex 7. Issue is reproducible after
running iperf3 server at the DUT for UDP lower packet sizes.

Signed-off-by: Rohan G Thomas <rohan.g.thomas@altera.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@altera.com>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/20251126-a10_ext_fix-v1-1-d163507f646f@altera.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

selftests: af_unix: remove unused stdlib.h include

The unix_connreset.c test included <stdlib.h>, but no symbol from that
header is used. This causes a fatal build error under certain
linux-next configurations where stdlib.h is not available.

Remove the unused include to fix the build.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202511221800.hcgCKvVa-lkp@intel.com/
Signed-off-by: Sunday Adelodun <adelodunolaoluwa@yahoo.com>
Link: https://patch.msgid.link/20251125113648.25903-1-adelodunolaoluwa@yahoo.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-fec-fix-some-ptp-related-issues'

Wei Fang says:

====================
net: fec: fix some PTP related issues

There are some issues which were introduced by the commit 350749b909bf
("net: fec: Add support for periodic output signal of PPS"). See each
patch for more details.
====================

Link: https://patch.msgid.link/20251125085210.1094306-1-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: fec: do not register PPS event for PEROUT

There are currently two situations that can trigger the PTP interrupt,
one is the PPS event, the other is the PEROUT event. However, the irq
handler fec_pps_interrupt() does not check the irq event type and
directly registers a PPS event into the system, but the event may be
a PEROUT event. This is incorrect because PEROUT is an output signal,
while PPS is the input of the kernel PPS system. Therefore, add a check
for the event type, if pps_enable is true, it means that the current
event is a PPS event, and then the PPS event is registered.

Fixes: 350749b909bf ("net: fec: Add support for periodic output signal of PPS")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251125085210.1094306-5-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: fec: do not allow enabling PPS and PEROUT simultaneously

In the current driver, PPS and PEROUT use the same channel to generate
the events, so they cannot be enabled at the same time. Otherwise, the
later configuration will overwrite the earlier configuration. Therefore,
when configuring PPS, the driver will check whether PEROUT is enabled.
Similarly, when configuring PEROUT, the driver will check whether PPS
is enabled.

Fixes: 350749b909bf ("net: fec: Add support for periodic output signal of PPS")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251125085210.1094306-4-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: fec: do not update PEROUT if it is enabled

If the previously set PEROUT is already active, updating it will cause
the new PEROUT to start immediately instead of at the specified time.
This is because fep->reload_period is updated whithout check whether
the PEROUT is enabled, and the old PEROUT is not disabled. Therefore,
the pulse period will be updated immediately in the pulse interrupt
handler fec_pps_interrupt().

Currently, the driver does not support directly updating PEROUT and it
will make the logic be more complicated. To fix the current issue, add
a check before enabling the PEROUT, the driver will return an error if
PEROUT is enabled. If users wants to update a new PEROUT, they should
disable the old PEROUT first.

Fixes: 350749b909bf ("net: fec: Add support for periodic output signal of PPS")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251125085210.1094306-3-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: fec: cancel perout_timer when PEROUT is disabled

The PEROUT allows the user to set a specified future time to output the
periodic signal. If the future time is far from the current time, the FEC
driver will use hrtimer to configure PEROUT one second before the future
time. However, the hrtimer will not be canceled if the PEROUT is disabled
before the hrtimer expires. So the PEROUT will be configured when the
hrtimer expires, which is not as expected. Therefore, cancel the hrtimer
in fec_ptp_pps_disable() to fix this issue.

Fixes: 350749b909bf ("net: fec: Add support for periodic output signal of PPS")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20251125085210.1094306-2-wei.fang@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: mctp: unconditionally set skb->dev on dst output

On transmit, we are currently relying on skb->dev being set by
mctp_local_output() when we first set up the skb destination fields.
However, forwarded skbs do not use the local_output path, so will retain
their incoming netdev as their ->dev on tx. This does not work when
we're forwarding between interfaces.

Set skb->dev unconditionally in the transmit path, to allow for proper
forwarding.

We keep the skb->dev initialisation in mctp_local_output(), as we use it
for fragmentation.

Fixes: 269936db5eb3 ("net: mctp: separate routing database from routing operations")
Suggested-by: Vince Chang <vince_chang@aspeedtech.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/20251125-dev-forward-v1-1-54ecffcd0616@codeconstruct.com.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'net-phy-add-support-for-fbnic-phy-w-25g-50g-and-100g-support'

Alexander Duyck says:

====================
net: phy: Add support for fbnic PHY w/ 25G, 50G, and 100G support

To transition the fbnic driver to using the XPCS driver we need to address
the fact that we need a representation for the FW managed PMD that is
actually a SerDes PHY to handle link bouncing during link training.

This patch set introduces the necessary bits to the XPCS driver code to
enable it to read 25G, 50G, and 100G speeds from the PCS ctrl1 register,
and adds support for the approriate interfaces.

The rest of this patch set enables the changes to fbnic to make use of
these interfaces and expose a PMD that can provide a necessary link delay
to avoid link flapping in the event that a cable is disconnected and
reconnected, and to correctly expose the count for the link down events.

With this we have the basic groundwork laid as with this all the bits and
pieces are in place in terms of reading the configuration. The general plan
for follow-on patch sets is to start looking at enabling changing the
configuration in environments where that is supported.
====================

Link: https://patch.msgid.link/176374310349.959489.838154632023183753.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Paolo Abeni <pabeni@redhat.com>