Jakub Kicinski [Fri, 21 Feb 2025 02:51:41 +0000 (18:51 -0800)]
selftests: drv-net: test XDP, HDS auto and the ioctl path
Test XDP and HDS interaction. While at it add a test for using the IOCTL,
as that turned out to be the real culprit.
Testing bnxt:
# NETIF=eth0 ./ksft-net-drv/drivers/net/hds.py
KTAP version 1
1..12
ok 1 hds.get_hds
ok 2 hds.get_hds_thresh
ok 3 hds.set_hds_disable # SKIP disabling of HDS not supported by the device
ok 4 hds.set_hds_enable
ok 5 hds.set_hds_thresh_zero
ok 6 hds.set_hds_thresh_max
ok 7 hds.set_hds_thresh_gt
ok 8 hds.set_xdp
ok 9 hds.enabled_set_xdp
ok 10 hds.ioctl
ok 11 hds.ioctl_set_xdp
ok 12 hds.ioctl_enabled_set_xdp
# Totals: pass:11 fail:0 xfail:0 xpass:0 skip:1 error:0
and netdevsim:
# ./ksft-net-drv/drivers/net/hds.py
KTAP version 1
1..12
ok 1 hds.get_hds
ok 2 hds.get_hds_thresh
ok 3 hds.set_hds_disable
ok 4 hds.set_hds_enable
ok 5 hds.set_hds_thresh_zero
ok 6 hds.set_hds_thresh_max
ok 7 hds.set_hds_thresh_gt
ok 8 hds.set_xdp
ok 9 hds.enabled_set_xdp
ok 10 hds.ioctl
ok 11 hds.ioctl_set_xdp
ok 12 hds.ioctl_enabled_set_xdp
# Totals: pass:12 fail:0 xfail:0 xpass:0 skip:0 error:0
Netdevsim needs a sane default for tx/rx ring size.
ethtool 6.11 is needed for the --disable-netlink option.
Jakub Kicinski [Fri, 21 Feb 2025 02:51:40 +0000 (18:51 -0800)]
net: ethtool: fix ioctl confusing drivers about desired HDS user config
The legacy ioctl path does not have support for extended attributes.
So we issue a GET to fetch the current settings from the driver,
in an attempt to keep them unchanged. HDS is a bit "special" as
the GET only returns on/off while the SET takes a "ternary" argument
(on/off/default). If the driver was in the "default" setting -
executing the ioctl path binds it to on or off, even tho the user
did not intend to change HDS config.
Factor the relevant logic out of the netlink code and reuse it.
Fixes: 87c8f8496a05 ("bnxt_en: add support for tcp-data-split ethtool command") Acked-by: Stanislav Fomichev <sdf@fomichev.me> Tested-by: Daniel Xu <dxu@dxuuu.xyz> Tested-by: Taehee Yoo <ap420073@gmail.com> Link: https://patch.msgid.link/20250221025141.1132944-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Ingo Molnar [Mon, 24 Feb 2025 19:40:17 +0000 (20:40 +0100)]
Merge branch into tip/master: 'perf/urgent'
# New commits in perf/urgent: bddf10d26e6e ("uprobes: Reject the shared zeropage in uprobe_write_opcode()") 2016066c6619 ("perf/core: Order the PMU list to fix warning about unordered pmu_ctx_list") 0fe8813baf4b ("perf/core: Add RCU read lock protection to perf_iterate_ctx()")
The cause is that zero pfn is set to the PTE without increasing the RSS
count in mfill_atomic_pte_zeropage() and the refcount of zero folio does
not increase accordingly. Then, the operation on the same pfn is performed
in uprobe_write_opcode()->__replace_page() to unconditional decrease the
RSS count and old_folio's refcount.
Therefore, two bugs are introduced:
1. The RSS count is incorrect, when process exit, the check_mm() report
error "Bad rss-count".
2. The reserved folio (zero folio) is freed when folio->refcount is zero,
then free_pages_prepare->free_page_is_bad() report error
"Bad page state".
There is more, the following warning could also theoretically be triggered:
Luo Gengkun [Wed, 22 Jan 2025 07:33:56 +0000 (07:33 +0000)]
perf/core: Order the PMU list to fix warning about unordered pmu_ctx_list
Syskaller triggers a warning due to prev_epc->pmu != next_epc->pmu in
perf_event_swap_task_ctx_data(). vmcore shows that two lists have the same
perf_event_pmu_context, but not in the same order.
The problem is that the order of pmu_ctx_list for the parent is impacted by
the time when an event/PMU is added. While the order for a child is
impacted by the event order in the pinned_groups and flexible_groups. So
the order of pmu_ctx_list in the parent and child may be different.
To fix this problem, insert the perf_event_pmu_context to its proper place
after iteration of the pmu_ctx_list.
The follow testcase can trigger above warning:
# perf record -e cycles --call-graph lbr -- taskset -c 3 ./a.out &
# perf stat -e cpu-clock,cs -p xxx // xxx is the pid of a.out
pid = fork();
if (pid == -1) {
printf("fork error\n");
return;
}
if (pid == 0) {
while (1) {
count++;
}
} else {
while (1) {
count++;
}
}
}
The testcase first opens an LBR event, so it will allocate task_ctx_data,
and then open tracepoint and software events, so the parent context will
have 3 different perf_event_pmu_contexts. On inheritance, child ctx will
insert the perf_event_pmu_context in another order and the warning will
trigger.
Paolo Bonzini [Mon, 24 Feb 2025 18:20:30 +0000 (13:20 -0500)]
Merge tag 'kvm-riscv-fixes-6.14-1' of https://github.com/kvm-riscv/linux into HEAD
KVM/riscv fixes for 6.14, take #1
- Fix hart status check in SBI HSM extension
- Fix hart suspend_type usage in SBI HSM extension
- Fix error returned by SBI IPI and TIME extensions for
unsupported function IDs
- Fix suspend_type usage in SBI SUSP extension
- Remove unnecessary vcpu kick after injecting interrupt
via IMSIC guest file
Paolo Bonzini [Mon, 24 Feb 2025 18:20:00 +0000 (13:20 -0500)]
Merge tag 'kvmarm-fixes-6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.14, take #3
- Fix TCR_EL2 configuration to not use the ASID in TTBR1_EL2
and not mess-up T1SZ/PS by using the HCR_EL2.E2H==0 layout.
- Bring back the VMID allocation to the vcpu_load phase, ensuring
that we only setup VTTBR_EL2 once on VHE. This cures an ugly
race that would lead to running with an unallocated VMID.
Breno Leitao [Fri, 17 Jan 2025 14:41:07 +0000 (06:41 -0800)]
perf/core: Add RCU read lock protection to perf_iterate_ctx()
The perf_iterate_ctx() function performs RCU list traversal but
currently lacks RCU read lock protection. This causes lockdep warnings
when running perf probe with unshare(1) under CONFIG_PROVE_RCU_LIST=y:
WARNING: suspicious RCU usage
kernel/events/core.c:8168 RCU-list traversed in non-reader section!!
This protection was previously present but was removed in commit bd2756811766 ("perf: Rewrite core context handling"). Add back the
necessary rcu_read_lock()/rcu_read_unlock() pair around
perf_iterate_ctx() call in perf_event_exec().
[ mingo: Use scoped_guard() as suggested by Peter ]
The ES8328 codec driver, which is also used for the ES8388 chip that
appears to have an identical register map, claims that the output can
either take the route from DAC->Mixer->Output or through DAC->Output
directly. To the best of what I could find, this is not true, and
creates problems.
Without DACCONTROL17 bit index 7 set for the left channel, as well as
DACCONTROL20 bit index 7 set for the right channel, I cannot get any
analog audio out on Left Out 2 and Right Out 2 respectively, despite the
DAPM routes claiming that this should be possible. Furthermore, the same
is the case for Left Out 1 and Right Out 1, showing that those two don't
have a direct route from DAC to output bypassing the mixer either.
Those control bits toggle whether the DACs are fed (stale bread?) into
their respective mixers. If one "unmutes" the mixer controls in
alsamixer, then sure, the audio output works, but if it doesn't work
without the mixer being fed the DAC input then evidently it's not a
direct output from the DAC.
ES8328/ES8388 are seemingly not alone in this. ES8323, which uses a
separate driver for what appears to be a very similar register map,
simply flips those two bits on in its probe function, and then pretends
there is no power management whatsoever for the individual controls.
Fair enough.
My theory as to why nobody has noticed this up to this point is that
everyone just assumes it's their fault when they had to unmute an
additional control in ALSA.
Fix this in the es8328 driver by removing the erroneous direct route,
then get rid of the playback switch controls and have those bits tied to
the mixer's widget instead, which until now had no register to play
with.
Linus Walleij [Thu, 20 Feb 2025 18:48:15 +0000 (19:48 +0100)]
net: dsa: rtl8366rb: Fix compilation problem
When the kernel is compiled without LED framework support the
rtl8366rb fails to build like this:
rtl8366rb.o: in function `rtl8366rb_setup_led':
rtl8366rb.c:953:(.text.unlikely.rtl8366rb_setup_led+0xe8):
undefined reference to `led_init_default_state_get'
rtl8366rb.c:980:(.text.unlikely.rtl8366rb_setup_led+0x240):
undefined reference to `devm_led_classdev_register_ext'
As this is constantly coming up in different randconfig builds,
bite the bullet and create a separate file for the offending
code, split out a header with all stuff needed both in the
core driver and the leds code.
Add a new bool Kconfig option for the LED compile target, such
that it depends on LEDS_CLASS=y || LEDS_CLASS=RTL8366RB
which make LED support always available when LEDS_CLASS is
compiled into the kernel and enforce that if the LEDS_CLASS
is a module, then the RTL8366RB driver needs to be a module
as well so that modprobe can resolve the dependencies.
Fixes: 32d617005475 ("net: dsa: realtek: add LED drivers for rtl8366rb") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502070525.xMUImayb-lkp@intel.com/ Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ard Biesheuvel [Sun, 23 Feb 2025 15:48:54 +0000 (16:48 +0100)]
efivarfs: Defer PM notifier registration until .fill_super
syzbot reports an issue that turns out to be caused by the fact that the
efivarfs PM notifier may be invoked before the efivarfs_fs_info::sb
field is populated, resulting in a NULL deference.
So defer the registration until efivarfs_fill_super() is invoked.
Patrick Rudolph [Fri, 21 Feb 2025 11:15:16 +0000 (12:15 +0100)]
efi/cper: Fix cper_arm_ctx_info alignment
According to the UEFI Common Platform Error Record appendix, the
processor context information structure is a variable length structure,
but "is padded with zeros if the size is not a multiple of 16 bytes".
Currently this isn't honoured, causing all but the first structure to
be garbage when printed. Thus align the size to be a multiple of 16.
Signed-off-by: Patrick Rudolph <patrick.rudolph@9elements.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Patrick Rudolph [Fri, 21 Feb 2025 08:12:42 +0000 (09:12 +0100)]
efi/cper: Fix cper_ia_proc_ctx alignment
According to the UEFI Common Platform Error Record appendix, the
IA32/X64 Processor Context Information Structure is a variable length
structure, but "is padded with zeros if the size is not a multiple
of 16 bytes".
Currently this isn't honoured, causing all but the first structure to
be garbage when printed. Thus align the size to be a multiple of 16.
Signed-off-by: Patrick Rudolph <patrick.rudolph@9elements.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
When there is a failure during bind QP, the cleanup flow destroys the
counter regardless if it is the one that created it or not, which is
problematic since if it isn't the one that created it, that counter could
still be in use.
Fix that by destroying the counter only if it was created during this call.
Teresa Remmet [Tue, 18 Feb 2025 07:41:52 +0000 (08:41 +0100)]
arm64: dts: imx8mm-phyboard-polis: Add support for PEB-AV-10
PEB-AV-10 is an Audio/Video extension module which extends
phyBOARD-Polis i.MX8MM.
With MIPI DSI to LVDS bridge already populated on SoM the PEB-AV-10 adds
support for:
- connecting 10" display,
- audio with TLV320AIC and
- EEPROM.
arm64: dts: imx8mm-phyboard-polis: Assign missing regulator for bluetooth
Assign the missing regulator to the bluetooth node. Absence of
this regulator triggers the warning message from kernel as driver
uses a fallback dummy regulator when there is no regulator assigned.
arm64: dts: imx8mm-phycore-som: Assign regulator for dsi to lvds bridge
Add a missing voltage regulator of 1.8v to the sn65dsi83
(dsi_lvds bridge) node. Due to the absence of this regulator, a fallback
dummy regulator is used and that triggers a warning message from the
kernel. Assigning the appropriate regulator avoids the warning.
Remove device tree property "fsl,magic-packet" as WoL is not working
on the SoM and so not required. This also saves a significant amount of
power during suspend as the ethernet phy is not powered down otherwise.
Andrej Picej [Tue, 18 Feb 2025 07:41:43 +0000 (08:41 +0100)]
arm64: dts: imx8mm-phycore-som: Fix bluetooth wakeup source
Not using pull-up on the host wake-up line triggers the wake up
immediately after device enters suspend. Fix this by enabling internal
pull-up and setting interrupt triggering on the falling edge.
arm64: dts: freescale: imx8mm-verdin-dahlia: add Microphone Jack to sound card
The simple-audio-card's microphone widget currently connects to the
headphone jack. Routing the microphone input to the microphone jack
allows for independent operation of the microphone and headphones.
This resolves the following boot-time kernel log message, which
indicated a conflict when the microphone and headphone functions were
not separated:
debugfs: File 'Headphone Jack' in directory 'dapm' already present!
Fixes: 6a57f224f734 ("arm64: dts: freescale: add initial support for verdin imx8m mini") Signed-off-by: Stefan Eichenberger <stefan.eichenberger@toradex.com> Reviewed-by: Francesco Dolcini <francesco.dolcini@toradex.com> Cc: <stable@vger.kernel.org> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
arm64: dts: freescale: imx8mp-verdin-dahlia: add Microphone Jack to sound card
The simple-audio-card's microphone widget currently connects to the
headphone jack. Routing the microphone input to the microphone jack
allows for independent operation of the microphone and headphones.
This resolves the following boot-time kernel log message, which
indicated a conflict when the microphone and headphone functions were
not separated:
debugfs: File 'Headphone Jack' in directory 'dapm' already present!
arm64: dts: freescale: imx8mm-verdin: Remove LVDS panel and backlight
Remove LVDS panel and backlight nodes from the Verdin iMX8M Mini SoM
dtsi file, those two hardware components are not part of the SoM,
therefore they should not be present in this file.
This is solving a dtb checker warning about panel-lvds compatible.
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Peng Fan [Fri, 14 Feb 2025 08:47:51 +0000 (16:47 +0800)]
soc: imx8m: Unregister cpufreq and soc dev in cleanup path
Unregister the cpufreq device and soc device when resource unwinding,
otherwise there will be warning when do removing test:
sysfs: cannot create duplicate filename '/devices/platform/imx-cpufreq-dt'
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.13.0-rc1-next-20241204
Hardware name: NXP i.MX8MPlus EVK board (DT)
Fixes: 9cc832d37799 ("soc: imx8m: Probe the SoC driver as platform driver") Cc: Marco Felsch <m.felsch@pengutronix.de> Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Marco Felsch <m.felsch@pengutronix.de> Signed-off-by: Shawn Guo <shawnguo@kernel.org>
Fabio Estevam [Mon, 3 Feb 2025 17:30:20 +0000 (14:30 -0300)]
ARM: dts: imx28-sps1: Fix GPIO LEDs description
According to leds-gpio.yaml, the LED nodes should not contain
unit addresses. Remove them.
Also, 'default-trigger' is not a valid property. Change it to
'linux,default-trigger'.
These changes fix the following dt-schema warnings:
led@1: Unevaluated properties are not allowed ('reg' was unexpected)
led@2: Unevaluated properties are not allowed ('reg' was unexpected)
led@3: Unevaluated properties are not allowed ('default-trigger', 'reg' were unexpected)
leds: '#address-cells', '#size-cells' do not match any of the regexes: '(^led-[0-9a-f]$|led)', 'pinctrl-[0-9]+'
Fabio Estevam [Mon, 3 Feb 2025 12:05:11 +0000 (09:05 -0300)]
ARM: dts: vf610-bk4: Use the more specific "lwn,bk4-spi"
Currently, the compatible string used for the spidev device is "lwn,bk4",
which is the same as the board compatible string documented at fsl.yaml.
This causes several dt-schema warnings:
make dtbs_check DT_SCHEMA_FILES=fsl.yaml
...
['lwn,bk4'] is too short
'lwn,bk4' is not one of ['tq,imx8dxp-tqma8xdp-mba8xx']
'lwn,bk4' is not one of ['tq,imx8qxp-tqma8xqp-mba8xx']
'lwn,bk4' is not one of ['armadeus,imx1-apf9328', 'fsl,imx1ads']
Use a more specific "lwn,bk4-spi" compatible string to fix the problem.
Brendan Jackman [Thu, 20 Feb 2025 12:23:40 +0000 (12:23 +0000)]
scripts/gdb: add $lx_per_cpu_ptr()
We currently have $lx_per_cpu() which works fine for stuff that kernel
code would access via per_cpu(). But this doesn't work for stuff that
kernel code accesses via per_cpu_ptr():
(gdb) p $lx_per_cpu(node_data[1].node_zones[2]->per_cpu_pageset)
Cannot access memory at address 0xffff11105fbd6c28
This is because we take the address of the pointer and use that as the
offset, instead of using the stored value.
Add a GDB version that mirrors the kernel API, which uses the pointer
value.
To be consistent with per_cpu_ptr(), we need to return the pointer value
instead of dereferencing it for the user. Therefore, move the existing
dereference out of the per_cpu() Python helper and do that only in the
$lx_per_cpu() implementation.
Michal Koutný [Fri, 21 Feb 2025 17:02:49 +0000 (18:02 +0100)]
pid: optional first-fit pid allocation
Noone would need to use this allocation strategy (it's slower, pid
numbers collide sooner). Its primary purpose are pid namespaces in
conjunction with pids.max cgroup limit which keeps (virtual) pid numbers
below the given limit. This is for 32-bit userspace programs that may
not work well with pid numbers above 65536.
Michal Koutný [Fri, 21 Feb 2025 17:02:48 +0000 (18:02 +0100)]
Revert "pid: allow pid_max to be set per pid namespace"
Patch series "Alternative "pid_max" for 32-bit userspace".
pid_max is sort of a legacy limit (its value and partially the concept
too, given the existence of pids cgroup controller). It is tempting to
make the pid_max value part of a pid namespace to provide compat
environment for 32-bit applications [1]. On the other hand, it provides
yet another mechanism for limitation of task count. Even without
namespacing of pid_max value, the configuration of conscious limit is
confusing for users [2].
This series builds upon the idea of restricting the number (amount) of
tasks by pids controller and ensuring that number (pid) never exceeds the
amount of tasks. This would not currently work out of the box because
next-fit pid allocation would continue to assign numbers (pids) higher
than the actual amount (there would be gaps in the lower range of the
interval). The patch 2/2 implements this idea by extending semantics of
ns_last_pid knob to allow first-fit numbering. (The implementation has
clumsy ifdefery, which can might be dropped since it's too x86-centric.)
The patch 1/2 is a mere revert to simplify pid_max to one global limit
only.
It is already difficult for users to troubleshoot which of multiple pid
limits restricts their workload. I'm afraid making pid_max
per-(hierarchical-)NS will contribute to confusion. Also, the
implementation copies the limit upon creation from parent, this pattern
showed cumbersome with some attributes in legacy cgroup controllers --
it's subject to race condition between parent's limit modification and
children creation and once copied it must be changed in the descendant.
This is very similar to what pids.max of a cgroup (already) does that can
be used as an alternative.