Ingo Molnar [Mon, 24 Feb 2025 19:40:21 +0000 (20:40 +0100)]
Merge branch into tip/master: 'x86/cpu'
# New commits in x86/cpu: 43bb700cff6b ("x86/cpu: Update Intel Family comments") 64aad4749d79 ("ACPI/processor_idle: Export acpi_processor_ffh_play_dead()") 96040f7273e2 ("x86/smp: Eliminate mwait_play_dead_cpuid_hint()") fc4ca9537bc4 ("intel_idle: Provide the default enter_dead() handler") 541ddf31e300 ("ACPI/processor_idle: Add FFH state handling") a7dd183f0b38 ("x86/smp: Allow calling mwait_play_dead with an arbitrary hint") 1e66d6cf888f ("x86/cpu: Fix #define name for Intel CPU model 0x5A")
Ingo Molnar [Mon, 24 Feb 2025 19:40:21 +0000 (20:40 +0100)]
Merge branch into tip/master: 'x86/core'
# New commits in x86/core: 684b12916a10 ("x86/arch_prctl/64: Clean up ARCH_MAP_VDSO_32") 2df1ad0d2580 ("x86/arch_prctl: Simplify sys_arch_prctl()") 06dd759b68ee ("x86/module: Remove unnecessary check in module_finalize()") c305a4e98378 ("x86: Move sysctls into arch/x86") 882b86fd4e0d ("x86/ibt: Handle FineIBT in handle_cfi_failure()") 306859de59e5 ("x86/early_printk: Harden early_serial") c4239a72a29d ("x86/ibt: Clean up poison_endbr()") c20ad96c9a8f ("x86/traps: Cleanup and robustify decode_bug()") ab9fea59487d ("x86/alternative: Simplify callthunk patching") 93f16a1ab78c ("x86/boot: Mark start_secondary() with __noendbr") 582077c94052 ("x86/cfi: Clean up linkage") 2981557cb040 ("x86,kcfi: Fix EXPORT_SYMBOL vs kCFI") 72e213a7ccf9 ("x86/ibt: Clean up is_endbr()") 675204778c69 ("module: don't annotate ROX memory as kmemleak_not_leak()") 63887c9f0203 ("x86: Compare physical instead of virtual PGD addresses") 3ef938c35035 ("x86/mm: Fix flush_tlb_range() when used for zapping normal PMDs") 64f6a4e10c05 ("x86: re-enable EXECMEM_ROX support") 602df3712979 ("module: drop unused module_writable_address()") 1d7e707af446 ("Revert "x86/module: prepare module loading for ROX allocations of text"") c287c0723329 ("module: switch to execmem API for remapping as RW and restoring ROX") 05e555b81726 ("execmem: add API for temporal remapping as RW and restoring ROX afterwards") 925f42645118 ("execmem: don't remove ROX cache from the direct map") 41d88484c71c ("x86/mm/pat: restore large ROX pages after fragmentation") 4ee788eb0781 ("x86/mm/pat: drop duplicate variable in cpa_flush()") 33ea120582a6 ("x86/mm/pat: cpa-test: fix length for CPA_ARRAY test")
Ingo Molnar [Mon, 24 Feb 2025 19:40:21 +0000 (20:40 +0100)]
Merge branch into tip/master: 'x86/cleanups'
# New commits in x86/cleanups: 51184c3c96a1 ("x86/usercopy: Fix kernel-doc func param name in clean_cache_range()'s description") 0156338a18eb ("x86/apic: Use str_disabled_enabled() helper in print_ipi_mode()")
Ingo Molnar [Mon, 24 Feb 2025 19:40:20 +0000 (20:40 +0100)]
Merge branch into tip/master: 'x86/boot'
# New commits in x86/boot: a2498e5c453b ("x86/kexec: Export e820_table_kexec[] to sysfs") 5bebe2e4fe27 ("x86/boot: Change some static bootflag functions to bool") efe659ac0146 ("x86/e820: Drop obsolete E820_TYPE_RESERVED_KERN and related code") d45dd0a9b27e ("x86/boot: Split parsing of boot_params into the parse_boot_params() helper function") 297fb82ebaad ("x86/boot: Split kernel resources setup into the setup_kernel_resources() helper function") 2d6bff31399b ("x86/boot: Move setting of memblock parameters to e820__memblock_setup()")
Ingo Molnar [Mon, 24 Feb 2025 19:40:19 +0000 (20:40 +0100)]
Merge branch into tip/master: 'timers/core'
# New commits in timers/core: f99c5bb396b8 ("posix-timers: Invoke cond_resched() during exit_itimers()") 4441b976dfef ("hrtimers: Replace hrtimer_clock_to_base_table with switch-case") 2ea97b76d671 ("hrtimers: Make hrtimer_update_function() less expensive")
Ingo Molnar [Mon, 24 Feb 2025 19:40:19 +0000 (20:40 +0100)]
Merge branch into tip/master: 'timers/cleanups'
# New commits in timers/cleanups: 86a578e780a9 ("wifi: rt2x00: Switch to use hrtimer_update_function()") 3f8d93d1371f ("io_uring: Use helper function hrtimer_update_function()") eee00df8e1f1 ("serial: xilinx_uartps: Use helper function hrtimer_update_function()") ce68de08a2cc ("ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup()") bbdafde7c220 ("RDMA: Switch to use hrtimer_setup()") 7b5edfd278b0 ("virtio: mem: Switch to use hrtimer_setup()") ff533f73d5c0 ("drm/vmwgfx: Switch to use hrtimer_setup()") 397c07a3c90b ("drm/xe/oa: Switch to use hrtimer_setup()") c38e753abee2 ("drm/vkms: Switch to use hrtimer_setup()") 58ac3c93306e ("drm/msm: Switch to use hrtimer_setup()") 1a2ff5c3058d ("drm/i915/request: Switch to use hrtimer_setup()") f97e1d787f9f ("drm/i915/uncore: Switch to use hrtimer_setup()") 82ad584eed8b ("drm/i915/pmu: Switch to use hrtimer_setup()") 7358f053c4d6 ("drm/i915/perf: Switch to use hrtimer_setup()") 9892287897ca ("drm/i915/gvt: Switch to use hrtimer_setup()") 0592bb39e3a3 ("drm/i915/huc: Switch to use hrtimer_setup()") 690d59fee83c ("drm/amdgpu: Switch to use hrtimer_setup()") c6be6eafd620 ("stm class: heartbeat: Switch to use hrtimer_setup()") f1061c1442c1 ("i2c: Switch to use hrtimer_setup()") c69da1735f19 ("iio: Switch to use hrtimer_setup()") a9d0ac739658 ("leds: trigger: pattern: Switch to use hrtimer_setup()") c158a29c5c5b ("mailbox: Switch to use hrtimer_setup()") 0ebb5e74db09 ("media: Switch to use hrtimer_setup()") 7f657ad09482 ("misc: vcpu_stall_detector: Switch to use hrtimer_setup()") 3a1ed018e995 ("mmc: dw_mmc: Switch to use hrtimer_setup()") abeebe8889b7 ("ntb: ntb_pingpong: Switch to use hrtimer_setup()") 5f8401cf7b3a ("drivers: perf: Switch to use hrtimer_setup()") 563608c20403 ("power: reset: ltc2952-poweroff: Switch to use hrtimer_setup()") 1b73fd14cfb4 ("power: supply: ab8500_chargalg: Switch to use hrtimer_setup()") d9a67240729d ("powercap: Switch to use hrtimer_setup()") 5e55888e340a ("pps: generators: pps_gen_parport: Switch to use hrtimer_setup()") c92697913fdc ("rtc: class: Switch to use hrtimer_setup()") b7011929380d ("scsi: Switch to use hrtimer_setup()") 0852ca41ce1c ("serial: xilinx_uartps: Switch to use hrtimer_setup()") 4e1214969603 ("serial: sh-sci: Switch to use hrtimer_setup()") 721c5bf65a1d ("serial: imx: Switch to use hrtimer_setup()") c5f0fa1622f6 ("serial: amba-pl011: Switch to use hrtimer_setup()") 6bf9bb76b3af ("serial: 8250: Switch to use hrtimer_setup()") 9fdf17c5aa2c ("usb: typec: tcpm: Switch to use hrtimer_setup()") 8073d9dfe2ef ("usb: musb: cppi41: Switch to use hrtimer_setup()") da4f28741b90 ("usb: ehci: Switch to use hrtimer_setup()") 060baec57cfe ("usb: gadget: Switch to use hrtimer_setup()") e0e59e95eb38 ("usb: fotg210-hcd: Switch to use hrtimer_setup()") 4cf533bbdfab ("usb: dwc2: Switch to use hrtimer_setup()") a63cb05bd553 ("USB: chipidea: Switch to use hrtimer_setup()") 1417c85d1625 ("xfrm: Switch to use hrtimer_setup()") 7b449279f56a ("octeontx2-pf: Switch to use hrtimer_setup()") e26ad10db84b ("igc: Switch to use hrtimer_setup()") 1528fd734e7b ("wifi: rt2x00: Switch to use hrtimer_setup()") cbe2691bee4e ("wifi: Switch to use hrtimer_setup()") d1ba57528f44 ("net/cdc_ncm: Switch to use hrtimer_setup()") d4bcc73352e4 ("net: wwan: iosm: Switch to use hrtimer_setup()") e193660f5e7f ("net: fec: Switch to use hrtimer_setup()") 78afb7fa96ed ("net: stmmac: Switch to use hrtimer_setup()") 3c85516612f8 ("net: qualcomm: rmnet: Switch to use hrtimer_setup()") 4781599491bd ("net: mvpp2: Switch to use hrtimer_setup()") dbf13c4278a5 ("net: ieee802154: at86rf230: Switch to use hrtimer_setup()") 7b63b1dc473e ("net: sparx5: Switch to use hrtimer_setup()") 964177da435c ("net: ethernet: hisilicon: Switch to use hrtimer_setup()") 66a3898a203d ("net: ethernet: ec_bhf: Switch to use hrtimer_setup()") f12185af60cb ("net: ethernet: cortina: Switch to use hrtimer_setup()") e9cc3a8936ee ("net: ethernet: ti: Switch to use hrtimer_setup()") 806e32248e22 ("can: Switch to use hrtimer_setup()") 881ec0c6db17 ("can: mcp251xfd: Switch to use hrtimer_setup()") e0eaefcd7e44 ("can: m_can: Switch to use hrtimer_setup()") 553f9a8be728 ("tcp: Switch to use hrtimer_setup()") 96b2fb3e6d14 ("mac802154: Switch to use hrtimer_setup()") efcb2d32a8f5 ("net/sched: Switch to use hrtimer_setup()") fe0b776543e9 ("netdev: Switch to use hrtimer_setup()") 8030d4673e99 ("hwrng: timeriomem: Switch to use hrtimer_setup()") 68d3de7fc49c ("null_blk: Switch to use hrtimer_setup()") 4279d7054c87 ("PM / devfreq: rockchip-dfi: Switch to use hrtimer_setup()") efad91a9836e ("PM: runtime: Switch to use hrtimer_setup()") cab0e0a05627 ("blk_iocost: Switch to use hrtimer_setup()") 32539b780c4f ("ata: pata_octeon_cf: Switch to use hrtimer_setup()") 2414f15910c5 ("block, bfq: Switch to use hrtimer_setup()") 19fec9c4434f ("tracing/osnoise: Switch to use hrtimer_setup()") d2254b064322 ("watchdog: Switch to use hrtimer_setup()") 1654eba8f74d ("ubifs: Switch to use hrtimer_setup()") deacdc871b48 ("bpf: Switch to use hrtimer_setup()") f66b0acf394b ("time: Switch to hrtimer_setup()") 9eeb54b47541 ("timerfd: Switch to use hrtimer_setup()") 022a223546e4 ("perf: Switch to use hrtimer_setup()") 91b7be704dd4 ("fork: Switch to use hrtimer_setup()") 4248fd6f37c1 ("io_uring/timeout: Switch to use hrtimer_setup()") b09dffdeb369 ("lib: test_objpool: Switch to use hrtimer_setup()") 53867760f50c ("mm/slab: Switch to use hrtimer_setup()") ee13da875b8a ("sched: Switch to use hrtimer_setup()") 99fb79f6d6de ("s390/ap_bus: Switch to use hrtimer_setup()") c56c98e5af6d ("perf/x86: Switch to use hrtimer_setup()") d1f0d81b3604 ("powerpc/watchdog: Switch to use hrtimer_setup()") 878a388866a6 ("ARM: 8611/1: l2x0: Switch to use hrtimer_setup()") 2f33de836402 ("ARM: imx: Switch to use hrtimer_setup()") 92051cb9d3e1 ("riscv: kvm: Switch to use hrtimer_setup()") 7d6f12520bd4 ("LoongArch: KVM: Switch to use hrtimer_setup()") 7e5fd922c146 ("KVM: arm64: Switch to use hrtimer_setup()") 7764b9dd174c ("KVM: x86: Switch to use hrtimer_setup()") 7ff22753d894 ("KVM: s390: Switch to use hrtimer_setup()") a0241210a3f3 ("KVM: PPC: Switch to use hrtimer_setup()") c97f85ddd60a ("KVM: MIPS: Switch to use hrtimer_setup()")
Ingo Molnar [Mon, 24 Feb 2025 19:40:18 +0000 (20:40 +0100)]
Merge branch into tip/master: 'sched/core'
# New commits in sched/core: 3c27b40830ca ("selftests/rseq: Add rseq syscall errors test") 1a5d3492f8e1 ("sched: Add unlikey branch hints to several system calls") b796ea848991 ("sched/core: Remove duplicate included header file stats.h") d90c9de9de2f ("x86/tsc: Always save/restore TSC sched_clock() on suspend/resume") d34e798094ca ("sched/fair: Refactor can_migrate_task() to elimate looping") 563bc2161b94 ("sched/eevdf: Force propagating min_slice of cfs_rq when {en,de}queue tasks") b9f2b29b9494 ("sched: Don't define sched_clock_irqtime as static key") 2ae891b82695 ("sched: Reduce the default slice to avoid tasks getting an extra tick") f553741ac8c0 ("sched: Cancel the slice protection of the idle entity")
Ingo Molnar [Mon, 24 Feb 2025 19:40:18 +0000 (20:40 +0100)]
Merge branch into tip/master: 'locking/core'
# New commits in locking/core: 337369f8ce9e ("locking/mutex: Add MUTEX_WARN_ON() into fast path") 2d352ec9fcb5 ("x86/locking: Use asm_inline for {,try_}cmpxchg{64,128} emulations") 4087e16b0331 ("x86/locking: Use ALT_OUTPUT_SP() for percpu_{,try_}cmpxchg{64,128}_op()")
Ingo Molnar [Mon, 24 Feb 2025 19:40:17 +0000 (20:40 +0100)]
Merge branch into tip/master: 'irq/drivers'
# New commits in irq/drivers: b956c9de9175 ("arm64: dts: rockchip: rk356x: Move PCIe MSI to use GIC ITS instead of MBI") f15be3d4a0a5 ("arm64: dts: rockchip: rk356x: Add MSI controller node") 2d81e1bb6252 ("irqchip/gic-v3: Add Rockchip 3568002 erratum workaround") 896f8e436f99 ("irqchip/riscv-imsic: Special handling for non-atomic device MSI update") 0bd55080ba9e ("irqchip/riscv-imsic: Avoid interrupt translation in interrupt handler") 51611130d57d ("irqchip/riscv-imsic: Implement irq_force_complete_move() for IMSIC") 0f67911e821c ("irqchip/riscv-imsic: Separate next and previous pointers in IMSIC vector") 58d868b67a9a ("RISC-V: Select CONFIG_GENERIC_PENDING_IRQ") e54b1b5e89ae ("genirq: Introduce irq_can_move_in_process_context()") 751dc837dabd ("genirq: Introduce common irq_force_complete_move() implementation") fe35ecee8ec8 ("irqchip/riscv-imsic: Move to common MSI library") 1c000dcaad2b ("irqchip/irq-msi-lib: Optionally set default irq_eoi()/irq_ack()") 999f458c1771 ("irqchip/riscv-imsic: Set irq_set_affinity() for IMSIC base") 0699e578e279 ("irqchip/renesas-rzg2l: Simplify checks in rzg2l_irqc_common_init()") 4bd0317ce63c ("irqchip/renesas-rzg2l: Switch to using dev_err_probe()") bec8a3712943 ("irqchip/renesas-rzg2l: Remove pm_put label") 7de11369ef30 ("irqchip/renesas-rzg2l: Use devm_pm_runtime_enable()") 78f384dad082 ("irqchip/renesas-rzg2l: Use devm_reset_control_get_exclusive_deasserted()") dd4e17c30944 ("irqchip/renesas-rzg2l: Use local dev pointer in rzg2l_irqc_common_init()") b93afe8a3ac5 ("irqchip/riscv-aplic: Add support for hart indexes") c057b6e42135 ("dt-bindings: interrupt-controller: Add risc-v,aplic hart indexes")
Ingo Molnar [Mon, 24 Feb 2025 19:40:17 +0000 (20:40 +0100)]
Merge branch into tip/master: 'perf/urgent'
# New commits in perf/urgent: bddf10d26e6e ("uprobes: Reject the shared zeropage in uprobe_write_opcode()") 2016066c6619 ("perf/core: Order the PMU list to fix warning about unordered pmu_ctx_list") 0fe8813baf4b ("perf/core: Add RCU read lock protection to perf_iterate_ctx()")
The cause is that zero pfn is set to the PTE without increasing the RSS
count in mfill_atomic_pte_zeropage() and the refcount of zero folio does
not increase accordingly. Then, the operation on the same pfn is performed
in uprobe_write_opcode()->__replace_page() to unconditional decrease the
RSS count and old_folio's refcount.
Therefore, two bugs are introduced:
1. The RSS count is incorrect, when process exit, the check_mm() report
error "Bad rss-count".
2. The reserved folio (zero folio) is freed when folio->refcount is zero,
then free_pages_prepare->free_page_is_bad() report error
"Bad page state".
There is more, the following warning could also theoretically be triggered:
Luo Gengkun [Wed, 22 Jan 2025 07:33:56 +0000 (07:33 +0000)]
perf/core: Order the PMU list to fix warning about unordered pmu_ctx_list
Syskaller triggers a warning due to prev_epc->pmu != next_epc->pmu in
perf_event_swap_task_ctx_data(). vmcore shows that two lists have the same
perf_event_pmu_context, but not in the same order.
The problem is that the order of pmu_ctx_list for the parent is impacted by
the time when an event/PMU is added. While the order for a child is
impacted by the event order in the pinned_groups and flexible_groups. So
the order of pmu_ctx_list in the parent and child may be different.
To fix this problem, insert the perf_event_pmu_context to its proper place
after iteration of the pmu_ctx_list.
The follow testcase can trigger above warning:
# perf record -e cycles --call-graph lbr -- taskset -c 3 ./a.out &
# perf stat -e cpu-clock,cs -p xxx // xxx is the pid of a.out
pid = fork();
if (pid == -1) {
printf("fork error\n");
return;
}
if (pid == 0) {
while (1) {
count++;
}
} else {
while (1) {
count++;
}
}
}
The testcase first opens an LBR event, so it will allocate task_ctx_data,
and then open tracepoint and software events, so the parent context will
have 3 different perf_event_pmu_contexts. On inheritance, child ctx will
insert the perf_event_pmu_context in another order and the warning will
trigger.
Breno Leitao [Fri, 17 Jan 2025 14:41:07 +0000 (06:41 -0800)]
perf/core: Add RCU read lock protection to perf_iterate_ctx()
The perf_iterate_ctx() function performs RCU list traversal but
currently lacks RCU read lock protection. This causes lockdep warnings
when running perf probe with unshare(1) under CONFIG_PROVE_RCU_LIST=y:
WARNING: suspicious RCU usage
kernel/events/core.c:8168 RCU-list traversed in non-reader section!!
This protection was previously present but was removed in commit bd2756811766 ("perf: Rewrite core context handling"). Add back the
necessary rcu_read_lock()/rcu_read_unlock() pair around
perf_iterate_ctx() call in perf_event_exec().
[ mingo: Use scoped_guard() as suggested by Peter ]
Uros Bizjak [Sun, 23 Feb 2025 16:13:38 +0000 (17:13 +0100)]
x86/ioperm: Use atomic64_inc_return() in ksys_ioperm()
Use atomic64_inc_return(&ref) instead of atomic64_add_return(1, &ref)
to use optimized implementation on targets that define
atomic_inc_return() and to remove now unneeded initialization of the
%eax/%edx register pair before the call to atomic64_inc_return().
Randy Dunlap [Sat, 11 Jan 2025 06:33:33 +0000 (22:33 -0800)]
x86/usercopy: Fix kernel-doc func param name in clean_cache_range()'s description
Use @addr instead of @vaddr in the kernel-doc comment for
clean_cache_range() to eliminate warnings:
arch/x86/lib/usercopy_64.c:29: warning: Function parameter or struct member 'addr' not described in 'clean_cache_range'
arch/x86/lib/usercopy_64.c:29: warning: Excess function parameter 'vaddr' description in 'clean_cache_range'
Linus Torvalds [Sun, 23 Feb 2025 01:32:00 +0000 (17:32 -0800)]
Merge tag 'v6.14-rc3-smb3-client-fix-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fix from Steve French:
- Fix potential null pointer dereference
* tag 'v6.14-rc3-smb3-client-fix-part2' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: Add check for next_buffer in receive_encrypted_standard()
Linus Torvalds [Sat, 22 Feb 2025 18:45:02 +0000 (10:45 -0800)]
Merge tag 'x86-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix AVX-VNNI CPU feature dependency bug triggered via the 'noxsave'
boot option
- Fix typos in the SVA documentation
- Add Tony Luck as RDT co-maintainer and remove Fenghua Yu
* tag 'x86-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
docs: arch/x86/sva: Fix two grammar errors under Background and FAQ
x86/cpufeatures: Make AVX-VNNI depend on AVX
MAINTAINERS: Change maintainer for RDT
Linus Torvalds [Sat, 22 Feb 2025 17:30:04 +0000 (09:30 -0800)]
Merge tag 'sched-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rseq fixes from Ingo Molnar:
- Fix overly spread-out RSEQ concurrency ID allocation pattern that
regressed certain workloads
- Fix RSEQ registration syscall behavior on -EFAULT errors when
CONFIG_DEBUG_RSEQ=y (This debug option is disabled on most
distributions)
* tag 'sched-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: Fix rseq registration with CONFIG_DEBUG_RSEQ
sched: Compact RSEQ concurrency IDs with reduced threads and affinity
Linus Torvalds [Sat, 22 Feb 2025 17:26:12 +0000 (09:26 -0800)]
Merge tag 'perf-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf event fixes from Ingo Molnar:
"Fix x86 Intel Lion Cove CPU event constraints, and fix uprobes
debug/error printk output pointer-value verbosity"
* tag 'perf-urgent-2025-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix event constraints for LNC
uprobes: Don't use %pK through printk
Linus Torvalds [Sat, 22 Feb 2025 17:09:33 +0000 (09:09 -0800)]
Merge tag 's390-6.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Fix inline asm constraint in cmma_test_essa() to avoid potential ESSA
detection miscompilation
- Fix build failure with CONFIG_GENDWARFKSYMS by disabling purgatory
symbol exports with -D__DISABLE_EXPORTS
- Update defconfigs
* tag 's390-6.14-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/boot: Fix ESSA detection
s390/purgatory: Use -D__DISABLE_EXPORTS
s390: Update defconfigs
Linus Torvalds [Sat, 22 Feb 2025 17:03:54 +0000 (09:03 -0800)]
Merge tag 'ftrace-v6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"Function graph accounting fixes:
- Fix the manage ops hashes
The function graph registers a "manager ops" and "sub-ops" to
ftrace. The manager ops does not have any callback but calls the
sub-ops callbacks. The manage ops hashes (what is used to tell
ftrace what functions to attach to) is built on the sub-ops it
manages.
There was an error in the way it built the hash. An empty hash
means to attach to all functions. When the manager ops had one
sub-ops it properly copied its hash. But when the manager ops had
more than one sub-ops, it went into a loop to make a set of all
functions it needed to add to the hash. If any of the subops hashes
was empty, that would mean to attach to all functions. The error
was that the first iteration of the loop passed in an empty hash to
start with in order to add the other hashes. That starting hash was
mistaken as to attach to all functions. This made the manage ops
attach to all functions whenever it had two or more sub-ops, even
if each sub-op was attached to only a single function.
- Do not add duplicate entries to the manager ops hash
If two or more subops hashes trace the same function, an entry for
that function will be added to the manager ops for each subops.
This causes waste and extra overhead.
Fprobe accounting fixes:
- Remove last function from fprobe hash
Fprobes has a ftrace hash to manage which functions an fprobe is
attached to. It also has a counter of how many fprobes are
attached. When the last fprobe is removed, it unregisters the
fprobe from ftrace but does not remove the functions the last
fprobe was attached to from the hash. This leaves the old functions
attached. When a new fprobe is added, the fprobe infrastructure
attaches to not only the functions of the new fprobe, but also to
the functions of the last fprobe.
- Fix accounting of the fprobe counter
When a fprobe is added, it updates a counter. If the counter goes
from zero to one, it attaches its ops to ftrace. When an fprobe is
removed, the counter is decremented. If the counter goes from 1 to
zero, it removes the fprobes ops from ftrace.
There was an issue where if two fprobes trace the same function,
the addition of each fprobe would increment the counter. But when
removing the first of the fprobes, it would notice that another
fprobe is still attached to one of its functions no it does not
remove the functions from the ftrace ops.
But it also did not decrement the counter, so when the last fprobe
is removed, the counter is still one. This leaves the fprobes
callback still registered with ftrace and it being called by the
functions defined by the fprobes ops hash. Worse yet, because all
the functions from the fprobe ops hash have been removed, that
tells ftrace that it wants to trace all functions.
Thus, this puts the state of the system where every function is
calling the fprobe callback handler (which does nothing as there
are no registered fprobes), but this causes a good 13% slow down of
the entire system.
Other updates:
- Add a selftest to test the above issues to prevent regressions.
- Fix preempt count accounting in function tracing
Better recursion protection was added to function tracing which
added another layer of preempt disable. As the preempt_count gets
traced in the event, it needs to subtract the amount of preempt
disabling the tracer does to record what the preempt_count was when
the trace was triggered.
- Fix memory leak in output of set_event
A variable is passed by the seq_file functions in the location that
is set by the return of the next() function. The start() function
allocates it and the stop() function frees it. But when the last
item is found, the next() returns NULL which leaks the data that
was allocated in start(). The m->private is used for something
else, so have next() free the data when it returns NULL, as stop()
will then just receive NULL in that case"
* tag 'ftrace-v6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix memory leak when reading set_event file
ftrace: Correct preemption accounting for function tracing.
selftests/ftrace: Update fprobe test to check enabled_functions file
fprobe: Fix accounting of when to unregister from function graph
fprobe: Always unregister fgraph function from ops
ftrace: Do not add duplicate entries in subops manager ops
ftrace: Fix accounting of adding subops to a manager ops
Michael Jeanson [Tue, 21 Jan 2025 21:33:52 +0000 (16:33 -0500)]
selftests/rseq: Add rseq syscall errors test
This test adds coverage of expected errors during rseq registration and
unregistration, it disables glibc integration and will thus always
exercise the rseq syscall explictly.
Dave Young [Tue, 28 Jan 2025 13:32:31 +0000 (21:32 +0800)]
x86/kexec: Export e820_table_kexec[] to sysfs
Previously the e820_table_kexec[] was exported to sysfs since kexec-tools uses
the memmap entries to prepare the e820 table for the new kernel.
The following commit, ~8 years ago, introduced e820_table_firmware[] and changed
the behavior to export the firmware table instead:
12df216c61c8 ("x86/boot/e820: Introduce the bootloader provided e820_table_firmware[] table")
Originally the kexec_file_load and kexec_load syscalls both used e820_table_kexec[].
Since the sysfs exported entries are from e820_table_firmware[] people
now need to tune both tables for kexec.
Restore the old behavior so the kexec_load and kexec_file_load syscalls work with
only one table update. The e820_table_firmware[] is used by hibernation kernel
code and it works without the sysfs exporting. Also remove the SEV
e820_table_firmware[] updating code.
Also update the code comments here and drop the comments about setup_data
reservation since it is not needed any more after this change was made
a year ago:
fc7f27cda843 ("x86/kexec: Do not update E820 kexec table for setup_data")
[ mingo: Tidy up the changelog and comments. ]
Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Link: https://lore.kernel.org/r/Z5jcb1GKhLvH8kDc@darkstar.users.ipa.redhat.com
drivers/i2c/i2c-core-base.c: In function ‘i2c_detect.isra’:
drivers/i2c/i2c-core-base.c:2544:1: warning: the frame size of 1312 bytes is larger than 1024 bytes [-Wframe-larger-than=]
2544 | }
| ^
Fix this by allocating the temporary client structure dynamically, as it
is a rather large structure (1216 bytes, depending on kernel config).
This is basically a revert of the to-be-fixed commit with some
checkpatch improvements.
Fixes: 735668f8e5c9 ("i2c: core: Allocate temp client on the stack in i2c_detect") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Su Hui <suhui@nfschina.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net>
[wsa: updated commit message, merged tags from similar patch] Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Linus Torvalds [Fri, 21 Feb 2025 21:16:01 +0000 (13:16 -0800)]
Merge tag 'soc-fixes-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"Two people stepped up as platform co-maintainers: Andrew Jeffery for
ASpeed and Janne Grunau for Apple.
The rockchip platform gets 9 small fixes for devicetree files,
addressing both compile-time warnings and board specific bugs.
One bugfix for the optee firmware driver addresses a reboot-time hang.
Two drivers need improved Kconfig dependencies to allow wider compile-
testing while hiding the drivers on platforms that can't use them.
ARM SCMI and loongson-guts drivers get minor bugfixes"
* tag 'soc-fixes-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
soc: loongson: loongson2_guts: Add check for devm_kstrdup()
tee: optee: Fix supplicant wait loop
platform: cznic: CZNIC_PLATFORMS should depend on ARCH_MVEBU
firmware: imx: IMX_SCMI_MISC_DRV should depend on ARCH_MXC
MAINTAINERS: arm: apple: Add Janne as maintainer
MAINTAINERS: Mark Andrew as M: for ASPEED MACHINE SUPPORT
firmware: arm_scmi: imx: Correct tx size of scmi_imx_misc_ctrl_set
arm64: dts: rockchip: adjust SMMU interrupt type on rk3588
arm64: dts: rockchip: disable IOMMU when running rk3588 in PCIe endpoint mode
dt-bindings: rockchip: pmu: Ensure all properties are defined
arm64: defconfig: Enable TISCI Interrupt Router and Aggregator
arm64: dts: rockchip: Fix lcdpwr_en pin for Cool Pi GenBook
arm64: dts: rockchip: fix fixed-regulator renames on rk3399-gru devices
arm64: dts: rockchip: Disable DMA for uart5 on px30-ringneck
arm64: dts: rockchip: Move uart5 pin configuration to px30 ringneck SoM
arm64: dts: rockchip: change eth phy mode to rgmii-id for orangepi r1 plus lts
arm64: dts: rockchip: Fix broken tsadc pinctrl names for rk3588
Linus Torvalds [Fri, 21 Feb 2025 21:10:22 +0000 (13:10 -0800)]
Merge tag 'drm-fixes-2025-02-22' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly drm fixes pull request, lots of small things all over, msm has
a bunch of things but all very small, xe, i915, a fix for the cgroup
dmem controller.
core:
- remove MAINTAINERS entry
cgroup/dmem:
- use correct function for pool descendants
panel:
- fix signal polarity issue jd9365da-h3
nouveau:
- folio handling fix
- config fix
amdxdna:
- fix missing header
xe:
- Fix error handling in xe_irq_install
- Fix devcoredump format
i915:
- Use spin_lock_irqsave() in interruptible context on guc submission
- Fixes on DDI and TRANS programming
- Make sure all planes in use by the joiner have their crtc included
- Fix 128b/132b modeset issues
msm:
- More catalog fixes:
- to skip watchdog programming through top block if its not
present
- fix the setting of WB mask to ensure the WB input control is
programmed correctly through ping-pong
- drop lm_pair for sm6150 as that chipset does not have any
3dmerge block
- Fix the mode validation logic for DP/eDP to account for widebus
(2ppc) to allow high clock resolutions
- Fix to disable dither during encoder disable as otherwise this
was causing kms_writeback failure due to resource sharing
between WB and DSI paths as DSI uses dither but WB does not
- Fixes for virtual planes, namely to drop extraneous return and
fix uninitialized variables
- Fix to avoid spill-over of DSC encoder block bits when
programming the bits-per-component
- Fixes in the DSI PHY to protect against concurrent access of
PHY_CMN_CLK_CFG regs between clock and display drivers
- Core/GPU:
- Fix non-blocking fence wait incorrectly rounding up to 1 jiffy
timeout
- Only print GMU fw version once, instead of each time the GPU
resumes"
* tag 'drm-fixes-2025-02-22' of https://gitlab.freedesktop.org/drm/kernel: (28 commits)
drm/i915/dp: Fix disabling the transcoder function in 128b/132b mode
drm/i915/dp: Fix error handling during 128b/132b link training
accel/amdxdna: Add missing include linux/slab.h
MAINTAINERS: Remove myself
drm/nouveau/pmu: Fix gp10b firmware guard
cgroup/dmem: Don't open-code css_for_each_descendant_pre
drm/xe/guc: Fix size_t print format
drm/xe: Make GUC binaries dump consistent with other binaries in devcoredump
drm/i915: Make sure all planes in use by the joiner have their crtc included
drm/i915/ddi: Fix HDMI port width programming in DDI_BUF_CTL
drm/i915/dsi: Use TRANS_DDI_FUNC_CTL's own port width macro
drm/xe: Fix error handling in xe_irq_install()
drm/i915/gt: Use spin_lock_irqsave() in interruptible context
drm/msm/dsi/phy: Do not overwite PHY_CMN_CLK_CFG1 when choosing bitclk source
drm/msm/dsi/phy: Protect PHY_CMN_CLK_CFG1 against clock driver
drm/msm/dsi/phy: Protect PHY_CMN_CLK_CFG0 updated from driver side
drm/msm/dpu: Drop extraneous return in dpu_crtc_reassign_planes()
drm/msm/dpu: Don't leak bits_per_component into random DSC_ENC fields
drm/msm/dpu: Disable dither in phys encoder cleanup
drm/msm/dpu: Fix uninitialized variable
...
Results are based on running 20 tests with turbo disabled (to reduce
clock freq turbo changes), with 10 second run per test based on the
number of system calls per second. The % standard deviation of the
measurements for the 20 tests was 0.05% to 0.40%, so the results are
reliable.
Yunhui Cui [Sun, 26 Jan 2025 03:32:43 +0000 (11:32 +0800)]
locking/mutex: Add MUTEX_WARN_ON() into fast path
Scenario: In platform_device_register, the driver misuses struct
device as platform_data, making kmemdup duplicate a device. Accessing
the duplicate may cause list corruption due to its mutex magic or list
holding old content.
It recurs randomly as the first mutex - getting process skips the slow
path and mutex check. Adding MUTEX_WARN_ON(lock->magic!= lock) in
__mutex_trylock_fast() makes it always happen.
Linus Torvalds [Fri, 21 Feb 2025 17:36:28 +0000 (09:36 -0800)]
Merge tag 'block-6.14-20250221' of git://git.kernel.dk/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- FC controller state check fixes (Daniel)
- PCI Endpoint fixes (Damien)
- TCP connection failure fixe (Caleb)
- TCP handling C2HTermReq PDU (Maurizio)
- RDMA queue state check (Ruozhu)
- Apple controller fixes (Hector)
- Target crash on disbaled namespace (Hannes)
- MD pull request via Yu:
- Fix queue limits error handling for raid0, raid1 and raid10
- Fix for a NULL pointer deref in request data mapping
- Code cleanup for request merging
* tag 'block-6.14-20250221' of git://git.kernel.dk/linux:
nvme: only allow entering LIVE from CONNECTING state
nvme-fc: rely on state transitions to handle connectivity loss
apple-nvme: Support coprocessors left idle
apple-nvme: Release power domains when probe fails
nvmet: Use enum definitions instead of hardcoded values
nvme: Cleanup the definition of the controller config register fields
nvme/ioctl: add missing space in err message
nvme-tcp: fix connect failure on receiving partial ICResp PDU
nvme: tcp: Fix compilation warning with W=1
nvmet: pci-epf: Avoid RCU stalls under heavy workload
nvmet: pci-epf: Do not uselessly write the CSTS register
nvmet: pci-epf: Correctly initialize CSTS when enabling the controller
nvmet-rdma: recheck queue state is LIVE in state lock in recv done
nvmet: Fix crash when a namespace is disabled
nvme-tcp: add basic support for the C2HTermReq PDU
nvme-pci: quirk Acer FA100 for non-uniqueue identifiers
block: fix NULL pointer dereferenced within __blk_rq_map_sg
block/merge: remove unnecessary min() with UINT_MAX
md/raid*: Fix the set_queue_limits implementations
Linus Torvalds [Fri, 21 Feb 2025 17:17:56 +0000 (09:17 -0800)]
Merge tag 'io_uring-6.14-20250221' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Series fixing an issue with multishot read on pollable files that may
return -EIOCBQUEUED from ->read_iter(). Four small patches for that,
the first one deliberately done in such a way that it'd be easy to
backport
- Remove some dead constant definitions
- Use array_index_nospec() for opcode indexing
- Work-around for worker creation retries in the presence of signals
* tag 'io_uring-6.14-20250221' of git://git.kernel.dk/linux:
io_uring/rw: clean up mshot forced sync mode
io_uring/rw: move ki_complete init into prep
io_uring/rw: don't directly use ki_complete
io_uring/rw: forbid multishot async reads
io_uring/rsrc: remove unused constants
io_uring: fix spelling error in uapi io_uring.h
io_uring: prevent opcode speculation
io-wq: backoff when retrying worker creation
Linus Torvalds [Fri, 21 Feb 2025 17:11:25 +0000 (09:11 -0800)]
Merge tag 'acpi-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix a memory leak in the ACPI platform_profile driver (Kurt Borja)"
* tag 'acpi-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: platform_profile: Fix memory leak in profile_class_is_visible()
Linus Torvalds [Fri, 21 Feb 2025 17:07:04 +0000 (09:07 -0800)]
Merge tag 'mtd/fixes-for-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux
Pull mtd fixes from Miquel Raynal:
"The two most important fixes in this list are probably the SST write
failure and the Qcom raw NAND controller probe failure which are due
to some refactoring, otherwise there has been a series of misc fixes
on the Cadence raw NAND controller driver and especially on the DMA
side"
* tag 'mtd/fixes-for-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
mtd: rawnand: cadence: fix unchecked dereference
mtd: spi-nor: sst: Fix SST write failure
dt-bindings: mtd: cadence: document required clock-names
mtd: rawnand: qcom: fix broken config in qcom_param_page_type_exec
mtd: rawnand: cadence: fix incorrect device in dma_unmap_single
mtd: rawnand: cadence: use dma_map_resource for sdma address
mtd: rawnand: cadence: fix error code in cadence_nand_init()
Linus Torvalds [Fri, 21 Feb 2025 16:59:27 +0000 (08:59 -0800)]
Merge tag 'gpio-fixes-for-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
"There are two fixes for GPIO core: one adds missing retval checks to
older code, while the second adds SRCU synchronization to legs in code
that were missed during the big rework a few cycles back. There's also
one small driver fix:
- check the return value of the get_direction() callback in struct
gpio_chip
- protect the multi-line get/set legs in GPIO core with SRCU
- fix a race condition in gpio-vf610"
* tag 'gpio-fixes-for-v6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: don't bail out if get_direction() fails in gpiochip_add_data()
gpiolib: protect gpio_chip with SRCU in array_info paths in multi get/set
gpio: vf610: add locking to gpio direction functions
gpiolib: check the return value of gpio_chip::get_direction()
x86/e820: Drop obsolete E820_TYPE_RESERVED_KERN and related code
E820_TYPE_RESERVED_KERN is a relict from the ancient history that was used
to early reserve setup_data, see:
28bb22379513 ("x86: move reserve_setup_data to setup.c")
Nowadays setup_data is anyway reserved in memblock and there is no point in
carrying E820_TYPE_RESERVED_KERN that behaves exactly like E820_TYPE_RAM
but only complicates the code.
A bonus for removing E820_TYPE_RESERVED_KERN is a small but measurable
speedup of 20 microseconds in init_mem_mappings() on a VM with 32GB or RAM.
x86/boot: Move setting of memblock parameters to e820__memblock_setup()
Changing memblock parameters, namely bottom_up and allocation upper
limit does not have any effect before memblock initialization in
e820__memblock_setup().
Move the calls to memblock_set_bottom_up() and memblock_set_current_limit()
to e820__memblock_setup() to group all the memblock initial setup and make
setup_arch() more readable.
the usage of asm pseudo directives in the asm template can confuse
the compiler to wrongly estimate the size of the generated
code.
The ALTERNATIVE macro expands to several asm pseudo directives,
so its usage in {,try_}cmpxchg{64,128} causes instruction length estimate
to fail by an order of magnitude (the specially instrumented compiler
reports the estimated length of these asm templates to be more than 20
instructions long).
This incorrect estimate further causes unoptimal inlining
decisions, unoptimal instruction scheduling and unoptimal code block
alignments for functions that use these locking primitives.
which is a feature that makes GCC pretend some inline assembler code
is tiny (while it would think it is huge), instead of just asm.
For code size estimation, the size of the asm is then taken as
the minimum size of one instruction, ignoring how many instructions
compiler thinks it is.
The effect of this patch on x86_64 target is minor, since 128-bit
functions are rarely used on this target. The code size of the resulting
defconfig object file stays the same:
Uros Bizjak [Fri, 14 Feb 2025 15:07:46 +0000 (16:07 +0100)]
x86/locking: Use ALT_OUTPUT_SP() for percpu_{,try_}cmpxchg{64,128}_op()
percpu_{,try_}cmpxchg{64,128}() macros use CALL instruction inside
asm statement in one of their alternatives. Use ALT_OUTPUT_SP()
macro to add required dependence on %esp register.
ALT_OUTPUT_SP() implements the above dependence by adding
ASM_CALL_CONSTRAINT to its arguments. This constraint should be used
for any inline asm which has a CALL instruction, otherwise the
compiler may schedule the asm before the frame pointer gets set up
by the containing function, causing objtool to print a "call without
frame pointer save/setup" warning.
The root cause is that s_next() returns NULL when nothing is found.
This results in s_stop() attempting to free a NULL pointer because its
parameter is NULL.
Fix the issue by freeing the memory appropriately when s_next() fails
to find anything.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250220031528.7373-1-ahuang12@lenovo.com Fixes: b355247df104 ("tracing: Cache ":mod:" events for modules not loaded yet") Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
ftrace: Correct preemption accounting for function tracing.
The function tracer should record the preemption level at the point when
the function is invoked. If the tracing subsystem decrement the
preemption counter it needs to correct this before feeding the data into
the trace buffer. This was broken in the commit cited below while
shifting the preempt-disabled section.
Use tracing_gen_ctx_dec() which properly subtracts one from the
preemption counter on a preemptible kernel.
Cc: stable@vger.kernel.org Cc: Wander Lairson Costa <wander@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/20250220140749.pfw8qoNZ@linutronix.de Fixes: ce5e48036c9e7 ("ftrace: disable preemption when recursion locked") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Wander Lairson Costa <wander@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Thu, 20 Feb 2025 20:20:14 +0000 (15:20 -0500)]
selftests/ftrace: Update fprobe test to check enabled_functions file
A few bugs were found in the fprobe accounting logic along with it using
the function graph infrastructure. Update the fprobe selftest to catch
those bugs in case they or something similar shows up in the future.
The test now checks the enabled_functions file which shows all the
functions attached to ftrace or fgraph. When enabling a fprobe, make sure
that its corresponding function is also added to that file. Also add two
more fprobes to enable to make sure that the fprobe logic works properly
with multiple probes.
Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/20250220202055.733001756@goodmis.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Tested-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Thu, 20 Feb 2025 20:20:13 +0000 (15:20 -0500)]
fprobe: Fix accounting of when to unregister from function graph
When adding a new fprobe, it will update the function hash to the
functions the fprobe is attached to and register with function graph to
have it call the registered functions. The fprobe_graph_active variable
keeps track of the number of fprobes that are using function graph.
If two fprobes attach to the same function, it increments the
fprobe_graph_active for each of them. But when they are removed, the first
fprobe to be removed will see that the function it is attached to is also
used by another fprobe and it will not remove that function from
function_graph. The logic will skip decrementing the fprobe_graph_active
variable.
This causes the fprobe_graph_active variable to not go to zero when all
fprobes are removed, and in doing so it does not unregister from
function graph. As the fgraph ops hash will now be empty, and an empty
filter hash means all functions are enabled, this triggers function graph
to add a callback to the fprobe infrastructure for every function!
Steven Rostedt [Thu, 20 Feb 2025 20:20:12 +0000 (15:20 -0500)]
fprobe: Always unregister fgraph function from ops
When the last fprobe is removed, it calls unregister_ftrace_graph() to
remove the graph_ops from function graph. The issue is when it does so, it
calls return before removing the function from its graph ops via
ftrace_set_filter_ips(). This leaves the last function lingering in the
fprobe's fgraph ops and if a probe is added it also enables that last
function (even though the callback will just drop it, it does add unneeded
overhead to make that call).
The above enabled a fprobe on kernel_clone, and then on schedule_timeout.
The content of the enabled_functions shows the functions that have a
callback attached to them. The fprobe attached to those functions
properly. Then the fprobes were cleared, and enabled_functions was empty
after that. But after adding a fprobe on kmem_cache_free, the
enabled_functions shows that the schedule_timeout was attached again. This
is because it was still left in the fprobe ops that is used to tell
function graph what functions it wants callbacks from.
Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/20250220202055.393254452@goodmis.org Fixes: 4346ba1604093 ("fprobe: Rewrite fprobe on function-graph tracer") Tested-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Thu, 20 Feb 2025 20:20:11 +0000 (15:20 -0500)]
ftrace: Do not add duplicate entries in subops manager ops
Check if a function is already in the manager ops of a subops. A manager
ops contains multiple subops, and if two or more subops are tracing the
same function, the manager ops only needs a single entry in its hash.
Cc: stable@vger.kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/20250220202055.226762894@goodmis.org Fixes: 4f554e955614f ("ftrace: Add ftrace_set_filter_ips function") Tested-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Steven Rostedt [Thu, 20 Feb 2025 20:20:10 +0000 (15:20 -0500)]
ftrace: Fix accounting of adding subops to a manager ops
Function graph uses a subops and manager ops mechanism to attach to
ftrace. The manager ops connects to ftrace and the functions it connects
to is defined by a list of subops that it manages.
The function hash that defines what the above ops attaches to limits the
functions to attach if the hash has any content. If the hash is empty, it
means to trace all functions.
The creation of the manager ops hash is done by iterating over all the
subops hashes. If any of the subops hashes is empty, it means that the
manager ops hash must trace all functions as well.
The issue is in the creation of the manager ops. When a second subops is
attached, a new hash is created by starting it as NULL and adding the
subops one at a time. But the NULL ops is mistaken as an empty hash, and
once an empty hash is found, it stops the loop of subops and just enables
all functions.
Fix this by initializing the new hash to NULL and if the hash is NULL do
not treat it as an empty hash but instead allocate by copying the content
of the first sub ops. Then on subsequent iterations, the new hash will not
be NULL, but the content of the previous subops. If that first subops
attached to all functions, then new hash may assume that the manager ops
also needs to attach to all functions.
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/20250220202055.060300046@goodmis.org Fixes: 5fccc7552ccbc ("ftrace: Add subops logic to allow one ops to manage many") Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
x86/tsc: Always save/restore TSC sched_clock() on suspend/resume
TSC could be reset in deep ACPI sleep states, even with invariant TSC.
That's the reason we have sched_clock() save/restore functions, to deal
with this situation. But what happens is that such functions are guarded
with a check for the stability of sched_clock - if not considered stable,
the save/restore routines aren't executed.
On top of that, we have a clear comment in native_sched_clock() saying
that *even* with TSC unstable, we continue using TSC for sched_clock due
to its speed.
In other words, if we have a situation of TSC getting detected as unstable,
it marks the sched_clock as unstable as well, so subsequent S3 sleep cycles
could bring bogus sched_clock values due to the lack of the save/restore
mechanism, causing warnings like this:
[22.954918] ------------[ cut here ]------------
[22.954923] Delta way too big! 18446743750843854390 ts=18446744072977390405 before=322133536015 after=322133536015 write stamp=18446744072977390405
[22.954923] If you just came from a suspend/resume,
[22.954923] please switch to the trace global clock:
[22.954923] echo global > /sys/kernel/tracing/trace_clock
[22.954923] or add trace_clock=global to the kernel command line
[22.954937] WARNING: CPU: 2 PID: 5728 at kernel/trace/ring_buffer.c:2890 rb_add_timestamp+0x193/0x1c0
Notice that the above was reproduced even with "trace_clock=global".
The fix for that is to _always_ save/restore the sched_clock on suspend
cycle _if TSC is used_ as sched_clock - only if we fallback to jiffies
the sched_clock_stable() check becomes relevant to save/restore the
sched_clock.
Debugged-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20250215210314.351480-1-gpiccoli@igalia.com
Joel Granados [Tue, 18 Feb 2025 09:56:21 +0000 (10:56 +0100)]
perf/core: Move perf_event sysctls into kernel/events
Move ctl tables to two files:
- perf_event_{paranoid,mlock_kb,max_sample_rate} and
perf_cpu_time_max_percent into kernel/events/core.c
- perf_event_max_{stack,context_per_stack} into
kernel/events/callchain.c
Make static variables and functions that are fully contained in core.c
and callchain.cand remove them from include/linux/perf_event.h.
Additionally six_hundred_forty_kb is moved to callchain.c.
Two new sysctl tables are added ({callchain,events_core}_sysctl_table)
with their respective sysctl registration functions.
This is part of a greater effort to move ctl tables into their
respective subsystems which will reduce the merge conflicts in
kerenel/sysctl.c.
Michael Jeanson [Wed, 19 Feb 2025 20:53:26 +0000 (15:53 -0500)]
rseq: Fix rseq registration with CONFIG_DEBUG_RSEQ
With CONFIG_DEBUG_RSEQ=y, at rseq registration the read-only fields are
copied from user-space, if this copy fails the syscall returns -EFAULT
and the registration should not be activated - but it erroneously is.
Move the activation of the registration after the copy of the fields to
fix this bug.
x86/build: Raise the minimum LLVM version to 15.0.0
In a similar vein as to this pending commit in the x86/asm tree:
a3e8fe814ad1 ("x86/build: Raise the minimum GCC version to 8.1")
... bump the minimum supported version of LLVM for building x86 kernels
to 15.0.0, as that is the first version that has support for
'-mstack-protector-guard-symbol', which is used unconditionally after:
80d47defddc0 ("x86/stackprotector/64: Convert to normal per-CPU variable"):
Dmitry Osipenko [Sun, 16 Feb 2025 22:16:34 +0000 (01:16 +0300)]
arm64: dts: rockchip: rk356x: Move PCIe MSI to use GIC ITS instead of MBI
Rockchip 356x device-tree now supports GIC ITS. Move PCIe controller's
MSI to use ITS instead of MBI. This removes extra CPU overhead of handling
PCIe MBIs by letting GIC's ITS to serve the PCIe MSIs.
Rockchip 356x SoC's GIC has two hardware integration issues that
affect MSI functionality of the GIC. Previously, both these GIC
issues were worked around by using MBI for MSI instead of ITS
because kernel GIC driver didn't have necessary quirks.
First issue is about RK356x GIC not supporting programmable
shareability, while reporting it as supported in a GIC's feature
register. Rockchip assigned Erratum ID #3568001 for this issue. This
patch adds dma-noncoherent property to the GIC node, denoting that a SW
workaround is required for mitigating the issue.
Second issue is about GIC AXI master interface addressing limited to
the first 4GB of physical address space. Rockchip assigned Erratum
ID #3568002 for this issue.
Now that kernel supports quirks for both of the erratums, add
MSI controller node to RK356x device-tree.
Rockchip RK3566/RK3568 GIC600 integration has DDR addressing
limited to the first 32bit of physical address space. Rockchip
assigned Erratum ID #3568002 for this issue. Add driver quirk for
this Rockchip GIC Erratum.
Note, that the 0x0201743b GIC600 ID is not Rockchip-specific and is
common for many ARM GICv3 implementations. Hence, there is an extra
of_machine_is_compatible() check.
vdso: Remove remnants of architecture-specific time storage
All users of the time releated parts of the vDSO are now using the generic
storage implementation. Remove the therefore unnecessary compatibility
accessor functions and symbols.
The values are not used anymore.
Also the sanity checks performed by vdso2c can never trigger as they
only validate invariants already enforced by the linker script.
x86/vdso: Switch to generic storage implementation
The generic storage implementation provides the same features as the
custom one. However it can be shared between architectures, making
maintenance easier.
This switch also moves the random state data out of the time data page.
The currently used hardcoded __VDSO_RND_DATA_OFFSET does not take into
account changes to the time data page layout.
powerpc/vdso: Switch to generic storage implementation
The generic storage implementation provides the same features as the
custom one. However it can be shared between architectures, making
maintenance easier.
Co-developed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-14-13a4669dfc8c@linutronix.de
MIPS: vdso: Switch to generic storage implementation
The generic storage implementation provides the same features as the
custom one. However it can be shared between architectures, making
maintenance easier.
s390/vdso: Switch to generic storage implementation
The generic storage implementation provides the same features as the
custom one. However it can be shared between architectures, making
maintenance easier.
Co-developed-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/all/20250204-vdso-store-rng-v3-12-13a4669dfc8c@linutronix.de
arm: vdso: Switch to generic storage implementation
The generic storage implementation provides the same features as the
custom one. However it can be shared between architectures, making
maintenance easier.
LoongArch: vDSO: Switch to generic storage implementation
The generic storage implementation provides the same features as the
custom one. However it can be shared between architectures, making
maintenance easier.
riscv: vdso: Switch to generic storage implementation
The generic storage implementation provides the same features as the
custom one. However it can be shared between architectures, making
maintenance easier.
arm64: vdso: Switch to generic storage implementation
The generic storage implementation provides the same features as the
custom one. However it can be shared between architectures, making
maintenance easier.
This switch also moves the random state data out of the time data page.
The currently used hardcoded __VDSO_RND_DATA_OFFSET does not take into
account changes to the time data page layout.
vdso: Add generic architecture-specific data storage
Some architectures need to expose architecture-specific data to the vDSO.
Enable the generic vDSO storage mechanism to both store and map this
data. Some architectures require more than a single page, like LoongArch,
so prepare for that usecase, too.
Extend the generic vDSO data storage with a page for the random state data.
The random state data is stored in a dedicated page, as the existing
storage page is only meant for time-related, time-namespace-aware data.
This simplifies to access logic to not need to handle time namespaces
anymore and also frees up more space in the time-related page.
In case further generic vDSO data store is required it can be added to
the random state page.
Historically each architecture defined their own way to store the vDSO
data page. Add a generic mechanism to provide storage for that page.
Furthermore this generic storage will be extended to also provide
uniform storage for *non*-time-related data, like the random state or
architecture-specific data. These will have their own pages and data
structures, so rename 'vdso_data' into 'vdso_time_data' to make that
split clear from the name.
Also introduce a new consistent naming scheme for the symbols related to
the vDSO, which makes it clear if the symbol is accessible from
userspace or kernel space and the type of data behind the symbol.
The generic fault handler contains an optimization to prefault the vvar
page when the timens page is accessed. This was lifted from s390 and x86.
As the Makefile is included into other Makefiles it can not be used to
define objects to be built from the current source directory.
However the generic datastore will introduce such a local source file.
Rename the included Makefile so it is clear how it is to be used and to
make room for a regular Makefile in lib/vdso/.
The vDSO implementation can only include headers from the vdso/
namespace. To enable the usage of the ALIGN() macro from the vDSO, move
linux/align.h to vdso/align.h wholly.
As the only dependency linux/const.h is only a wrapper around
vdso/const.h anyways adapt that dependency.
Also provide a compatibility wrapper linux/align.h.
x86/vdso: Fix latent bug in vclock_pages calculation
The vclock pages are *after* the non-vclock pages. Currently there are both
two vclock and two non-vclock pages so the existing logic works by
accident. As soon as the number of pages changes it will break however.
This will be the case with the introduction of the generic vDSO data
storage.
Use a macro to keep the calculation understandable and in sync between
the linker script and mapping code.
Stephan Gerhold [Tue, 18 Feb 2025 15:59:18 +0000 (16:59 +0100)]
irqchip/qcom-pdc: Workaround hardware register bug on X1E80100
On X1E80100, there is a hardware bug in the register logic of the
IRQ_ENABLE_BANK register: While read accesses work on the normal address,
all write accesses must be made to a shifted address. Without a workaround
for this, the wrong interrupt gets enabled in the PDC and it is impossible
to wakeup from deep suspend (CX collapse). This has not caused problems so
far, because the deep suspend state was not enabled. A workaround is
required now since work is ongoing to fix this.
The PDC has multiple "DRV" regions, each one has a size of 0x10000 and
provides the same set of registers for a particular client in the system.
Linux is one the clients and uses DRV region 2 on X1E. Each "bank" inside
the DRV region consists of 32 interrupt pins that can be enabled using the
IRQ_ENABLE_BANK register:
IRQ_ENABLE_BANK[bank] = base + IRQ_ENABLE_BANK + bank * sizeof(u32)
On X1E, this works as intended for read access. However, write access to
most banks is shifted by 2:
Introduce a workaround for the bug by matching the qcom,x1e80100-pdc
compatible and apply the offsets as shown above:
- Bank 0...1: previous DRV region, bank += 3
- Bank 1...4: our DRV region, bank -= 2
- Bank 5: our DRV region, no fixup required
The PDC node in the device tree only describes the DRV region for the Linux
client, but the workaround also requires to map parts of the previous DRV
region to issue writes there. To maintain compatibility with old device
trees, obtain the base address of the preceeding region by applying the
-0x10000 offset. Note that this is also more correct from a conceptual
point of view:
It does not really make use of the other region; it just issues shifted
writes that end up in the registers of the Linux associated DRV region 2.
Dave Airlie [Fri, 21 Feb 2025 00:50:28 +0000 (10:50 +1000)]
Merge tag 'drm-msm-fixes-2025-02-20' of https://gitlab.freedesktop.org/drm/msm into drm-fixes
Fixes for v6.14-rc4
Display:
* More catalog fixes:
- to skip watchdog programming through top block if its not present
- fix the setting of WB mask to ensure the WB input control is programmed
correctly through ping-pong
- drop lm_pair for sm6150 as that chipset does not have any 3dmerge block
* Fix the mode validation logic for DP/eDP to account for widebus (2ppc)
to allow high clock resolutions
* Fix to disable dither during encoder disable as otherwise this was
causing kms_writeback failure due to resource sharing between
* WB and DSI paths as DSI uses dither but WB does not
* Fixes for virtual planes, namely to drop extraneous return and fix
uninitialized variables
* Fix to avoid spill-over of DSC encoder block bits when programming
the bits-per-component
* Fixes in the DSI PHY to protect against concurrent access of
PHY_CMN_CLK_CFG regs between clock and display drivers
Core/GPU:
* Fix non-blocking fence wait incorrectly rounding up to 1 jiffy timeout
* Only print GMU fw version once, instead of each time the GPU resumes
Dave Airlie [Fri, 21 Feb 2025 00:44:53 +0000 (10:44 +1000)]
Merge tag 'drm-intel-fixes-2025-02-20' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Use spin_lock_irqsave() in interruptible context on guc submission (Krzysztof)
- Fixes on DDI and TRANS programming (Imre)
- Make sure all planes in use by the joiner have their crtc included (Ville)
- Fix 128b/132b modeset issues (Imre)
Jens Axboe [Fri, 21 Feb 2025 00:43:59 +0000 (17:43 -0700)]
Merge tag 'nvme-6.14-2025-02-20' of git://git.infradead.org/nvme into block-6.14
Pull NVMe fixes from Keith:
"nvme fixes for Linux 6.14
- FC controller state check fixes (Daniel)
- PCI Endpoint fixes (Damien)
- TCP connection failure fixe (Caleb)
- TCP handling C2HTermReq PDU (Maurizio)
- RDMA queue state check (Ruozhu)
- Apple controller fixes (Hector)
- Target crash on disbaled namespace (Hannes)"
* tag 'nvme-6.14-2025-02-20' of git://git.infradead.org/nvme:
nvme: only allow entering LIVE from CONNECTING state
nvme-fc: rely on state transitions to handle connectivity loss
apple-nvme: Support coprocessors left idle
apple-nvme: Release power domains when probe fails
nvmet: Use enum definitions instead of hardcoded values
nvme: Cleanup the definition of the controller config register fields
nvme/ioctl: add missing space in err message
nvme-tcp: fix connect failure on receiving partial ICResp PDU
nvme: tcp: Fix compilation warning with W=1
nvmet: pci-epf: Avoid RCU stalls under heavy workload
nvmet: pci-epf: Do not uselessly write the CSTS register
nvmet: pci-epf: Correctly initialize CSTS when enabling the controller
nvmet-rdma: recheck queue state is LIVE in state lock in recv done
nvmet: Fix crash when a namespace is disabled
nvme-tcp: add basic support for the C2HTermReq PDU
nvme-pci: quirk Acer FA100 for non-uniqueue identifiers
Linus Torvalds [Thu, 20 Feb 2025 23:37:17 +0000 (15:37 -0800)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull BPF fixes from Daniel Borkmann:
- Fix a soft-lockup in BPF arena_map_free on 64k page size kernels
(Alan Maguire)
- Fix a missing allocation failure check in BPF verifier's
acquire_lock_state (Kumar Kartikeya Dwivedi)
- Fix a NULL-pointer dereference in trace_kfree_skb by adding kfree_skb
to the raw_tp_null_args set (Kuniyuki Iwashima)
- Fix a deadlock when freeing BPF cgroup storage (Abel Wu)
- Fix a syzbot-reported deadlock when holding BPF map's freeze_mutex
(Andrii Nakryiko)
- Fix a use-after-free issue in bpf_test_init when eth_skb_pkt_type is
accessing skb data not containing an Ethernet header (Shigeru
Yoshida)
- Fix skipping non-existing keys in generic_map_lookup_batch (Yan Zhai)
- Several BPF sockmap fixes to address incorrect TCP copied_seq
calculations, which prevented correct data reads from recv(2) in user
space (Jiayuan Chen)
- Two fixes for BPF map lookup nullness elision (Daniel Xu)
- Fix a NULL-pointer dereference from vmlinux BTF lookup in
bpf_sk_storage_tracing_allowed (Jared Kangas)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests: bpf: test batch lookup on array of maps with holes
bpf: skip non exist keys in generic_map_lookup_batch
bpf: Handle allocation failure in acquire_lock_state
bpf: verifier: Disambiguate get_constant_map_key() errors
bpf: selftests: Test constant key extraction on irrelevant maps
bpf: verifier: Do not extract constant map keys for irrelevant maps
bpf: Fix softlockup in arena_map_free on 64k page kernel
net: Add rx_skb of kfree_skb to raw_tp_null_args[].
bpf: Fix deadlock when freeing cgroup storage
selftests/bpf: Add strparser test for bpf
selftests/bpf: Fix invalid flag of recv()
bpf: Disable non stream socket for strparser
bpf: Fix wrong copied_seq calculation
strparser: Add read_sock callback
bpf: avoid holding freeze_mutex during mmap operation
bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic
selftests/bpf: Adjust data size to have ETH_HLEN
bpf, test_run: Fix use-after-free issue in eth_skb_pkt_type()
bpf: Remove unnecessary BTF lookups in bpf_sk_storage_tracing_allowed