]> Gentwo Git Trees - linux/.git/log
linux/.git
5 days agodrm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:51 +0000 (14:29 +0100)]
drm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ring

On old GPUs, it may be an issue that handling the interrupts from
VM faults is too slow and the interrupt handler (IH) ring may
overflow, which can cause an eventual hang.

Delegate the processing of all VM faults to the soft
IRQ handler ring.

As a result, we spend much less time in the IRQ handler that
interacts with the HW IH ring, which significantly reduces the
chance of hangs/reboots.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 days agodrm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:50 +0000 (14:29 +0100)]
drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ring

On old GPUs, it may be an issue that handling the interrupts from
VM faults is too slow and the interrupt handler (IH) ring may
overflow, which can cause an eventual hang.

Delegate the processing of all VM faults to the soft
IRQ handler ring.

As a result, we spend much less time in the IRQ handler that
interacts with the HW IH ring, which significantly reduces the
chance of hangs/reboots.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 days agodrm/amdgpu/gmc6: Cache VM fault info
Timur Kristóf [Wed, 26 Nov 2025 13:29:49 +0000 (14:29 +0100)]
drm/amdgpu/gmc6: Cache VM fault info

Call amdgpu_vm_update_fault_cache on GMC v6 similarly to how we
do in GMC v7-v8 so that VM fault info can be used later by
userspace for debugging.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 days agodrm/amdgpu/gmc6: Don't print MC client as it's unknown
Timur Kristóf [Wed, 26 Nov 2025 13:29:48 +0000 (14:29 +0100)]
drm/amdgpu/gmc6: Don't print MC client as it's unknown

The VM_CONTEXT1_PROTECTION_FAULT_MCCLIENT register
doesn't exist on GMC v6 so we can't print the MC client as a
string like we do on GMC v7-v8. However, we still print the
mc_id from VM_CONTEXT1_PROTECTION_FAULT_STATUS.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 days agodrm/amdgpu/cz_ih: Enable soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:47 +0000 (14:29 +0100)]
drm/amdgpu/cz_ih: Enable soft IRQ handler ring

We are going to use the soft IRQ handler ring on GMC v8
to process interrupts from VM faults.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 days agodrm/amdgpu/tonga_ih: Enable soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:46 +0000 (14:29 +0100)]
drm/amdgpu/tonga_ih: Enable soft IRQ handler ring

We are going to use the soft IRQ handler ring on GMC v8
to process interrupts from VM faults.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 days agodrm/amdgpu/iceland_ih: Enable soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:45 +0000 (14:29 +0100)]
drm/amdgpu/iceland_ih: Enable soft IRQ handler ring

We are going to use the soft IRQ handler ring on GMC v8
to process interrupts from VM faults.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 days agodrm/amdgpu/cik_ih: Enable soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:44 +0000 (14:29 +0100)]
drm/amdgpu/cik_ih: Enable soft IRQ handler ring

We are going to use the soft IRQ handler ring on GMC v7 (CIK)
to process interrupts from VM faults.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 days agodrm/amdgpu/si_ih: Enable soft IRQ handler ring
Timur Kristóf [Wed, 26 Nov 2025 13:29:43 +0000 (14:29 +0100)]
drm/amdgpu/si_ih: Enable soft IRQ handler ring

We are going to use the soft IRQ handler ring on GMC v6 (SI)
to process interrupts from VM faults.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 days agodrm/amd/display: fix typo in display_mode_core_structs.h
Aditya Gollamudi [Sun, 12 Oct 2025 19:13:19 +0000 (12:13 -0700)]
drm/amd/display: fix typo in display_mode_core_structs.h

Fix a typo in a comment, change "enviroment" to "environment" in
drivers/gpu/drm/amd/display/dc/dml2/display_mode_core_structs.h

Fixes: e6a8a000cfe6 ("drm/amd/display: Rename dml2 to dml2_0 folder")
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Aditya Gollamudi <adigollamudi@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 days agodrm/amd/display: fix Smart Power OLED not working after S4
Ian Chen [Thu, 13 Nov 2025 05:07:58 +0000 (13:07 +0800)]
drm/amd/display: fix Smart Power OLED not working after S4

[HOW]
Before enable smart power OLED, we need to call set pipe to let
DMUB get correct ABM config.

Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Ian Chen <ian.chen@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 days agodrm/amd/display: Move RGB-type check for audio sync to DCE HW sequence
Ivan Lipski [Fri, 21 Nov 2025 20:03:57 +0000 (15:03 -0500)]
drm/amd/display: Move RGB-type check for audio sync to DCE HW sequence

[Why&How]
DVI-A & VGA connectors are applicable to DCE ASICs, so move them to
dce110_hwseq.c to block audio sync on SIGNAL_TYPE_RGB for DCE ASICs.

Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
5 days agodrm/amdgpu: add missing lock to amdgpu_ttm_access_memory_sdma
Pierre-Eric Pelloux-Prayer [Tue, 25 Nov 2025 09:48:39 +0000 (10:48 +0100)]
drm/amdgpu: add missing lock to amdgpu_ttm_access_memory_sdma

Users of ttm entities need to hold the gtt_window_lock before using them
to guarantee proper ordering of jobs.

Cc: stable@vger.kernel.org
Fixes: cb5cc4f573e1 ("drm/amdgpu: improve debug VRAM access performance using sdma")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 days agodrm/amdgpu: fix cyan_skillfish2 gpu info fw handling
Alex Deucher [Wed, 26 Nov 2025 14:40:31 +0000 (09:40 -0500)]
drm/amdgpu: fix cyan_skillfish2 gpu info fw handling

If the board supports IP discovery, we don't need to
parse the gpu info firmware.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4721
Fixes: fa819e3a7c1e ("drm/amdgpu: add support for cyan skillfish gpu_info")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 days agodrm/amdgpu: Fix CPER ring debugfs read buffer overflow risk
Srinivasan Shanmugam [Fri, 21 Nov 2025 10:58:20 +0000 (16:28 +0530)]
drm/amdgpu: Fix CPER ring debugfs read buffer overflow risk

The CPER ring debugfs read code always writes a 12-byte header when the
file is read for the first time (*offset == 0):

    copy_to_user(buf, ring_header, 12);

But the code never checks whether the user buffer (@size) is at least
12 bytes long. After writing the 12-byte header, the code then gives the
   full original @size to the CPER payload handler:

    record_req->buf_size = size;

This means the function can write:

    12 bytes (header) + payload bytes (up to @size)

into a buffer that is only @size bytes big. In other words, the kernel
may write more data than the user asked for. This can overflow the user
buffer.

The fix is:

  - If the user buffer is smaller than 12 bytes on the first read,
    return -EINVAL instead of copying the header.
  - After writing the 12-byte header, subtract 12 from @size and pass
    the reduced size to record_req->buf_size. This ensures the CPER
payload only uses the remaining free space in the buffer.

Reads after the first one (*offset != 0) do not write the header, so
their behavior stays exactly the same. The only user-visible change is
that tiny buffers now fail safely instead of risking an overflow.

Fixes:
    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:523
    amdgpu_ras_cper_debugfs_read()
        warn: userbuf overflow? is 'ring_header_size' <= 'size'

Fixes: 527e3d40339b ("drm/amd/ras: Add CPER ring read for uniras")
Reported by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Xiang Liu <xiang.liu@amd.com>
Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 days agodrm/amdgpu: attach tlb fence to the PTs update
Prike Liang [Fri, 31 Oct 2025 09:02:51 +0000 (17:02 +0800)]
drm/amdgpu: attach tlb fence to the PTs update

Ensure the userq TLB flush is emitted only after
the VM update finishes and the PT BOs have been
annotated with bookkeeping fences.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 days agodrm/amdkfd: assign AID to uuid in topology for SPX mode
Eric Huang [Wed, 19 Nov 2025 20:07:10 +0000 (15:07 -0500)]
drm/amdkfd: assign AID to uuid in topology for SPX mode

XCD id is assigned to uuid, which causes some performance
drop in SPX mode, assigning AID back will resolve the
issue.

Fixes: 3a75edf93aae ("drm/amdkfd: set uuid for each partition in topology")
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 days agodrm/amd/display: Check ATOM_DEVICE_CRT2_SUPPORT in dc_load_detection
Ivan Lipski [Thu, 13 Nov 2025 21:51:32 +0000 (16:51 -0500)]
drm/amd/display: Check ATOM_DEVICE_CRT2_SUPPORT in dc_load_detection

[WHY & HOW]
Fix the typo of the else-if condition from ATOM_DEVICE_CRT1_SUPPORT to
ATOM_DEVICE_CRT2_SUPPORT.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 days agodrm/amd/display: Add cursor offload abort to the new HWSS path
Nicholas Kazlauskas [Tue, 11 Nov 2025 18:39:52 +0000 (13:39 -0500)]
drm/amd/display: Add cursor offload abort to the new HWSS path

[HOW]
If cursor attributes or position are passed into DC via a stream update
and we take the newer HWSS paths then it's possible that the update
races with cursor offloading if it's enabled.

This can cause the cursor to remain on the screen if no further updates
come in if it results in HW cursor support being disabled.

[HOW]
Add the abort into the HWSS path so that cursor offloading doesn't
attempt to reprogram the cursor with outdated params.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 days agodrm/amd/display: Increase EDID read retries
Mario Limonciello (AMD) [Thu, 6 Nov 2025 05:04:54 +0000 (23:04 -0600)]
drm/amd/display: Increase EDID read retries

[WHY]
When monitor is still booting EDID read can fail while DPCD read
is successful.  In this case no EDID data will be returned, and this
could happen for a while.

[HOW]
Increase number of attempts to read EDID in dm_helpers_read_local_edid()
to 25.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4672
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 days agodrm/amd/display: Fix dereference-before-check for dc_link
Srinivasan Shanmugam [Fri, 21 Nov 2025 14:53:29 +0000 (20:23 +0530)]
drm/amd/display: Fix dereference-before-check for dc_link

The function dereferences amdgpu_dm_connector->dc_link early to
initialize verified_link_cap and dc, but later still checks
amdgpu_dm_connector->dc_link for NULL in the analog path.

This late NULL check is redundant, introduce a local dc_link pointer,
use it consistently, and drop the superfluous NULL check while using
dc_link->link_id.id instead.

The function uses dc_link at the very beginning without checking if it
is NULL.  But later in the code, it suddenly checks if dc_link is NULL.

This check is too late to be useful, because the code has already used
dc_link earlier.  So this NULL check does nothing.

We simplify the code by storing amdgpu_dm_connector->dc_link in a local
dc_link variable and using it throughout the function.  Since dc_link is
already dereferenced early, the later NULL check is unnecessary and is
removed.

Fixes the below:
  amdgpu_dm_connector_get_modes():
  variable dereferenced before check 'amdgpu_dm_connector->dc_link'

drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c
  8845 &amdgpu_dm_connector->dc_link->verified_link_cap;
  8846 const struct dc *dc = amdgpu_dm_connector->dc_link->dc;
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                      Dereference
  ...

  8856
  8857 if (amdgpu_dm_connector->dc_sink &&
  8858     amdgpu_dm_connector->dc_link &&
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                            Checked too late.
                            Presumably this NULL check could be removed?
  ...

Fixes: d46e422f65ae ("drm/amd/display: Cleanup uses of the analog flag")
Reported by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Timur Kristóf <timur.kristof@gmail.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 days agodrm/amd/display: Don't change brightness for disabled connectors
Mario Limonciello (AMD) [Mon, 3 Nov 2025 22:02:11 +0000 (16:02 -0600)]
drm/amd/display: Don't change brightness for disabled connectors

[WHY]
When a laptop lid is closed the connector is disabled but userspace
can still try to change brightness.  This doesn't work because the
panel is turned off. It will eventually time out, but there is a lot
of stutter along the way.

[How]
Iterate all connectors to check whether the matching one for the backlight
index is enabled.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4675
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 days agodrm/amd/display: Fix logical vs bitwise bug in get_embedded_panel_info_v2_1()
Dan Carpenter [Fri, 31 Oct 2025 13:02:25 +0000 (16:02 +0300)]
drm/amd/display: Fix logical vs bitwise bug in get_embedded_panel_info_v2_1()

The .H_SYNC_POLARITY and .V_SYNC_POLARITY variables are 1 bit bitfields
of a u32.  The ATOM_HSYNC_POLARITY define is 0x2 and the
ATOM_VSYNC_POLARITY is 0x4.  When we do a bitwise negate of 0, 2, or 4
then the last bit is always 1 so this code always sets .H_SYNC_POLARITY
and .V_SYNC_POLARITY to true.

This code is instead intended to check if the ATOM_HSYNC_POLARITY or
ATOM_VSYNC_POLARITY flags are set and reverse the result.  In other
words, it's supposed to be a logical negate instead of a bitwise negate.

Fixes: ae79c310b1a6 ("drm/amd/display: Add DCE12 bios parser support")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 days agodrm/amd/display: Check NULL before accessing
Alex Hung [Fri, 7 Nov 2025 22:35:58 +0000 (15:35 -0700)]
drm/amd/display: Check NULL before accessing

[WHAT]
IGT kms_cursor_legacy's long-nonblocking-modeset-vs-cursor-atomic
fails with NULL pointer dereference. This can be reproduced with
both an eDP panel and a DP monitors connected.

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 13 UID: 0 PID: 2960 Comm: kms_cursor_lega Not tainted
6.16.0-99-custom #8 PREEMPT(voluntary)
 Hardware name: AMD ........
 RIP: 0010:dc_stream_get_scanoutpos+0x34/0x130 [amdgpu]
 Code: 57 4d 89 c7 41 56 49 89 ce 41 55 49 89 d5 41 54 49
 89 fc 53 48 83 ec 18 48 8b 87 a0 64 00 00 48 89 75 d0 48 c7 c6 e0 41 30
 c2 <48> 8b 38 48 8b 9f 68 06 00 00 e8 8d d7 fd ff 31 c0 48 81 c3 e0 02
 RSP: 0018:ffffd0f3c2bd7608 EFLAGS: 00010292
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffd0f3c2bd7668
 RDX: ffffd0f3c2bd7664 RSI: ffffffffc23041e0 RDI: ffff8b32494b8000
 RBP: ffffd0f3c2bd7648 R08: ffffd0f3c2bd766c R09: ffffd0f3c2bd7760
 R10: ffffd0f3c2bd7820 R11: 0000000000000000 R12: ffff8b32494b8000
 R13: ffffd0f3c2bd7664 R14: ffffd0f3c2bd7668 R15: ffffd0f3c2bd766c
 FS:  000071f631b68700(0000) GS:ffff8b399f114000(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000001b8105000 CR4: 0000000000f50ef0
 PKRU: 55555554
 Call Trace:
 <TASK>
 dm_crtc_get_scanoutpos+0xd7/0x180 [amdgpu]
 amdgpu_display_get_crtc_scanoutpos+0x86/0x1c0 [amdgpu]
 ? __pfx_amdgpu_crtc_get_scanout_position+0x10/0x10[amdgpu]
 amdgpu_crtc_get_scanout_position+0x27/0x50 [amdgpu]
 drm_crtc_vblank_helper_get_vblank_timestamp_internal+0xf7/0x400
 drm_crtc_vblank_helper_get_vblank_timestamp+0x1c/0x30
 drm_crtc_get_last_vbltimestamp+0x55/0x90
 drm_crtc_next_vblank_start+0x45/0xa0
 drm_atomic_helper_wait_for_fences+0x81/0x1f0
 ...

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 days agoRevert "drm/amd/display: Move setup_stream_attribute"
Alex Deucher [Tue, 25 Nov 2025 14:08:45 +0000 (09:08 -0500)]
Revert "drm/amd/display: Move setup_stream_attribute"

This reverts commit 2681bf4ae8d24df950138b8c9ea9c271cd62e414.

This results in a blank screen on the HDMI port on some systems.
Revert for now so as not to regress 6.18, can be addressed
in 6.19 once the issue is root caused.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4652
Cc: Sunpeng.Li@amd.com
Cc: ivan.lipski@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 days agodrm/amdgpu: free job fences on failure in amdgpu_job_alloc_with_ib
Pierre-Eric Pelloux-Prayer [Mon, 24 Nov 2025 14:42:50 +0000 (15:42 +0100)]
drm/amdgpu: free job fences on failure in amdgpu_job_alloc_with_ib

Otherwise we're leaking memory.

Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
11 days agodrm/amdgpu: clear job on failure in amdgpu_job_alloc(_with_ib)
Pierre-Eric Pelloux-Prayer [Mon, 24 Nov 2025 14:33:41 +0000 (15:33 +0100)]
drm/amdgpu: clear job on failure in amdgpu_job_alloc(_with_ib)

If memory is freed we need to nullify the pointer or the caller
might call kfree again (eg: amdgpu_cs_parser_fini calls kfree on
all non-null job pointers).

Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
13 days agodrm/amd/pm: adjust the visibility of pp_table sysfs node
Yang Wang [Thu, 30 Oct 2025 01:06:22 +0000 (09:06 +0800)]
drm/amd/pm: adjust the visibility of pp_table sysfs node

v1:
- make pp_table invisible on VF mode (only valid on BM)
- make pp_table invisible on Mi* chips (Not supported)
- make pp_table invisible if scpm feature is enabled.

v2:
move pp_table invisible code logic into amdgpu_dpm_get_pp_table() function.

v3:
add table buffer pointer check both on powerplay & swsmu.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
13 days agoRevert "drm/amd: fix gfx hang on renoir in IGT reload test"
Rodrigo Siqueira [Fri, 21 Nov 2025 13:55:29 +0000 (06:55 -0700)]
Revert "drm/amd: fix gfx hang on renoir in IGT reload test"

The original patch introduced additional latency during boot time
because it triggers a driver reload to avoid a CP hang when the driver
is reloaded multiple times. This has been addressed with a more generic
solution that triggers the GPU reset only during the unload phase,
avoiding extra latency during boot time. For this reason, this commit
reverts the original change.

This reverts commit 72a98763b473890e6605604bfcaf71fc212b4720.

This patch should only be applied if commit:
4355e61835e7 ("drm/amdgpu: Fix GFX hang on SteamDeck when amdgpu is reloaded")
is present.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
13 days agodrm/amdgpu: Fix GFX hang on SteamDeck when amdgpu is reloaded
Rodrigo Siqueira [Fri, 21 Nov 2025 13:55:28 +0000 (06:55 -0700)]
drm/amdgpu: Fix GFX hang on SteamDeck when amdgpu is reloaded

When trying to unload amdgpu in the SteamDeck (TTY mode), the following
set of errors happens and the system gets unstable:

[..]
 [drm] Initialized amdgpu 3.64.0 for 0000:04:00.0 on minor 0
 amdgpu 0000:04:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx_0.0.0 (-110).
 amdgpu 0000:04:00.0: amdgpu: ib ring test failed (-110).
[..]
 amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
 amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff!
 amdgpu 0000:04:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
 amdgpu 0000:04:00.0: amdgpu: Failed to disable gfxoff!
[..]

When the driver initializes the GPU, the PSP validates all the firmware
loaded, and after that, it is not possible to load any other firmware
unless the device is reset. What is happening in the load/unload
situation is that PSP halts the GC engine because it suspects that
something is amiss. To address this issue, this commit ensures that the
GPU is reset (mode 2 reset) in the unload sequence.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
13 days agodrm/amd/pm: fix amdgpu_irq enabled counter unbalanced on smu v11.0
Yang Wang [Wed, 19 Nov 2025 02:46:23 +0000 (10:46 +0800)]
drm/amd/pm: fix amdgpu_irq enabled counter unbalanced on smu v11.0

v1:
- fix amdgpu_irq enabled counter unbalanced issue on smu_v11_0_disable_thermal_alert.

v2:
- re-enable smu thermal alert to make amdgpu irq counter balance for smu v11.0 if in runpm state

[75582.361561] ------------[ cut here ]------------
[75582.361565] WARNING: CPU: 42 PID: 533 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 amdgpu_irq_put+0xd8/0xf0 [amdgpu]
...
[75582.362211] Tainted: [E]=UNSIGNED_MODULE
[75582.362214] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F14a 08/14/2020
[75582.362218] Workqueue: pm pm_runtime_work
[75582.362225] RIP: 0010:amdgpu_irq_put+0xd8/0xf0 [amdgpu]
[75582.362556] Code: 31 f6 31 ff e9 c9 bf cf c2 44 89 f2 4c 89 e6 4c 89 ef e8 db fc ff ff 5b 41 5c 41 5d 41 5e 5d 31 d2 31 f6 31 ff e9 a8 bf cf c2 <0f> 0b eb c3 b8 fe ff ff ff eb 97 e9 84 e8 8b 00 0f 1f 84 00 00 00
[75582.362560] RSP: 0018:ffffd50d51297b80 EFLAGS: 00010246
[75582.362564] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[75582.362568] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[75582.362570] RBP: ffffd50d51297ba0 R08: 0000000000000000 R09: 0000000000000000
[75582.362573] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e72091d2008
[75582.362576] R13: ffff8e720af80000 R14: 0000000000000000 R15: ffff8e720af80000
[75582.362579] FS:  0000000000000000(0000) GS:ffff8e9158262000(0000) knlGS:0000000000000000
[75582.362582] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[75582.362585] CR2: 000074869d040c14 CR3: 0000001e37a3e000 CR4: 00000000003506f0
[75582.362588] Call Trace:
[75582.362591]  <TASK>
[75582.362597]  smu_v11_0_disable_thermal_alert+0x17/0x30 [amdgpu]
[75582.362983]  smu_smc_hw_cleanup+0x79/0x4f0 [amdgpu]
[75582.363375]  smu_suspend+0x92/0x110 [amdgpu]
[75582.363762]  ? gfx_v10_0_hw_fini+0xd5/0x150 [amdgpu]
[75582.364098]  amdgpu_ip_block_suspend+0x27/0x80 [amdgpu]
[75582.364377]  ? timer_delete_sync+0x10/0x20
[75582.364384]  amdgpu_device_ip_suspend_phase2+0x190/0x450 [amdgpu]
[75582.364665]  amdgpu_device_suspend+0x1ae/0x2f0 [amdgpu]
[75582.364948]  amdgpu_pmops_runtime_suspend+0xf3/0x1f0 [amdgpu]
[75582.365230]  pci_pm_runtime_suspend+0x6d/0x1f0
[75582.365237]  ? __pfx_pci_pm_runtime_suspend+0x10/0x10
[75582.365242]  __rpm_callback+0x4c/0x190
[75582.365246]  ? srso_return_thunk+0x5/0x5f
[75582.365252]  ? srso_return_thunk+0x5/0x5f
[75582.365256]  ? ktime_get_mono_fast_ns+0x43/0xe0
[75582.365263]  rpm_callback+0x6e/0x80
[75582.365267]  rpm_suspend+0x124/0x5f0
[75582.365271]  ? srso_return_thunk+0x5/0x5f
[75582.365275]  ? __schedule+0x439/0x15e0
[75582.365281]  ? srso_return_thunk+0x5/0x5f
[75582.365285]  ? __queue_delayed_work+0xb8/0x180
[75582.365293]  pm_runtime_work+0xc6/0xe0
[75582.365297]  process_one_work+0x1a1/0x3f0
[75582.365303]  worker_thread+0x2ba/0x3d0
[75582.365309]  kthread+0x107/0x220
[75582.365313]  ? __pfx_worker_thread+0x10/0x10
[75582.365318]  ? __pfx_kthread+0x10/0x10
[75582.365323]  ret_from_fork+0xa2/0x120
[75582.365328]  ? __pfx_kthread+0x10/0x10
[75582.365332]  ret_from_fork_asm+0x1a/0x30
[75582.365343]  </TASK>
[75582.365345] ---[ end trace 0000000000000000 ]---
[75582.365350] amdgpu 0000:05:00.0: amdgpu: Fail to disable thermal alert!
[75582.365379] amdgpu 0000:05:00.0: amdgpu: suspend of IP block <smu> failed -22

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
13 days agodrm/amd/amdgpu: reserve vm invalidation engine for uni_mes
Michael Chen [Thu, 13 Nov 2025 17:56:43 +0000 (12:56 -0500)]
drm/amd/amdgpu: reserve vm invalidation engine for uni_mes

Reserve vm invalidation engine 6 when uni_mes enabled. It
is used in processing tlb flush request from host.

Signed-off-by: Michael Chen <michael.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Shaoyun liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Add sriov vf check for VCN per queue reset support.
Shikang Fan [Wed, 19 Nov 2025 10:05:10 +0000 (18:05 +0800)]
drm/amdgpu: Add sriov vf check for VCN per queue reset support.

Add SRIOV check when setting VCN ring's supported reset mask.

Signed-off-by: Shikang Fan <shikang.fan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu/ttm: Fix crash when handling MMIO_REMAP in PDE flags
Srinivasan Shanmugam [Tue, 18 Nov 2025 08:58:33 +0000 (14:28 +0530)]
drm/amdgpu/ttm: Fix crash when handling MMIO_REMAP in PDE flags

The MMIO_REMAP BO is a special 4K IO page that does not have a ttm_tt
behind it. However, amdgpu_ttm_tt_pde_flags() was treating it like
normal TT/doorbell/preempt memory and unconditionally accessed
ttm->caching. For the MMIO_REMAP BO, ttm is NULL, so this leads to a
NULL pointer dereference when computing PDE flags.

Fix this by checking that ttm is non-NULL before reading ttm->caching.
This prevents the crash for MMIO_REMAP and also makes the code more
defensive if other BOs ever come through without a ttm_tt.

Fixes: fb5a52dbe9fe ("drm/amdgpu: Implement TTM handling for MMIO_REMAP placement")
Suggested-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Tested-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu/vm: Check PRT uAPI flag instead of PTE flag
Timur Kristóf [Wed, 19 Nov 2025 09:25:42 +0000 (10:25 +0100)]
drm/amdgpu/vm: Check PRT uAPI flag instead of PTE flag

This fixes sparse mappings (aka. partially resident textures).

Check the correct flags.
Since a recent refactor, the code works with uAPI flags (for
mapping buffer objects), and not PTE (page table entry) flags.

Fixes: 6716a823d18d ("drm/amdgpu: rework how PTE flags are generated v3")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Skip emit de meta data on gfx11 with rs64 enabled
Yifan Zha [Fri, 14 Nov 2025 09:48:58 +0000 (17:48 +0800)]
drm/amdgpu: Skip emit de meta data on gfx11 with rs64 enabled

[Why]
Accoreding to CP updated to RS64 on gfx11,
WRITE_DATA with PREEMPTION_META_MEMORY(dst_sel=8) is illegal for CP FW.
That packet is used for MCBP on F32 based system.
So it would lead to incorrect GRBM write and FW is not handling that
extra case correctly.

[How]
With gfx11 rs64 enabled, skip emit de meta data.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd: Skip power ungate during suspend for VPE
Mario Limonciello [Tue, 18 Nov 2025 13:18:10 +0000 (07:18 -0600)]
drm/amd: Skip power ungate during suspend for VPE

During the suspend sequence VPE is already going to be power gated
as part of vpe_suspend().  It's unnecessary to call during calls to
amdgpu_device_set_pg_state().

It actually can expose a race condition with the firmware if s0i3
sequence starts as well.  Drop these calls.

Cc: Peyton.Lee@amd.com
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Move analog check to dce110_hwseq
Timur Kristóf [Thu, 13 Nov 2025 16:33:48 +0000 (17:33 +0100)]
drm/amd/display: Move analog check to dce110_hwseq

Instead of checking that the signal is analog before calling the
HWSS disable_audio_stream() function to disable audio, move
the check inside the HWSS function.

Suggested-by: Ray Wu <Ray.Wu@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Link: https://lore.kernel.org/r/20251113163348.137315-5-timur.kristof@gmail.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Cleanup early return in construct_phy
Timur Kristóf [Thu, 13 Nov 2025 16:33:47 +0000 (17:33 +0100)]
drm/amd/display: Cleanup early return in construct_phy

Match pre-existing patterns in the DC code base.
Instead of returning early from the construct_phy() function,
add a label at the end and use goto to jump there.
Additionally, respect the DC logger and let it log the function
even when it returns early.

Suggested-by: Ray Wu <Ray.Wu@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Link: https://lore.kernel.org/r/20251113163348.137315-4-timur.kristof@gmail.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Cleanup uses of the analog flag
Timur Kristóf [Thu, 13 Nov 2025 16:33:46 +0000 (17:33 +0100)]
drm/amd/display: Cleanup uses of the analog flag

In the detect_link_and_local_sink() function, do not modify the
EDID capabilities of the display based on the connector. Instead,
respect the analog flag better and when the analog flag is set,
check that the connector indeed supports analog displays.

Suggested-by: Ray Wu <Ray.Wu@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Link: https://lore.kernel.org/r/20251113163348.137315-3-timur.kristof@gmail.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Fix warning for analog stream encoders
Timur Kristóf [Thu, 13 Nov 2025 16:33:45 +0000 (17:33 +0100)]
drm/amd/display: Fix warning for analog stream encoders

Fixes the following warning that some users are reporting
with some kernel configurations:

"positional initialization of field in 'struct' declared
with 'designated_init' attribute"

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20251113163348.137315-2-timur.kristof@gmail.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/radeon: delete radeon_fence_process in is_signaled, no deadlock
Robert McClinton [Sun, 16 Nov 2025 17:33:21 +0000 (12:33 -0500)]
drm/radeon: delete radeon_fence_process in is_signaled, no deadlock

Delete the attempt to progress the queue when checking if fence is
signaled. This avoids deadlock.

dma-fence_ops::signaled can be called with the fence lock in unknown
state. For radeon, the fence lock is also the wait queue lock. This can
cause a self deadlock when signaled() tries to make forward progress on
the wait queue. But advancing the queue is unneeded because incorrectly
returning false from signaled() is perfectly acceptable.

Link: https://github.com/brave/brave-browser/issues/49182
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4641
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Robert McClinton <rbmccav@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: dc_hw_sequencer.c: remove kernel-doc comments
Randy Dunlap [Sat, 8 Nov 2025 01:35:04 +0000 (17:35 -0800)]
drm/amd/display: dc_hw_sequencer.c: remove kernel-doc comments

Change comments from kernel-doc style "/**" to normal C comments
"/*" since the comments are not in kernel-doc format.
This fixes around 39 kernel-doc warnings like this one:

drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c:1322: warning:
 This comment starts with '/**', but isn't a kernel-doc comment.
 Refer Documentation/doc-guide/kernel-doc.rst

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511062036.Ry8Z2APc-lkp@intel.com/
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amdgpu: Unregister mce notifier
Lijo Lazar [Thu, 13 Nov 2025 11:37:42 +0000 (17:07 +0530)]
drm/amdgpu: Unregister mce notifier

Unregister mce notifier on unload.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Promote DC to 3.2.359
Taimur Hassan [Sat, 8 Nov 2025 01:15:20 +0000 (20:15 -0500)]
drm/amd/display: Promote DC to 3.2.359

This version brings along the following updates:

- Add interface to capture expected HW state from SW state
- Add panel Replay capability detection, DPCD reading, and enablement logic
- Re-check seamless boot enablement on subsequent dc_commit_streams
- Improve DPCD link capability retrieval with increased retries and per-retry delays
- Add HPD filter for HDMI
- Add pipe topology history tracking to DC
- Fix MST initialization on resume when switching from SST to MST during suspend
- Fix double cursor on DCN20 & DCN30 in non-native scaling
- Check DCCG_AUDIO_DTO2 register mask before access
- Fix pbn to kbps conversion

Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Ignore Coverity false positive
Taimur Hassan [Fri, 7 Nov 2025 21:14:42 +0000 (16:14 -0500)]
drm/amd/display: Ignore Coverity false positive

[Why&How]
Ignore Coverity false positive analysis in the dmub_cmd.h

Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Fix pbn to kbps Conversion
Fangzhi Zuo [Fri, 7 Nov 2025 20:01:30 +0000 (15:01 -0500)]
drm/amd/display: Fix pbn to kbps Conversion

[Why]
Existing routine has two conversion sequence,
pbn_to_kbps and kbps_to_pbn with margin.
Non of those has without-margin calculation.

kbps_to_pbn with margin conversion includes
fec overhead which has already been included in
pbn_div calculation with 0.994 factor considered.
It is a double counted fec overhead factor that causes
potential bw loss.

[How]
Add without-margin calculation.
Fix fec overhead double counted issue.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3735
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Check DCCG_AUDIO_DTO2 register mask exist
Charlene Liu [Tue, 28 Oct 2025 01:40:21 +0000 (21:40 -0400)]
drm/amd/display: Check DCCG_AUDIO_DTO2 register mask exist

[Why&How]
Check DCCG_AUDIO_DTO2 register mask exist before access.
Also,  add a existing DIO_CLOCK_control register for later use.

Reviewed-by: Roman Li <roman.li@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Add null pointer check in link_dpms
Charlene Liu [Thu, 6 Nov 2025 23:10:30 +0000 (18:10 -0500)]
drm/amd/display: Add null pointer check in link_dpms

[why]
Check that the stream exists to add link->local_sink null pointer access
protection.

Reviewed-by: Harold Sun <harold.sun@amd.com>
Reviewed-by: Ethan Cheung <ethan.cheung@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Clear the CUR_ENABLE register on DCN20 on DPP5
Ivan Lipski [Wed, 5 Nov 2025 20:27:42 +0000 (15:27 -0500)]
drm/amd/display: Clear the CUR_ENABLE register on DCN20 on DPP5

[Why]
On DCN20 & DCN30, the 6th DPP's & HUBP's are powered on permanently and
cannot be power gated. Thus, when dpp_reset() is invoked for the DPP5,
while it's still powered on, the cached cursor_state
(dpp_base->pos.cur0_ctl.bits.cur0_enable)
and the actual state (CUR0_ENABLE) bit are unsycned. This can cause a
double cursor in full screen with non-native scaling.

[How]
Force disable cursor on DPP5 on plane powerdown for ASICs w/ 6 DPPs/HUBPs.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4673
Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Add pipe topology history to dc
Nicholas Carbones [Fri, 31 Oct 2025 20:36:09 +0000 (16:36 -0400)]
drm/amd/display: Add pipe topology history to dc

[Why]
There is no way to check pipe topology update history through a
dump.

[How]
Add a topology history structure to dc with snapshots of the most recent
pipe topology updates.

Reviewed-by: George Shen <george.shen@amd.com>
Signed-off-by: Nicholas Carbones <ncarbone@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Add an HPD filter for HDMI
Ivan Lipski [Thu, 30 Oct 2025 15:25:33 +0000 (11:25 -0400)]
drm/amd/display: Add an HPD filter for HDMI

[Why]
Some monitors perform rapid “autoscan” HPD re‑assertions right after a
disconnect or powersaving mode enablement. These appear as a quick
disconnect→reconnect with an identical EDID. Since Linux has no HDMI
hotplug detection (HPD) filter, these quick reconnects are seen as hotplug
events, which can unintentionally wake a system with DPMS off.

An example: https://gitlab.freedesktop.org/drm/amd/-/issues/2876

Such 'fake reconnects' are considered when the interval between a
disconnect and a connect is within 1500ms (experimentally chosen using
several monitors), and the two connections have the same EDID.

[How]
Implement a time-based debounce mechanism:

1. On HDMI disconnect detection, instead of immediately processing the
HPD event, save the current sink and schedule delayed work (default 1500ms)

2. If another HDMI disconnect HPD event arrives during the debounce period,
it reschedules the pending work, ensuring only the final state is processed.

3. When the debounce timer expires, re-detect the display and compare the
new sink with the cached one using EDID comparison.

4. If sinks match (same EDID), this was a spontaneous HPD toggle:
   - Update connector state internally
   - Skip hotplug event to prevent desktop rearrangement

   If sinks differ, this was a real display change:
   - Process normally with the hotplug event

The debounce delay is configurable via module parameter
'hdmi_hpd_debounce_delay_ms'.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2876
Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Increase DPCD read retries
Mario Limonciello (AMD) [Mon, 3 Nov 2025 18:11:31 +0000 (12:11 -0600)]
drm/amd/display: Increase DPCD read retries

[Why]
Empirical measurement of some monitors that fail to read EDID while
booting shows that the number of retries with a 30ms delay between
tries is as high as 16.

[How]
Increase number of retries to 20.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4672
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Move sleep into each retry for retrieve_link_cap()
Mario Limonciello (AMD) [Mon, 3 Nov 2025 17:17:44 +0000 (11:17 -0600)]
drm/amd/display: Move sleep into each retry for retrieve_link_cap()

[Why]
When a monitor is booting it's possible that it isn't ready to retrieve
link caps and this can lead to an EDID read failure:

```
[drm:retrieve_link_cap [amdgpu]] *ERROR* retrieve_link_cap: Read receiver caps dpcd data failed.
amdgpu 0000:c5:00.0: [drm] *ERROR* No EDID read.
```

[How]
Rather than msleep once and try a few times, msleep each time.  Should
be no changes for existing working monitors, but should correct reading
caps on a monitor that is slow to boot.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4672
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Re-check seamless boot can be enabled or not
Paul Hsieh [Thu, 30 Oct 2025 04:17:53 +0000 (12:17 +0800)]
drm/amd/display: Re-check seamless boot can be enabled or not

[Why]
If the seamless boot feature has already been enabled, and
dc_commit_streams is called again before receiving a flip, the
driver will adjust the engine clock without turning off the screen,
which will cause garbage to occur. However, in reality, the Pixel
Clock from the first dc_commit_streams and the second dc_commit_streams
are different.

[How]
If the apply seamless boot flag in the previous stream has not been
cleared, and dc_commit_streams is received again, we need to recheck
whether seamless boot should be disabled

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Paul Hsieh <Paul.Hsieh@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Get panel replay capability from DPCD
Jack Chang [Wed, 20 Aug 2025 08:59:08 +0000 (16:59 +0800)]
drm/amd/display: Get panel replay capability from DPCD

[Why&How]
Read Panel replay caps from DPCD when retrieving link capability

Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Jack Chang <jack.chang@amd.com>
Signed-off-by: Leon Huang <Leon.Huang1@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Add panel replay enablement option and logic
Jack Chang [Thu, 21 Aug 2025 05:19:23 +0000 (13:19 +0800)]
drm/amd/display: Add panel replay enablement option and logic

[Why&How]
1.Add flow to enable and configure panel replay enablement and
configuration
2.Add registry key for enable option
3.Add replay version check to be compatible with freesync replay
4.Add AC/DC switch function to notify ac/dc change.
5.Add flow in set event function to check and decide Replay
enable/disable

Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Jack Chang <jack.chang@amd.com>
Signed-off-by: Leon Huang <Leon.Huang1@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Add panel replay capability detection
Jack Chang [Fri, 8 Aug 2025 03:20:56 +0000 (11:20 +0800)]
drm/amd/display: Add panel replay capability detection

[Why&How]
For supporting VESA PR, add flow to determine the support capability

Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Jack Chang <jack.chang@amd.com>
Signed-off-by: Leon Huang <Leon.Huang1@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agodrm/amd/display: Add interface to capture expected HW state from SW state
George Shen [Wed, 29 Oct 2025 15:27:32 +0000 (11:27 -0400)]
drm/amd/display: Add interface to capture expected HW state from SW state

[Why]
To debug certain issues, such as underflow, it is common practice to
dump the HW state of all registers for analysis. The first thing to
check with the dump is to ensure all values are programmed as expected
according to SW state.

[How]
Add interface to DC to capture expected HW register values based on SW
state.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2 weeks agoMerge tag 'amd-drm-next-6.19-2025-11-14' of https://gitlab.freedesktop.org/agd5f...
Dave Airlie [Mon, 17 Nov 2025 20:58:01 +0000 (06:58 +1000)]
Merge tag 'amd-drm-next-6.19-2025-11-14' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.19-2025-11-14:

amdgpu:
- RAS updates
- GC12 DCC P2P fix
- Documentation fixes
- Power limit code cleanup
- Userq updates
- VRR fix
- SMART OLED support
- DSC refactor for DCN 3.5
- Replay updates
- DC clockgating updates
- HDCP refactor
- ISP fix
- SMU 13.0.12 updates
- JPEG 5.0.1 fix
- VCE1 support
- Enable DC by default on SI
- Refactor CIK and SI enablement
- Enable amdgpu by default for CI dGPUs
- XGMI fixes
- SR-IOV fixes
- Memory allocation critical path fixes
- Enable amdgpu by default on SI dGPUs

amdkfd:
- Relax checks on save area overallocations
- Fix GPU mappings after prefetch

radeon:
- Refactor CIK and SI enablement

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20251114192553.442621-1-alexander.deucher@amd.com
2 weeks agoMerge tag 'drm-intel-gt-next-2025-11-14' of https://gitlab.freedesktop.org/drm/i915...
Dave Airlie [Mon, 17 Nov 2025 20:52:08 +0000 (06:52 +1000)]
Merge tag 'drm-intel-gt-next-2025-11-14' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next

Driver Changes:

Fixes/improvements/new stuff:

- Avoid lock inversion when pinning to GGTT on CHV/BXT+VTD (Janusz Krzysztofik)
- Use standard API for seqcount read in TLB invalidation [gt] (Andi Shyti)

Miscellaneous:

- Wait longer for threads in migrate selftest on CHV/BXT+VTD (Janusz Krzysztofik)
- Wait for page_sizes_gtt in gtt selftest on CHV/BXT+VTD (Janusz Krzysztofik)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://patch.msgid.link/aRdXOAKlTVX_b0en@linux
2 weeks agoMerge tag 'drm-intel-next-2025-11-14' of https://gitlab.freedesktop.org/drm/i915...
Dave Airlie [Mon, 17 Nov 2025 19:55:51 +0000 (05:55 +1000)]
Merge tag 'drm-intel-next-2025-11-14' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next

drm/i915 feature pull #2 for v6.19:

Features and functionality:
- Add initial display support for Xe3p_LPD, display version 35 (Sai Teja, Matt
  R, Gustavo, Matt A, Ankit, Juha-pekka, Luca, Ravi Kumar)
- Compute LT PHY HDMI params when port clock not in predefined tables (Suraj)

Refactoring and cleanups:
- Refactor intel_frontbuffer split between i915, xe, and display (Ville)
- Clean up intel_de_wait_custom() usage (Ville)
- Unify display register polling interfaces (Ville)
- Finish removal of the expensive format info lookups (Ville)
- Cursor code cleanups (Ville)
- Convert intel_rom interfaces to struct drm_device (Jani)

Fixes:
- Fix uninitialized variable in DSI exec packet (Jonathan)
- Fix PIPEDMC logging (Alok Tiwari)
- Fix PSR pipe to vblank conversion (Jani)
- Fix intel_frontbuffer lifetime handling (Ville)
- Disable Panel Replay on DP MST for the time being (Imre)

Merges:
- Backmerge drm-next to get the drm_print.h changes (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patch.msgid.link/b131309bb7310ab749f1770aa6e36fa8d6a82fa5@intel.com
2 weeks agoMerge tag 'drm-misc-next-2025-11-14-1' of https://gitlab.freedesktop.org/drm/misc...
Dave Airlie [Mon, 17 Nov 2025 04:21:48 +0000 (14:21 +1000)]
Merge tag 'drm-misc-next-2025-11-14-1' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for v6.19:

UAPI Changes:
- Add sysfs entries, coredump support and uevents to QAIC.
- Add fdinfo memory statistics to ivpu.

Cross-subsystem Changes:
- Handle stub fence initialization during module init.
- Stop using system_wq in scheduler and drivers.

Core Changes:
- Documentation updates to ttm, vblank.
- Add EDID quirk for sharp panel.
- Use drm_crtc_vblank_(crtc,waitqueue) more in core and drivers.

Driver Changes:
- Small updates and fixes to panfrost, amdxdna, vmwgfx, ast, ivpu.
- Handle preemption in amdxdna.
- Add PM support to qaic.
- Huge refactor of sun4i's layer code to decouple plane code from output
  and improve support for DE33.
- Add larger page and compression support to nouveau.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/1ad3ea69-d029-4a21-8b3d-6b264b1b2a30@linux.intel.com
2 weeks agoMerge tag 'drm-xe-next-2025-11-14' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Mon, 17 Nov 2025 03:39:45 +0000 (13:39 +1000)]
Merge tag 'drm-xe-next-2025-11-14' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

Driver Changes:

Avoid TOCTOU when montoring throttle reasons (Lucas)
Add/extend workaround (Nitin)
SRIOV migration work / plumbing (Michal Wajdeczko, Michal Winiarski, Lukasz)
Drop debug flag requirement for VF resource fixup
Fix MTL vm_max_level (Rodrigo)
Changes around TILE_ADDR_RANGE for platform compatibility
(Fei, Lucas)
Add runtime registers for GFX ver >= 35 (Piotr)
Kerneldoc fix (Kriish)
Rework pcode error mapping (Lucas)
Allow lockdown the PF (Michal)
Eliminate GUC code caching of some frequency values (Sk)
Improvements around forcewake referencing (Matt Roper)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/aRcJOrisG2qPbucE@fedora
3 weeks agoMerge tag 'drm-xe-next-2025-11-05' of https://gitlab.freedesktop.org/drm/xe/kernel...
Dave Airlie [Sun, 16 Nov 2025 22:21:58 +0000 (08:21 +1000)]
Merge tag 'drm-xe-next-2025-11-05' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

UAPI Changes:

Limit number of jobs per exec queue (Shuicheng)
Add sriov_admin sysfs tree (Michal)

Driver Changes:

Fix an uninitialized value (Thomas)
Expose a residency counter through debugfs (Mohammed Thasleem)
Workaround enabling and improvement (Tapani, Tangudu)
More Crescent Island-specific support (Sk Anirban, Lucas)
PAT entry dump imprement (Xin)
Inline gt_reset in the worker (Lucas)
Synchronize GT reset with device unbind (Balasubramani)
Do clean shutdown also when using flr (Jouni)
Fix serialization on burst of unbinds (Matt Brost)
Pagefault Refactor (Matt Brost)
Remove some unused code (Gwan-gyeong)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/aQuBECxNOhudc0Bz@fedora
3 weeks agodrm/amdgpu: Use amdgpu by default on SI dedicated GPUs (v2)
Timur Kristóf [Fri, 14 Nov 2025 12:07:36 +0000 (13:07 +0100)]
drm/amdgpu: Use amdgpu by default on SI dedicated GPUs (v2)

Now that the DC analog connector support and VCE1 support landed,
amdgpu is at feature parity with the old radeon driver
on SI dGPUs.

Enabling the amdgpu driver by default for SI dGPUs has the
following benefits:

- More stable OpenGL support through RadeonSI
- Vulkan support through RADV
- Improved performance
- Better display features through DC

Users who want to keep using the old driver can do so using:
amdgpu.si_support=0 radeon.si_support=1

v2:
- Update documentation in Kconfig file

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: Use amdgpu by default on CIK dedicated GPUs
Timur Kristóf [Fri, 14 Nov 2025 14:26:08 +0000 (09:26 -0500)]
drm/amdgpu: Use amdgpu by default on CIK dedicated GPUs

The amdgpu driver has been working well on CIK dGPUs for years.
Now that the DC analog connector support landed,
amdgpu is at feature parity with the old radeon driver
on CIK dGPUs.

Enabling the amdgpu driver by default for CIK dGPUs has the
following benefits:

- More stable OpenGL support through RadeonSI
- Vulkan support through RADV
- Improved performance
- Better display features through DC

Users who want to keep using the old driver can do so using:
amdgpu.cik_support=0 radeon.cik_support=1

v2:
- Update documentation in Kconfig file
v3:
- Rebase documentation updates (Alex)

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: Fix the issue of missing ras message on sriov host
YiPeng Chai [Thu, 23 Oct 2025 06:47:07 +0000 (14:47 +0800)]
drm/amdgpu: Fix the issue of missing ras message on sriov host

This code only applies to amdgpu processing
poison consumption after uniras is enabled,
but not to sriov.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: Add lock to serialize sriov command execution
YiPeng Chai [Mon, 21 Jul 2025 07:22:27 +0000 (15:22 +0800)]
drm/amdgpu: Add lock to serialize sriov command execution

Add lock to serialize sriov command execution.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: Synchronize sriov host to add block_mmsch bit field
YiPeng Chai [Tue, 11 Nov 2025 08:56:35 +0000 (16:56 +0800)]
drm/amdgpu: Synchronize sriov host to add block_mmsch bit field

Synchronize sriov host to add block_mmsch bit field.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: use GFP_ATOMIC instead of NOWAIT in the critical path
Christian König [Tue, 28 Oct 2025 10:16:12 +0000 (11:16 +0100)]
drm/amdgpu: use GFP_ATOMIC instead of NOWAIT in the critical path

Otherwise job submissions can fail with ENOMEM.

We probably need to re-design the per VMID tracking at some point.

Signed-off-by: Christian König <christian.koenig@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4258
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: avoid memory allocation in the critical code path v3
Christian König [Wed, 29 Oct 2025 14:36:32 +0000 (15:36 +0100)]
drm/amdgpu: avoid memory allocation in the critical code path v3

When we run out of VMIDs we need to wait for some to become available.
Previously we were using a dma_fence_array for that, but this means that
we have to allocate memory.

Instead just wait for the first not signaled fence from the least recently
used VMID to signal. That is not as efficient since we end up in this
function multiple times again, but allocating memory can easily fail or
deadlock if we have to wait for memory to become available.

v2: remove now unused VM manager fields
v3: fix dma_fence reference

Signed-off-by: Christian König <christian.koenig@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4258
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: Enable xgmi extended peer links for sriov guest
Will Aitken [Tue, 7 Oct 2025 14:49:15 +0000 (14:49 +0000)]
drm/amdgpu: Enable xgmi extended peer links for sriov guest

The amd-smi tool relies on extended peer link information to report xgmi
link metrics. The necessary xgmi ta command, GET_EXTEND_PEER_LINKS, has
been enabled in the host driver and this change is necessary for the
guest to make use of it. To handle the case where the host driver does
not have the latest xgmi ta, the guest driver checks for guest support
through a pf2vf feature flag before invoking psp.

Signed-off-by: Will Aitken <wiaitken@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: Update headers for sriov xgmi ext peer link support feature flag
Will Aitken [Tue, 7 Oct 2025 14:19:45 +0000 (14:19 +0000)]
drm/amdgpu: Update headers for sriov xgmi ext peer link support feature flag

Adds new sriov msg flag to match host, feature flag in the amdgim
enum, and a wrapper macro to check it.

Signed-off-by: Will Aitken <wiaitken@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: Refactor sriov xgmi topology filling to common code
Will Aitken [Tue, 30 Sep 2025 16:24:07 +0000 (16:24 +0000)]
drm/amdgpu: Refactor sriov xgmi topology filling to common code

amdgpu_xgmi_fill_topology_info and psp_xgmi_reflect_topology_info
perform the same logic of copying topology info of one node to every
other node in the hive. Instead of having two functions that purport to
do the same thing, this refactoring moves the logic of the fill function
to the reflect function and adds reflecting port number info as well for
complete functionality.

Signed-off-by: Will Aitken <wiaitken@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: Use amdgpu by default on CIK dedicated GPUs
Timur Kristóf [Sun, 9 Nov 2025 15:41:06 +0000 (16:41 +0100)]
drm/amdgpu: Use amdgpu by default on CIK dedicated GPUs

The amdgpu driver has been working well on CIK dGPUs for years.
Now that the DC analog connector support landed, these GPUs
are at feature parity with the old radeon driver.

Additionally, amdgpu yields extra performance, supports Vulkan
and provides more display features through DC as well as more
robust power management.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: Refactor how SI and CIK support is determined
Timur Kristóf [Sun, 9 Nov 2025 15:41:05 +0000 (16:41 +0100)]
drm/amdgpu: Refactor how SI and CIK support is determined

Move the determination into a separate function.
Change amdgpu.si_support and amdgpu.cik_support so that their
default value is -1 (default).

This prepares the code for changing the default driver based
on the chip.

Also adjust the module param documentation.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/radeon: Refactor how SI and CIK support is determined
Timur Kristóf [Sun, 9 Nov 2025 15:41:04 +0000 (16:41 +0100)]
drm/radeon: Refactor how SI and CIK support is determined

Move the determination into a separate function.
Change radeon.si_support and radeon.cik_support so that their
default value is -1 (default).

This prepares the code for changing the default driver based
on the chip.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/amdgpu: Avoid xgmi register access
Lijo Lazar [Thu, 6 Nov 2025 08:19:59 +0000 (13:49 +0530)]
drm/amdgpu: Avoid xgmi register access

On single GPU systems, avoid accesses to XGMI link registers.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
3 weeks agodrm/xe/oa: Store forcewake reference in stream structure
Matt Roper [Mon, 10 Nov 2025 23:20:21 +0000 (15:20 -0800)]
drm/xe/oa: Store forcewake reference in stream structure

Calls to xe_force_wake_put() should generally pass the exact reference
returned by xe_force_wake_get().  Since OA grabs and releases forcewake
in different functions, xe_oa_stream_destroy() is currently calling put
with a hardcoded ALL mask.  Although this works for now, it's somewhat
fragile in case OA moves to more precise power domain management in the
future.

Stash the original reference obtained during stream initialization
inside the stream structure so that we can use it directly when the
stream is destroyed.

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20251110232017.1475869-35-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 weeks agodrm/xe/eustall: Store forcewake reference in stream structure
Matt Roper [Mon, 10 Nov 2025 23:20:20 +0000 (15:20 -0800)]
drm/xe/eustall: Store forcewake reference in stream structure

Calls to xe_force_wake_put() should generally pass the exact reference
returned by xe_force_wake_get().  Since EU stall grabs and releases
forcewake in different functions, xe_eu_stall_disable_locked() is
currently calling put with a hardcoded RENDER domain.  Although this
works for now, it's somewhat fragile in case the power domain(s)
required by stall sampling change in the future, or if workarounds show
up that require us to obtain additional domains.

Stash the original reference obtained during stream enable inside the
stream structure so that we can use it directly when the stream is
disabled.

Cc: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Harish Chegondi <harish.chegondi@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251110232017.1475869-34-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 weeks agodrm/xe/forcewake: Improve kerneldoc
Matt Roper [Mon, 10 Nov 2025 23:20:19 +0000 (15:20 -0800)]
drm/xe/forcewake: Improve kerneldoc

Improve the kerneldoc for forcewake a bit to give more detail about what
the structures represent.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patch.msgid.link/20251110232017.1475869-33-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
3 weeks agoaccel/amdxdna: Fix deadlock between context destroy and job timeout
Lizhi Hou [Fri, 7 Nov 2025 18:10:50 +0000 (10:10 -0800)]
accel/amdxdna: Fix deadlock between context destroy and job timeout

Hardware context destroy function holds dev_lock while waiting for all jobs
to complete. The timeout job also needs to acquire dev_lock, this leads to
a deadlock.

Fix the issue by temporarily releasing dev_lock before waiting for all
jobs to finish, and reacquiring it afterward.

Fixes: 4fd6ca90fc7f ("accel/amdxdna: Refactor hardware context destroy routine")
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251107181050.1293125-1-lizhi.hou@amd.com
3 weeks agoaccel/amdxdna: Clear mailbox interrupt register during channel creation
Lizhi Hou [Fri, 7 Nov 2025 18:11:15 +0000 (10:11 -0800)]
accel/amdxdna: Clear mailbox interrupt register during channel creation

The mailbox interrupt register is not always cleared when a mailbox channel
is created. This can leave stale interrupt states from previous operations.

Fix this by explicitly clearing the interrupt register in the mailbox
channel creation function.

Fixes: b87f920b9344 ("accel/amdxdna: Support hardware mailbox")
Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251107181115.1293158-1-lizhi.hou@amd.com
3 weeks agodrm/imx/ipuv3: Fix dumb-buffer allocation for non-RGB formats
Thomas Zimmermann [Tue, 4 Nov 2025 15:38:05 +0000 (16:38 +0100)]
drm/imx/ipuv3: Fix dumb-buffer allocation for non-RGB formats

Align pitch to multiples of 8 pixels for bpp values that do not map
to RGB formats. The call to drm_driver_color_mode_format() fails with
DRM_INVALID_FORMAT in these cases. Fall back to manually computing
the pitch alignment from which drm_mode_size_dumb() can compute the
correct pitch.

Fixes userspace that allocates dumb buffers for YUV formats, where
bpp equals 12. A common example is the IGT kms_getfb test.

v2:
- ignore width in calculation

Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: b1d0e470f881 ("drm/imx/ipuv3: Compute dumb-buffer sizes with drm_mode_size_dumb()")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Philipp Zabel <p.zabel@pengutronix.de>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
Cc: Fabio Estevam <festevam@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: imx@lists.linux.dev
Cc: linux-arm-kernel@lists.infradead.org
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patch.msgid.link/20251104153832.189666-1-tzimmermann@suse.de
3 weeks agodrm/xe/pf: Use migration-friendly GGTT auto-provisioning
Michal Wajdeczko [Wed, 12 Nov 2025 12:44:08 +0000 (13:44 +0100)]
drm/xe/pf: Use migration-friendly GGTT auto-provisioning

Instead of trying very hard to find the largest fair GGTT size that
could be allocated for VFs on the current tile, pick some smaller
rounded down to power-of-two value that is more likely to be
provisioned in the same manner by the other PF instance:

  num VFs | GGTT space (MiB)
  --------+-----------------
   63..57 | 56
   56..29 | 64
   28..15 | 128
   14..8  | 256
    7..4  | 512
    3..2  | 1024
       1  | 2048 (regular PF)
       1  | 3584 (admin only PF)

Note that due to FW/HW limitations we can't share all 4GiB GGTT
address space with VFs, so for the larger (>7) number of the VFs
the change in the outcome is happening at different points than
we have in case of GuC contexts/doorbells IDs.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://patch.msgid.link/20251112124408.8094-1-michal.wajdeczko@intel.com
3 weeks agodrm/intel/bmg: Allow device ID usage with single-argument macros
Michał Winiarski [Wed, 12 Nov 2025 13:22:20 +0000 (14:22 +0100)]
drm/intel/bmg: Allow device ID usage with single-argument macros

When INTEL_BMG_G21_IDS were added as a subplatform, token concatenation
operator usage was omitted, making INTEL_BMG_IDS not usable with
single-argument macros.
Fix that by adding the missing operator.

Fixes: 78de8f876683 ("drm/xe: Handle Wa_22010954014 and Wa_14022085890 as device workarounds")
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-25-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 weeks agodrm/xe/pf: Add wait helper for VF FLR
Michał Winiarski [Wed, 12 Nov 2025 13:22:19 +0000 (14:22 +0100)]
drm/xe/pf: Add wait helper for VF FLR

VF FLR requires additional processing done by PF driver.
The processing is done after FLR is already finished from PCIe
perspective.
In order to avoid a scenario where migration state transitions while
PF processing is still in progress, additional synchronization
point is needed.
Add a helper that will be used as part of VF driver struct
pci_error_handlers .reset_done() callback.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-24-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 weeks agodrm/xe/pf: Handle VRAM migration data as part of PF control
Michał Winiarski [Wed, 12 Nov 2025 13:22:18 +0000 (14:22 +0100)]
drm/xe/pf: Handle VRAM migration data as part of PF control

Connect the helpers to allow save and restore of VRAM migration data in
stop_copy / resume device state.

Co-developed-by: Lukasz Laguna <lukasz.laguna@intel.com>
Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-23-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 weeks agodrm/xe/migrate: Add function to copy of VRAM data in chunks
Lukasz Laguna [Wed, 12 Nov 2025 13:22:17 +0000 (14:22 +0100)]
drm/xe/migrate: Add function to copy of VRAM data in chunks

Introduce a new function to copy data between VRAM and sysmem objects.
The existing xe_migrate_copy() is tailored for eviction and restore
operations, which involves additional logic and operates on entire
objects.
The xe_migrate_vram_copy_chunk() allows copying chunks of data to or
from a dedicated buffer object, which is essential in case of VF
migration.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-22-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 weeks agodrm/xe/pf: Add helper to retrieve VF's LMEM object
Lukasz Laguna [Wed, 12 Nov 2025 13:22:16 +0000 (14:22 +0100)]
drm/xe/pf: Add helper to retrieve VF's LMEM object

Instead of accessing VF's lmem_obj directly, introduce a helper function
to make the access more convenient.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-21-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 weeks agodrm/xe/pf: Handle MMIO migration data as part of PF control
Michał Winiarski [Wed, 12 Nov 2025 13:22:15 +0000 (14:22 +0100)]
drm/xe/pf: Handle MMIO migration data as part of PF control

Implement the helpers and use them for save and restore of MMIO
migration data in stop_copy / resume device state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-20-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 weeks agodrm/xe/pf: Handle GGTT migration data as part of PF control
Michał Winiarski [Wed, 12 Nov 2025 13:22:14 +0000 (14:22 +0100)]
drm/xe/pf: Handle GGTT migration data as part of PF control

Connect the helpers to allow save and restore of GGTT migration data in
stop_copy / resume device state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-19-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 weeks agodrm/xe/pf: Add helpers for VF GGTT migration data handling
Michał Winiarski [Wed, 12 Nov 2025 13:22:13 +0000 (14:22 +0100)]
drm/xe/pf: Add helpers for VF GGTT migration data handling

In an upcoming change, the VF GGTT migration data will be handled as
part of VF control state machine. Add the necessary helpers to allow the
migration data transfer to/from the HW GGTT resource.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-18-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 weeks agodrm/xe/pf: Handle GuC migration data as part of PF control
Michał Winiarski [Wed, 12 Nov 2025 13:22:12 +0000 (14:22 +0100)]
drm/xe/pf: Handle GuC migration data as part of PF control

Connect the helpers to allow save and restore of GuC migration data in
stop_copy / resume device state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-17-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 weeks agodrm/xe/pf: Switch VF migration GuC save/restore to struct migration data
Michał Winiarski [Wed, 12 Nov 2025 13:22:11 +0000 (14:22 +0100)]
drm/xe/pf: Switch VF migration GuC save/restore to struct migration data

In upcoming changes, the GuC VF migration data will be handled as part
of separate SAVE/RESTORE states in VF control state machine.
Now that the data is decoupled from both guc_state debugfs and PAUSE
state, we can safely remove the struct xe_gt_sriov_state_snapshot and
modify the GuC save/restore functions to operate on struct
xe_sriov_migration_data.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-16-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 weeks agodrm/xe/pf: Don't save GuC VF migration data on pause
Michał Winiarski [Wed, 12 Nov 2025 13:22:10 +0000 (14:22 +0100)]
drm/xe/pf: Don't save GuC VF migration data on pause

In upcoming changes, the GuC VF migration data will be handled as part
of separate SAVE/RESTORE states in VF control state machine.
Remove it from PAUSE state.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-15-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 weeks agodrm/xe/pf: Remove GuC migration data save/restore from GT debugfs
Michał Winiarski [Wed, 12 Nov 2025 13:22:09 +0000 (14:22 +0100)]
drm/xe/pf: Remove GuC migration data save/restore from GT debugfs

In upcoming changes, SR-IOV VF migration data will be extended beyond
GuC data and exported to userspace using VFIO interface (with a
vendor-specific variant driver) and a device-level debugfs interface.
Remove the GT-level debugfs.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-14-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 weeks agodrm/xe/pf: Increase PF GuC Buffer Cache size and use it for VF migration
Michał Winiarski [Wed, 12 Nov 2025 13:22:08 +0000 (14:22 +0100)]
drm/xe/pf: Increase PF GuC Buffer Cache size and use it for VF migration

Contiguous PF GGTT VMAs can be scarce after creating VFs.
Increase the GuC buffer cache size to 8M for PF so that we can fit GuC
migration data (which currently maxes out at just over 4M) and use the
cache instead of allocating fresh BOs.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-13-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
3 weeks agodrm/xe: Allow the caller to pass guc_buf_cache size
Michał Winiarski [Wed, 12 Nov 2025 13:22:07 +0000 (14:22 +0100)]
drm/xe: Allow the caller to pass guc_buf_cache size

An upcoming change will use GuC buffer cache as a place where GuC
migration data will be stored, and the memory requirement for that is
larger than indirect data.
Allow the caller to pass the size based on the intended usecase.

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-12-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>