]> Gentwo Git Trees - linux/.git/commit
drm/xe: Add support to handle hardware errors
authorRiana Tauro <riana.tauro@intel.com>
Tue, 26 Aug 2025 06:34:15 +0000 (12:04 +0530)
committerRodrigo Vivi <rodrigo.vivi@intel.com>
Tue, 26 Aug 2025 14:11:34 +0000 (10:11 -0400)
commit0a2a873d615a39e8a87d3f15285ed888341ddce8
tree66cddf5ebdb2d67323b44bf0614033695b745f9f
parentf646c9f9371b28b8f93e619fe003415f6aaeb416
drm/xe: Add support to handle hardware errors

Gfx device reports two classes of errors: uncorrectable and
correctable. Depending on the severity uncorrectable errors are further
classified Non-Fatal and Fatal.

Correctable and Non-Fatal errors: These errors are reported as MSI. Bits in
the Master Interrupt Register indicate the class of the error.
The source of the error is then read from the Device Error Source
Register.

Fatal errors: These are reported as PCIe errors
When a PCIe error is asserted, the OS will perform a SBR (Secondary
Bus reset) which causes the driver to reload. The error registers are
sticky and the values are maintained through SBR.

Add basic support to handle these errors.

Bspec: 50875, 53073, 53074, 53075, 53076

v2: Format commit message (Umesh)
v3: fix documentation (Stuart)

Cc: Stuart Summers <stuart.summers@intel.com>
Co-developed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-9-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
drivers/gpu/drm/xe/Makefile
drivers/gpu/drm/xe/regs/xe_hw_error_regs.h [new file with mode: 0644]
drivers/gpu/drm/xe/regs/xe_irq_regs.h
drivers/gpu/drm/xe/xe_hw_error.c [new file with mode: 0644]
drivers/gpu/drm/xe/xe_hw_error.h [new file with mode: 0644]
drivers/gpu/drm/xe/xe_irq.c