CVE-2026-23213 Overview
A race condition vulnerability has been identified in the Linux kernel's AMD GPU driver (drm/amd/pm) that can lead to system instability during SMU (System Management Unit) Mode 1 reset operations. During the reset cycle, the ASIC becomes temporarily inaccessible via PCIe, but other driver components may still attempt to access MMIO (Memory-Mapped I/O) registers, resulting in incomplete PCIe transactions that can cause NMI (Non-Maskable Interrupt) panics or complete system hangs.
Critical Impact
Systems with AMD GPUs may experience kernel panics or complete system hangs when the GPU driver triggers a Mode 1 reset while other driver threads or interrupt handlers attempt concurrent MMIO register access.
Affected Products
- Linux kernel with AMD GPU driver (drm/amd/pm module)
- Systems with AMD graphics hardware utilizing SMU Mode 1 reset functionality
- Linux distributions shipping affected kernel versions
Discovery Timeline
- 2026-02-18 - CVE CVE-2026-23213 published to NVD
- 2026-02-18 - Last updated in NVD database
Technical Details for CVE-2026-23213
Vulnerability Analysis
This vulnerability stems from a race condition in the AMD GPU power management subsystem within the Linux kernel. When the driver initiates a Mode 1 reset of the System Management Unit, the ASIC (Application-Specific Integrated Circuit) undergoes a reset cycle during which it becomes temporarily unavailable on the PCIe bus.
The core issue is that during this reset window, other driver components—including interrupt handlers and concurrent driver threads—may still attempt to access MMIO registers. Since the hardware is offline, these PCIe transactions cannot complete successfully. The result is uncompleted PCIe transactions that the system's NMI watchdog or PCIe error handling mechanisms interpret as fatal errors, leading to kernel panics or complete system hangs.
The fix introduces a no_hw_access flag that is set immediately after triggering the reset, signaling to other driver components that they should skip any register access attempts. A memory barrier (smp_mb()) ensures this flag update is visible across all CPU cores before the driver enters its sleep/wait state for the reset to complete.
Root Cause
The root cause is a missing synchronization mechanism between the SMU Mode 1 reset path and other driver components that perform hardware register access. Without proper coordination, there is a timing window where:
- The reset is initiated and the ASIC goes offline
- Other threads or interrupt handlers attempt MMIO access
- PCIe transactions fail because the target device is unreachable
- The system responds to these failures with NMI panics or hangs
The absence of a globally visible flag and appropriate memory barriers allowed this race condition to manifest, particularly on multi-core systems where different CPU cores might be executing driver code simultaneously.
Attack Vector
This vulnerability is triggered during normal GPU driver operations rather than through external attack vectors. The race condition occurs internally within the driver when Mode 1 reset is invoked. While not directly exploitable for remote code execution, it can cause denial of service through system instability.
The vulnerability manifests when concurrent access patterns occur during the reset window. The fix addresses this by implementing proper synchronization through the no_hw_access flag combined with memory barrier instructions to ensure visibility across all processor cores. For detailed implementation, see the kernel git commits referenced in the patch.
Detection Methods for CVE-2026-23213
Indicators of Compromise
- Kernel panic messages referencing NMI watchdog timeouts during GPU operations
- System logs showing PCIe transaction errors related to AMD GPU device
- Crash dumps indicating faults in drm/amd/pm or SMU-related kernel functions
- Unexpected system hangs occurring during GPU workload transitions or power management events
Detection Strategies
- Monitor kernel logs (dmesg, /var/log/kern.log) for NMI panic messages or PCIe errors associated with AMD GPU devices
- Implement crash dump collection to capture kernel oops/panic events for post-mortem analysis
- Review system stability logs for patterns of hangs correlating with GPU activity or power state changes
- Use SentinelOne Singularity platform to detect anomalous kernel behavior and crash patterns
Monitoring Recommendations
- Enable kernel crash dump collection (kdump) to capture diagnostic information during system failures
- Configure kernel logging to capture PCIe error events and driver debug messages
- Implement system health monitoring to detect unexpected reboots or hangs in production environments
- Deploy SentinelOne agents for real-time kernel integrity monitoring and threat detection
How to Mitigate CVE-2026-23213
Immediate Actions Required
- Update to a patched Linux kernel version containing the fix (commit 7edb503fe4b6d67f47d8bb0dfafb8e699bb0f8a4)
- Apply vendor-provided kernel updates from your Linux distribution
- If updates cannot be applied immediately, consider reducing GPU workloads that may trigger Mode 1 resets
- Review system stability and ensure crash dump mechanisms are in place to capture any occurrences
Patch Information
The vulnerability has been addressed through kernel patches that introduce the no_hw_access flag and appropriate memory barriers. The fix is available through multiple stable kernel branches:
Organizations should apply kernel updates from their Linux distribution vendor that incorporate these fixes.
Workarounds
- Apply vendor-provided kernel patches as the primary mitigation
- If patching is not immediately possible, avoid operations that may trigger GPU Mode 1 resets
- Implement system monitoring to detect and respond to crash events quickly
- Consider temporarily using alternative graphics drivers if available and compatible with your workload
# Check current kernel version and update to patched version
uname -r
# For Debian/Ubuntu systems
sudo apt update && sudo apt upgrade linux-image-generic
# For RHEL/CentOS/Fedora systems
sudo dnf update kernel
# Reboot to apply the updated kernel
sudo reboot
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


