CVE-2026-43404: Linux Kernel Race Condition Vulnerability

CVE-2026-43404 Overview

CVE-2026-43404 is a livelock and starvation vulnerability in the Linux kernel memory management subsystem. The flaw resides in hmm_range_fault(), which can spin indefinitely while attempting to acquire a device-private folio lock during do_swap_page processing. When the process holding the folio lock depends on a work item scheduled on the same CPU as the spinning hmm_range_fault() caller, the work item is starved and the system enters a state that never resolves. The condition was reproduced by the xe_exec_system_allocator IGT test and has been resolved upstream.

Critical Impact
A workload using Heterogeneous Memory Management (HMM) with device-private memory can trigger a livelock that halts forward progress on the affected CPU until the system is reset.

Affected Products

Linux kernel — mm subsystem (hmm_range_fault() path)
Linux kernel — migration entry handling (migration_entry_wait_on_locked())
Workloads using device-private memory with HMM (for example, Xe DRM driver paths exercised by xe_exec_system_allocator)

Discovery Timeline

2026-05-08 - CVE-2026-43404 published to NVD
2026-05-12 - Last updated in NVD database

Technical Details for CVE-2026-43404

Vulnerability Analysis

The defect lives in the Linux kernel HMM page-fault path. When hmm_range_fault() encounters a device-private folio in do_swap_page(), it calls folio_trylock() to acquire the folio lock so the page can be migrated back to system RAM. On folio_trylock() failure, the original implementation retries in a tight loop until the lock is obtained.

The spinning thread never voluntarily yields the CPU. If the lock holder is blocked inside migrate_device_unmap() waiting on lru_add_drain_all(), which schedules a short work item on every online CPU, the work item assigned to the spinning CPU cannot run. The result is a circular wait between two kernel threads on the same CPU.

The upstream fix replaces the busy retry with a wait on the folio. migration_entry_wait_on_locked() was renamed to softleaf_entry_wait_on_locked() and is now invoked from do_swap_page() so the faulting thread sleeps until the folio lock is released, allowing the deferred work item to execute.

Root Cause

The root cause is an uncoordinated retry loop combined with a CPU-bound work dependency. Three preconditions must align: device-private and system memory folios are both processed in migrate_device_unmap(), the device-private folio has a mapcount > 1 so migration PTE insertion is deferred to try_to_migrate(), and the kernel is built with no or voluntary preemption only. This is a Race Condition class defect.

Attack Vector

Exploitation requires a local workload that drives HMM with device-private memory and triggers concurrent migration. No remote vector is documented. The realistic impact is a Denial of Service through kernel livelock rather than privilege escalation or data disclosure. No public proof of concept exists beyond the xe_exec_system_allocator IGT regression test that reproduced the issue.

No verified exploit code is available. Technical details are documented in the upstream commits referenced in Kernel Git Commit b570f37a, Kernel Git Commit 7e6e2fc9, and Kernel Git Commit 94b6d0ba.

Detection Methods for CVE-2026-43404

Indicators of Compromise

A kernel thread stuck in hmm_range_fault() consuming 100% of a single CPU with no forward progress.
Another task simultaneously blocked inside migrate_device_unmap() calling lru_add_drain_all().
Soft lockup or RCU stall warnings referencing do_swap_page, hmm_range_fault, or folio_trylock in dmesg.

Detection Strategies

Collect kernel stack traces with echo l > /proc/sysrq-trigger or perf record when a CPU appears pinned; look for the hmm_range_fault → folio_trylock retry pattern.
Enable CONFIG_SOFTLOCKUP_DETECTOR and CONFIG_DETECT_HUNG_TASK so the kernel logs hangs that match this signature.
Inventory hosts running GPU compute or HMM-enabled drivers, including the Xe DRM driver, and compare running kernel versions against the patched commits.

Monitoring Recommendations

Forward kernel logs to a centralized log platform and alert on soft lockup, rcu: INFO: rcu_sched self-detected stall, and stack frames containing hmm_range_fault.
Track per-CPU utilization on GPU compute nodes and flag sustained single-core saturation that does not correlate with user workload.
Monitor uptime regressions and unplanned reboots on hosts using device-private memory drivers.

How to Mitigate CVE-2026-43404

Immediate Actions Required

Apply the upstream stable kernel updates that include commit a69d1ab971a624c6f112cea61536569d579c3215 and the cherry-picks referenced in the NVD entry.
Prioritize patching on hosts running GPU compute, AI training, or other workloads that use HMM with device-private memory.
Reboot patched systems to ensure the updated do_swap_page() and softleaf_entry_wait_on_locked() paths are loaded.

Patch Information

The fix replaces the folio_trylock() spin with a wait on the folio in do_swap_page(). migration_entry_wait_on_locked() is renamed to softleaf_entry_wait_on_locked() and a stub is added for !CONFIG_MIGRATION builds with a WARN_ON_ONCE() guard. Patched commits: b570f37a, 7e6e2fc9, and 94b6d0ba.

Workarounds

Build the kernel with CONFIG_PREEMPT=y (full preemption) to break precondition (c) and prevent the livelock from manifesting.
Avoid running workloads that drive concurrent device-private folio migration on affected kernels until patches are deployed.
Where feasible, disable or unload drivers that exercise the HMM device-private path on unpatched hosts.

bash

# Verify the running kernel includes the fix and check preemption model
uname -r
zgrep -E 'CONFIG_PREEMPT|CONFIG_MIGRATION' /boot/config-$(uname -r)

# Inspect for the livelock signature in kernel logs
dmesg -T | grep -E 'soft lockup|hmm_range_fault|migrate_device_unmap'