CVE-2025-68793: Linux Kernel Use-After-Free Vulnerability

CVE-2025-68793 Overview

CVE-2025-68793 is a Use-After-Free (UAF) vulnerability in the Linux kernel's AMD GPU driver (amdgpu) affecting the GPU recovery mechanism. The flaw exists due to a race condition between the scheduler timeout callback and the TDR (Timeout Detection and Recovery) work queue during GPU recovery operations. When the GPU recovery function calls drm_sched_stop() followed by drm_sched_start(), the TDR queue may free a job structure before the timeout callback completes, resulting in a UAF when accessing the pasid field.

Critical Impact
Local attackers may exploit this race condition to trigger memory corruption, potentially leading to privilege escalation, denial of service, or information disclosure on systems with AMD GPUs.

Affected Products

Linux Kernel with AMD GPU driver (amdgpu)
Systems utilizing AMD GPU hardware with affected kernel versions
Workloads involving GPU recovery operations

Discovery Timeline

2026-01-13 - CVE CVE-2025-68793 published to NVD
2026-01-13 - Last updated in NVD database

Technical Details for CVE-2025-68793

Vulnerability Analysis

This vulnerability is classified as a Use-After-Free resulting from a Time-of-Check Time-of-Use (TOCTOU) race condition. The issue occurs within the amdgpu_device_gpu_recover() function in the AMD GPU kernel driver. During GPU recovery, the driver performs a sequence of operations that includes stopping the scheduler via drm_sched_stop() and later restarting it with drm_sched_start().

The restart operation triggers the TDR work queue, which may free job structures asynchronously. If the timeout callback is still executing and attempts to access the job->pasid field after the TDR queue has already freed the job, the driver reads from deallocated memory. This creates a classic UAF condition that can be exploited for memory corruption attacks.

The KASAN (Kernel Address Sanitizer) trace captured in the vulnerability report shows a slab-use-after-free occurring at the amdgpu_device_gpu_recover+0x968/0x990 offset, with a 4-byte read from freed memory at address ffff88b0ce3f794c.

Root Cause

The root cause is improper synchronization between the scheduler timeout callback and the TDR work queue in the GPU recovery path. The pasid (Process Address Space Identifier) field is accessed after the job structure may have been freed by the TDR queue. The fix involves caching the pasid value early in the recovery process before any operations that could result in the job being freed, thereby eliminating the race condition.

Attack Vector

Exploitation of this vulnerability requires local access to a system with an AMD GPU. An attacker could potentially trigger this race condition by:

Initiating workloads that cause GPU hangs requiring recovery
Timing attacks to exploit the window between drm_sched_start() and the timeout callback completion
Manipulating GPU workload scheduling to increase the likelihood of triggering the race condition

The vulnerability is triggered through the drm_sched_job_timedout workqueue handler in the gpu_sched module, which is invoked during GPU timeout recovery scenarios. The race window exists between the job being freed by the TDR queue and the access to job->pasid in the recovery function.

Detection Methods for CVE-2025-68793

Indicators of Compromise

KASAN slab-use-after-free reports in kernel logs referencing amdgpu_device_gpu_recover
Kernel oops or panic events originating from AMD GPU driver functions
Unexpected system crashes during GPU-intensive workloads or recovery operations
Workqueue errors related to amdgpu-reset-dev or drm_sched_job_timedout

Detection Strategies

Enable KASAN (Kernel Address Sanitizer) in development or testing environments to detect UAF conditions
Monitor kernel logs for errors containing amdgpu_device_gpu_recover, amdgpu_job_timedout, or drm_sched_job_timedout
Deploy kernel livepatch monitoring to track attempts to exploit race conditions in GPU drivers
Implement system audit rules to log GPU recovery events and associated process context

Monitoring Recommendations

Configure centralized logging to capture kernel messages with amdgpu or gpu_sched module references
Set up alerts for KASAN reports or memory corruption warnings in production systems
Monitor for unusual patterns of GPU recovery events that could indicate exploitation attempts
Track kernel module loading events for amdgpu and gpu_sched to ensure patched versions are in use

How to Mitigate CVE-2025-68793

Immediate Actions Required

Update the Linux kernel to a version containing the fix commit
Apply vendor-provided kernel patches for distributions using affected kernel versions
Monitor systems for signs of exploitation while patches are being deployed
Consider temporarily reducing GPU workloads that may trigger recovery scenarios on critical systems

Patch Information

The vulnerability has been resolved in the Linux kernel through commits that cache the pasid value early in the recovery process to avoid accessing potentially freed job structures. The fix is available in the stable kernel tree:

The patch was cherry-picked from commit 20880a3fd5dd7bca1a079534cf6596bda92e107d. System administrators should update to kernel versions containing these commits or apply backported patches from their Linux distribution.

Workarounds

No complete workaround is available; applying the kernel patch is the recommended remediation
Reducing GPU-intensive workloads may decrease the likelihood of triggering GPU recovery scenarios
Monitoring and logging GPU recovery events can help detect potential exploitation attempts
Consider implementing additional access controls on systems with AMD GPUs to limit local attack surface