CVE-2025-68790: Linux Kernel Use-After-Free Vulnerability

CVE-2025-68790 Overview

A use-after-free vulnerability has been discovered in the Linux kernel's mlx5 network driver, specifically in the HCA_PORTS component handling during LAG (Link Aggregation Group) teardown. The vulnerability occurs when the hca_devcom_comp pointer is not cleared in the device's private data after unregistration, allowing a second pass through mlx5_unload_one() to attempt unregistration of an already freed component.

Critical Impact
This vulnerability can lead to kernel panics and system crashes, particularly on s390 architecture where PCI error recovery events routinely trigger multiple passes through the unload function.

Affected Products

Linux kernel with mlx5_core network driver
Systems using Mellanox/NVIDIA ConnectX network adapters
s390 architecture systems with PCI error recovery enabled

Discovery Timeline

2026-01-13 - CVE CVE-2025-68790 published to NVD
2026-01-13 - Last updated in NVD database

Technical Details for CVE-2025-68790

Vulnerability Analysis

This use-after-free vulnerability stems from improper state management in the mlx5 network driver during device unload operations. The Linux kernel's mlx5_core driver manages HCA (Host Channel Adapter) ports through a device communication component (hca_devcom_comp). During normal LAG teardown, this component is unregistered, but the pointer in the device's private data structure is not cleared.

On s390 architecture, PCI-level recovery events characteristically trigger two concurrent passes through mlx5_unload_one() - one via the poll_health() method and another through mlx5_pci_err_detected() as a callback from the generic PCI error recovery mechanism. When the second pass occurs before the pointer is cleared, it attempts to unregister the already-freed component, resulting in a use-after-free condition.

The crash manifests with a failing address pattern of 6b6b6b6b6b6b6000, which is the KASAN (Kernel Address Sanitizer) poison pattern for freed memory (0x6b), providing clear indication of use-after-free behavior.

Root Cause

The root cause is a missing state cleanup operation in the LAG teardown code path. After calling the unregister function for the HCA_PORTS component, the hca_devcom_comp pointer remains set to the now-invalid address. The fix requires clearing this pointer to NULL immediately after unregistration to prevent subsequent accesses.

The race condition is exacerbated by the s390 architecture's PCI error recovery design, which legitimately invokes the unload path through multiple code paths simultaneously. Without proper pointer clearing, the driver cannot distinguish between an initialized component requiring cleanup and an already-freed component.

Attack Vector

The vulnerability is triggered through PCI error recovery events on affected systems. While the attack vector requires local system access or the ability to induce PCI errors, the consequences include:

Kernel panic due to invalid memory access
System crash and potential denial of service
Possible memory corruption if the freed memory is reallocated before the second unregister attempt

The call trace shows the crash occurring in __lock_acquire() when the driver attempts to acquire a lock on the already-freed device structure during mlx5_detach_device().

Detection Methods for CVE-2025-68790

Indicators of Compromise

Kernel panic messages containing mlx5_unload_one or mlx5_pci_err_detected in the call trace
KASAN reports with failing addresses matching the pattern 6b6b6b6b6b6b6xxx (freed memory poison)
System crashes during PCI error recovery events on systems with Mellanox/NVIDIA network adapters
Oops messages referencing mlx5_detach_device or mlx5_core module

Detection Strategies

Enable KASAN (Kernel Address Sanitizer) to detect use-after-free conditions during development and testing
Monitor kernel logs for mlx5_core module errors, particularly during PCI error recovery events
Implement system monitoring for unexpected kernel panics on servers with ConnectX network adapters
Use kernel debugging tools like lockdep to detect lock acquisition on invalid memory

Monitoring Recommendations

Configure kernel crash dump collection (kdump) to capture detailed crash analysis for mlx5-related panics
Set up alerting on kernel oops and panic events, filtering for mlx5_core module involvement
Monitor PCI error recovery events on s390 systems with mlx5 network devices
Review system stability metrics on servers using LAG configurations with Mellanox adapters

How to Mitigate CVE-2025-68790

Immediate Actions Required

Update the Linux kernel to a version containing the fix commits
If immediate patching is not possible, consider temporarily disabling LAG configurations on affected systems
Prioritize patching on s390 architecture systems where the vulnerability is most easily triggered
Review and test PCI error recovery procedures before applying patches in production environments

Patch Information

The vulnerability has been addressed in the Linux kernel through the following commits:

The fix ensures that hca_devcom_comp is cleared in the device's private data immediately after unregistering it during LAG teardown, preventing subsequent passes through the unload function from attempting to access freed memory.

Workarounds

Avoid triggering PCI error recovery events on systems that cannot be immediately patched
On s390 systems, coordinate maintenance windows to reduce exposure during PCI-related operations
Consider disabling Link Aggregation features temporarily on critical systems until patches can be applied
Enable kernel crash dump analysis to collect diagnostic information if crashes occur before patching

bash

# Verify current kernel version and check for mlx5 module
uname -r
lsmod | grep mlx5

# Check if LAG is configured on mlx5 interfaces
ip link show type bond