CVE-2026-23136 Overview
A vulnerability has been identified in the Linux kernel's libceph component where the sparse-read state machine fails to reset properly when a connection fault occurs. When an OSD (Object Storage Daemon) connection experiences a fault, the connection is abandoned and reestablished with pending operations being retried. However, the sparse-read state tracking mechanism operates largely independently of the messenger's connection state. This design flaw means that if a connection is lost mid-payload or the sparse-read state machine encounters an error, the state is not properly reset, causing the OSD client to misinterpret new reply data as a continuation of previous incomplete operations.
Critical Impact
Systems using Ceph storage with the Linux kernel may experience persistent denial of service conditions due to the sparse-read state machine entering an unrecoverable failure loop after connection faults.
Affected Products
- Linux Kernel (versions with libceph sparse-read functionality)
- Systems using Ceph distributed storage
- CephFS clients utilizing the affected kernel versions
Discovery Timeline
- 2026-02-14 - CVE CVE-2026-23136 published to NVD
- 2026-02-18 - Last updated in NVD database
Technical Details for CVE-2026-23136
Vulnerability Analysis
The vulnerability resides in the libceph subsystem of the Linux kernel, specifically in how the OSD client tracks the progress of sparse-read reply operations. The OSD client maintains a separate state machine for sparse-read operations that functions independently from the underlying messenger connection state. This architectural separation creates a synchronization gap that becomes problematic during fault recovery scenarios.
When a connection fault occurs, the messenger properly handles reconnection and operation retry logic. However, the sparse-read state machine retains its previous state, including any partial data tracking or error conditions from the interrupted operation. When the retried operation receives a fresh reply from the OSD, the state machine incorrectly interprets this new data as a continuation of the previous incomplete transmission.
This state corruption manifests in observable error conditions where the reported data length does not match the expected extent length, leading to socket errors. The kernel logs will show repeating messages indicating a mismatch between data length and extent values, followed by socket read errors, creating an infinite loop that prevents recovery.
Root Cause
The root cause is the lack of state reset logic in the osd_fault() function for the sparse-read state machine. When a fault handler is invoked to process a connection failure, it properly reinitializes connection-level state but neglects to reset the operation-specific sparse-read tracking state. This oversight allows stale state information to persist across connection recovery cycles, corrupting the interpretation of subsequent legitimate responses.
Attack Vector
This vulnerability is triggered by network conditions that cause OSD connection faults during sparse-read operations. While primarily a reliability issue, an attacker with network-level access could potentially induce connection faults at strategic moments to trigger the vulnerable code path. The attack does not require authentication to the Ceph cluster but does require the ability to disrupt network connectivity between the kernel client and OSD servers.
The vulnerability manifests when:
- A sparse-read operation is in progress against a Ceph OSD
- The connection experiences a fault mid-payload or encounters a state machine error
- The connection is reestablished and operations are retried
- The unreset state machine misinterprets the new reply data
The fix involves adding sparse-read state reset logic to the osd_fault() function, ensuring that all retried operations start from a clean state. The kernel commits referenced in the security advisories implement this reset mechanism across multiple stable kernel branches.
Detection Methods for CVE-2026-23136
Indicators of Compromise
- Kernel log messages showing repeated patterns of libceph: data len X != extent len Y errors
- Socket error messages on OSD connections such as libceph: osd0 ... socket error on read
- Ceph client operations experiencing persistent timeouts or hangs
- System instability during Ceph storage access following network disruptions
Detection Strategies
- Monitor kernel logs (dmesg or /var/log/kern.log) for the characteristic error loop pattern involving data length mismatches
- Implement alerting on repeated libceph socket errors occurring in rapid succession
- Track Ceph client health metrics for anomalous connection fault rates
- Use system monitoring tools to detect processes blocked on Ceph filesystem operations
Monitoring Recommendations
- Configure log monitoring to alert on the specific error pattern: data len .* != extent len
- Establish baseline metrics for OSD connection fault rates and alert on deviations
- Monitor Ceph client I/O latency for sudden degradation following network events
- Implement periodic health checks on CephFS mount points to detect hung operations
How to Mitigate CVE-2026-23136
Immediate Actions Required
- Update the Linux kernel to a patched version that includes the sparse-read state reset fix
- Monitor systems for signs of the error loop pattern and schedule maintenance reboots if detected
- Consider temporarily increasing network reliability measures for Ceph client connections
- Review and apply the kernel commits listed in the external references
Patch Information
The vulnerability has been resolved in multiple Linux kernel stable branches. The fix adds proper state reset logic to the osd_fault() function to ensure sparse-read operations start from a clean state after connection recovery.
The following kernel commits address this vulnerability:
Workarounds
- If immediate patching is not possible, consider remounting CephFS filesystems after detecting the error condition
- Implement network redundancy to reduce the likelihood of connection faults during sparse-read operations
- Use client-side timeout configurations to force operation retries that may clear the corrupted state
- Consider temporarily switching to non-sparse-read operation modes if supported by your Ceph configuration
# Check current kernel version for vulnerable libceph
uname -r
# Monitor for the characteristic error pattern in kernel logs
dmesg | grep -E "data len .* != extent len|libceph.*socket error"
# Force unmount and remount of CephFS if error loop is detected
umount -f /mnt/cephfs
mount -t ceph <mon_addr>:/ /mnt/cephfs -o name=admin,secretfile=/etc/ceph/admin.secret
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


