CVE-2026-35554 Overview
A race condition vulnerability exists in the Apache Kafka Java producer client's buffer pool management that can cause messages to be silently delivered to incorrect topics. When a produce batch expires due to delivery.timeout.ms while a network request containing that batch is still in flight, the batch's ByteBuffer is prematurely deallocated and returned to the buffer pool. If a subsequent producer batch—potentially destined for a different topic—reuses this freed buffer before the original network request completes, the buffer contents may become corrupted. This can result in messages being delivered to unintended topics without any error being reported to the producer.
Critical Impact
This vulnerability enables silent data misdirection where messages intended for one topic may be delivered to a different topic, potentially exposing sensitive data to unauthorized consumers and causing data integrity issues across downstream systems.
Affected Products
- Apache Kafka versions ≤ 3.9.1
- Apache Kafka versions ≤ 4.0.1
- Apache Kafka versions ≤ 4.1.1
Discovery Timeline
- April 7, 2026 - CVE-2026-35554 published to NVD
- April 8, 2026 - Last updated in NVD database
Technical Details for CVE-2026-35554
Vulnerability Analysis
This vulnerability represents a classic Time-of-Check Time-of-Use (TOCTOU) race condition within the Kafka producer client's memory management subsystem. The core issue stems from improper synchronization between the batch timeout mechanism and the network I/O layer.
When the Kafka producer client batches messages for efficient network transmission, it allocates ByteBuffers from a shared buffer pool. These buffers are reused to minimize garbage collection overhead and improve throughput. The race condition occurs in the critical window between when a batch times out (based on delivery.timeout.ms) and when the associated network request actually completes.
Data Confidentiality Impact:
Messages intended for one topic may be delivered to a different topic, potentially exposing sensitive data to consumers who have access to the destination topic but not the intended source topic. This is particularly dangerous in multi-tenant environments where different topics may have different access control policies.
Data Integrity Impact:
Consumers on the receiving topic may encounter unexpected or incompatible messages, leading to deserialization failures, processing errors, and corrupted downstream data. This can cascade through data pipelines causing widespread data quality issues.
Root Cause
The root cause is a race condition (CWE-362) in the buffer pool management logic of the Apache Kafka Java producer client. The vulnerability exists because the batch timeout handler and the network completion handler operate asynchronously without proper synchronization on shared buffer resources. When a batch expires, the code assumes the network request has failed and immediately returns the ByteBuffer to the pool for reuse, but the network layer may still be using that buffer for an in-flight request.
Attack Vector
This vulnerability has a network attack vector and requires no user interaction or privileges to trigger. The condition can be triggered under normal production load scenarios where network latency causes requests to remain in flight longer than the configured delivery.timeout.ms. The vulnerability is particularly likely to manifest during:
- High network latency conditions between producer and broker
- Network congestion or packet loss scenarios
- High throughput workloads with aggressive timeout configurations
- Broker-side delays due to replication or disk I/O
The attack does not require direct attacker intervention—environmental conditions alone can trigger the race condition, making this a reliability and security concern in any Kafka deployment.
Detection Methods for CVE-2026-35554
Indicators of Compromise
- Unexpected messages appearing in topics where they don't belong based on application logic
- Consumer deserialization exceptions or schema validation failures on topics with strict schemas
- Data inconsistencies between source systems and Kafka-backed data stores
- Application logs showing messages received for unexpected message types or formats
Detection Strategies
- Implement schema validation on consumers to detect messages that don't conform to expected topic schemas
- Monitor Kafka consumer error rates for deserialization failures which may indicate misdirected messages
- Enable producer-side message tracing with correlation IDs to track message routing anomalies
- Compare message counts between producer instrumentation and consumer acknowledgments per topic
Monitoring Recommendations
- Set up alerts for sudden spikes in consumer deserialization errors across topics
- Monitor the record-error-rate and batch-expired-total producer metrics for correlation patterns
- Implement end-to-end message validation in critical data pipelines to detect content mismatches
- Review Kafka producer logs for batch expiration warnings coinciding with high network latency
How to Mitigate CVE-2026-35554
Immediate Actions Required
- Upgrade Apache Kafka client libraries to patched versions: 3.9.2, 4.0.2, 4.1.2, or 4.2.0 or later
- Review and increase delivery.timeout.ms configuration to reduce likelihood of batch expiration during in-flight requests
- Audit topics that may have received misdirected messages if the vulnerability window has been active in production
- Implement consumer-side validation to reject messages that don't match expected schemas or formats
Patch Information
Apache Kafka has released patched versions to address this vulnerability. Users should upgrade to the following versions:
- Apache Kafka 3.9.2 - Fixes the race condition for the 3.9.x release line
- Apache Kafka 4.0.2 - Fixes the race condition for the 4.0.x release line
- Apache Kafka 4.1.2 - Fixes the race condition for the 4.1.x release line
- Apache Kafka 4.2.0 - Includes the fix in the latest major release
For detailed technical information about the fix, refer to Apache Kafka Issue KAFKA-19012 and the Apache Mailing List Thread.
Workarounds
- Increase delivery.timeout.ms to a value significantly higher than observed network round-trip times to reduce batch expiration likelihood
- Reduce batch.size and linger.ms to minimize the window of exposure by sending smaller, more frequent batches
- Implement application-level message validation and routing verification in consumers as a defense-in-depth measure
- Consider temporarily reducing producer throughput in high-latency network conditions until patches can be applied
# Configuration example - Increase delivery timeout to reduce race condition likelihood
# Add to producer configuration (producer.properties)
delivery.timeout.ms=300000
request.timeout.ms=60000
linger.ms=5
batch.size=16384
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

