CVE-2026-42440 Overview
CVE-2026-42440 is a denial of service vulnerability in Apache OpenNLP's AbstractModelReader class. The flaw stems from unbounded array allocation [CWE-789] when parsing binary model files. The methods getOutcomes(), getOutcomePatterns(), and getPredicates() read attacker-controlled 32-bit integer counts from a .bin model stream and pass them directly to array allocations without validation. A crafted model file can set these counts to Integer.MAX_VALUE, exhausting the JVM heap and triggering an OutOfMemoryError. The vulnerability affects Apache OpenNLP versions before 2.5.9 and before 3.0.0-M3.
Critical Impact
A single small .bin model file from an untrusted source can crash any JVM that loads it through GenericModelReader or higher-level components delegating to it.
Affected Products
- Apache OpenNLP versions before 2.5.9 (2.x branch)
- Apache OpenNLP 3.0.0-M1
- Apache OpenNLP 3.0.0-M2
Discovery Timeline
- 2026-05-04 - CVE-2026-42440 published to NVD
- 2026-05-06 - Last updated in NVD database
Technical Details for CVE-2026-42440
Vulnerability Analysis
The vulnerability resides in Apache OpenNLP's binary model deserialization logic. When AbstractModelReader parses a .bin model stream, three methods read count fields that determine array sizes. getOutcomes() allocates new String[numOutcomes], getOutcomePatterns() allocates new int[numOCTypes][], and getPredicates() allocates new String[NUM_PREDS]. None of these methods validate that the count is non-negative or within a reasonable bound before allocation.
The failure occurs early in deserialization. For a GIS model, getOutcomes() is reached after only the model-type string, the correction constant, and the correction parameter have been read. An attacker incurs no meaningful payload size cost to weaponize the file, making the exploit asymmetric and reliable across deployments that ingest external models.
Root Cause
The root cause is missing input validation on length-prefixed arrays during deserialization. The int count is treated as trusted metadata even when the model file originates from an untrusted source. Any value up to Integer.MAX_VALUE is accepted and passed directly to the JVM's array allocator, which attempts to reserve heap memory for billions of references before any data is consumed.
Attack Vector
Exploitation requires the target process to load a malicious .bin model file. Attack scenarios include applications that fetch models from third-party repositories, accept user-uploaded models, or process models retrieved from network locations without integrity verification. Any code path delegating to GenericModelReader is affected, including higher-level NLP pipelines that load models during initialization or at runtime. No authentication or user interaction beyond model loading is required. The vulnerability does not yield code execution or data disclosure; the impact is limited to availability through process termination via OutOfMemoryError.
Detection Methods for CVE-2026-42440
Indicators of Compromise
- JVM process termination with java.lang.OutOfMemoryError originating from AbstractModelReader.getOutcomes(), getOutcomePatterns(), or getPredicates() in stack traces.
- Repeated crashes of services that load .bin model files shortly after model ingestion or upload events.
- Inbound .bin model files from untrusted sources containing abnormally large 32-bit count fields near offset boundaries of the OpenNLP binary format.
Detection Strategies
- Inspect application logs for OutOfMemoryError exceptions with opennlp.tools.ml.model.AbstractModelReader frames in the stack trace.
- Monitor file uploads and external fetch operations for .bin files exceeding expected count metadata using a parser that validates header integers before deserialization.
- Track Apache OpenNLP dependency versions across the software bill of materials and flag instances of versions before 2.5.9 or 3.0.0-M3.
Monitoring Recommendations
- Alert on JVM heap exhaustion events correlated with model load operations in services using OpenNLP.
- Log the source URI, hash, and size of every model file loaded by production services to support post-incident triage.
- Monitor process restart counts on NLP workers to detect repeated DoS attempts targeting model parsing.
How to Mitigate CVE-2026-42440
Immediate Actions Required
- Upgrade Apache OpenNLP 2.x deployments to version 2.5.9 immediately.
- Upgrade Apache OpenNLP 3.x deployments to version 3.0.0-M3 immediately.
- Inventory all services and pipelines that load .bin model files and verify the OpenNLP version in use.
- Treat all .bin model files from end users, third-party repositories, or unverified sources as untrusted input.
Patch Information
The fix introduces an upper bound on each of the three count fields, checked before array allocation. Counts that are negative or exceed the bound cause an IllegalArgumentException and the read fails fast with no large allocation. The default bound is 10,000,000 entries, which exceeds legitimate model sizes but stays well below values that would threaten heap exhaustion. Deployments that legitimately require larger models can raise the limit at JVM startup using -DOPENNLP_MAX_ENTRIES=50000000 or another positive integer. Invalid or non-positive values fall back to the default. See the Apache Mailing List Thread and the Openwall OSS-Security Notice for details.
Workarounds
- Verify the provenance and integrity of every .bin model file before loading, using cryptographic hashes from a trusted source.
- Restrict model loading to files originating from internal, controlled repositories and reject end-user supplied models.
- Run NLP processing in isolated worker processes with bounded heap sizes so that a crash does not affect the parent service.
- Apply network egress controls to prevent unauthorized fetching of model files from untrusted third-party repositories.
# Configuration example: raise the entry bound only when required
java -DOPENNLP_MAX_ENTRIES=50000000 -jar your-application.jar
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


