CVE-2026-31247 Overview
CVE-2026-31247 is a denial-of-service vulnerability in the Docling document parsing library through version 2.61.0. The flaw resides in the JATS (Journal Article Tag Suite) XML backend, which calls etree.parse() without disabling external entity resolution. Attackers can submit a malicious XML document containing nested entity declarations, commonly known as an XML Bomb or Billion Laughs attack. When Docling processes the file, recursive entity expansion consumes excessive memory and CPU, crashing the parser process. The issue is categorized under CWE-400: Uncontrolled Resource Consumption and aligns with the broader class of XML External Entity (XXE) parsing weaknesses.
Critical Impact
A single crafted XML file can exhaust system memory and CPU, taking offline any service that ingests untrusted documents through Docling's JATS backend.
Affected Products
- Docling document parsing library through version 2.61.0
- Applications embedding Docling's JATS XML backend for document ingestion
- Pipelines processing untrusted XML through Docling-based services
Discovery Timeline
- 2026-05-11 - CVE-2026-31247 published to NVD
- 2026-05-13 - Last updated in NVD database
Technical Details for CVE-2026-31247
Vulnerability Analysis
Docling is an open-source library for parsing and converting documents across multiple formats, including JATS XML used widely in scientific publishing. The JATS backend relies on the lxml.etree.parse() function to load XML input. By default, lxml resolves internal entity declarations during parsing. Docling does not configure a hardened parser that disables entity expansion, leaving the backend exposed to algorithmic complexity attacks against the XML processor.
The vulnerability does not enable code execution or data disclosure. It targets availability by forcing the parser to allocate exponential memory while resolving nested entities. Any service that accepts XML uploads and feeds them into Docling can be crashed remotely without authentication or user interaction.
Root Cause
The root cause is the absence of secure XML parser configuration in the JATS backend. The call to etree.parse() omits a hardened XMLParser instance with resolve_entities=False, no_network=True, and huge_tree=False options. Without these flags, the parser will dereference internal entity definitions and expand them recursively, which is the mechanism abused by Billion Laughs payloads.
Attack Vector
An attacker delivers a crafted JATS XML file to any endpoint that hands input to the Docling parser. The payload defines a chain of entities where each entity references the previous one multiple times. Resolving the top-level entity produces an exponential blow-up of in-memory strings, exhausting resources before parsing completes. The attack requires only network reachability to a service that calls Docling's JATS loader. For technical details of the vulnerable parsing path, see the Docling project repository.
Detection Methods for CVE-2026-31247
Indicators of Compromise
- Docling worker processes terminating with out-of-memory (OOM) errors or MemoryError Python tracebacks referencing lxml.etree
- Sudden spikes in CPU and resident memory on services that ingest XML through Docling
- Inbound XML payloads containing repeated <!ENTITY> declarations that reference each other recursively
- Application logs showing failed parses of JATS documents shortly before service restarts
Detection Strategies
- Inspect XML uploads at the application gateway and flag documents containing nested ENTITY definitions or oversized internal DTDs
- Monitor process resource ceilings for Docling workers and alert when memory growth exceeds normal parsing baselines
- Correlate web server access logs of XML uploads with downstream worker crashes or restarts to identify abuse attempts
Monitoring Recommendations
- Enable structured logging in the Docling pipeline to capture parser exceptions and document hashes for forensic review
- Track per-tenant XML upload sizes and rejection rates to surface anomalous submitters
- Forward host-level OOM-killer events from container orchestration logs into centralized monitoring for rapid triage
How to Mitigate CVE-2026-31247
Immediate Actions Required
- Upgrade Docling to a release later than 2.61.0 that disables entity resolution in the JATS backend once the fix is available
- Restrict the size of XML files accepted by any service that forwards documents to Docling
- Run Docling parsing in isolated worker processes with strict memory and CPU limits to contain resource exhaustion
- Reject XML documents that declare internal DTD subsets when they are not required by your workflow
Patch Information
Monitor the Docling project repository for releases addressing CVE-2026-31247. The expected fix is to construct an lxml.etree.XMLParser with resolve_entities=False, no_network=True, and huge_tree=False, then pass it to etree.parse() in the JATS backend. Additional context is available in the CVE-2026-31247 reference document.
Workarounds
- Wrap calls to the JATS backend with a pre-parser that strips or rejects XML documents containing ENTITY declarations
- Deploy a Web Application Firewall (WAF) rule that blocks XML payloads matching Billion Laughs signatures
- Execute Docling inside a cgroup or container with hard memory caps so a malicious file fails the worker without affecting the host
- Disable the JATS backend in deployments that do not require JATS XML ingestion
# Example container resource limits to contain XML-bomb DoS
docker run --rm \
--memory=512m \
--memory-swap=512m \
--cpus="1.0" \
--pids-limit=128 \
docling-worker:latest
: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


