CVE-2026-31247: Docling JATS XML Backend DoS Vulnerability

CVE-2026-31247 Overview

CVE-2026-31247 is a denial-of-service vulnerability in the Docling document parsing library through version 2.61.0. The flaw resides in the JATS (Journal Article Tag Suite) XML backend, which calls etree.parse() without disabling external entity resolution. Attackers can submit a malicious XML document containing nested entity declarations, commonly known as an XML Bomb or Billion Laughs attack. When Docling processes the file, recursive entity expansion consumes excessive memory and CPU, crashing the parser process. The issue is categorized under CWE-400: Uncontrolled Resource Consumption and aligns with the broader class of XML External Entity (XXE) parsing weaknesses.

Critical Impact
A single crafted XML file can exhaust system memory and CPU, taking offline any service that ingests untrusted documents through Docling's JATS backend.

Affected Products

Docling document parsing library through version 2.61.0
Applications embedding Docling's JATS XML backend for document ingestion
Pipelines processing untrusted XML through Docling-based services

Discovery Timeline

2026-05-11 - CVE-2026-31247 published to NVD
2026-05-13 - Last updated in NVD database

Technical Details for CVE-2026-31247

Vulnerability Analysis

Docling is an open-source library for parsing and converting documents across multiple formats, including JATS XML used widely in scientific publishing. The JATS backend relies on the lxml.etree.parse() function to load XML input. By default, lxml resolves internal entity declarations during parsing. Docling does not configure a hardened parser that disables entity expansion, leaving the backend exposed to algorithmic complexity attacks against the XML processor.

The vulnerability does not enable code execution or data disclosure. It targets availability by forcing the parser to allocate exponential memory while resolving nested entities. Any service that accepts XML uploads and feeds them into Docling can be crashed remotely without authentication or user interaction.

Root Cause

The root cause is the absence of secure XML parser configuration in the JATS backend. The call to etree.parse() omits a hardened XMLParser instance with resolve_entities=False, no_network=True, and huge_tree=False options. Without these flags, the parser will dereference internal entity definitions and expand them recursively, which is the mechanism abused by Billion Laughs payloads.

Attack Vector

An attacker delivers a crafted JATS XML file to any endpoint that hands input to the Docling parser. The payload defines a chain of entities where each entity references the previous one multiple times. Resolving the top-level entity produces an exponential blow-up of in-memory strings, exhausting resources before parsing completes. The attack requires only network reachability to a service that calls Docling's JATS loader. For technical details of the vulnerable parsing path, see the Docling project repository.

Detection Methods for CVE-2026-31247

Indicators of Compromise

Docling worker processes terminating with out-of-memory (OOM) errors or MemoryError Python tracebacks referencing lxml.etree
Sudden spikes in CPU and resident memory on services that ingest XML through Docling
Inbound XML payloads containing repeated <!ENTITY> declarations that reference each other recursively
Application logs showing failed parses of JATS documents shortly before service restarts

Detection Strategies

Inspect XML uploads at the application gateway and flag documents containing nested ENTITY definitions or oversized internal DTDs
Monitor process resource ceilings for Docling workers and alert when memory growth exceeds normal parsing baselines
Correlate web server access logs of XML uploads with downstream worker crashes or restarts to identify abuse attempts

Monitoring Recommendations

Enable structured logging in the Docling pipeline to capture parser exceptions and document hashes for forensic review
Track per-tenant XML upload sizes and rejection rates to surface anomalous submitters
Forward host-level OOM-killer events from container orchestration logs into centralized monitoring for rapid triage

How to Mitigate CVE-2026-31247

Immediate Actions Required

Upgrade Docling to a release later than 2.61.0 that disables entity resolution in the JATS backend once the fix is available
Restrict the size of XML files accepted by any service that forwards documents to Docling
Run Docling parsing in isolated worker processes with strict memory and CPU limits to contain resource exhaustion
Reject XML documents that declare internal DTD subsets when they are not required by your workflow

Patch Information

Monitor the Docling project repository for releases addressing CVE-2026-31247. The expected fix is to construct an lxml.etree.XMLParser with resolve_entities=False, no_network=True, and huge_tree=False, then pass it to etree.parse() in the JATS backend. Additional context is available in the CVE-2026-31247 reference document.

Workarounds

Wrap calls to the JATS backend with a pre-parser that strips or rejects XML documents containing ENTITY declarations
Deploy a Web Application Firewall (WAF) rule that blocks XML payloads matching Billion Laughs signatures
Execute Docling inside a cgroup or container with hard memory caps so a malicious file fails the worker without affecting the host
Disable the JATS backend in deployments that do not require JATS XML ingestion

bash

# Example container resource limits to contain XML-bomb DoS
docker run --rm \
  --memory=512m \
  --memory-swap=512m \
  --cpus="1.0" \
  --pids-limit=128 \
  docling-worker:latest