CVE-2026-24009 Overview
CVE-2026-24009 is an Insecure Deserialization vulnerability affecting Docling Core, a Python library that defines core data types and transformations for the Docling document processing application. The vulnerability exposes applications to remote code execution through PyYAML's unsafe deserialization mechanism when processing untrusted YAML data via the DoclingDocument.load_from_yaml() method.
This vulnerability is a downstream exposure of CVE-2020-14343, a known PyYAML deserialization flaw. Applications using docling-core versions 2.21.0 through 2.48.3 are vulnerable if they also use PyYAML versions prior to 5.4 and process untrusted YAML input.
Critical Impact
Successful exploitation allows remote attackers to execute arbitrary code on systems processing malicious YAML documents, potentially leading to complete system compromise, data exfiltration, or lateral movement within affected environments.
Affected Products
- Docling Core versions 2.21.0 through 2.48.3
- Environments running PyYAML versions prior to 5.4
- Applications invoking docling_core.types.doc.DoclingDocument.load_from_yaml() with untrusted input
Discovery Timeline
- 2026-01-22 - CVE-2026-24009 published to NVD
- 2026-01-22 - Last updated in NVD database
Technical Details for CVE-2026-24009
Vulnerability Analysis
The vulnerability stems from the use of PyYAML's yaml.FullLoader class when deserializing YAML documents in the load_from_yaml() method. The FullLoader class, while more restrictive than the deprecated Loader, still permits the instantiation of arbitrary Python objects through YAML tags. This capability can be weaponized by attackers who craft malicious YAML payloads containing Python object constructors that execute system commands or load malicious modules.
The attack requires the target application to process attacker-controlled YAML content, making document upload functionality, API endpoints accepting YAML, or file processing pipelines potential attack surfaces. The network-based attack vector combined with no authentication requirements significantly increases the risk exposure for public-facing applications.
Root Cause
The root cause is the selection of an unsafe YAML loader (yaml.FullLoader) that permits arbitrary object deserialization. PyYAML's deserialization mechanism supports YAML tags that can instantiate Python objects, including those that execute code during construction. The FullLoader class was intended to provide some safety by restricting certain dangerous operations, but it remains vulnerable to carefully crafted payloads that exploit permitted object types.
The CWE-502 (Deserialization of Untrusted Data) classification accurately captures this vulnerability class, where untrusted input is deserialized without adequate restrictions on what objects can be instantiated.
Attack Vector
Attackers can exploit this vulnerability by submitting malicious YAML content to any application endpoint that processes YAML using the vulnerable DoclingDocument.load_from_yaml() method. The attack payload typically embeds Python object constructors using YAML's !!python/object tag syntax or related constructs that trigger code execution during the deserialization phase.
The network-accessible nature of this vulnerability means that any application exposing YAML processing functionality—whether through document upload forms, REST APIs, or file import features—may be susceptible to remote exploitation without authentication.
The following code shows the security patch that addresses this vulnerability by switching from yaml.FullLoader to yaml.SafeLoader:
if isinstance(filename, str):
filename = Path(filename)
with open(filename, encoding="utf-8") as f:
- data = yaml.load(f, Loader=yaml.FullLoader)
+ data = yaml.load(f, Loader=yaml.SafeLoader)
return DoclingDocument.model_validate(data)
def export_to_dict(
Source: GitHub Commit
Detection Methods for CVE-2026-24009
Indicators of Compromise
- YAML files containing !!python/object, !!python/module, or similar Python-specific YAML tags
- Unexpected process spawning or command execution originating from Python processes handling document operations
- Network connections to external hosts initiated by Docling-based applications during YAML processing
- Error logs indicating failed deserialization attempts with Python object instantiation
Detection Strategies
- Implement input validation rules to reject YAML content containing Python object constructor tags before processing
- Monitor application logs for deserialization errors or unexpected object instantiation attempts
- Deploy web application firewall (WAF) rules to detect and block YAML payloads with embedded Python object syntax
- Use static code analysis to identify usage of yaml.FullLoader or yaml.Loader in codebases
Monitoring Recommendations
- Enable verbose logging for document processing components to capture YAML parsing events
- Implement runtime application self-protection (RASP) to detect and block object instantiation during deserialization
- Monitor system call patterns for anomalous behavior following YAML file processing operations
- Set up alerts for any process execution originating from document processing workflows
How to Mitigate CVE-2026-24009
Immediate Actions Required
- Upgrade docling-core to version 2.48.4 or later immediately
- If immediate upgrade is not possible, ensure PyYAML is updated to version 5.4 or greater as an interim mitigation
- Audit application code to identify all instances where DoclingDocument.load_from_yaml() processes external input
- Implement input validation to sanitize or reject YAML content from untrusted sources
Patch Information
The vulnerability has been patched in docling-core version 2.48.4. The fix replaces the use of yaml.FullLoader with yaml.SafeLoader in the YAML deserialization logic, ensuring that arbitrary Python objects cannot be instantiated during document loading. For detailed information, refer to the GitHub Security Advisory and the release notes for v2.48.4.
Workarounds
- Upgrade the PyYAML dependency to version 5.4 or later if docling-core cannot be immediately upgraded
- Restrict YAML document processing to trusted sources only until patches can be applied
- Implement network segmentation to limit the blast radius of potential compromise
- Deploy application-layer controls to reject YAML files containing Python object tags before they reach vulnerable code paths
# Upgrade docling-core to patched version
pip install --upgrade docling-core>=2.48.4
# Alternative: Upgrade PyYAML as interim mitigation
pip install --upgrade pyyaml>=5.4
# Verify installed versions
pip show docling-core pyyaml | grep -E "^(Name|Version):"
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


