CVE-2025-6985 Overview
The HTMLSectionSplitter class in langchain-text-splitters version 0.3.8 contains a critical XML External Entity (XXE) vulnerability resulting from unsafe XSLT parsing. This security flaw allows remote attackers to read arbitrary local files or perform outbound HTTP(S) requests without requiring authentication, special privileges, or user interaction.
The vulnerability arises because the HTMLSectionSplitter class permits the use of arbitrary XSLT stylesheets, which are parsed using lxml.etree.parse() and lxml.etree.XSLT() without any security hardening measures. In lxml versions up to 4.9.x, external entities are resolved by default, enabling attackers to exfiltrate sensitive data. Even in lxml versions 5.0 and above where entity expansion is disabled, the XSLT document() function can still read any URI unless XSLTAccessControl restrictions are applied.
Critical Impact
Remote attackers can gain read-only access to any file the LangChain process can reach, including SSH keys, environment files, source code, and cloud metadata endpoints, with no authentication required.
Affected Products
- langchain-text-splitters version 0.3.8
- Applications using HTMLSectionSplitter with custom XSLT enabled
- Deployments using lxml versions up to 4.9.x (full XXE) or 5.0+ (document() function abuse)
Discovery Timeline
- 2025-10-06 - CVE-2025-6985 published to NVD
- 2025-10-08 - Last updated in NVD database
Technical Details for CVE-2025-6985
Vulnerability Analysis
This XXE vulnerability in the HTMLSectionSplitter class stems from the unsafe handling of XSLT stylesheets during HTML document processing. The class is designed to split HTML documents into sections, but its implementation accepts arbitrary XSLT input without proper validation or security controls.
When processing XSLT stylesheets, the code utilizes lxml.etree.parse() for document parsing and lxml.etree.XSLT() for stylesheet transformation. Neither function is configured with security hardening, leaving the parser vulnerable to external entity injection attacks. This allows an attacker to craft malicious XSLT content that references external resources, leading to information disclosure.
The attack is exploitable in default deployments where custom XSLT functionality is enabled, representing a significant risk for organizations using LangChain for document processing workflows.
Root Cause
The root cause is the absence of security hardening when parsing XSLT stylesheets in the HTMLSectionSplitter class. The lxml library's default configuration resolves external entities in versions up to 4.9.x, and the document() XSLT function remains unrestricted even in newer versions unless XSLTAccessControl is explicitly configured.
The vulnerable code path processes user-supplied or externally-sourced XSLT without:
- Disabling external entity resolution
- Applying XSLTAccessControl restrictions
- Validating or sanitizing XSLT input
- Limiting accessible URIs or file paths
Attack Vector
The attack vector is network-based with low complexity. An attacker can exploit this vulnerability by supplying a malicious XSLT stylesheet to the HTMLSectionSplitter class. The stylesheet can contain external entity declarations or document() function calls that reference:
- Local file paths (e.g., /etc/passwd, ~/.ssh/id_rsa, .env files)
- Cloud metadata endpoints (e.g., http://169.254.169.254/latest/meta-data/)
- Internal network resources via HTTP(S) requests
The vulnerability mechanism involves crafting an XSLT stylesheet that either defines external entities pointing to sensitive files or uses the document() function to fetch arbitrary URIs. When the HTMLSectionSplitter processes this stylesheet, the lxml parser resolves these references and includes the content in the output, effectively exfiltrating the data to the attacker.
For technical details on the exploitation mechanism, see the Huntr Bounty Report.
Detection Methods for CVE-2025-6985
Indicators of Compromise
- Unexpected file access attempts by the LangChain process to sensitive files such as /etc/passwd, SSH keys, or environment files
- Outbound HTTP(S) connections to cloud metadata endpoints (e.g., 169.254.169.254) from the application server
- XSLT processing logs showing external entity declarations or document() function calls with unusual URIs
- Application logs indicating access to files outside normal operational scope
Detection Strategies
- Monitor file access patterns for the LangChain process and alert on access to sensitive system files
- Implement network monitoring to detect connections to cloud metadata services from application containers
- Deploy application-level logging to capture XSLT content being processed by HTMLSectionSplitter
- Use SIEM rules to correlate file access anomalies with XSLT processing events
Monitoring Recommendations
- Enable detailed audit logging for file system access on servers running LangChain applications
- Configure network egress monitoring to identify and alert on suspicious outbound requests
- Implement runtime application security monitoring to detect XXE attack patterns
- Review application logs for error messages related to external entity resolution or document loading failures
How to Mitigate CVE-2025-6985
Immediate Actions Required
- Disable custom XSLT functionality in HTMLSectionSplitter if not required for business operations
- Upgrade lxml to version 5.0 or higher to disable default external entity expansion
- Implement input validation to reject XSLT stylesheets containing external entity declarations or document() function calls
- Apply network segmentation to restrict outbound connections from LangChain application servers
Patch Information
Monitor the langchain-text-splitters project for security updates addressing this vulnerability. The Huntr Bounty Report provides additional details on the vulnerability disclosure and remediation status.
When a patch becomes available, update langchain-text-splitters to the patched version immediately. Review your deployment configuration to ensure custom XSLT processing is only enabled where necessary.
Workarounds
- Configure XSLTAccessControl in lxml to restrict access to local files and network resources when XSLT processing is required
- Implement a wrapper around HTMLSectionSplitter that sanitizes XSLT input before processing
- Use application-level firewalls to block outbound requests to cloud metadata endpoints
- Run the LangChain application with minimal file system permissions using a dedicated service account
# Configuration example: Restrict file access for LangChain service
# Create dedicated user with minimal permissions
useradd -r -s /bin/false langchain-service
# Set restrictive file permissions
chmod 700 /opt/langchain/app
chown -R langchain-service:langchain-service /opt/langchain/app
# Run application with restricted user
sudo -u langchain-service python app.py
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


