CVE-2026-41481 Overview
CVE-2026-41481 is a Server-Side Request Forgery (SSRF) vulnerability in LangChain's langchain-text-splitters package. The vulnerability exists in the HTMLHeaderTextSplitter.split_text_from_url() function, which validates the initial URL using validate_safe_url() but then performs the HTTP fetch with requests.get() with redirects enabled by default. Because redirect targets are not revalidated, an attacker-controlled server could redirect requests to internal, localhost, or cloud metadata endpoints, effectively bypassing SSRF protections.
Critical Impact
Applications using LangChain's HTMLHeaderTextSplitter that expose Document contents back to requesters could leak sensitive data from internal endpoints, including cloud metadata services.
Affected Products
- langchain-text-splitters versions prior to 1.1.2
Discovery Timeline
- 2026-04-24 - CVE CVE-2026-41481 published to NVD
- 2026-04-28 - Last updated in NVD database
Technical Details for CVE-2026-41481
Vulnerability Analysis
This SSRF vulnerability (CWE-918) stems from an incomplete URL validation implementation in LangChain's text splitting functionality. The HTMLHeaderTextSplitter.split_text_from_url() method is designed to fetch and parse HTML content from user-provided URLs. While the implementation includes URL validation via validate_safe_url() for the initial request, it fails to account for HTTP redirects.
When requests.get() is called without explicitly disabling redirects (allow_redirects=False), the library will automatically follow any 3xx redirect responses. An attacker can exploit this by providing a URL to a server they control, which then issues a redirect to an internal resource such as http://169.254.169.254/latest/meta-data/ (AWS metadata service), http://localhost/admin, or other internal endpoints.
The response body from these internal requests is parsed and returned as Document objects to the calling application. The severity of data exposure depends on how the application handles these Document objects—applications that return content to users could inadvertently leak sensitive internal data.
Root Cause
The root cause is the lack of redirect target validation in the URL fetching logic. The security control (validate_safe_url()) is applied only to the user-supplied URL, not to any subsequent redirect destinations. This creates a Time-of-Check Time-of-Use (TOCTOU) gap where the validated URL differs from the actually fetched resource.
Attack Vector
The attack vector is network-based and requires user interaction, as an attacker must convince a victim application to process a malicious URL. The attack flow involves:
- Attacker provides a URL pointing to their controlled server (e.g., https://attacker.com/redirect)
- The application validates this URL—it passes validation as it points to an external host
- The application fetches the URL using requests.get() with default settings
- The attacker's server responds with a 302 redirect to an internal endpoint (e.g., http://169.254.169.254/latest/meta-data/)
- The HTTP library automatically follows the redirect without revalidation
- Internal/cloud metadata content is fetched and returned as Document objects
- If the application exposes Document contents, sensitive data is leaked
The vulnerability mechanism relies on the Python requests library's default behavior of following redirects. When requests.get(url) is called, any 3xx response triggers automatic redirect following. The fix in version 1.1.2 addresses this by either disabling redirects or validating each redirect target before following. For detailed technical implementation, see the GitHub Security Advisory.
Detection Methods for CVE-2026-41481
Indicators of Compromise
- Outbound HTTP requests from application servers to cloud metadata endpoints (e.g., 169.254.169.254, metadata.google.internal)
- Unusual redirect chains in application logs where external URLs redirect to internal addresses
- Unexpected access to localhost or internal network addresses from web-facing applications
Detection Strategies
- Monitor network traffic for requests to cloud metadata service IP addresses originating from application servers
- Implement application-level logging for all URLs processed by HTMLHeaderTextSplitter.split_text_from_url()
- Review application logs for redirect chains that terminate at internal endpoints
- Deploy Web Application Firewall (WAF) rules to detect SSRF attack patterns
Monitoring Recommendations
- Enable detailed logging for HTTP client libraries to capture redirect chains
- Set up alerts for network connections to RFC 1918 addresses or link-local addresses from application containers
- Monitor for attempts to access cloud provider metadata URLs from application workloads
How to Mitigate CVE-2026-41481
Immediate Actions Required
- Upgrade langchain-text-splitters to version 1.1.2 or later immediately
- Audit applications using HTMLHeaderTextSplitter.split_text_from_url() for potential exposure of Document contents to untrusted users
- Implement network-level controls to block outbound requests to metadata service endpoints
- Review application logic to ensure Document objects are not directly exposed to requesters
Patch Information
The vulnerability is fixed in langchain-text-splitters version 1.1.2. Organizations should upgrade using their Python package manager. For detailed information about the fix, refer to the LangChain Security Advisory GHSA-fv5p-p927-qmxr.
Workarounds
- Disable redirect following by wrapping URL fetching with custom code that validates each redirect target
- Implement network segmentation to prevent application servers from accessing internal endpoints
- Use a proxy or egress filter that blocks requests to internal IP ranges and metadata endpoints
- Restrict URL inputs to a known allowlist of trusted domains where possible
# Upgrade langchain-text-splitters to patched version
pip install --upgrade langchain-text-splitters>=1.1.2
# Verify installed version
pip show langchain-text-splitters | grep Version
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


