CVE-2026-41481: Langchain Text-Splitters SSRF Vulnerability

CVE-2026-41481 Overview

CVE-2026-41481 is a Server-Side Request Forgery (SSRF) vulnerability in LangChain's langchain-text-splitters package. The vulnerability exists in the HTMLHeaderTextSplitter.split_text_from_url() function, which validates the initial URL using validate_safe_url() but then performs the HTTP fetch with requests.get() with redirects enabled by default. Because redirect targets are not revalidated, an attacker-controlled server could redirect requests to internal, localhost, or cloud metadata endpoints, effectively bypassing SSRF protections.

Critical Impact
Applications using LangChain's HTMLHeaderTextSplitter that expose Document contents back to requesters could leak sensitive data from internal endpoints, including cloud metadata services.

Affected Products

langchain-text-splitters versions prior to 1.1.2

Discovery Timeline

2026-04-24 - CVE CVE-2026-41481 published to NVD
2026-04-28 - Last updated in NVD database

Technical Details for CVE-2026-41481

Vulnerability Analysis

This SSRF vulnerability (CWE-918) stems from an incomplete URL validation implementation in LangChain's text splitting functionality. The HTMLHeaderTextSplitter.split_text_from_url() method is designed to fetch and parse HTML content from user-provided URLs. While the implementation includes URL validation via validate_safe_url() for the initial request, it fails to account for HTTP redirects.

When requests.get() is called without explicitly disabling redirects (allow_redirects=False), the library will automatically follow any 3xx redirect responses. An attacker can exploit this by providing a URL to a server they control, which then issues a redirect to an internal resource such as http://169.254.169.254/latest/meta-data/ (AWS metadata service), http://localhost/admin, or other internal endpoints.

The response body from these internal requests is parsed and returned as Document objects to the calling application. The severity of data exposure depends on how the application handles these Document objects—applications that return content to users could inadvertently leak sensitive internal data.

Root Cause

The root cause is the lack of redirect target validation in the URL fetching logic. The security control (validate_safe_url()) is applied only to the user-supplied URL, not to any subsequent redirect destinations. This creates a Time-of-Check Time-of-Use (TOCTOU) gap where the validated URL differs from the actually fetched resource.

Attack Vector

The attack vector is network-based and requires user interaction, as an attacker must convince a victim application to process a malicious URL. The attack flow involves:

Attacker provides a URL pointing to their controlled server (e.g., https://attacker.com/redirect)
The application validates this URL—it passes validation as it points to an external host
The application fetches the URL using requests.get() with default settings
The attacker's server responds with a 302 redirect to an internal endpoint (e.g., http://169.254.169.254/latest/meta-data/)
The HTTP library automatically follows the redirect without revalidation
Internal/cloud metadata content is fetched and returned as Document objects
If the application exposes Document contents, sensitive data is leaked

The vulnerability mechanism relies on the Python requests library's default behavior of following redirects. When requests.get(url) is called, any 3xx response triggers automatic redirect following. The fix in version 1.1.2 addresses this by either disabling redirects or validating each redirect target before following. For detailed technical implementation, see the GitHub Security Advisory.

Detection Methods for CVE-2026-41481

Indicators of Compromise

Outbound HTTP requests from application servers to cloud metadata endpoints (e.g., 169.254.169.254, metadata.google.internal)
Unusual redirect chains in application logs where external URLs redirect to internal addresses
Unexpected access to localhost or internal network addresses from web-facing applications

Detection Strategies

Monitor network traffic for requests to cloud metadata service IP addresses originating from application servers
Implement application-level logging for all URLs processed by HTMLHeaderTextSplitter.split_text_from_url()
Review application logs for redirect chains that terminate at internal endpoints
Deploy Web Application Firewall (WAF) rules to detect SSRF attack patterns

Monitoring Recommendations

Enable detailed logging for HTTP client libraries to capture redirect chains
Set up alerts for network connections to RFC 1918 addresses or link-local addresses from application containers
Monitor for attempts to access cloud provider metadata URLs from application workloads

How to Mitigate CVE-2026-41481

Immediate Actions Required

Upgrade langchain-text-splitters to version 1.1.2 or later immediately
Audit applications using HTMLHeaderTextSplitter.split_text_from_url() for potential exposure of Document contents to untrusted users
Implement network-level controls to block outbound requests to metadata service endpoints
Review application logic to ensure Document objects are not directly exposed to requesters

Patch Information

The vulnerability is fixed in langchain-text-splitters version 1.1.2. Organizations should upgrade using their Python package manager. For detailed information about the fix, refer to the LangChain Security Advisory GHSA-fv5p-p927-qmxr.

Workarounds

Disable redirect following by wrapping URL fetching with custom code that validates each redirect target
Implement network segmentation to prevent application servers from accessing internal endpoints
Use a proxy or egress filter that blocks requests to internal IP ranges and metadata endpoints
Restrict URL inputs to a known allowlist of trusted domains where possible

bash

# Upgrade langchain-text-splitters to patched version
pip install --upgrade langchain-text-splitters>=1.1.2

# Verify installed version
pip show langchain-text-splitters | grep Version