CVE-2026-26019 Overview
CVE-2026-26019 is a Server-Side Request Forgery (SSRF) vulnerability in LangChain's @langchain/community package. The RecursiveUrlLoader class, a web crawler designed to recursively follow links from a starting URL, contains a flawed URL validation mechanism that can be exploited to access internal infrastructure and sensitive cloud metadata services.
The vulnerability stems from two distinct weaknesses: first, the preventOutside option relies on String.startsWith() for URL comparison, which fails to perform proper semantic URL validation. Second, the crawler performs no validation against private or reserved IP addresses, allowing requests to cloud metadata services, localhost, and RFC 1918 addresses.
Critical Impact
Attackers controlling content on a crawled page can redirect the crawler to access internal infrastructure, cloud metadata endpoints (such as AWS IMDSv1), or other sensitive network resources, potentially leading to credential theft or further network compromise.
Affected Products
- @langchain/community versions prior to 1.1.14
- LangChain JavaScript/TypeScript applications using RecursiveUrlLoader
Discovery Timeline
- February 11, 2026 - CVE-2026-26019 published to NVD
- February 12, 2026 - Last updated in NVD database
Technical Details for CVE-2026-26019
Vulnerability Analysis
The RecursiveUrlLoader class provides a convenient mechanism for LLM-powered applications to ingest web content recursively. The preventOutside option, enabled by default, is intended to prevent the crawler from leaving the original domain. However, the implementation used JavaScript's String.startsWith() method to compare URLs, which only performs a naive string prefix check rather than proper URL parsing and domain validation.
This means an attacker who controls content on a page being crawled can craft malicious links that share a string prefix with the legitimate target URL but actually point to attacker-controlled infrastructure. For example, if crawling https://example.com, a link to https://example.com.evil.com would pass the startsWith() check despite pointing to a completely different domain.
The second vulnerability compounds this issue: the crawler had no allowlist/blocklist mechanism for IP addresses. Crawled pages could include links to http://169.254.169.254 (AWS metadata service), http://localhost, or any RFC 1918 private IP ranges, and the crawler would fetch them without restriction.
Root Cause
The root cause is improper input validation (CWE-918: Server-Side Request Forgery) in the URL comparison logic. Using String.startsWith() for security-sensitive URL origin checking is fundamentally flawed because:
- It treats URLs as simple strings rather than structured data with distinct origin components
- It fails to normalize URLs before comparison (e.g., trailing slashes, port numbers)
- It cannot distinguish between domain boundaries (example.com vs example.com.attacker.com)
Additionally, the absence of IP address validation against private/reserved ranges allowed direct SSRF attacks targeting internal networks and cloud infrastructure.
Attack Vector
An attacker can exploit this vulnerability by placing malicious links on any page that will be crawled by a LangChain application using RecursiveUrlLoader. The attack flow is:
- Attacker identifies a LangChain application crawling a target domain (e.g., https://target.com)
- Attacker places content on a page within the crawl scope containing links like:
- https://target.com.attacker-domain.com/steal-data (prefix bypass)
- http://169.254.169.254/latest/meta-data/ (cloud metadata access)
- http://192.168.1.1/admin (internal network access)
- The crawler follows these links, believing they are within the allowed scope
- Responses from internal services are processed and potentially exposed
The security patch introduces proper SSRF hardening through new utility functions:
import { JSDOM, VirtualConsole } from "jsdom";
import { Document } from "@langchain/core/documents";
import { AsyncCaller } from "@langchain/core/utils/async_caller";
+import { isSameOrigin, validateSafeUrl } from "@langchain/core/utils/ssrf";
import {
BaseDocumentLoader,
DocumentLoader,
Source: GitHub Commit Changes
The fix adds a new SSRF utility module to the core library:
export * as utils__json_patch from "../utils/json_patch.js";
export * as utils__json_schema from "../utils/json_schema.js";
export * as utils__math from "../utils/math.js";
+export * as utils__ssrf from "../utils/ssrf.js";
export * as utils__stream from "../utils/stream.js";
export * as utils__testing from "../utils/testing/index.js";
export * as utils__tiktoken from "../utils/tiktoken.js";
Source: GitHub Commit Changes
Detection Methods for CVE-2026-26019
Indicators of Compromise
- Unexpected outbound requests from LangChain applications to cloud metadata endpoints (e.g., 169.254.169.254)
- Network connections from web crawling processes to internal RFC 1918 IP ranges
- Requests to domains with suspicious prefix patterns matching legitimate crawl targets
- Unusual data exfiltration patterns from crawler processes to external domains
Detection Strategies
- Monitor DNS queries and HTTP requests from LangChain application servers for connections to metadata service IPs
- Implement network segmentation rules that alert on crawlers accessing internal network ranges
- Review application logs for RecursiveUrlLoader activity targeting unexpected URL patterns
- Deploy web application firewalls (WAF) to detect and block SSRF attack patterns in outbound traffic
Monitoring Recommendations
- Enable detailed logging for all RecursiveUrlLoader instances including full URL paths
- Set up alerts for any outbound connections to RFC 1918 addresses or cloud metadata endpoints from application servers
- Monitor for dependencies on @langchain/community versions below 1.1.14 in package manifests
- Implement egress filtering and log all blocked outbound requests for forensic analysis
How to Mitigate CVE-2026-26019
Immediate Actions Required
- Upgrade @langchain/community to version 1.1.14 or later immediately
- Audit all LangChain applications using RecursiveUrlLoader to identify vulnerable deployments
- Implement network-level controls to block outbound requests to cloud metadata services and private IP ranges
- Review crawler configurations to ensure minimum necessary scope for URL following
Patch Information
The vulnerability is fixed in @langchain/community version 1.1.14. The patch introduces the isSameOrigin() and validateSafeUrl() utility functions in @langchain/core/utils/ssrf that perform proper URL origin validation and block requests to private/reserved IP addresses.
Patch resources:
Workarounds
- If immediate patching is not possible, implement network egress controls to block requests to 169.254.169.254, localhost, and RFC 1918 ranges
- Consider using a web proxy with URL filtering for all outbound crawler requests
- Restrict the URLs that can be crawled to a strict allowlist of known-safe domains
- Disable RecursiveUrlLoader functionality entirely until patching is complete
# Network-level mitigation example: Block metadata service access with iptables
iptables -A OUTPUT -d 169.254.169.254 -j DROP
iptables -A OUTPUT -d 10.0.0.0/8 -j DROP
iptables -A OUTPUT -d 172.16.0.0/12 -j DROP
iptables -A OUTPUT -d 192.168.0.0/16 -j DROP
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


