CVE-2025-3225: Llamaindex XML Entity Expansion DoS Flaw

CVE-2025-3225 Overview

CVE-2025-3225 is an XML Entity Expansion vulnerability, commonly known as a "billion laughs" attack, in the sitemap parser of the run-llama/llama_index repository. The flaw affects version v0.12.21 of the LlamaIndex framework, a widely used data framework for building LLM applications. An attacker can supply a crafted Sitemap XML document containing nested entity references that expand exponentially during parsing. Processing the malicious sitemap exhausts system memory and triggers a Denial of Service (DoS) condition, potentially crashing the host process. The issue is tracked under [CWE-776] (Improper Restriction of Recursive Entity References in DTDs) and is resolved in version v0.12.29.

Critical Impact
A remote, unauthenticated attacker can crash LlamaIndex-powered services by submitting a malicious sitemap URL, disrupting LLM ingestion pipelines and dependent applications.

Affected Products

LlamaIndex llama_index version v0.12.21
LlamaIndex sitemap reader integrations relying on the vulnerable XML parser
Downstream applications and pipelines that ingest external sitemaps via LlamaIndex

Discovery Timeline

2025-07-07 - CVE-2025-3225 published to the National Vulnerability Database
2025-07-30 - Last updated in NVD database

Technical Details for CVE-2025-3225

Vulnerability Analysis

The sitemap parser in llama_index uses Python's standard library XML parser to process remote sitemap documents. The default parser resolves Document Type Definition (DTD) entity references without restricting recursion depth or expansion size. An attacker who controls the sitemap URL or its content can submit an XML document that defines a small entity and then references it recursively many times. Each parsing layer expands the entity exponentially, consuming gigabytes of memory within seconds. The classic payload defines lol as a string, then lol2 as ten lol references, then lol3 as ten lol2 references, and so on. When the parser dereferences the top-level entity, expansion balloons to billions of characters, exhausting available RAM and terminating the process.

Root Cause

The root cause is the use of xml.etree.ElementTree without protections against malicious external entities or recursive entity expansion. The standard library parser does not enforce limits on entity expansion by default, making it unsafe for untrusted input. The fix replaces the unsafe parser with defusedxml, a hardened XML library that blocks entity expansion, external entity resolution, and other XML-based attacks.

Attack Vector

The attack is network-based and requires no authentication or user interaction. An attacker submits a malicious sitemap URL to any application that uses the affected LlamaIndex sitemap reader. The parser fetches and processes the XML, triggering memory exhaustion in the worker process. Public LLM ingestion endpoints that accept user-supplied URLs are particularly exposed.

python

# Security patch: replacing xml.etree with defusedxml
# Source: https://github.com/run-llama/llama_index/commit/4f6ee062b19212106a2632af9c9521fc7f0a3584

from typing import List, Optional

from defusedxml import ElementTree as safe_xml
from llama_index.core.readers.base import BaseReader
from llama_index.core.schema import Document


class PubmedReader(BaseReader):
    """
    Pubmed Reader.

    Gets a search query, return a list of Documents of the top corresponding
    scientific papers on Pubmed.
    """

The patch swaps the unsafe xml.etree import for defusedxml.ElementTree, which rejects DTDs and entity declarations by default. The accompanying pyproject.toml update adds defusedxml = "^0.7.1" as a required dependency and bumps the reader package from 0.3.1 to 0.3.2.

Detection Methods for CVE-2025-3225

Indicators of Compromise

Sudden, sustained memory growth in Python worker processes hosting LlamaIndex sitemap readers
Out-of-memory (OOM) kills or container restarts correlating with sitemap ingestion requests
Inbound HTTP requests referencing sitemap URLs that return XML containing nested <!ENTITY> declarations
Process crashes in services importing llama_index modules at version 0.12.21 or earlier

Detection Strategies

Inspect application logs for exceptions originating from xml.etree.ElementTree during sitemap parsing operations
Apply network-layer inspection to flag XML payloads containing recursive entity definitions such as repeated &lol; patterns
Audit installed Python dependencies for llama-index-core versions below 0.12.29 and the vulnerable reader package version 0.3.1

Monitoring Recommendations

Enable resource metrics (RSS memory, CPU) on services that ingest user-supplied URLs and alert on rapid memory spikes
Track outbound HTTP fetches initiated by sitemap readers and correlate response sizes against process memory growth
Forward application and runtime logs to a centralized analytics platform for retrospective hunting on entity-expansion patterns

How to Mitigate CVE-2025-3225

Immediate Actions Required

Upgrade llama_index to version v0.12.29 or later, which replaces unsafe XML parsing with defusedxml
Inventory all services that accept user-supplied sitemap URLs and restrict them to trusted sources until patched
Apply per-process memory limits and request timeouts to LlamaIndex worker containers to contain DoS impact

Patch Information

The fix is delivered in commit 4f6ee062b1, which migrates affected readers from xml.etree.ElementTree to defusedxml.ElementTree. The patch is included in llama_index version v0.12.29 and llama-index-readers-papers version 0.3.2. See the Huntr bug bounty report for additional disclosure context.

Workarounds

Pin Python applications to a hardened XML parser by installing defusedxml and monkey-patching xml.etree imports where upgrade is not yet possible
Validate sitemap inputs against an allowlist of trusted domains before passing them to LlamaIndex readers
Enforce request size limits and parse timeouts on any HTTP layer that fronts the LlamaIndex ingestion service

bash

# Upgrade LlamaIndex to a patched release
pip install --upgrade "llama-index-core>=0.12.29"
pip install --upgrade "llama-index-readers-papers>=0.3.2"

# Verify defusedxml is installed and pinned
pip install "defusedxml>=0.7.1"
pip show llama-index-core | grep -i version