CVE-2025-3225 Overview
CVE-2025-3225 is an XML Entity Expansion vulnerability, commonly known as a "billion laughs" attack, in the sitemap parser of the run-llama/llama_index repository. The flaw affects version v0.12.21 and is resolved in v0.12.29. An attacker can supply a crafted Sitemap XML document containing nested entity references that expand exponentially during parsing. Processing this input exhausts system memory and can crash the host application. The vulnerability is classified under CWE-776 (Improper Restriction of Recursive Entity References in DTDs).
Critical Impact
Remote, unauthenticated attackers can trigger memory exhaustion and denial of service in any application embedding the vulnerable LlamaIndex sitemap reader.
Affected Products
- LlamaIndex llama_index version v0.12.21
- LlamaIndex readers package llama-index-readers-papers version 0.3.1
- Applications using the vulnerable sitemap parser or PubMed reader components
Discovery Timeline
- 2025-07-07 - CVE-2025-3225 published to the National Vulnerability Database
- 2025-07-30 - Last updated in NVD database
Technical Details for CVE-2025-3225
Vulnerability Analysis
The vulnerability lives in XML parsing logic that relies on Python's standard xml.etree module. The standard library parser resolves entity references defined in the document without restricting recursion depth or expansion size. An attacker submits a Sitemap XML file containing a small set of entities, each referencing the previous entity multiple times. Parsing the document forces the interpreter to materialize an exponentially growing string in memory.
The LlamaIndex framework is widely embedded in retrieval-augmented generation (RAG) pipelines, where sitemap and document readers ingest external content. A pipeline that accepts a URL or file path from an untrusted source becomes a remote denial of service target. The result is process termination, container restarts, or full host instability when memory limits are absent.
Root Cause
The root cause is unsafe XML parsing using xml.etree.ElementTree without entity expansion protections. The library does not enforce limits on the number of entity expansions a single document may trigger. Maintainers fixed the flaw by switching to defusedxml, a hardened XML parsing library that blocks billion laughs, quadratic blowup, and external entity attacks by default.
Attack Vector
The attack vector is network-based and requires no authentication or user interaction. An attacker hosts a malicious Sitemap XML at a URL that a LlamaIndex-powered application later fetches, or submits the file directly through an exposed ingestion endpoint. Once parsing begins, memory usage grows until the process is killed by the operating system or container orchestrator.
# Patch: switch from xml.etree to defusedxml in PubmedReader
# File: llama-index-integrations/readers/llama-index-readers-papers/
# llama_index/readers/papers/pubmed/base.py
from typing import List, Optional
from defusedxml import ElementTree as safe_xml
from llama_index.core.readers.base import BaseReader
from llama_index.core.schema import Document
class PubmedReader(BaseReader):
"""
Pubmed Reader.
Gets a search query, return a list of Documents of the top
corresponding scientific papers on Pubmed.
"""
Source: run-llama/llama_index commit 4f6ee06
The patch replaces the standard xml.etree import with defusedxml.ElementTree, which rejects documents containing recursive entity definitions before expansion occurs.
Detection Methods for CVE-2025-3225
Indicators of Compromise
- Sudden, sustained memory growth in Python processes hosting LlamaIndex workloads, followed by out-of-memory kills
- Application or container restarts correlated with sitemap ingestion or PubMed reader activity
- Inbound XML payloads containing nested <!ENTITY> declarations referencing other entities multiple times
- Logs showing xml.etree.ElementTree parse calls immediately before process termination
Detection Strategies
- Inspect application logs for stack traces originating in llama_index reader modules during XML parsing
- Use a web application firewall or proxy rule to flag XML payloads that declare more than a small threshold of internal entities
- Inventory installed Python packages and alert when llama-index is at or below v0.12.21 or llama-index-readers-papers is at 0.3.1
Monitoring Recommendations
- Track per-process memory ceilings on hosts running LlamaIndex ingestion jobs and alert on rapid growth
- Monitor container exit codes for OOMKilled events on RAG pipeline workloads
- Capture network telemetry for outbound fetches of remote Sitemap XML URLs and correlate with parser crashes
How to Mitigate CVE-2025-3225
Immediate Actions Required
- Upgrade llama-index to v0.12.29 or later across all environments running RAG pipelines
- Upgrade llama-index-readers-papers to 0.3.2 or later, which adds the defusedxml dependency
- Audit any custom readers in your codebase that parse XML and replace xml.etree with defusedxml
- Treat all sitemap URLs and XML inputs as untrusted, even when supplied by internal users
Patch Information
The fix is published in commit 4f6ee06 and shipped in llama_index version v0.12.29. The patch introduces defusedxml as a required dependency and routes XML parsing through defusedxml.ElementTree, which blocks entity expansion attacks. Additional context is available in the Huntr bounty report.
Workarounds
- Disable or remove the sitemap reader and PubMed reader components until upgrades are deployed
- Place strict memory limits and restart policies on containers running LlamaIndex workers to contain impact
- Use an upstream proxy to reject XML documents that declare internal entity references
# Configuration example: pin patched versions and verify defusedxml is installed
pip install --upgrade "llama-index>=0.12.29" "llama-index-readers-papers>=0.3.2"
pip show defusedxml | grep -E "Name|Version"
# Apply a container memory cap to limit blast radius during ingestion
docker run --memory=2g --memory-swap=2g --restart=on-failure my-rag-app
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


