Skip to main content
CVE Vulnerability Database
Vulnerability Database/CVE-2025-54988

CVE-2025-54988: Apache Tika XXE Vulnerability

CVE-2025-54988 is an XXE vulnerability in Apache Tika that allows attackers to inject XML External Entities via crafted XFA files in PDFs, potentially exposing sensitive data. This article covers technical details, affected versions, impact, and mitigation strategies.

Published:

CVE-2025-54988 Overview

CVE-2025-54988 is an XML External Entity (XXE) injection vulnerability affecting Apache Tika versions 1.13 through 3.2.1. The vulnerability exists within the tika-parser-pdf-module component, which is responsible for parsing PDF documents. An attacker can exploit this flaw by crafting a malicious XFA (XML Forms Architecture) file embedded within a PDF document. When Apache Tika processes this crafted PDF, the underlying XML parser processes external entity references, potentially allowing the attacker to read sensitive data from the server or trigger malicious requests to internal resources or third-party servers.

The tika-parser-pdf-module is used as a dependency across several Apache Tika packages, significantly expanding the attack surface. Affected packages include tika-parsers-standard-modules, tika-parsers-standard-package, tika-app, tika-grpc, and tika-server-standard.

Critical Impact

This XXE vulnerability allows attackers to exfiltrate sensitive data, perform server-side request forgery (SSRF) attacks against internal infrastructure, and potentially cause denial of service through resource exhaustion.

Affected Products

  • Apache Tika versions 1.13 through 3.2.1
  • tika-parser-pdf-module (all affected versions)
  • tika-parsers-standard-modules, tika-parsers-standard-package, tika-app, tika-grpc, tika-server-standard (dependent packages)

Discovery Timeline

  • 2025-08-20 - CVE-2025-54988 published to NVD
  • 2025-11-04 - Last updated in NVD database

Technical Details for CVE-2025-54988

Vulnerability Analysis

This vulnerability is classified under CWE-611 (Improper Restriction of XML External Entity Reference). The root issue lies in the PDF parsing functionality within Apache Tika, specifically when handling XFA forms embedded in PDF documents. XFA (XML Forms Architecture) is an XML-based specification that allows forms to be embedded within PDF files. When Apache Tika encounters a PDF containing XFA content, the XML parser processes the embedded XML data without properly restricting external entity resolution.

The exploitation mechanism requires local access to supply a crafted PDF file to the Apache Tika parser. Once the malicious PDF is processed, the XXE payload within the XFA section is executed by the XML parser. This can result in the disclosure of local file contents, internal network reconnaissance through SSRF, or resource exhaustion attacks.

Root Cause

The vulnerability stems from insecure XML parser configuration within the tika-parser-pdf-module. When parsing XFA content from PDF documents, the XML processor does not properly disable external entity processing. This allows attackers to define malicious external entities within the XFA XML that reference local files, internal URLs, or external servers. The parser resolves these entity references during document processing, leading to information disclosure or SSRF conditions.

Attack Vector

The attack requires an attacker to craft a malicious PDF document containing an XFA form with embedded XXE payloads. The attacker then needs to deliver this PDF to a system running a vulnerable version of Apache Tika for processing. This could occur through various scenarios:

  1. Uploading a malicious PDF to a web application that uses Apache Tika for document indexing or content extraction
  2. Sending a malicious PDF via email to systems with automated document processing
  3. Placing a malicious PDF in a directory monitored by Apache Tika for batch processing

The XXE payload within the XFA section can be configured to read sensitive files such as /etc/passwd, configuration files containing credentials, or to probe internal network services through SSRF requests. The attacker receives the extracted data through out-of-band channels or by observing error messages if the application exposes parsing errors.

Detection Methods for CVE-2025-54988

Indicators of Compromise

  • Unusual outbound network connections from systems running Apache Tika to external servers
  • Log entries showing attempts to access sensitive local files during PDF processing
  • PDF files containing suspicious XFA sections with external entity declarations
  • Error messages indicating XML parsing failures related to external entity resolution

Detection Strategies

  • Monitor Apache Tika processing logs for XML parsing errors or external entity warnings
  • Implement network monitoring to detect unexpected outbound connections from document processing services
  • Deploy file integrity monitoring on sensitive configuration files that could be targeted by XXE attacks
  • Scan uploaded PDF files for suspicious XFA content before processing with Apache Tika

Monitoring Recommendations

  • Configure SIEM rules to alert on suspicious file access patterns from Apache Tika processes
  • Implement egress filtering to restrict outbound connections from document processing servers
  • Enable verbose logging on Apache Tika instances to capture detailed parsing events
  • Monitor for DNS queries to unusual domains originating from document processing infrastructure

How to Mitigate CVE-2025-54988

Immediate Actions Required

  • Upgrade Apache Tika to version 3.2.2 or later immediately
  • Review all applications and services that depend on Apache Tika for PDF processing
  • Audit systems for signs of exploitation by reviewing access logs and network traffic
  • Restrict network access for systems running vulnerable Apache Tika versions until patched

Patch Information

Apache has released version 3.2.2 to address this vulnerability. Users should upgrade to this version or later to remediate the XXE vulnerability in the tika-parser-pdf-module. For detailed information, refer to the Apache Mailing List Discussion. Additional security advisories are available from OpenWall OSS Security and Debian has released guidance in their Debian LTS Announcement.

Workarounds

  • Configure XML parsers to disable external entity processing if application code allows
  • Implement input validation to reject PDF files containing XFA forms from untrusted sources
  • Deploy network segmentation to isolate document processing systems from sensitive internal resources
  • Use a web application firewall to inspect and block suspicious PDF uploads
bash
# Example Maven dependency update for Apache Tika
# Update pom.xml to use the fixed version
# <dependency>
#     <groupId>org.apache.tika</groupId>
#     <artifactId>tika-parser-pdf-module</artifactId>
#     <version>3.2.2</version>
# </dependency>

Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

Default Legacy - Prefooter | Experience the World’s Most Advanced Cybersecurity Platform

Experience the Most Advanced Cybersecurity Platform

See how the world’s most intelligent, autonomous cybersecurity platform can protect your organization today and into the future.