Skip to main content
CVE Vulnerability Database
Vulnerability Database/CVE-2025-66516

CVE-2025-66516: Apache Tika XXE Vulnerability

CVE-2025-66516 is an XXE vulnerability in Apache Tika affecting tika-core, tika-pdf-module, and tika-parsers through crafted XFA files in PDFs. This article covers technical details, affected versions, and patches.

Published:

CVE-2025-66516 Overview

CVE-2025-66516 is a critical XML External Entity (XXE) injection vulnerability affecting Apache Tika across multiple modules including tika-core (versions 1.13-3.2.1), tika-pdf-module (versions 2.0.0-3.2.1), and tika-parsers (versions 1.13-1.28.5). This vulnerability allows remote attackers to perform XXE injection attacks through specially crafted XFA (XML Forms Architecture) files embedded within PDF documents.

This CVE expands upon the previously reported CVE-2025-54988 by clarifying the broader scope of affected packages. Notably, the actual vulnerability and its fix reside in tika-core, meaning organizations that only upgraded tika-parser-pdf-module without upgrading tika-core to version 3.2.2 or later remain vulnerable.

Critical Impact

This XXE vulnerability enables attackers to exfiltrate sensitive data, perform server-side request forgery (SSRF), and potentially achieve remote code execution on systems processing malicious PDF files through Apache Tika.

Affected Products

  • Apache Tika tika-core versions 1.13 through 3.2.1
  • Apache Tika tika-pdf-module versions 2.0.0 through 3.2.1
  • Apache Tika tika-parsers versions 1.13 through 1.28.5

Discovery Timeline

  • 2025-12-04 - CVE CVE-2025-66516 published to NVD
  • 2025-12-30 - Last updated in NVD database

Technical Details for CVE-2025-66516

Vulnerability Analysis

The vulnerability exists in how Apache Tika processes PDF files containing XFA (XML Forms Architecture) content. When parsing these PDF files, Tika's XML processing components fail to properly restrict the resolution of external entities within the XFA data. This allows attackers to craft malicious PDF documents that, when processed by Tika, can trigger XXE attacks.

The issue is particularly concerning because it affects the core parsing functionality in tika-core, not just the PDF-specific modules. This architectural detail means that even applications that updated their PDF parsing components but retained older versions of tika-core remain vulnerable to exploitation.

Root Cause

The root cause is CWE-611: Improper Restriction of XML External Entity Reference. The XML parser used within tika-core for processing XFA content within PDF files does not properly disable external entity resolution. When processing XFA data from a PDF, the parser will resolve external entities, allowing attackers to reference arbitrary external resources including local files and remote URLs.

Attack Vector

The attack is network-accessible and requires no authentication or user interaction. An attacker can exploit this vulnerability by:

  1. Creating a malicious PDF file containing crafted XFA content with external entity declarations
  2. Submitting this PDF to any application or service that uses vulnerable versions of Apache Tika for document processing
  3. When Tika parses the PDF, the XXE payload executes, potentially allowing data exfiltration, SSRF attacks, or denial of service

The vulnerability is particularly dangerous in document processing pipelines, search indexing systems, content management platforms, and any web application that accepts user-uploaded PDF files and processes them with Apache Tika.

Detection Methods for CVE-2025-66516

Indicators of Compromise

  • Unusual outbound network connections from Tika processing servers to external hosts
  • Error logs showing XML parsing failures with references to external DTDs or entities
  • Unexpected file access attempts on systems running Tika-based document processing
  • Anomalous DNS queries originating from document processing infrastructure

Detection Strategies

  • Monitor for PDF files containing suspicious XFA content with external entity declarations
  • Implement network-level detection for outbound connections from document processing systems to unexpected destinations
  • Review application logs for XML parsing errors related to external entity resolution
  • Deploy file inspection rules to identify PDFs with embedded XFA payloads containing DOCTYPE declarations

Monitoring Recommendations

  • Enable verbose logging on Apache Tika instances to capture XML parsing events
  • Monitor egress traffic from systems running document processing workloads
  • Implement file integrity monitoring on sensitive directories accessible by Tika processes
  • Set up alerting for unusual resource access patterns during document parsing operations

How to Mitigate CVE-2025-66516

Immediate Actions Required

  • Upgrade tika-core to version 3.2.2 or later immediately
  • Verify that all Tika modules (tika-core, tika-pdf-module, tika-parsers) are updated consistently
  • Audit any applications or services that process user-uploaded PDF files using Apache Tika
  • Implement network segmentation to limit outbound connectivity from document processing systems

Patch Information

The vulnerability is fixed in Apache Tika tika-core version 3.2.2 and later. Organizations should review the Apache Mailing List Discussion for detailed patch information and upgrade guidance. It is critical to upgrade tika-core specifically, as upgrading only the PDF module without updating core will not remediate the vulnerability. For additional context, the CVE-2025-54988 Record documents the initial disclosure of this vulnerability.

Workarounds

  • Configure XML parsers to disable external entity resolution at the application level if immediate patching is not possible
  • Implement input validation to reject PDF files containing XFA content until patches can be applied
  • Deploy web application firewalls (WAF) with rules to detect XXE payloads in uploaded files
  • Isolate document processing services in sandboxed environments with restricted network access

Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

Default Legacy - Prefooter | Experience the World’s Most Advanced Cybersecurity Platform

Experience the Most Advanced Cybersecurity Platform

See how the world’s most intelligent, autonomous cybersecurity platform can protect your organization today and into the future.