CVE-2026-33929: Apache PDFBox Path Traversal Vulnerability

CVE-2026-33929 Overview

CVE-2026-33929 is a Path Traversal vulnerability (CWE-22) affecting the ExtractEmbeddedFiles example in Apache PDFBox. This vulnerability exists due to an incomplete fix for a previously disclosed path traversal issue (CVE-2026-23907). The flawed remediation in versions 2.0.36 and 3.0.7 fails to properly consider file path separators, allowing attackers to craft malicious PDF files that can write to arbitrary paths beyond the intended directory.

When a user processes a maliciously crafted PDF file using the vulnerable ExtractEmbeddedFiles example code, an attacker can potentially write files to unintended locations on the filesystem. For example, a user with write permissions to /home/ABC could be victimized by a malicious PDF that attempts to write to paths like /home/ABCDEF, effectively bypassing the intended directory restriction.

Critical Impact
Attackers can exploit this path traversal vulnerability to write malicious files to arbitrary filesystem locations, potentially enabling code execution or data compromise in affected environments.

Affected Products

Apache PDFBox ExtractEmbeddedFiles example versions 2.0.24 through 2.0.36
Apache PDFBox ExtractEmbeddedFiles example versions 3.0.0 through 3.0.7
Applications that have copied the vulnerable ExtractEmbeddedFiles example code into production

Discovery Timeline

April 14, 2026 - CVE-2026-33929 published to NVD
April 14, 2026 - Last updated in NVD database

Technical Details for CVE-2026-33929

Vulnerability Analysis

This path traversal vulnerability exists within the ExtractEmbeddedFiles example code of Apache PDFBox. The vulnerability is a bypass of an earlier security fix implemented for CVE-2026-23907. The original patch attempted to restrict file extraction to a designated directory, but the implementation contains a critical flaw in how it validates file paths.

The issue stems from the code not properly considering the file path separator character when validating the target extraction path. This incomplete validation allows an attacker to construct a filename that appears to be within the allowed directory but actually resolves to a different path on the filesystem.

For instance, if the extraction code permits writing to /home/ABC, an attacker can craft a PDF with embedded files that target /home/ABCDEF or other paths that share the same prefix but are not within the intended directory. The path validation check incorrectly passes because it only verifies that the path starts with the allowed prefix, without ensuring the path actually descends into the allowed directory.

Root Cause

The root cause is improper input validation in the file path sanitization logic. When extracting embedded files from a PDF document, the code validates that the destination path begins with an allowed directory prefix. However, it fails to verify that the path includes the proper directory separator after the prefix, enabling directory traversal through prefix manipulation. This is a classic case of incomplete path canonicalization where the security check does not account for all possible path construction techniques.

Attack Vector

The attack is network-accessible and requires a user with write permissions on a target directory to process a maliciously crafted PDF file. The attacker creates a PDF document containing embedded files with specially crafted filenames designed to exploit the flawed path validation. When the victim uses the ExtractEmbeddedFiles utility to process this PDF, the embedded files are written to unintended locations on the filesystem.

The exploitation scenario involves:

An attacker crafting a PDF with embedded files containing path traversal sequences
The victim downloading and processing the malicious PDF using the vulnerable example code
Files being extracted to directories outside the intended extraction location
Potential for overwriting critical files or placing malicious executables in sensitive locations

The attack requires low privileges (the victim must have write access to the target path) and no user interaction beyond processing the PDF file.

Detection Methods for CVE-2026-33929

Indicators of Compromise

Unexpected file writes to directories outside designated extraction paths
PDF files with embedded filenames containing unusual path patterns or long prefixes
File creation events in directories sharing common prefixes with legitimate extraction directories
Presence of unexpected files in user home directories or application directories

Detection Strategies

Monitor file system activity during PDF processing operations for writes to unexpected locations
Implement file integrity monitoring on sensitive directories that share prefixes with PDF extraction targets
Review application logs for extraction operations where destination paths do not include proper directory separators
Deploy endpoint detection rules to alert on suspicious file writes following PDF document processing

Monitoring Recommendations

Enable audit logging for file creation events in directories adjacent to PDF extraction targets
Monitor for PDF files being processed that contain unusually long embedded filenames
Set up alerts for applications using Apache PDFBox versions in the affected range
Review code repositories for copies of the vulnerable ExtractEmbeddedFiles example code

How to Mitigate CVE-2026-33929

Immediate Actions Required

Update Apache PDFBox to version 2.0.37 or 3.0.8 when available
Apply the fix provided in GitHub PR 427 immediately
Audit production code for any copies of the ExtractEmbeddedFiles example and apply the patch
Restrict permissions on directories that could be targeted through prefix manipulation

Patch Information

Apache has acknowledged this vulnerability and recommends updating to version 2.0.37 or 3.0.8 once available. In the interim, users should apply the fix provided in the GitHub PDFBox Pull Request. Organizations that have incorporated the ExtractEmbeddedFiles example code into their own applications must manually apply the security fix to their codebase.

Additional details are available in the Apache Mailing List Discussion and the Apache Mailing List Thread.

Workarounds

Implement additional path validation that ensures the canonical path of extracted files is strictly within the intended directory
Use getCanonicalPath() to resolve the actual file path and verify it starts with the allowed directory including the trailing separator
Process PDF files in an isolated environment or sandbox with restricted filesystem permissions
Avoid processing untrusted PDF documents until the patch is applied
Configure file system permissions to prevent writes to directories that share prefixes with extraction targets

CVE-2026-33929 Overview

Critical Impact
Attackers can exploit this path traversal vulnerability to write malicious files to arbitrary filesystem locations, potentially enabling code execution or data compromise in affected environments.

Affected Products

Apache PDFBox ExtractEmbeddedFiles example versions 2.0.24 through 2.0.36
Apache PDFBox ExtractEmbeddedFiles example versions 3.0.0 through 3.0.7
Applications that have copied the vulnerable ExtractEmbeddedFiles example code into production

Discovery Timeline

April 14, 2026 - CVE-2026-33929 published to NVD
April 14, 2026 - Last updated in NVD database

Technical Details for CVE-2026-33929

Vulnerability Analysis

Root Cause

Attack Vector

The exploitation scenario involves:

An attacker crafting a PDF with embedded files containing path traversal sequences
The victim downloading and processing the malicious PDF using the vulnerable example code
Files being extracted to directories outside the intended extraction location
Potential for overwriting critical files or placing malicious executables in sensitive locations

The attack requires low privileges (the victim must have write access to the target path) and no user interaction beyond processing the PDF file.

Detection Methods for CVE-2026-33929

Indicators of Compromise

Unexpected file writes to directories outside designated extraction paths
PDF files with embedded filenames containing unusual path patterns or long prefixes
File creation events in directories sharing common prefixes with legitimate extraction directories
Presence of unexpected files in user home directories or application directories

Detection Strategies

Monitor file system activity during PDF processing operations for writes to unexpected locations
Implement file integrity monitoring on sensitive directories that share prefixes with PDF extraction targets
Review application logs for extraction operations where destination paths do not include proper directory separators
Deploy endpoint detection rules to alert on suspicious file writes following PDF document processing

Monitoring Recommendations

Enable audit logging for file creation events in directories adjacent to PDF extraction targets
Monitor for PDF files being processed that contain unusually long embedded filenames
Set up alerts for applications using Apache PDFBox versions in the affected range
Review code repositories for copies of the vulnerable ExtractEmbeddedFiles example code

How to Mitigate CVE-2026-33929

Immediate Actions Required

Update Apache PDFBox to version 2.0.37 or 3.0.8 when available
Apply the fix provided in GitHub PR 427 immediately
Audit production code for any copies of the ExtractEmbeddedFiles example and apply the patch
Restrict permissions on directories that could be targeted through prefix manipulation

Patch Information

Additional details are available in the Apache Mailing List Discussion and the Apache Mailing List Thread.

Workarounds

Implement additional path validation that ensures the canonical path of extracted files is strictly within the intended directory
Use getCanonicalPath() to resolve the actual file path and verify it starts with the allowed directory including the trailing separator
Process PDF files in an isolated environment or sandbox with restricted filesystem permissions
Avoid processing untrusted PDF documents until the patch is applied
Configure file system permissions to prevent writes to directories that share prefixes with extraction targets

CVE-2026-33929: Apache PDFBox Path Traversal Vulnerability

CVE-2026-33929 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2026-33929

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2026-33929

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2026-33929

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform

CVE-2026-33929: Apache PDFBox Path Traversal Vulnerability

CVE-2026-33929 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2026-33929

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2026-33929

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2026-33929

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform