CVE-2026-33929 Overview
CVE-2026-33929 is a Path Traversal vulnerability (CWE-22) affecting the ExtractEmbeddedFiles example in Apache PDFBox. This vulnerability exists due to an incomplete fix for a previously disclosed path traversal issue (CVE-2026-23907). The flawed remediation in versions 2.0.36 and 3.0.7 fails to properly consider file path separators, allowing attackers to craft malicious PDF files that can write to arbitrary paths beyond the intended directory.
When a user processes a maliciously crafted PDF file using the vulnerable ExtractEmbeddedFiles example code, an attacker can potentially write files to unintended locations on the filesystem. For example, a user with write permissions to /home/ABC could be victimized by a malicious PDF that attempts to write to paths like /home/ABCDEF, effectively bypassing the intended directory restriction.
Critical Impact
Attackers can exploit this path traversal vulnerability to write malicious files to arbitrary filesystem locations, potentially enabling code execution or data compromise in affected environments.
Affected Products
- Apache PDFBox ExtractEmbeddedFiles example versions 2.0.24 through 2.0.36
- Apache PDFBox ExtractEmbeddedFiles example versions 3.0.0 through 3.0.7
- Applications that have copied the vulnerable ExtractEmbeddedFiles example code into production
Discovery Timeline
- April 14, 2026 - CVE-2026-33929 published to NVD
- April 14, 2026 - Last updated in NVD database
Technical Details for CVE-2026-33929
Vulnerability Analysis
This path traversal vulnerability exists within the ExtractEmbeddedFiles example code of Apache PDFBox. The vulnerability is a bypass of an earlier security fix implemented for CVE-2026-23907. The original patch attempted to restrict file extraction to a designated directory, but the implementation contains a critical flaw in how it validates file paths.
The issue stems from the code not properly considering the file path separator character when validating the target extraction path. This incomplete validation allows an attacker to construct a filename that appears to be within the allowed directory but actually resolves to a different path on the filesystem.
For instance, if the extraction code permits writing to /home/ABC, an attacker can craft a PDF with embedded files that target /home/ABCDEF or other paths that share the same prefix but are not within the intended directory. The path validation check incorrectly passes because it only verifies that the path starts with the allowed prefix, without ensuring the path actually descends into the allowed directory.
Root Cause
The root cause is improper input validation in the file path sanitization logic. When extracting embedded files from a PDF document, the code validates that the destination path begins with an allowed directory prefix. However, it fails to verify that the path includes the proper directory separator after the prefix, enabling directory traversal through prefix manipulation. This is a classic case of incomplete path canonicalization where the security check does not account for all possible path construction techniques.
Attack Vector
The attack is network-accessible and requires a user with write permissions on a target directory to process a maliciously crafted PDF file. The attacker creates a PDF document containing embedded files with specially crafted filenames designed to exploit the flawed path validation. When the victim uses the ExtractEmbeddedFiles utility to process this PDF, the embedded files are written to unintended locations on the filesystem.
The exploitation scenario involves:
- An attacker crafting a PDF with embedded files containing path traversal sequences
- The victim downloading and processing the malicious PDF using the vulnerable example code
- Files being extracted to directories outside the intended extraction location
- Potential for overwriting critical files or placing malicious executables in sensitive locations
The attack requires low privileges (the victim must have write access to the target path) and no user interaction beyond processing the PDF file.
Detection Methods for CVE-2026-33929
Indicators of Compromise
- Unexpected file writes to directories outside designated extraction paths
- PDF files with embedded filenames containing unusual path patterns or long prefixes
- File creation events in directories sharing common prefixes with legitimate extraction directories
- Presence of unexpected files in user home directories or application directories
Detection Strategies
- Monitor file system activity during PDF processing operations for writes to unexpected locations
- Implement file integrity monitoring on sensitive directories that share prefixes with PDF extraction targets
- Review application logs for extraction operations where destination paths do not include proper directory separators
- Deploy endpoint detection rules to alert on suspicious file writes following PDF document processing
Monitoring Recommendations
- Enable audit logging for file creation events in directories adjacent to PDF extraction targets
- Monitor for PDF files being processed that contain unusually long embedded filenames
- Set up alerts for applications using Apache PDFBox versions in the affected range
- Review code repositories for copies of the vulnerable ExtractEmbeddedFiles example code
How to Mitigate CVE-2026-33929
Immediate Actions Required
- Update Apache PDFBox to version 2.0.37 or 3.0.8 when available
- Apply the fix provided in GitHub PR 427 immediately
- Audit production code for any copies of the ExtractEmbeddedFiles example and apply the patch
- Restrict permissions on directories that could be targeted through prefix manipulation
Patch Information
Apache has acknowledged this vulnerability and recommends updating to version 2.0.37 or 3.0.8 once available. In the interim, users should apply the fix provided in the GitHub PDFBox Pull Request. Organizations that have incorporated the ExtractEmbeddedFiles example code into their own applications must manually apply the security fix to their codebase.
Additional details are available in the Apache Mailing List Discussion and the Apache Mailing List Thread.
Workarounds
- Implement additional path validation that ensures the canonical path of extracted files is strictly within the intended directory
- Use getCanonicalPath() to resolve the actual file path and verify it starts with the allowed directory including the trailing separator
- Process PDF files in an isolated environment or sandbox with restricted filesystem permissions
- Avoid processing untrusted PDF documents until the patch is applied
- Configure file system permissions to prevent writes to directories that share prefixes with extraction targets
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

