CVE-2026-22691 Overview
CVE-2026-22691 is a Denial of Service vulnerability affecting pypdf, a free and open-source pure-Python PDF library. Prior to version 6.6.0, pypdf is susceptible to resource exhaustion when processing PDF files containing malformed startxref entries. An attacker can craft a malicious PDF document that causes extended runtimes when the library attempts to rebuild the cross-reference table, particularly when the file contains excessive whitespace characters. This vulnerability only affects the non-strict reading mode of pypdf.
Critical Impact
Maliciously crafted PDF files can cause prolonged processing times, leading to resource exhaustion and potential denial of service in applications using pypdf in non-strict mode.
Affected Products
- pypdf versions prior to 6.6.0
- Applications using pypdf in non-strict reading mode
- Python-based PDF processing pipelines leveraging pypdf
Discovery Timeline
- 2026-01-10 - CVE CVE-2026-22691 published to NVD
- 2026-01-13 - Last updated in NVD database
Technical Details for CVE-2026-22691
Vulnerability Analysis
This vulnerability is classified under CWE-400 (Uncontrolled Resource Consumption). The issue occurs during the cross-reference table rebuilding process in pypdf when operating in non-strict reading mode. When the library encounters a PDF file with an invalid or malformed startxref entry, it attempts to parse and reconstruct the cross-reference table. If the PDF contains large amounts of whitespace characters, this parsing operation can become computationally expensive, consuming excessive CPU time and potentially causing the application to become unresponsive.
The startxref keyword in a PDF file indicates the byte offset where the cross-reference table begins. When this value is malformed or points to an invalid location, pypdf's non-strict mode attempts to recover by searching through the document to locate valid cross-reference information. This recovery mechanism becomes problematic when the document is crafted with excessive whitespace, causing the parser to spend significant time iterating through meaningless content.
Root Cause
The root cause lies in the insufficient bounds checking and optimization during the cross-reference table rebuilding process. When pypdf operates in non-strict mode, it provides lenient parsing to handle slightly malformed PDF documents. However, this leniency creates an algorithmic complexity vulnerability where documents with large whitespace regions combined with invalid startxref entries trigger worst-case parsing scenarios. The library lacks adequate safeguards to limit processing time or detect patterns indicative of malicious documents.
Attack Vector
The attack is network-accessible, requiring no authentication or user interaction beyond the victim application processing a malicious PDF file. An attacker can exploit this vulnerability by:
- Creating a PDF file with a deliberately malformed startxref entry pointing to an invalid location
- Inserting large quantities of whitespace characters throughout the document
- Delivering this malicious PDF to an application using pypdf in non-strict reading mode
- The application attempts to parse the PDF, triggering the inefficient cross-reference rebuilding process
- The application experiences extended processing times, potentially leading to service degradation or denial
Applications that accept PDF uploads from untrusted sources, automated PDF processing pipelines, and web services that parse user-submitted PDFs are particularly at risk.
Detection Methods for CVE-2026-22691
Indicators of Compromise
- Unusually high CPU utilization during PDF processing operations
- Extended processing times for PDF file parsing tasks
- Application timeouts or unresponsiveness when handling specific PDF files
- PDF files with abnormally large file sizes relative to their content
- Log entries indicating failures or timeouts in PDF parsing functions
Detection Strategies
- Monitor pypdf version in application dependencies and flag any version below 6.6.0
- Implement processing time thresholds for PDF parsing operations and alert on exceeding limits
- Deploy file size and complexity analysis on incoming PDF files before processing
- Use static analysis tools to identify pypdf usage patterns, particularly non-strict mode configurations
- Review application logs for recurring PDF parsing failures or performance anomalies
Monitoring Recommendations
- Establish baseline metrics for PDF processing times and monitor for significant deviations
- Implement resource usage monitoring (CPU, memory) on systems running pypdf-based applications
- Configure alerting for PDF processing tasks that exceed expected duration thresholds
- Track the ratio of PDF file size to processing time as an anomaly detection metric
How to Mitigate CVE-2026-22691
Immediate Actions Required
- Upgrade pypdf to version 6.6.0 or later immediately
- Review applications for non-strict mode usage and consider switching to strict mode where feasible
- Implement processing timeouts for all PDF parsing operations as a defense-in-depth measure
- Validate and sanitize PDF files from untrusted sources before processing
- Consider implementing file size limits for uploaded PDF documents
Patch Information
The vulnerability has been addressed in pypdf version 6.6.0. The fix was implemented in commit 294165726b646bb7799be1cc787f593f2fdbcf45 and is tracked in GitHub Pull Request #3594. Users should update to the patched version by running pip install --upgrade pypdf>=6.6.0. Additional details are available in the GitHub Security Advisory GHSA-4f6g-68pf-7vhv and the release notes for version 6.6.0.
Workarounds
- Enable strict mode when parsing PDF files by setting strict=True in pypdf reader initialization
- Implement processing timeouts using Python's signal module or threading-based timeout mechanisms
- Pre-validate PDF structure using alternative tools before passing to pypdf
- Deploy input file size restrictions to limit exposure to large malicious documents
- Isolate PDF processing in separate worker processes with resource limits
# Upgrade pypdf to patched version
pip install --upgrade pypdf>=6.6.0
# Verify installed version
pip show pypdf | grep Version
# Alternative: Pin to specific patched version in requirements.txt
echo "pypdf>=6.6.0" >> requirements.txt
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

