CVE-2026-33123 Overview
pypdf is a free and open-source pure-python PDF library widely used for parsing, manipulating, and generating PDF documents. A resource exhaustion vulnerability exists in versions prior to 6.9.1 that allows an attacker to craft a malicious PDF file which leads to long runtimes and/or large memory usage when processed by the library. The vulnerability specifically manifests when accessing an array-based stream containing many entries, enabling denial of service conditions against applications using pypdf.
Critical Impact
Applications processing untrusted PDF files using vulnerable pypdf versions may experience denial of service through CPU and memory exhaustion, potentially affecting service availability and system stability.
Affected Products
- pypdf versions prior to 6.9.1
- Applications and services using pypdf for PDF processing
- Python environments with vulnerable pypdf installations
Discovery Timeline
- 2026-03-20 - CVE CVE-2026-33123 published to NVD
- 2026-03-23 - Last updated in NVD database
Technical Details for CVE-2026-33123
Vulnerability Analysis
This vulnerability is classified as CWE-400 (Uncontrolled Resource Consumption), a type of denial of service vulnerability that occurs when an application fails to properly limit resource allocation. In the context of pypdf, the library does not adequately constrain processing when handling PDF array-based streams with excessive entries.
When pypdf parses a maliciously crafted PDF file containing an array-based stream with a large number of entries, the library attempts to process all elements without implementing appropriate resource boundaries. This results in algorithmic complexity issues where processing time and memory consumption scale disproportionately with input size, enabling an attacker to trigger denial of service conditions with relatively small malicious files.
The attack requires local access, meaning an attacker must be able to supply a malicious PDF file to an application using the vulnerable pypdf library. This could occur through file upload functionality, email attachment processing, document management systems, or any workflow that processes user-supplied PDF documents.
Root Cause
The root cause of this vulnerability lies in the insufficient bounds checking and resource management when processing array-based streams within PDF documents. The pypdf library's stream parsing logic did not implement adequate safeguards to prevent excessive resource consumption when encountering arrays with abnormally large numbers of entries. Without proper limits on iteration depth or memory allocation thresholds, the library becomes susceptible to algorithmic complexity attacks.
Attack Vector
The attack vector requires local access where an attacker provides a specially crafted PDF file to an application using a vulnerable pypdf version. The malicious PDF contains an array-based stream with an exceptionally large number of entries designed to trigger resource exhaustion.
When the target application attempts to parse or process the malicious PDF, pypdf enters a processing loop that consumes excessive CPU cycles and/or memory. Depending on the application's architecture and deployment environment, this can result in:
- Application hangs or crashes
- Service unavailability for legitimate users
- System-wide resource exhaustion affecting co-hosted services
- Potential cascading failures in distributed systems
The vulnerability mechanism involves crafting PDF structures that exploit the array stream parsing functionality. For detailed technical information about the vulnerable code paths and fix implementation, refer to the GitHub Pull Request #3686 and Security Advisory GHSA-qpxp-75px-xjcp.
Detection Methods for CVE-2026-33123
Indicators of Compromise
- Unusual CPU spikes during PDF processing operations
- Excessive memory consumption by Python processes handling PDF files
- Application timeouts or hangs when processing specific PDF documents
- Error logs indicating memory exhaustion or process termination during PDF parsing
Detection Strategies
- Monitor resource utilization patterns for PDF processing workflows to identify anomalous consumption
- Implement file analysis scanning for PDFs with unusually large array structures before processing
- Deploy application-level timeout mechanisms to detect and terminate long-running PDF operations
- Audit dependency versions to identify applications using pypdf versions prior to 6.9.1
Monitoring Recommendations
- Configure alerting on Python process memory and CPU thresholds during PDF processing
- Implement request timeout monitoring for services that accept PDF uploads
- Review application logs for patterns indicating repeated resource exhaustion events
- Deploy SentinelOne agents to monitor for process behavior anomalies associated with resource exhaustion attacks
How to Mitigate CVE-2026-33123
Immediate Actions Required
- Upgrade pypdf to version 6.9.1 or later immediately across all affected systems
- Audit all applications and services that process PDF files to identify pypdf dependencies
- Implement resource limits (memory caps, CPU time limits) for PDF processing operations as a defense-in-depth measure
- Consider isolating PDF processing in sandboxed environments or containers with strict resource quotas
Patch Information
The vulnerability has been addressed in pypdf version 6.9.1. Organizations should upgrade to this version or later to remediate the vulnerability. The fix implements proper resource management and bounds checking for array-based stream processing.
Upgrade using pip:
pip install --upgrade pypdf>=6.9.1
For detailed information about the fix, see the GitHub Release 6.9.1.
Workarounds
- Implement processing timeouts at the application level to terminate long-running PDF operations
- Apply memory limits to processes handling PDF files using OS-level controls or container resource constraints
- Pre-screen uploaded PDF files using file size limits and structural analysis before processing with pypdf
- Deploy PDF processing in isolated environments to prevent resource exhaustion from affecting critical services
# Example: Running pypdf processing with resource limits using timeout and ulimit
# Set memory limit to 512MB and timeout to 30 seconds
ulimit -v 524288 && timeout 30 python process_pdf.py input.pdf
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


