CVE-2023-36464 Overview
CVE-2023-36464 is an Infinite Loop vulnerability affecting pypdf, an open source, pure-python PDF library. In affected versions, an attacker may craft a malicious PDF that triggers an infinite loop when the __parse_content_stream function is executed. This occurs, for example, when a user attempts to extract text from such a crafted PDF file. The vulnerability can lead to denial of service conditions by consuming system resources indefinitely.
Critical Impact
An attacker can craft a malicious PDF file that causes applications using pypdf to hang indefinitely when processing text extraction, resulting in denial of service conditions.
Affected Products
- pypdf_project pypdf
- pypdf2_project pypdf2
Discovery Timeline
- 2023-06-27 - CVE CVE-2023-36464 published to NVD
- 2024-11-21 - Last updated in NVD database
Technical Details for CVE-2023-36464
Vulnerability Analysis
This vulnerability is classified under CWE-835 (Loop with Unreachable Exit Condition), commonly referred to as an infinite loop. The flaw exists in the PDF content stream parsing functionality within pypdf. When parsing content streams, the library reads bytes looking for carriage return (\r) or newline (\n) characters. However, the parsing logic fails to properly handle the case where the stream reaches end-of-file (EOF) before encountering these expected characters.
The attack requires local access with user interaction—a victim must open or process a maliciously crafted PDF file. While the vulnerability does not compromise confidentiality or integrity, it can completely halt application availability, causing the affected process to consume CPU resources indefinitely until manually terminated.
Root Cause
The root cause lies in the parsing loop within pypdf/generic/_data_structures.py. The problematic code iterates with the condition while peek not in (b"\r", b"\n"), which continues reading until a newline or carriage return is found. The issue is that this condition never accounts for an empty byte string (b""), which is returned when the end of the file is reached. Without proper EOF handling, a malformed PDF that lacks the expected line terminators causes the parser to loop indefinitely.
Attack Vector
The attack vector is local and requires user interaction. An attacker must distribute a specially crafted PDF file and convince a user to process it with an application that uses the vulnerable pypdf library. Common scenarios include:
- Email attachments containing malicious PDFs
- PDF files uploaded to web applications that perform server-side text extraction
- Document processing pipelines that automatically parse PDF content
- Any application using pypdf's text extraction functionality on untrusted PDF files
The vulnerability is triggered when the __parse_content_stream function processes the malicious content, such as during text extraction operations.
Detection Methods for CVE-2023-36464
Indicators of Compromise
- Processes using pypdf or pypdf2 libraries consuming excessive CPU resources for extended periods
- Application hangs or freezes when opening or processing specific PDF files
- Abnormally long processing times for PDF text extraction operations
- System resource exhaustion symptoms coinciding with PDF processing activities
Detection Strategies
- Monitor application processes for unusually long-running PDF parsing operations
- Implement timeouts for PDF processing operations to detect potential infinite loop conditions
- Audit Python dependencies to identify vulnerable pypdf versions in your environment
- Use software composition analysis (SCA) tools to scan for affected library versions
Monitoring Recommendations
- Configure application-level timeouts for all PDF processing operations
- Monitor CPU utilization patterns for processes that handle PDF files
- Set up alerts for processes that exceed expected execution times during document processing
- Implement health checks for services that depend on pypdf for PDF parsing
How to Mitigate CVE-2023-36464
Immediate Actions Required
- Upgrade pypdf to a patched version that includes the fix from pull request #1828
- Audit all applications and services using pypdf or pypdf2 libraries
- Implement input validation and processing timeouts for PDF file handling
- Restrict PDF processing to trusted sources where possible
Patch Information
The vulnerability was introduced in pull request #969 and has been resolved in pull request #1828. Users should upgrade to the latest version of pypdf that includes this fix. The GitHub Security Advisory GHSA-4vvm-4w3v-6mr8 provides additional details about the affected versions and remediation.
Workarounds
- For users unable to immediately upgrade, apply a manual patch by modifying the line while peek not in (b"\r", b"\n") in pypdf/generic/_data_structures.py to while peek not in (b"\r", b"\n", b"")
- Implement processing timeouts around pypdf text extraction calls to prevent indefinite hangs
- Validate PDF files using alternative tools before processing with vulnerable pypdf versions
- Consider using sandboxed environments for processing untrusted PDF files
# Manual workaround: Modify pypdf/generic/_data_structures.py
# Change the line:
# while peek not in (b"\r", b"\n")
# To include EOF handling:
# while peek not in (b"\r", b"\n", b"")
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

