CVE-2022-30973 Overview
CVE-2022-30973 is a Denial of Service vulnerability in Apache Tika affecting the 1.x branch. The vulnerability exists because the fix for a related issue (CVE-2022-30126) was not properly applied to the 1.28.2 release. A regular expression in the StandardsText class, utilized by the StandardsExtractingContentHandler, can cause catastrophic backtracking when processing a specially crafted file, leading to denial of service conditions.
Critical Impact
Attackers can exploit this regex vulnerability to cause application hangs or resource exhaustion through specially crafted files, disrupting document processing services that rely on Apache Tika's StandardsExtractingContentHandler.
Affected Products
- Apache Tika 1.x versions prior to 1.28.3
- Applications using StandardsExtractingContentHandler for document processing
- Systems running vulnerable Apache Tika versions with standards extraction enabled
Discovery Timeline
- May 31, 2022 - CVE-2022-30973 published to NVD
- November 21, 2024 - Last updated in NVD database
Technical Details for CVE-2022-30973
Vulnerability Analysis
This vulnerability is a classic example of Regular Expression Denial of Service (ReDoS). The StandardsText class in Apache Tika contains a regular expression pattern that exhibits exponential time complexity when processing maliciously crafted input. When the regex engine encounters certain string patterns, it enters a backtracking loop that can consume excessive CPU resources and effectively freeze the application.
The vulnerability specifically affects users who have enabled the non-default StandardsExtractingContentHandler, which is used to extract references to standards (such as ISO, ANSI, IEEE standards) from documents. While this limits the attack surface, organizations using this feature for document compliance analysis or metadata extraction remain vulnerable.
Root Cause
The root cause stems from an incomplete fix for CVE-2022-30126. The patch that addressed regex backtracking issues in the main branch was not properly backported to the 1.x release branch. This resulted in version 1.28.2 shipping without the necessary regex optimizations, leaving the vulnerable pattern in place within the StandardsText class.
Attack Vector
The attack requires local access where an attacker must convince a user to process a specially crafted file or have the ability to submit files to a document processing system. The vulnerability exploits the regex processing mechanism through carefully constructed input strings that trigger worst-case backtracking behavior.
The exploitation mechanism involves crafting input text that matches partial regex patterns in ways that force the regex engine to explore numerous permutations before failing to match. This algorithmic complexity attack doesn't require any special privileges but does require user interaction to process the malicious file.
Detection Methods for CVE-2022-30973
Indicators of Compromise
- Abnormally high CPU utilization during document processing operations
- Extended processing times for specific files that should complete quickly
- Application hangs or unresponsiveness when parsing certain document types
- Thread dumps showing regex-related processing in StandardsText class methods
Detection Strategies
- Monitor Apache Tika process CPU usage for sustained spikes during file processing
- Implement processing timeouts to detect potential ReDoS attacks on document handlers
- Review application logs for timeout errors related to StandardsExtractingContentHandler
- Inventory systems to identify Apache Tika 1.x installations prior to version 1.28.3
Monitoring Recommendations
- Set up alerting for document processing operations that exceed expected duration thresholds
- Monitor thread pool exhaustion in applications using Apache Tika for document parsing
- Track resource utilization patterns to identify potential exploitation attempts
- Implement file upload monitoring to detect patterns of malicious file submissions
How to Mitigate CVE-2022-30973
Immediate Actions Required
- Upgrade Apache Tika to version 1.28.3 or later immediately
- Disable StandardsExtractingContentHandler if not required for business operations
- Implement processing timeouts for document parsing operations
- Review and restrict file upload capabilities where Apache Tika processes user-submitted content
Patch Information
Apache has addressed this vulnerability in version 1.28.3 of Apache Tika. Users should upgrade to this version or later to receive the corrected regex pattern that prevents catastrophic backtracking. For detailed information about the fix, refer to the Apache Thread Discussion.
Additional security advisories are available from NetApp Security Advisory and Openwall OSS Security.
Workarounds
- Remove or disable StandardsExtractingContentHandler from document processing pipelines if standards extraction is not required
- Implement strict timeout controls around Apache Tika parsing operations to prevent indefinite hangs
- Add file size limits and content validation before passing documents to Tika for processing
- Consider using a process sandbox or container isolation for document parsing workloads
# Configuration example
# Verify Apache Tika version
java -jar tika-app.jar --version
# Upgrade to patched version 1.28.3 or later
# Update Maven dependency in pom.xml:
# <dependency>
# <groupId>org.apache.tika</groupId>
# <artifactId>tika-core</artifactId>
# <version>1.28.3</version>
# </dependency>
# Or for Gradle in build.gradle:
# implementation 'org.apache.tika:tika-core:1.28.3'
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


