CVE-2024-12720: Huggingface Transformers DoS Vulnerability

CVE-2024-12720 Overview

CVE-2024-12720 is a Regular Expression Denial of Service (ReDoS) vulnerability in the huggingface/transformers library. The flaw resides in tokenization_nougat_fast.py, specifically inside the post_process_single() function. A regular expression in this function exhibits exponential time complexity when processing specially crafted input, triggering catastrophic backtracking. Attackers can submit malicious tokenizer input to force excessive CPU consumption and cause application downtime. The issue affects version v4.46.3 and earlier releases of the library. The vulnerability is tracked under [CWE-1333] (Inefficient Regular Expression Complexity).

Critical Impact
Remote, unauthenticated attackers can exhaust CPU resources and cause denial of service in any application that feeds untrusted text through the Nougat fast tokenizer.

Affected Products

Hugging Face Transformers v4.46.3 (latest at time of disclosure)
Prior versions of huggingface/transformers containing tokenization_nougat_fast.py
Downstream applications and services that invoke the Nougat tokenizer on user-supplied input

Discovery Timeline

2025-03-20 - CVE-2024-12720 published to the National Vulnerability Database
2025-08-01 - Last updated in the NVD database

Technical Details for CVE-2024-12720

Vulnerability Analysis

The Nougat tokenizer ships with a post-processing routine that normalizes model output before returning text to the caller. Inside post_process_single(), a regular expression scans the input to clean up formatting artifacts. The pattern contains ambiguous quantifiers that overlap on the same input characters. When the engine encounters input designed to maximize alternative match paths, it explores them exhaustively before failing. This catastrophic backtracking causes processing time to grow exponentially with input length.

A single request containing a few kilobytes of crafted text can pin a CPU core for minutes. Repeated requests starve worker threads and degrade or stop service for legitimate users. The attack does not require authentication and works over the network whenever the affected tokenizer is exposed through an inference API or web service.

Root Cause

The root cause is an inefficient regular expression in tokenization_nougat_fast.py that lacks atomic grouping or possessive quantifiers. Overlapping repetition operators allow the regex engine to evaluate many redundant match permutations. Python's re module uses a backtracking NFA engine, which amplifies the cost of these ambiguous patterns. Input validation prior to the regex call does not bound the length or structure of the text being processed.

Attack Vector

The attack vector is network-based and requires no privileges or user interaction. An attacker submits crafted text to any endpoint that routes data through the Nougat fast tokenizer's post_process_single() function. This includes hosted inference APIs, document conversion pipelines, and notebooks that process untrusted user content. The vulnerability does not expose data confidentiality or integrity but produces high availability impact through CPU exhaustion. See the Huntr bounty report for additional technical context.

Detection Methods for CVE-2024-12720

Indicators of Compromise

Sustained high CPU usage on Python worker processes running transformers without a corresponding rise in throughput
Request timeouts or stalled responses from inference endpoints that invoke the Nougat tokenizer
Stack traces or profiler samples showing extended time inside re module functions called from tokenization_nougat_fast.py
Inbound requests containing repetitive or pathological character sequences targeting tokenizer endpoints

Detection Strategies

Profile tokenizer call durations and alert when post_process_single() exceeds a baseline threshold
Apply request payload size limits and reject inputs that exceed expected document lengths
Use a regex execution timeout wrapper to abort matches that run beyond a fixed time budget
Log and correlate spikes in CPU utilization with HTTP request payloads to identify abusive clients

Monitoring Recommendations

Track per-endpoint p95 and p99 latency for any service that loads huggingface/transformers
Monitor process-level CPU saturation on inference workers and trigger alerts on sustained 100% utilization
Capture the transformers package version across the fleet and flag hosts pinned to versions at or below v4.46.3
Inspect web application firewall logs for repeated submissions of long or unusual character sequences to tokenizer-backed endpoints

How to Mitigate CVE-2024-12720

Immediate Actions Required

Upgrade huggingface/transformers to a release that includes the upstream fix commit deac971c469bcbb182c2e52da0b82fb3bf54cccf
Enforce strict input length limits on any endpoint that calls the Nougat fast tokenizer
Place rate limiting and request timeouts in front of tokenizer-backed inference services
Audit internal pipelines for use of tokenization_nougat_fast and isolate them from untrusted input until patched

Patch Information

The maintainers fixed the regex in commit deac971c469bcbb182c2e52da0b82fb3bf54cccf. Review the GitHub commit for the exact regex replacement applied to post_process_single(). Upgrade to a transformers release that incorporates this commit and redeploy any container images or virtual environments that bundle the library.

Workarounds

Cap request body size at the reverse proxy or API gateway to prevent large inputs from reaching the tokenizer
Wrap calls to post_process_single() in a subprocess or thread with a hard CPU time limit
Disable the Nougat fast tokenizer path and route traffic through an alternative tokenizer if feasible
Pre-filter input to reject character patterns known to trigger backtracking before invoking the tokenizer

bash

# Upgrade transformers to a patched release
pip install --upgrade "transformers>4.46.3"

# Verify the installed version
python -c "import transformers; print(transformers.__version__)"