CVE-2026-3276: Python unicodedata.normalize() DOS Flaw

CVE-2026-3276 Overview

CVE-2026-3276 is an algorithmic complexity vulnerability [CWE-407] in the Python unicodedata module. The unicodedata.normalize() function consumes excessive CPU time when processing specially crafted Unicode input. Strings containing long runs of combining characters with alternating Canonical Combining Class (CCC) values trigger the issue. All normalization forms (NFC, NFD, NFKC, NFKD) are affected.

Applications that normalize untrusted Unicode input expose themselves to denial-of-service conditions. Web services, identifier validators, and text processing pipelines built on CPython are common attack surfaces.

Critical Impact
Remote attackers can submit small crafted Unicode payloads to exhaust CPU resources and degrade availability of any application that calls unicodedata.normalize() on untrusted input.

Affected Products

CPython unicodedata module
Applications relying on unicodedata.normalize() for processing untrusted input
Downstream Python distributions and libraries that perform Unicode normalization

Discovery Timeline

2026-06-03 - CVE-2026-3276 published to NVD
2026-06-03 - Last updated in NVD database

Technical Details for CVE-2026-3276

Vulnerability Analysis

The vulnerability resides in the Unicode normalization routine of CPython's unicodedata module. Normalization reorders combining marks based on their Canonical Combining Class. When a sequence contains many combining characters with alternating CCC values, the canonical ordering algorithm performs significantly more work than for typical input.

The processing cost grows non-linearly relative to the size of the crafted input. A relatively small payload can stall a worker process for seconds or longer. The flaw affects every normalization form because all four forms (NFC, NFD, NFKC, NFKD) share the canonical ordering step.

The attack does not require authentication or user interaction. Any code path that calls unicodedata.normalize() on attacker-controlled strings is reachable over the network.

Root Cause

The canonical ordering algorithm sorts combining marks by CCC value. Alternating CCC sequences defeat the algorithm's expected execution profile and produce worst-case behavior. The implementation lacks input length limits or complexity guards on adversarial CCC patterns. CWE-407 (Inefficient Algorithmic Complexity) captures the underlying class of weakness.

Attack Vector

Attackers send crafted Unicode strings to any endpoint that normalizes input. Common targets include username and email normalization, URL canonicalization, search indexing, content sanitization, and IDNA processing. Each request ties up a worker thread or process, and concurrent requests amplify the impact into a service-wide denial of service. See the GitHub Issue Report and Python Security Announcement for technical details on the triggering input patterns.

Detection Methods for CVE-2026-3276

Indicators of Compromise

Sustained high CPU usage in Python worker processes handling text input
Request latency spikes correlated with submission of long Unicode strings
HTTP request bodies or parameters containing dense runs of combining marks with alternating CCC values
Worker process timeouts or thread pool starvation in services that perform Unicode normalization

Detection Strategies

Profile Python applications to identify time spent in unicodedata.normalize() calls on user-supplied input
Inspect web application logs for unusually long input strings submitted to normalization endpoints
Deploy WAF rules that flag payloads with excessive combining character density
Correlate CPU saturation events with inbound request payload characteristics

Monitoring Recommendations

Track per-request CPU time for endpoints that process Unicode text
Alert on Python worker processes exceeding CPU thresholds while handling text normalization workloads
Monitor request payload size distributions and flag outliers containing many non-ASCII combining marks
Capture stack traces during CPU spikes to confirm unicodedata frames

How to Mitigate CVE-2026-3276

Immediate Actions Required

Inventory all code paths invoking unicodedata.normalize() on untrusted input
Enforce strict input length limits before normalization (for example, reject inputs over a few hundred characters where business logic permits)
Apply request-level CPU and execution time budgets to handlers processing user text
Track the Python Security Announcement for patched CPython releases

Patch Information

The upstream fix is tracked in the CPython GitHub Pull Request referenced by the advisory. Upgrade to a CPython release that incorporates the patch as soon as it becomes available for your supported version line. Distribution maintainers will publish backports through their standard security update channels. The Openwall OSS Security Discussion provides additional context on coordinated disclosure.

Workarounds

Reject input exceeding a conservative maximum length before calling unicodedata.normalize()
Filter or limit the count of consecutive combining characters in incoming strings
Wrap normalization calls in a timeout or run them in a process with strict CPU limits
Place rate limiting and request size caps on endpoints that perform Unicode processing

bash

# Configuration example: enforce input length limits before normalization
# Example Python guard
python - <<'EOF'
import unicodedata

MAX_LEN = 256

def safe_normalize(form: str, value: str) -> str:
    if len(value) > MAX_LEN:
        raise ValueError("input too long for normalization")
    return unicodedata.normalize(form, value)
EOF