CVE-2026-7669 Overview
CVE-2026-7669 is a code injection vulnerability in sgl-project SGLang up to version 0.5.9. The flaw resides in the get_tokenizer function within python/sglang/srt/utils/hf_transformers_utils.py, which handles HuggingFace transformer tokenizer loading. When a caller passes trust_remote_code=False, SGLang silently re-invokes AutoTokenizer.from_pretrained with trust_remote_code=True, overriding the explicit security setting. A model repository containing a malicious tokenizer.py referenced via auto_map in tokenizer_config.json will execute arbitrary Python in the SGLang process. The weakness is classified under [CWE-74] (Improper Neutralization of Special Elements in Output).
Critical Impact
Loading an attacker-controlled HuggingFace model in SGLang executes arbitrary Python code in the inference process, even when callers explicitly set trust_remote_code=False.
Affected Products
- sgl-project SGLang versions up to and including 0.5.9
- Deployments using HuggingFace transformers==5.3.0 (pinned in pyproject.toml)
- Both tokenizer_mode="auto" and tokenizer_mode="slow" configurations
Discovery Timeline
- 2026-05-02 - CVE-2026-7669 published to NVD
- 2026-05-05 - Last updated in NVD database
Technical Details for CVE-2026-7669
Vulnerability Analysis
The vulnerability emerges from an interaction between SGLang's tokenizer loading logic and HuggingFace transformers v5. When get_tokenizer() requests a tokenizer with trust_remote_code=False, transformers v5 returns a TokenizersBackend instance as the generic fallback for tokenizer classes not present in its registry. SGLang treats this fallback as a failure and retries the call with trust_remote_code=True to recover. This silent escalation overrides the caller's explicit security boundary without emitting any log line or warning. Because transformers==5.3.0 is pinned in pyproject.toml, every current SGLang release exhibits the behavior. The exploit is public, and the vendor did not respond to early disclosure outreach.
Root Cause
The root cause is an unsafe fallback path that re-issues the tokenizer load with elevated trust when the first attempt does not return a recognized tokenizer class. The retry ignores the security intent encoded in the original trust_remote_code=False argument. The condition triggering the retry is reachable for any tokenizer class HuggingFace v5 routes through TokenizersBackend, which is the generic catch-all path.
Attack Vector
An attacker publishes a HuggingFace model repository containing a tokenizer_config.json with an auto_map entry pointing at a malicious tokenizer.py. When an SGLang operator or downstream service loads that model identifier, the second AutoTokenizer.from_pretrained call honors auto_map and imports the attacker's Python module. Code execution occurs in the SGLang process context, with access to model weights, GPU memory, environment secrets, and any network reachability the inference host has. The attack is network-reachable but requires the target to load a specific model, contributing to the high attack complexity rating.
No verified exploit code is reproduced here. See the GitHub PoC Repository and VulDB Vulnerability #360817 for technical artifacts.
Detection Methods for CVE-2026-7669
Indicators of Compromise
- Unexpected child processes or outbound network connections originating from the SGLang Python process after a model load
- Presence of auto_map entries in tokenizer_config.json of cached HuggingFace models under ~/.cache/huggingface/
- Loaded modules in the SGLang process with paths inside HuggingFace cache directories rather than site-packages
- Filesystem writes or credential access from the inference worker shortly after a new model identifier is requested
Detection Strategies
- Audit all SGLang model load requests and correlate the model repository identifier against an allowlist of trusted publishers
- Inspect tokenizer_config.json for any auto_map keys before permitting a model into the serving environment
- Hook or instrument AutoTokenizer.from_pretrained to log the effective trust_remote_code value and alert on True when the caller passed False
- Monitor for Python import events sourced from cache paths using EDR or eBPF-based file-execution telemetry
Monitoring Recommendations
- Forward SGLang stdout, stderr, and Python audit hook events into a centralized log pipeline for retention and analytics
- Alert on any process spawned by the inference worker that is not in a known-good baseline (shell, curl, wget, ssh)
- Track egress connections from inference hosts to non-HuggingFace destinations during model bootstrap windows
How to Mitigate CVE-2026-7669
Immediate Actions Required
- Restrict SGLang deployments to load only models from a vetted internal registry or specific allowlisted HuggingFace repositories
- Run SGLang inference workers as unprivileged users inside containers with read-only filesystems and no outbound internet beyond model registries
- Pre-fetch and audit tokenizer artifacts in an isolated environment, rejecting any model whose tokenizer_config.json contains an auto_map entry
- Block or proxy huggingface.co traffic from production inference hosts and serve approved models from an internal mirror
Patch Information
No vendor patch has been published for SGLang at the time of NVD disclosure. The vendor was contacted prior to public disclosure but did not respond. Track the VulDB Vulnerability #360817 entry and the SGLang project for upstream fixes, and pin to a fixed release once available.
Workarounds
- Patch get_tokenizer locally to remove the fallback that re-invokes AutoTokenizer.from_pretrained with trust_remote_code=True
- Downgrade transformers below v5 if compatibility allows, since the TokenizersBackend fallback path is the trigger
- Wrap AutoTokenizer.from_pretrained with a monkeypatch that forces trust_remote_code=False regardless of internal callers
- Strip auto_map from any cached tokenizer_config.json before loading
# Configuration example: scan cached tokenizer configs for auto_map abuse
find ~/.cache/huggingface -name 'tokenizer_config.json' \
-exec grep -l '"auto_map"' {} \;
# Run SGLang with a hardened wrapper that pins trust_remote_code=False
export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
python -m sglang.launch_server --model-path /opt/models/approved/llama-3 \
--tokenizer-mode slow
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


