CVE-2026-0599 Overview
A resource exhaustion vulnerability exists in Hugging Face text-generation-inference version 3.3.6 that allows unauthenticated remote attackers to exploit unbounded external image fetching during input validation when operating in VLM (Vision Language Model) mode. The vulnerability arises from improper handling of Markdown image links, where the router performs blocking HTTP GET requests and reads entire response bodies into memory without size limits, enabling attackers to cause denial of service conditions.
Critical Impact
Unauthenticated remote attackers can crash host machines by triggering memory exhaustion, CPU overutilization, and network bandwidth saturation through malicious image URL requests—even when those requests are later rejected for exceeding token limits.
Affected Products
- Hugging Face text-generation-inference version 3.3.6 and earlier (VLM mode)
- Default deployments without memory usage limits
- Deployments lacking authentication controls
Discovery Timeline
- 2026-02-02 - CVE CVE-2026-0599 published to NVD
- 2026-02-03 - Last updated in NVD database
Technical Details for CVE-2026-0599
Vulnerability Analysis
This vulnerability (CWE-400: Uncontrolled Resource Consumption) stems from a design flaw in how the text-generation-inference router processes incoming requests in VLM mode. When the router scans user inputs for Markdown image links (e.g., ), it initiates a blocking HTTP GET request to fetch the external resource. The critical issue is that the entire response body is read into memory and cloned before any decoding or validation occurs.
The impact is particularly severe because the resource exhaustion occurs during the input validation phase—before the actual inference processing begins. This means that even requests destined to be rejected for exceeding token limits will still trigger the resource-intensive fetch operation. Attackers can exploit this to cause:
- Memory inflation: Large files fetched into memory without bounds
- CPU overutilization: Processing overhead from handling oversized responses
- Network bandwidth saturation: Continuous fetching of external resources
The default deployment configuration compounds the risk by lacking both memory usage limits and authentication requirements, making publicly accessible instances trivially exploitable.
Root Cause
The root cause is the absence of size limits and resource controls when fetching external images referenced in Markdown-formatted inputs. The router implementation performs unbounded memory allocation when reading HTTP response bodies, and this operation occurs synchronously during request validation rather than being deferred or bounded.
Attack Vector
The vulnerability is exploitable over the network without authentication. An attacker can craft HTTP requests containing Markdown image syntax pointing to either:
- Large files: URLs returning multi-gigabyte responses to exhaust memory
- Slow endpoints: URLs with intentionally delayed responses to tie up resources
- Multiple concurrent requests: Amplifying the impact through parallel exploitation
The attack can be executed by submitting text inputs containing malicious Markdown image references to the text-generation-inference API endpoint when VLM mode is enabled.
For detailed technical analysis of the vulnerability mechanism, refer to the Huntr Bug Bounty Report.
Detection Methods for CVE-2026-0599
Indicators of Compromise
- Unusual spikes in memory consumption on text-generation-inference hosts
- Increased outbound HTTP requests from the inference service to external URLs
- Service crashes or out-of-memory (OOM) errors in container/process logs
- Network traffic anomalies showing large inbound data transfers to the inference endpoint
Detection Strategies
- Monitor memory utilization patterns on hosts running text-generation-inference with alerting thresholds
- Implement network monitoring to detect unusual volumes of external HTTP requests originating from the inference service
- Review application logs for repeated image fetch operations or token limit rejections preceded by resource-intensive operations
- Deploy rate limiting at the API gateway level to detect and throttle suspicious request patterns
Monitoring Recommendations
- Set up resource monitoring dashboards for CPU, memory, and network bandwidth on inference hosts
- Configure container orchestration (Kubernetes/Docker) to enforce memory limits and restart policies
- Enable verbose logging for the text-generation-inference router to capture image fetch operations
- Implement application-level metrics for tracking external resource fetching behavior
How to Mitigate CVE-2026-0599
Immediate Actions Required
- Upgrade text-generation-inference to version 3.3.7 or later immediately
- Implement authentication controls if not already in place to restrict access to trusted users
- Configure memory limits for the text-generation-inference process or container
- Consider temporarily disabling VLM mode if not required for operations
Patch Information
The vulnerability has been resolved in text-generation-inference version 3.3.7. The fix implements proper bounds checking and resource controls for external image fetching operations. The patch details can be reviewed in the GitHub Commit.
Organizations should prioritize upgrading to the patched version as the vulnerability is exploitable without authentication and can result in complete service disruption.
Workarounds
- Deploy a reverse proxy or API gateway in front of text-generation-inference to filter requests containing Markdown image syntax
- Configure container resource limits (memory and CPU) to prevent host-level crashes
- Implement network egress controls to restrict or monitor external HTTP requests from the inference service
- Add authentication mechanisms at the infrastructure level if the application lacks built-in authentication
# Example: Docker deployment with memory limits
docker run -d \
--memory="8g" \
--memory-swap="8g" \
--cpus="4" \
ghcr.io/huggingface/text-generation-inference:3.3.7 \
--model-id your-model-id
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


