CVE-2026-22773 Overview
CVE-2026-22773 is a Denial of Service vulnerability affecting vLLM, a popular inference and serving engine for large language models (LLMs). The vulnerability allows authenticated users to crash vLLM engine instances serving multimodal models that utilize the Idefics3 vision model implementation by sending a specially crafted 1x1 pixel image. This triggers a tensor dimension mismatch that results in an unhandled runtime error, causing complete server termination.
Critical Impact
Authenticated attackers can cause complete denial of service to vLLM inference servers by submitting malicious image payloads, disrupting AI/ML workloads and potentially affecting production LLM services.
Affected Products
- vLLM versions 0.6.4 through 0.11.x (prior to 0.12.0)
- vLLM deployments using Idefics3 vision model implementation
- Multimodal LLM serving configurations
Discovery Timeline
- 2026-01-10 - CVE CVE-2026-22773 published to NVD
- 2026-01-13 - Last updated in NVD database
Technical Details for CVE-2026-22773
Vulnerability Analysis
This vulnerability exists in the image processing pipeline of vLLM's Idefics3 vision model implementation. When processing multimodal inputs that combine text and images, the Idefics3 model expects image tensors with specific dimensional requirements. The vulnerability stems from improper allocation of resources without limits (CWE-770), where the system fails to validate image dimensions before tensor operations.
When a malformed 1x1 pixel image is submitted to the inference endpoint, the vision model's tensor processing logic encounters a dimension mismatch during the image embedding phase. This mismatch triggers an unhandled runtime exception that propagates up the call stack, bypassing any error recovery mechanisms and causing the entire vLLM server process to terminate.
The attack is particularly impactful because vLLM is designed to handle high-throughput inference workloads, meaning a single malicious request can disrupt service for all concurrent users and queued requests.
Root Cause
The root cause is insufficient input validation in the Idefics3 vision model's image preprocessing pipeline. The code assumes incoming images meet minimum dimensional requirements for tensor operations without explicitly validating these constraints. When boundary-case images (such as 1x1 pixel images) are processed, the resulting tensor shapes are incompatible with downstream operations, causing the runtime error.
The underlying issue is classified as CWE-770 (Allocation of Resources Without Limits or Throttling), as the system fails to properly constrain and validate the image input resources before processing them in tensor operations.
Attack Vector
The attack can be executed remotely over the network by any authenticated user with access to the vLLM inference API. The attacker needs to:
- Identify a vLLM deployment serving a multimodal model with Idefics3 vision capabilities
- Craft a valid API request containing a 1x1 pixel image payload
- Submit the request to the multimodal inference endpoint
The vulnerability manifests when the Idefics3 vision model attempts to process the malformed image, resulting in a tensor dimension mismatch during the embedding generation phase. This causes an unhandled runtime exception that terminates the server process. For technical implementation details, refer to the GitHub Security Advisory.
Detection Methods for CVE-2026-22773
Indicators of Compromise
- Unexpected vLLM server process terminations or crashes
- Error logs containing tensor dimension mismatch exceptions related to image processing
- API requests containing unusually small image payloads (particularly 1x1 pixel images)
- Repeated server restarts following multimodal inference requests
Detection Strategies
- Monitor vLLM server logs for unhandled runtime errors in the Idefics3 vision model components
- Implement request logging to capture image dimensions before processing
- Deploy application-level health checks to detect unexpected server terminations
- Analyze API traffic patterns for requests with minimal image payloads targeting multimodal endpoints
Monitoring Recommendations
- Configure alerting for vLLM process crashes or unexpected restarts
- Implement log aggregation to correlate tensor-related exceptions with incoming request payloads
- Monitor inference API latency spikes that may indicate service degradation before crashes
- Track request patterns from individual users for anomalous small image submissions
How to Mitigate CVE-2026-22773
Immediate Actions Required
- Upgrade vLLM to version 0.12.0 or later immediately
- Review access controls to restrict multimodal inference endpoints to trusted users
- Implement request rate limiting on API endpoints as a temporary protective measure
- Deploy health monitoring to enable rapid restart of crashed instances
Patch Information
The vulnerability has been patched in vLLM version 0.12.0. Organizations should upgrade to this version or later to remediate the vulnerability. The patch adds proper validation of image dimensions before tensor processing operations, ensuring that malformed images are rejected with an appropriate error response rather than causing server crashes.
For detailed patch information, see the GitHub Security Advisory.
Workarounds
- Implement input validation at the API gateway level to reject images below minimum dimensional thresholds
- Deploy vLLM instances behind a reverse proxy that filters requests with malformed image payloads
- Use container orchestration with automatic restart policies to minimize downtime from crashes
- Consider temporarily disabling Idefics3 vision model support if not required for production workloads
# Configuration example - Upgrade vLLM to patched version
pip install --upgrade vllm>=0.12.0
# Verify installed version
pip show vllm | grep Version
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


