CVE-2026-9540 Overview
CVE-2026-9540 is a denial of service vulnerability affecting vllm-project vllm version 0.19.0. The flaw resides in the OpenAI-compatible Serving Path component, where unspecified processing can be manipulated by a remote attacker to disrupt service availability. No authentication or user interaction is required to trigger the condition, which makes the attack reachable from any network-accessible client. The issue is classified under CWE-404 (Improper Resource Shutdown or Release). Public exploit details are available, and a pull request to remediate the defect is pending acceptance upstream.
Critical Impact
Remote, unauthenticated attackers can degrade or disable vLLM inference endpoints exposed through the OpenAI-compatible API, interrupting downstream LLM-dependent applications.
Affected Products
- vllm-project vllm 0.19.0
- Deployments exposing the OpenAI-compatible Serving Path
- LLM inference services built on the affected vLLM release
Discovery Timeline
- 2026-05-26 - CVE-2026-9540 published to NVD
- 2026-05-26 - Last updated in NVD database
Technical Details for CVE-2026-9540
Vulnerability Analysis
The vulnerability targets the OpenAI-compatible Serving Path inside vLLM 0.19.0. This component exposes REST endpoints that mimic the OpenAI API, enabling clients to send completion, chat, and embedding requests. An attacker can submit crafted input to this surface and trigger a denial of service condition. The defect maps to CWE-404, indicating that resources allocated during request handling are not properly released or shut down. As a result, repeated abusive requests can exhaust process resources or stall request servicing. The Exploit Prediction Scoring System currently places this issue in the lower probability range, but exploit material is already public per the VulDB vulnerability record.
Root Cause
The root cause is improper resource shutdown or release in the OpenAI-compatible Serving Path handlers. The component fails to reclaim resources tied to certain request patterns, leading to availability degradation over time or under repeated attacks. The upstream maintainers have proposed a fix in pull request #37594, tracked against issue #37343.
Attack Vector
The attack vector is network-based and requires no authentication or user interaction. An attacker reaches the affected serving endpoint over HTTP and submits malformed or abusive input. Because vLLM is frequently deployed as a backend for chat assistants, RAG pipelines, and agentic systems, a successful attack interrupts dependent services. Public discussion of vLLM latency and resource behavior is available on the Ingero blog. Refer to the VulDB submission record for additional disclosure context.
Detection Methods for CVE-2026-9540
Indicators of Compromise
- Spikes in request latency or worker stalls on vLLM OpenAI-compatible endpoints with no corresponding legitimate traffic increase
- Repeated requests to /v1/completions, /v1/chat/completions, or /v1/embeddings from a small set of source IPs preceding service degradation
- Growth in process memory, file descriptors, or thread counts in vLLM workers without recovery between requests
- Unexplained restarts or health-check failures of vLLM serving processes
Detection Strategies
- Baseline normal request volume and latency for the OpenAI-compatible endpoints and alert on sustained deviations
- Inspect HTTP access logs for high-rate or anomalously structured requests targeting the serving path
- Correlate vLLM worker resource metrics with request-level telemetry to identify resource non-release patterns
- Track CVE-2026-9540 indicators against threat intelligence sourced from the VulDB CTI feed
Monitoring Recommendations
- Export vLLM Prometheus metrics and alert on queue depth, GPU utilization stalls, and pending request counts
- Forward reverse-proxy and API gateway logs to a centralized analytics platform for rate and pattern analysis
- Monitor container restart counts and OOM events on hosts running vLLM 0.19.0
- Add synthetic probes against inference endpoints to detect availability loss quickly
How to Mitigate CVE-2026-9540
Immediate Actions Required
- Inventory all vLLM deployments and identify any instances running version 0.19.0 exposing the OpenAI-compatible Serving Path
- Restrict network exposure of vLLM inference endpoints to trusted clients using firewalls, VPNs, or service mesh policies
- Place an authenticating reverse proxy or API gateway in front of vLLM to require credentials and enforce request validation
- Apply rate limiting and request size limits at the gateway to reduce abuse surface until a patched release is available
Patch Information
At the time of publication, the upstream fix is staged in vllm pull request #37594 and awaits acceptance. Track the vllm-project repository and issue #37343 for the merged commit and a tagged release that includes the fix. Upgrade to the first vLLM release that incorporates the merged pull request once it is published.
Workarounds
- Terminate client connections with strict timeouts at the reverse proxy to limit resource hold time on vLLM workers
- Run vLLM under a process supervisor with resource limits (cgroups, Kubernetes requests/limits) and automatic restart on failure
- Block or throttle anonymous traffic to the OpenAI-compatible endpoints and require API keys validated upstream
- Isolate vLLM workloads in dedicated namespaces or nodes so that a denial of service event does not impact unrelated services
# Configuration example
# Example NGINX snippet to rate-limit and bound request size in front of vLLM
http {
limit_req_zone $binary_remote_addr zone=vllm_rl:10m rate=10r/s;
client_max_body_size 1m;
server {
listen 443 ssl;
location /v1/ {
limit_req zone=vllm_rl burst=20 nodelay;
proxy_read_timeout 30s;
proxy_send_timeout 30s;
proxy_pass http://vllm_backend;
}
}
}
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


