CVE-2026-9540: vllm-project vllm DoS Vulnerability

CVE-2026-9540 Overview

CVE-2026-9540 is a denial of service vulnerability affecting vllm-project vllm version 0.19.0. The flaw resides in the OpenAI-compatible Serving Path component, where unspecified processing can be manipulated by a remote attacker to disrupt service availability. No authentication or user interaction is required to trigger the condition, which makes the attack reachable from any network-accessible client. The issue is classified under CWE-404 (Improper Resource Shutdown or Release). Public exploit details are available, and a pull request to remediate the defect is pending acceptance upstream.

Critical Impact
Remote, unauthenticated attackers can degrade or disable vLLM inference endpoints exposed through the OpenAI-compatible API, interrupting downstream LLM-dependent applications.

Affected Products

vllm-project vllm 0.19.0
Deployments exposing the OpenAI-compatible Serving Path
LLM inference services built on the affected vLLM release

Discovery Timeline

2026-05-26 - CVE-2026-9540 published to NVD
2026-05-26 - Last updated in NVD database

Technical Details for CVE-2026-9540

Vulnerability Analysis

The vulnerability targets the OpenAI-compatible Serving Path inside vLLM 0.19.0. This component exposes REST endpoints that mimic the OpenAI API, enabling clients to send completion, chat, and embedding requests. An attacker can submit crafted input to this surface and trigger a denial of service condition. The defect maps to CWE-404, indicating that resources allocated during request handling are not properly released or shut down. As a result, repeated abusive requests can exhaust process resources or stall request servicing. The Exploit Prediction Scoring System currently places this issue in the lower probability range, but exploit material is already public per the VulDB vulnerability record.

Root Cause

The root cause is improper resource shutdown or release in the OpenAI-compatible Serving Path handlers. The component fails to reclaim resources tied to certain request patterns, leading to availability degradation over time or under repeated attacks. The upstream maintainers have proposed a fix in pull request #37594, tracked against issue #37343.

Attack Vector

The attack vector is network-based and requires no authentication or user interaction. An attacker reaches the affected serving endpoint over HTTP and submits malformed or abusive input. Because vLLM is frequently deployed as a backend for chat assistants, RAG pipelines, and agentic systems, a successful attack interrupts dependent services. Public discussion of vLLM latency and resource behavior is available on the Ingero blog. Refer to the VulDB submission record for additional disclosure context.

Detection Methods for CVE-2026-9540

Indicators of Compromise

Spikes in request latency or worker stalls on vLLM OpenAI-compatible endpoints with no corresponding legitimate traffic increase
Repeated requests to /v1/completions, /v1/chat/completions, or /v1/embeddings from a small set of source IPs preceding service degradation
Growth in process memory, file descriptors, or thread counts in vLLM workers without recovery between requests
Unexplained restarts or health-check failures of vLLM serving processes

Detection Strategies

Baseline normal request volume and latency for the OpenAI-compatible endpoints and alert on sustained deviations
Inspect HTTP access logs for high-rate or anomalously structured requests targeting the serving path
Correlate vLLM worker resource metrics with request-level telemetry to identify resource non-release patterns
Track CVE-2026-9540 indicators against threat intelligence sourced from the VulDB CTI feed

Monitoring Recommendations

Export vLLM Prometheus metrics and alert on queue depth, GPU utilization stalls, and pending request counts
Forward reverse-proxy and API gateway logs to a centralized analytics platform for rate and pattern analysis
Monitor container restart counts and OOM events on hosts running vLLM 0.19.0
Add synthetic probes against inference endpoints to detect availability loss quickly

How to Mitigate CVE-2026-9540

Immediate Actions Required

Inventory all vLLM deployments and identify any instances running version 0.19.0 exposing the OpenAI-compatible Serving Path
Restrict network exposure of vLLM inference endpoints to trusted clients using firewalls, VPNs, or service mesh policies
Place an authenticating reverse proxy or API gateway in front of vLLM to require credentials and enforce request validation
Apply rate limiting and request size limits at the gateway to reduce abuse surface until a patched release is available

Patch Information

At the time of publication, the upstream fix is staged in vllm pull request #37594 and awaits acceptance. Track the vllm-project repository and issue #37343 for the merged commit and a tagged release that includes the fix. Upgrade to the first vLLM release that incorporates the merged pull request once it is published.

Workarounds

Terminate client connections with strict timeouts at the reverse proxy to limit resource hold time on vLLM workers
Run vLLM under a process supervisor with resource limits (cgroups, Kubernetes requests/limits) and automatic restart on failure
Block or throttle anonymous traffic to the OpenAI-compatible endpoints and require API keys validated upstream
Isolate vLLM workloads in dedicated namespaces or nodes so that a denial of service event does not impact unrelated services

bash

# Configuration example
# Example NGINX snippet to rate-limit and bound request size in front of vLLM
http {
    limit_req_zone $binary_remote_addr zone=vllm_rl:10m rate=10r/s;
    client_max_body_size 1m;

    server {
        listen 443 ssl;
        location /v1/ {
            limit_req zone=vllm_rl burst=20 nodelay;
            proxy_read_timeout 30s;
            proxy_send_timeout 30s;
            proxy_pass http://vllm_backend;
        }
    }
}