CVE-2026-5497: vLLM Out-of-Memory DoS Vulnerability

CVE-2026-5497 Overview

CVE-2026-5497 is an unauthenticated Denial of Service (DoS) vulnerability in vLLM versions 0.8.0 and later. The flaw resides in the VideoMediaIO.load_base64() method, which processes video/jpeg data URLs without enforcing a frame count limit. An attacker can submit a single API request containing thousands of comma-separated base64-encoded JPEG frames. The server decodes every frame into memory, exhausts available RAM, and crashes. The vulnerability is reachable through the OpenAI-compatible chat completions API, making any internet-exposed vLLM inference endpoint a target. The weakness is categorized under [CWE-400] Uncontrolled Resource Consumption.

Critical Impact
Unauthenticated attackers can crash vLLM inference servers with a single crafted HTTP request, disrupting any LLM-backed application or service.

Affected Products

vLLM version 0.8.0 and later
vLLM deployments exposing the OpenAI-compatible chat completions API
Any application stack consuming vLLM as an inference backend with multimodal video input enabled

Discovery Timeline

2026-06-11 - CVE-2026-5497 published to NVD
2026-06-11 - Last updated in NVD database

Technical Details for CVE-2026-5497

Vulnerability Analysis

The vulnerability lies in how vLLM handles multimodal input, specifically the parsing of video/jpeg data URLs. When a client submits a chat completion request containing an embedded video data URL, VideoMediaIO.load_base64() splits the base64 payload on comma delimiters. Each resulting segment is treated as an individual JPEG frame and decoded into a memory buffer. No upper bound is applied to the number of frames extracted from a single request. An attacker can submit a request containing thousands of comma-separated segments, forcing the server to allocate memory proportional to the attacker-controlled frame count. Memory pressure escalates rapidly, triggering an Out-of-Memory (OOM) condition that terminates the inference worker.

Root Cause

The root cause is missing input validation on a user-controlled resource quantity. VideoMediaIO.load_base64() derives the frame count directly from the structure of the input string. No configuration parameter, hard-coded ceiling, or memory budget gate prevents pathological inputs from being fully expanded. This pattern matches [CWE-400] Uncontrolled Resource Consumption.

Attack Vector

The attack vector is the network. The exploit requires no authentication and no user interaction. An attacker sends a POST request to the vLLM /v1/chat/completions endpoint with a message containing a video/jpeg data URL composed of many comma-delimited base64 JPEG segments. The server enumerates and decodes each segment, exhausts host memory, and the kernel OOM killer terminates the vLLM process. Repeated requests prevent recovery, sustaining the denial of service.

Verified proof-of-concept details are documented in the Huntr Bounty Report. The upstream fix is available in the GitHub Commit Details.

Detection Methods for CVE-2026-5497

Indicators of Compromise

Inbound HTTP requests to /v1/chat/completions containing data:video/jpeg;base64, payloads with abnormally high comma counts.
vLLM worker processes terminated by the Linux OOM killer with Out of memory: Killed process entries in dmesg or journalctl.
Sudden spikes in resident set size (RSS) for the vLLM Python process immediately preceding a crash.
Repeated 5xx responses or connection resets from the vLLM API endpoint following a single malformed request.

Detection Strategies

Inspect request bodies at an API gateway or reverse proxy and flag video/jpeg data URLs whose payload size or comma count exceeds operational baselines.
Correlate process termination events on vLLM hosts with preceding API requests to identify the triggering client and payload signature.
Monitor host-level memory utilization for sharp, short-duration spikes that align with single inbound requests.

Monitoring Recommendations

Enable structured access logging on the vLLM endpoint and ship logs to a SIEM for retention and query.
Alert on oom-killer kernel events targeting vLLM or its Python interpreter.
Track request rate, payload size distribution, and 5xx error rate per source IP to surface abuse patterns.

How to Mitigate CVE-2026-5497

Immediate Actions Required

Upgrade vLLM to the patched release that includes commit 58ee61422169ce17e08248f8efa1e9df434fe395.
Restrict network exposure of the vLLM API to authenticated clients via a reverse proxy or service mesh.
Enforce request body size limits at the ingress layer to cap the maximum payload an attacker can submit.
Disable multimodal video input on deployments that do not require it.

Patch Information

The upstream fix introduces a frame count limit in VideoMediaIO.load_base64(), preventing unbounded expansion of comma-delimited base64 segments. Review the GitHub Commit Details for the exact code change and integrate it into your build pipeline. Operators running forks should backport the change rather than relying on configuration alone.

Workarounds

Place vLLM behind an API gateway that rejects requests whose video/jpeg data URLs contain more than a small, fixed number of commas.
Apply a maximum request body size (for example, 1 MB) at the reverse proxy to limit attacker leverage.
Run vLLM under a cgroup or container with a memory limit so OOM events terminate only the worker rather than destabilizing the host.
Require authentication and per-client rate limiting on the chat completions endpoint until patching is complete.

bash

# Example NGINX ingress hardening for vLLM
client_max_body_size 1m;
limit_req_zone $binary_remote_addr zone=vllm:10m rate=10r/s;

location /v1/chat/completions {
    limit_req zone=vllm burst=20 nodelay;
    proxy_pass http://vllm_upstream;
    proxy_read_timeout 60s;
}

CVE-2026-5497 Overview

Critical Impact
Unauthenticated attackers can crash vLLM inference servers with a single crafted HTTP request, disrupting any LLM-backed application or service.

Affected Products

vLLM version 0.8.0 and later
vLLM deployments exposing the OpenAI-compatible chat completions API
Any application stack consuming vLLM as an inference backend with multimodal video input enabled

Discovery Timeline

2026-06-11 - CVE-2026-5497 published to NVD
2026-06-11 - Last updated in NVD database

Technical Details for CVE-2026-5497

Vulnerability Analysis

Root Cause

Attack Vector

Verified proof-of-concept details are documented in the Huntr Bounty Report. The upstream fix is available in the GitHub Commit Details.

Detection Methods for CVE-2026-5497

Indicators of Compromise

Inbound HTTP requests to /v1/chat/completions containing data:video/jpeg;base64, payloads with abnormally high comma counts.
vLLM worker processes terminated by the Linux OOM killer with Out of memory: Killed process entries in dmesg or journalctl.
Sudden spikes in resident set size (RSS) for the vLLM Python process immediately preceding a crash.
Repeated 5xx responses or connection resets from the vLLM API endpoint following a single malformed request.

Detection Strategies

Inspect request bodies at an API gateway or reverse proxy and flag video/jpeg data URLs whose payload size or comma count exceeds operational baselines.
Correlate process termination events on vLLM hosts with preceding API requests to identify the triggering client and payload signature.
Monitor host-level memory utilization for sharp, short-duration spikes that align with single inbound requests.

Monitoring Recommendations

Enable structured access logging on the vLLM endpoint and ship logs to a SIEM for retention and query.
Alert on oom-killer kernel events targeting vLLM or its Python interpreter.
Track request rate, payload size distribution, and 5xx error rate per source IP to surface abuse patterns.

How to Mitigate CVE-2026-5497

Immediate Actions Required

Upgrade vLLM to the patched release that includes commit 58ee61422169ce17e08248f8efa1e9df434fe395.
Restrict network exposure of the vLLM API to authenticated clients via a reverse proxy or service mesh.
Enforce request body size limits at the ingress layer to cap the maximum payload an attacker can submit.
Disable multimodal video input on deployments that do not require it.

Patch Information

Workarounds

Place vLLM behind an API gateway that rejects requests whose video/jpeg data URLs contain more than a small, fixed number of commas.
Apply a maximum request body size (for example, 1 MB) at the reverse proxy to limit attacker leverage.
Run vLLM under a cgroup or container with a memory limit so OOM events terminate only the worker rather than destabilizing the host.
Require authentication and per-client rate limiting on the chat completions endpoint until patching is complete.

bash

# Example NGINX ingress hardening for vLLM
client_max_body_size 1m;
limit_req_zone $binary_remote_addr zone=vllm:10m rate=10r/s;

location /v1/chat/completions {
    limit_req zone=vllm burst=20 nodelay;
    proxy_pass http://vllm_upstream;
    proxy_read_timeout 60s;
}

CVE-2026-5497: vLLM Out-of-Memory DoS Vulnerability

CVE-2026-5497 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2026-5497

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2026-5497

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2026-5497

Immediate Actions Required

Patch Information

Workarounds

Experience the Most Advanced Cybersecurity Platform

CVE-2026-5497: vLLM Out-of-Memory DoS Vulnerability

CVE-2026-5497 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2026-5497

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2026-5497

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2026-5497

Immediate Actions Required

Patch Information

Workarounds

Experience the Most Advanced Cybersecurity Platform