CVE-2025-62164: Vllm Vllm RCE Vulnerability

CVE-2025-62164 Overview

CVE-2025-62164 is a memory corruption vulnerability in vLLM, a popular inference and serving engine for large language models (LLMs). The vulnerability exists in the Completions API endpoint and affects versions 0.10.2 through 0.11.0. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default, allowing maliciously crafted tensors to bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense(). This memory corruption can crash vLLM and potentially lead to remote code execution on the server hosting vLLM.

Critical Impact
This vulnerability enables network-based attackers with low privileges to achieve denial-of-service (crash) and potentially execute arbitrary code on vLLM servers through maliciously crafted tensor payloads, compromising the confidentiality, integrity, and availability of AI inference infrastructure.

Affected Products

vLLM versions 0.10.2 to 0.11.0 (exclusive of 0.11.1)
vLLM version 0.11.1-rc0
vLLM version 0.11.1-rc1

Discovery Timeline

2025-11-21 - CVE-2025-62164 published to NVD
2025-12-04 - Last updated in NVD database

Technical Details for CVE-2025-62164

Vulnerability Analysis

This vulnerability exploits a fundamental weakness in how vLLM handles untrusted serialized data through PyTorch's tensor deserialization mechanism. The Completions API endpoint accepts user-supplied prompt embeddings as serialized tensors, which are then loaded using torch.load(). The critical issue arises from PyTorch 2.8.0's default behavior change where sparse tensor integrity checks are no longer enforced automatically.

When a maliciously crafted sparse tensor is deserialized and subsequently converted to a dense representation via to_dense(), the lack of proper bounds validation allows an attacker to manipulate tensor dimensions and indices. This results in an out-of-bounds memory write that corrupts heap memory, potentially overwriting critical data structures. The memory corruption can be weaponized beyond simple crashes to achieve arbitrary code execution, particularly when combined with heap spraying or other memory manipulation techniques.

The attack requires only authenticated network access, making it particularly dangerous for vLLM deployments exposed to multiple users or services.

Root Cause

The root cause is improper input validation (CWE-20) in the tensor deserialization pathway. Specifically, vLLM trusts user-supplied serialized tensor data without validating the integrity of sparse tensor structures before processing them. The vulnerability is exacerbated by PyTorch 2.8.0's decision to disable sparse tensor bounds checking by default, creating a gap between the expected safety guarantees and actual runtime behavior.

Attack Vector

An attacker with network access to the vLLM Completions API can craft a malicious serialized sparse tensor with deliberately corrupted indices or dimensions. When submitted as prompt embeddings, the tensor passes through torch.load() deserialization. The subsequent call to to_dense() performs memory writes based on the corrupted tensor indices without bounds validation, resulting in out-of-bounds memory access. This can crash the vLLM service or, with precise memory layout knowledge, enable arbitrary code execution on the hosting server.

python

# Security patch requiring explicit flag for loading embeddings
# Source: https://github.com/vllm-project/vllm/commit/58fab50d82838d5014f4a14d991fdb9352c9c84b

# In examples/offline_inference/prithvi_geospatial_mae.py
             dtype="float16",
             enforce_eager=True,
             model_impl="terratorch",
+            enable_mm_embeds=True,
         )

# In examples/offline_inference/prithvi_geospatial_mae_io_processor.py
         max_num_seqs=32,
         io_processor_plugin="prithvi_to_tiff",
         model_impl="terratorch",
+        enable_mm_embeds=True,
     )

The patch introduces an explicit enable_mm_embeds=True flag that must be set to allow loading of multimodal embeddings, preventing automatic processing of untrusted tensor data.

Detection Methods for CVE-2025-62164

Indicators of Compromise

Unexpected vLLM server crashes or service restarts, particularly when processing API requests
Anomalous serialized tensor payloads in Completions API request logs containing unusual sparse tensor structures
Memory corruption signatures or segmentation faults in vLLM process logs
Unusual network traffic patterns to the Completions API endpoint with large or malformed payloads

Detection Strategies

Monitor vLLM service logs for segmentation faults, memory access violations, or unexpected process terminations
Implement API request payload inspection to detect abnormally structured tensor data in prompt embeddings
Deploy endpoint detection to identify memory corruption exploitation attempts on vLLM host systems
Set up process monitoring for vLLM to track crash frequency and restart patterns

Monitoring Recommendations

Configure alerting on vLLM process crashes or abnormal memory consumption patterns
Implement request logging with payload size and structure analysis for the Completions API
Deploy runtime application self-protection (RASP) solutions to detect memory corruption attempts
Monitor for unusual CPU or memory spikes during tensor deserialization operations

How to Mitigate CVE-2025-62164

Immediate Actions Required

Upgrade vLLM to version 0.11.1 or later immediately to apply the security patch
Audit access controls on vLLM Completions API endpoints to limit exposure to authenticated and trusted users only
Review and restrict which clients can submit custom prompt embeddings to the API
Implement network segmentation to limit access to vLLM servers from untrusted networks

Patch Information

The vulnerability has been patched in vLLM version 0.11.1. The fix introduces an explicit enable_mm_embeds flag that must be enabled to allow loading of text and image embeddings, preventing automatic processing of potentially malicious serialized tensor data. For detailed patch information, see the GitHub Security Advisory and the Pull Request #27204.

Workarounds

If upgrading is not immediately possible, restrict access to the Completions API endpoint to trusted internal clients only
Implement an API gateway or web application firewall to filter and validate incoming tensor payloads before they reach vLLM
Disable custom prompt embedding functionality if not required for your use case
Consider running vLLM in isolated containers with limited privileges and resource constraints to minimize RCE impact

bash

# Configuration example: Restrict vLLM API access via network firewall
# Allow only trusted internal network access to vLLM port 8000
iptables -A INPUT -p tcp --dport 8000 -s 10.0.0.0/8 -j ACCEPT
iptables -A INPUT -p tcp --dport 8000 -j DROP

# Run vLLM with resource limits in container
docker run --memory="16g" --cpus="4" \
  --security-opt=no-new-privileges \
  --read-only \
  vllm/vllm-openai:v0.11.1

CVE-2025-62164 Overview

Critical Impact
This vulnerability enables network-based attackers with low privileges to achieve denial-of-service (crash) and potentially execute arbitrary code on vLLM servers through maliciously crafted tensor payloads, compromising the confidentiality, integrity, and availability of AI inference infrastructure.

Affected Products

vLLM versions 0.10.2 to 0.11.0 (exclusive of 0.11.1)
vLLM version 0.11.1-rc0
vLLM version 0.11.1-rc1

Discovery Timeline

2025-11-21 - CVE-2025-62164 published to NVD
2025-12-04 - Last updated in NVD database

Technical Details for CVE-2025-62164

Vulnerability Analysis

The attack requires only authenticated network access, making it particularly dangerous for vLLM deployments exposed to multiple users or services.

Root Cause

Attack Vector

python

# Security patch requiring explicit flag for loading embeddings
# Source: https://github.com/vllm-project/vllm/commit/58fab50d82838d5014f4a14d991fdb9352c9c84b

# In examples/offline_inference/prithvi_geospatial_mae.py
             dtype="float16",
             enforce_eager=True,
             model_impl="terratorch",
+            enable_mm_embeds=True,
         )

# In examples/offline_inference/prithvi_geospatial_mae_io_processor.py
         max_num_seqs=32,
         io_processor_plugin="prithvi_to_tiff",
         model_impl="terratorch",
+        enable_mm_embeds=True,
     )

The patch introduces an explicit enable_mm_embeds=True flag that must be set to allow loading of multimodal embeddings, preventing automatic processing of untrusted tensor data.

Detection Methods for CVE-2025-62164

Indicators of Compromise

Unexpected vLLM server crashes or service restarts, particularly when processing API requests
Anomalous serialized tensor payloads in Completions API request logs containing unusual sparse tensor structures
Memory corruption signatures or segmentation faults in vLLM process logs
Unusual network traffic patterns to the Completions API endpoint with large or malformed payloads

Detection Strategies

Monitor vLLM service logs for segmentation faults, memory access violations, or unexpected process terminations
Implement API request payload inspection to detect abnormally structured tensor data in prompt embeddings
Deploy endpoint detection to identify memory corruption exploitation attempts on vLLM host systems
Set up process monitoring for vLLM to track crash frequency and restart patterns

Monitoring Recommendations

Configure alerting on vLLM process crashes or abnormal memory consumption patterns
Implement request logging with payload size and structure analysis for the Completions API
Deploy runtime application self-protection (RASP) solutions to detect memory corruption attempts
Monitor for unusual CPU or memory spikes during tensor deserialization operations

How to Mitigate CVE-2025-62164

Immediate Actions Required

Upgrade vLLM to version 0.11.1 or later immediately to apply the security patch
Audit access controls on vLLM Completions API endpoints to limit exposure to authenticated and trusted users only
Review and restrict which clients can submit custom prompt embeddings to the API
Implement network segmentation to limit access to vLLM servers from untrusted networks

Patch Information

Workarounds

If upgrading is not immediately possible, restrict access to the Completions API endpoint to trusted internal clients only
Implement an API gateway or web application firewall to filter and validate incoming tensor payloads before they reach vLLM
Disable custom prompt embedding functionality if not required for your use case
Consider running vLLM in isolated containers with limited privileges and resource constraints to minimize RCE impact

bash

# Configuration example: Restrict vLLM API access via network firewall
# Allow only trusted internal network access to vLLM port 8000
iptables -A INPUT -p tcp --dport 8000 -s 10.0.0.0/8 -j ACCEPT
iptables -A INPUT -p tcp --dport 8000 -j DROP

# Run vLLM with resource limits in container
docker run --memory="16g" --cpus="4" \
  --security-opt=no-new-privileges \
  --read-only \
  vllm/vllm-openai:v0.11.1

CVE-2025-62164: Vllm Vllm RCE Vulnerability

CVE-2025-62164 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2025-62164

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2025-62164

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2025-62164

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform

CVE-2025-62164: Vllm Vllm RCE Vulnerability

CVE-2025-62164 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2025-62164

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2025-62164

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2025-62164

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform