CVE-2025-24357: vLLM Library RCE Vulnerability

CVE-2025-24357 Overview

CVE-2025-24357 is an insecure deserialization vulnerability in vLLM, a popular library for Large Language Model (LLM) inference and serving. The vulnerability exists in the hf_model_weights_iterator function within vllm/model_executor/weight_utils.py, which uses PyTorch's torch.load function with the weights_only parameter set to False by default. When loading model checkpoints downloaded from HuggingFace, this unsafe configuration allows malicious pickle data to execute arbitrary code during the unpickling process.

Critical Impact
Attackers can achieve remote code execution by crafting malicious model checkpoints that execute arbitrary code when loaded by vLLM, potentially compromising AI/ML infrastructure and sensitive training data.

Affected Products

vLLM versions prior to v0.7.0
vLLM model inference and serving deployments using HuggingFace model checkpoints
Systems utilizing torch.load through vLLM's weight loading utilities

Discovery Timeline

2025-01-27 - CVE CVE-2025-24357 published to NVD
2025-06-27 - Last updated in NVD database

Technical Details for CVE-2025-24357

Vulnerability Analysis

This vulnerability is classified as CWE-502 (Deserialization of Untrusted Data). The core issue stems from Python's pickle serialization format, which is inherently unsafe when handling untrusted data. PyTorch model checkpoints are stored as pickle files, and when torch.load is called without weights_only=True, it allows arbitrary Python objects to be deserialized and instantiated, including objects with malicious __reduce__ methods that execute code during unpickling.

The attack surface is particularly concerning in vLLM's context because model checkpoints are typically downloaded from external sources like HuggingFace Hub. If an attacker can publish a malicious model or compromise an existing one, any vLLM deployment loading that checkpoint would execute the attacker's code with the privileges of the vLLM process.

Root Cause

The root cause is the use of torch.load() with the default weights_only=False parameter when loading model weights from potentially untrusted sources. According to the PyTorch Documentation for torch.load, pickle is inherently unsafe and can execute arbitrary code during unpickling. The fix requires explicitly setting weights_only=True to restrict deserialization to only tensor data, primitive types, and safe containers.

Attack Vector

The attack is network-based, requiring user interaction to load a malicious model checkpoint. An attacker would need to:

Create or compromise a model repository on HuggingFace or similar platform
Embed malicious pickle payloads in the model checkpoint files (.pt, .bin files)
Wait for victims to download and load the malicious checkpoint using vLLM
Upon loading, the malicious payload executes during torch.load() unpickling

The following patches from the official security fix demonstrate the mitigation:

Patch in vllm/assets/image.py:

python

         """
         image_path = get_vllm_public_assets(filename=f"{self.name}.pt",
                                             s3_prefix=VLM_IMAGES_DIR)
-        return torch.load(image_path, map_location="cpu")
+        return torch.load(image_path, map_location="cpu", weights_only=True)

Source: GitHub Commit Update

Patch in vllm/lora/models.py:

python

                 new_embeddings_tensor_path)
         elif os.path.isfile(new_embeddings_bin_file_path):
             embeddings = torch.load(new_embeddings_bin_file_path,
-                                    map_location=device)
+                                    map_location=device,
+                                    weights_only=True)

         return cls.from_lora_tensors(
             lora_model_id=get_lora_id()

Source: GitHub Commit Update

Detection Methods for CVE-2025-24357

Indicators of Compromise

Unexpected process spawning from Python/vLLM processes during model loading operations
Network connections initiated during model checkpoint deserialization
Unusual file system activity or modifications during vLLM model initialization
Suspicious pickle files in model cache directories containing non-standard Python objects

Detection Strategies

Audit vLLM deployments for versions prior to v0.7.0 using dependency scanning tools
Monitor for torch.load() calls without weights_only=True parameter in application code
Implement file integrity monitoring on model checkpoint directories
Use runtime application security monitoring to detect pickle deserialization attacks

Monitoring Recommendations

Enable verbose logging for model loading operations in vLLM deployments
Monitor network egress from ML inference servers for unexpected connections
Implement behavioral analysis for processes executing under vLLM service accounts
Configure alerts for suspicious Python process activity during model initialization phases

How to Mitigate CVE-2025-24357

Immediate Actions Required

Upgrade vLLM to version v0.7.0 or later immediately
Audit all custom code for torch.load() calls and ensure weights_only=True is set
Review and verify the integrity of all model checkpoints in use
Restrict model downloads to trusted and verified sources only

Patch Information

The vulnerability is fixed in vLLM version v0.7.0. The patch adds weights_only=True to all torch.load() calls throughout the codebase. For detailed patch information, refer to:

Workarounds

If immediate upgrade is not possible, manually patch all torch.load() calls to include weights_only=True
Implement network segmentation to isolate ML inference infrastructure from sensitive systems
Use model scanning tools to detect potentially malicious pickle payloads before loading
Consider using safetensors format instead of pickle-based checkpoints where supported

bash

# Upgrade vLLM to patched version
pip install --upgrade vllm>=0.7.0

# Verify installed version
pip show vllm | grep Version

CVE-2025-24357 Overview

Critical Impact
Attackers can achieve remote code execution by crafting malicious model checkpoints that execute arbitrary code when loaded by vLLM, potentially compromising AI/ML infrastructure and sensitive training data.

Affected Products

vLLM versions prior to v0.7.0
vLLM model inference and serving deployments using HuggingFace model checkpoints
Systems utilizing torch.load through vLLM's weight loading utilities

Discovery Timeline

2025-01-27 - CVE CVE-2025-24357 published to NVD
2025-06-27 - Last updated in NVD database

Technical Details for CVE-2025-24357

Vulnerability Analysis

Root Cause

Attack Vector

The attack is network-based, requiring user interaction to load a malicious model checkpoint. An attacker would need to:

Create or compromise a model repository on HuggingFace or similar platform
Embed malicious pickle payloads in the model checkpoint files (.pt, .bin files)
Wait for victims to download and load the malicious checkpoint using vLLM
Upon loading, the malicious payload executes during torch.load() unpickling

The following patches from the official security fix demonstrate the mitigation:

Patch in vllm/assets/image.py:

python

         """
         image_path = get_vllm_public_assets(filename=f"{self.name}.pt",
                                             s3_prefix=VLM_IMAGES_DIR)
-        return torch.load(image_path, map_location="cpu")
+        return torch.load(image_path, map_location="cpu", weights_only=True)

Source: GitHub Commit Update

Patch in vllm/lora/models.py:

python

                 new_embeddings_tensor_path)
         elif os.path.isfile(new_embeddings_bin_file_path):
             embeddings = torch.load(new_embeddings_bin_file_path,
-                                    map_location=device)
+                                    map_location=device,
+                                    weights_only=True)

         return cls.from_lora_tensors(
             lora_model_id=get_lora_id()

Source: GitHub Commit Update

Detection Methods for CVE-2025-24357

Indicators of Compromise

Unexpected process spawning from Python/vLLM processes during model loading operations
Network connections initiated during model checkpoint deserialization
Unusual file system activity or modifications during vLLM model initialization
Suspicious pickle files in model cache directories containing non-standard Python objects

Detection Strategies

Audit vLLM deployments for versions prior to v0.7.0 using dependency scanning tools
Monitor for torch.load() calls without weights_only=True parameter in application code
Implement file integrity monitoring on model checkpoint directories
Use runtime application security monitoring to detect pickle deserialization attacks

Monitoring Recommendations

Enable verbose logging for model loading operations in vLLM deployments
Monitor network egress from ML inference servers for unexpected connections
Implement behavioral analysis for processes executing under vLLM service accounts
Configure alerts for suspicious Python process activity during model initialization phases

How to Mitigate CVE-2025-24357

Immediate Actions Required

Upgrade vLLM to version v0.7.0 or later immediately
Audit all custom code for torch.load() calls and ensure weights_only=True is set
Review and verify the integrity of all model checkpoints in use
Restrict model downloads to trusted and verified sources only

Patch Information

The vulnerability is fixed in vLLM version v0.7.0. The patch adds weights_only=True to all torch.load() calls throughout the codebase. For detailed patch information, refer to:

Workarounds

If immediate upgrade is not possible, manually patch all torch.load() calls to include weights_only=True
Implement network segmentation to isolate ML inference infrastructure from sensitive systems
Use model scanning tools to detect potentially malicious pickle payloads before loading
Consider using safetensors format instead of pickle-based checkpoints where supported

bash

# Upgrade vLLM to patched version
pip install --upgrade vllm>=0.7.0

# Verify installed version
pip show vllm | grep Version

CVE-2025-24357: vLLM Library RCE Vulnerability

CVE-2025-24357 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2025-24357

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2025-24357

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2025-24357

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform

CVE-2025-24357: vLLM Library RCE Vulnerability

CVE-2025-24357 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2025-24357

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2025-24357

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2025-24357

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform