CVE-2025-24357 Overview
CVE-2025-24357 is an insecure deserialization vulnerability in vLLM, a popular library for Large Language Model (LLM) inference and serving. The vulnerability exists in the hf_model_weights_iterator function within vllm/model_executor/weight_utils.py, which uses PyTorch's torch.load function with the weights_only parameter set to False by default. When loading model checkpoints downloaded from HuggingFace, this unsafe configuration allows malicious pickle data to execute arbitrary code during the unpickling process.
Critical Impact
Attackers can achieve remote code execution by crafting malicious model checkpoints that execute arbitrary code when loaded by vLLM, potentially compromising AI/ML infrastructure and sensitive training data.
Affected Products
- vLLM versions prior to v0.7.0
- vLLM model inference and serving deployments using HuggingFace model checkpoints
- Systems utilizing torch.load through vLLM's weight loading utilities
Discovery Timeline
- 2025-01-27 - CVE CVE-2025-24357 published to NVD
- 2025-06-27 - Last updated in NVD database
Technical Details for CVE-2025-24357
Vulnerability Analysis
This vulnerability is classified as CWE-502 (Deserialization of Untrusted Data). The core issue stems from Python's pickle serialization format, which is inherently unsafe when handling untrusted data. PyTorch model checkpoints are stored as pickle files, and when torch.load is called without weights_only=True, it allows arbitrary Python objects to be deserialized and instantiated, including objects with malicious __reduce__ methods that execute code during unpickling.
The attack surface is particularly concerning in vLLM's context because model checkpoints are typically downloaded from external sources like HuggingFace Hub. If an attacker can publish a malicious model or compromise an existing one, any vLLM deployment loading that checkpoint would execute the attacker's code with the privileges of the vLLM process.
Root Cause
The root cause is the use of torch.load() with the default weights_only=False parameter when loading model weights from potentially untrusted sources. According to the PyTorch Documentation for torch.load, pickle is inherently unsafe and can execute arbitrary code during unpickling. The fix requires explicitly setting weights_only=True to restrict deserialization to only tensor data, primitive types, and safe containers.
Attack Vector
The attack is network-based, requiring user interaction to load a malicious model checkpoint. An attacker would need to:
- Create or compromise a model repository on HuggingFace or similar platform
- Embed malicious pickle payloads in the model checkpoint files (.pt, .bin files)
- Wait for victims to download and load the malicious checkpoint using vLLM
- Upon loading, the malicious payload executes during torch.load() unpickling
The following patches from the official security fix demonstrate the mitigation:
Patch in vllm/assets/image.py:
"""
image_path = get_vllm_public_assets(filename=f"{self.name}.pt",
s3_prefix=VLM_IMAGES_DIR)
- return torch.load(image_path, map_location="cpu")
+ return torch.load(image_path, map_location="cpu", weights_only=True)
Source: GitHub Commit Update
Patch in vllm/lora/models.py:
new_embeddings_tensor_path)
elif os.path.isfile(new_embeddings_bin_file_path):
embeddings = torch.load(new_embeddings_bin_file_path,
- map_location=device)
+ map_location=device,
+ weights_only=True)
return cls.from_lora_tensors(
lora_model_id=get_lora_id()
Source: GitHub Commit Update
Detection Methods for CVE-2025-24357
Indicators of Compromise
- Unexpected process spawning from Python/vLLM processes during model loading operations
- Network connections initiated during model checkpoint deserialization
- Unusual file system activity or modifications during vLLM model initialization
- Suspicious pickle files in model cache directories containing non-standard Python objects
Detection Strategies
- Audit vLLM deployments for versions prior to v0.7.0 using dependency scanning tools
- Monitor for torch.load() calls without weights_only=True parameter in application code
- Implement file integrity monitoring on model checkpoint directories
- Use runtime application security monitoring to detect pickle deserialization attacks
Monitoring Recommendations
- Enable verbose logging for model loading operations in vLLM deployments
- Monitor network egress from ML inference servers for unexpected connections
- Implement behavioral analysis for processes executing under vLLM service accounts
- Configure alerts for suspicious Python process activity during model initialization phases
How to Mitigate CVE-2025-24357
Immediate Actions Required
- Upgrade vLLM to version v0.7.0 or later immediately
- Audit all custom code for torch.load() calls and ensure weights_only=True is set
- Review and verify the integrity of all model checkpoints in use
- Restrict model downloads to trusted and verified sources only
Patch Information
The vulnerability is fixed in vLLM version v0.7.0. The patch adds weights_only=True to all torch.load() calls throughout the codebase. For detailed patch information, refer to:
Workarounds
- If immediate upgrade is not possible, manually patch all torch.load() calls to include weights_only=True
- Implement network segmentation to isolate ML inference infrastructure from sensitive systems
- Use model scanning tools to detect potentially malicious pickle payloads before loading
- Consider using safetensors format instead of pickle-based checkpoints where supported
# Upgrade vLLM to patched version
pip install --upgrade vllm>=0.7.0
# Verify installed version
pip show vllm | grep Version
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


