CVE-2025-30165 Overview
CVE-2025-30165 is an insecure deserialization vulnerability affecting vLLM, a popular inference and serving engine for large language models. In multi-node vLLM deployments using the V0 engine, the system utilizes ZeroMQ for inter-node communication. Secondary vLLM hosts connect to the primary host using a SUB ZeroMQ socket, and data received on this socket is deserialized using Python's pickle module. This unsafe deserialization practice can be exploited to execute arbitrary code on remote machines within the vLLM deployment cluster.
Critical Impact
Attackers who compromise the primary vLLM host can leverage this vulnerability to execute arbitrary code on all secondary nodes in the deployment. Alternative attack vectors such as ARP cache poisoning can also be used to redirect traffic and deliver malicious payloads without direct access to the primary host.
Affected Products
- vLLM (versions using V0 engine with multi-node tensor parallelism)
- vLLM deployments prior to v0.8.0 (V0 engine enabled by default)
- Multi-node vLLM deployments using tensor parallelism across multiple hosts
Discovery Timeline
- May 6, 2025 - CVE-2025-30165 published to NVD
- July 31, 2025 - Last updated in NVD database
Technical Details for CVE-2025-30165
Vulnerability Analysis
This vulnerability stems from the use of Python's pickle module for deserializing data in inter-node communications within vLLM's distributed architecture. The pickle module is inherently unsafe for deserializing untrusted data because it can execute arbitrary Python code during the deserialization process. When secondary vLLM nodes receive data from what they believe is the primary host, they blindly deserialize it using pickle, creating a critical code execution pathway.
The vulnerability is particularly concerning in distributed AI infrastructure where multiple GPU nodes work together for tensor parallelism. While the attack requires adjacent network access (the attacker must be on the same network segment as the vLLM deployment), the potential for lateral movement across an entire AI compute cluster makes this a significant security risk.
The vLLM maintainers have explicitly chosen not to patch this vulnerability due to its invasive nature and the fact that V0 has been off by default since version 0.8.0. Instead, they recommend network-level mitigations for deployments still using the affected configuration.
Root Cause
The root cause of CVE-2025-30165 is the unsafe use of Python's pickle deserialization on data received from network sockets without proper validation or sanitization. The vulnerable code exists in the shm_broadcast.py module, specifically in the ZeroMQ subscriber implementation that handles inter-node communication. When the secondary nodes connect to the primary host's XPUB socket and receive messages on their SUB socket, the received data is passed directly to pickle.loads() without any integrity verification or origin authentication.
Attack Vector
The attack can be executed through two primary vectors:
Primary Host Compromise: An attacker who gains control of the primary vLLM host can send malicious pickled payloads to all secondary nodes in the cluster. The secondary nodes will deserialize these payloads and execute the embedded malicious code, effectively allowing the attacker to compromise the entire vLLM deployment from a single point of entry.
Network-Level Attack: Without access to the primary host, an attacker on the same network segment can use techniques such as ARP cache poisoning to intercept and redirect ZeroMQ traffic. By impersonating the primary host, the attacker can inject malicious pickled payloads that secondary nodes will execute.
The vulnerability exists in the client-side code (secondary nodes), making it an escalation point that allows an attacker to move laterally through the distributed AI infrastructure.
Detection Methods for CVE-2025-30165
Indicators of Compromise
- Unexpected network connections to ZeroMQ ports from unauthorized IP addresses
- ARP table anomalies indicating potential ARP cache poisoning attacks
- Unusual process execution on vLLM secondary nodes that don't match expected ML workloads
- Network traffic patterns showing modified or injected ZeroMQ messages
Detection Strategies
- Monitor for ARP spoofing attacks on networks hosting vLLM deployments
- Implement network intrusion detection rules for anomalous ZeroMQ traffic patterns
- Deploy endpoint detection to identify unexpected code execution on vLLM nodes
- Audit vLLM configuration to identify deployments using V0 engine with multi-node tensor parallelism
Monitoring Recommendations
- Enable logging of all inter-node ZeroMQ communications in vLLM deployments
- Implement network segmentation monitoring for vLLM cluster traffic
- Deploy SentinelOne agents on all vLLM nodes for real-time threat detection and response
- Monitor for process injection and suspicious Python execution patterns on GPU compute nodes
How to Mitigate CVE-2025-30165
Immediate Actions Required
- Upgrade to vLLM v0.8.0 or later and ensure V1 engine is enabled (default configuration)
- Audit existing deployments to identify any using V0 engine with multi-node tensor parallelism
- Implement strict network segmentation to isolate vLLM cluster traffic from untrusted networks
- Deploy network-level access controls to restrict ZeroMQ port access to authorized nodes only
Patch Information
The vLLM maintainers have decided not to release a patch for this vulnerability due to the invasive nature of the required fix and the fact that the V0 engine has been disabled by default since version 0.8.0. Users are advised to migrate to the V1 engine, which is not affected by this vulnerability. For detailed information, refer to the GitHub Security Advisory.
Workarounds
- Migrate from V0 to V1 engine if possible (V1 is the default since v0.8.0 and is not affected)
- Deploy vLLM clusters on isolated, secure network segments with no untrusted access
- Implement VLANs and firewall rules to prevent ARP spoofing and unauthorized network access
- Use IPsec or other network-layer encryption for inter-node communications
- Consider single-node deployments where multi-node tensor parallelism is not strictly required
# Network isolation example - restrict ZeroMQ ports to trusted nodes only
# Example iptables rules for secondary vLLM nodes
iptables -A INPUT -p tcp --dport 5555 -s PRIMARY_HOST_IP -j ACCEPT
iptables -A INPUT -p tcp --dport 5555 -j DROP
# Verify V1 engine is enabled in vLLM configuration
# Set environment variable to explicitly use V1 engine
export VLLM_USE_V1=1
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


