CVE-2025-33255: Nvidia TensorRT LLM RCE Vulnerability

CVE-2025-33255 Overview

CVE-2025-33255 is an unsafe deserialization vulnerability in the NVIDIA TensorRT-LLM (TRT-LLM) Message Passing Interface (MPI) server. The flaw affects all platforms running TensorRT-LLM and allows network-based attackers to send crafted serialized data to the MPI server without authentication. Successful exploitation can lead to remote code execution, denial of service, data tampering, and information disclosure. The vulnerability is classified under CWE-502: Deserialization of Untrusted Data.

Critical Impact
Unauthenticated network attackers can execute arbitrary code on systems hosting the TensorRT-LLM MPI server, compromising AI inference infrastructure and exposed model data.

Affected Products

NVIDIA TensorRT-LLM (all platforms)
AI inference deployments using the TRT-LLM MPI server component
Multi-node large language model serving environments built on TensorRT-LLM

Discovery Timeline

2026-05-20 - CVE-2025-33255 published to NVD
2026-05-21 - Last updated in NVD database

Technical Details for CVE-2025-33255

Vulnerability Analysis

The vulnerability resides in the MPI server component of NVIDIA TensorRT-LLM, which coordinates distributed inference workloads across multiple GPUs and nodes. The MPI server accepts serialized objects over the network and deserializes them without validating their origin or integrity. An attacker who can reach the MPI server endpoint can supply a malicious serialized payload that triggers code execution during the deserialization process.

The attack requires no authentication and no user interaction. Because TRT-LLM is commonly deployed in production AI inference clusters, exploitation can compromise hosted models, training data, customer prompts, and adjacent infrastructure. Outcomes documented by NVIDIA include arbitrary code execution, denial of service, data tampering, and information disclosure.

Root Cause

The root cause is unsafe deserialization of untrusted input by the MPI server [CWE-502]. The server reconstructs Python or framework objects from network-supplied byte streams without restricting permitted classes or verifying message authenticity. Deserialization routines such as pickle.loads invoke object constructors and __reduce__ methods during unmarshalling, giving attacker-controlled data direct access to execution primitives.

Attack Vector

An attacker reaches the MPI server over the network and submits a crafted serialized message. When the server deserializes the payload, embedded gadget chains execute attacker-supplied logic in the context of the inference process. From that foothold, an attacker can pivot to other cluster nodes, exfiltrate model weights, alter inference outputs, or terminate the service. See the NVIDIA Security Bulletin for vendor technical details.

Detection Methods for CVE-2025-33255

Indicators of Compromise

Unexpected child processes spawned by TensorRT-LLM or MPI worker processes, such as shells, python subprocesses, or network utilities.
Inbound connections to MPI server ports from hosts outside the trusted inference cluster subnet.
Anomalous outbound connections from inference nodes to external IP addresses or cloud metadata endpoints.
Modifications to TensorRT-LLM model files, configuration files, or Python site-packages directories outside change windows.

Detection Strategies

Monitor MPI server processes for deserialization activity followed by execve calls to non-inference binaries.
Apply network detection rules that flag unauthenticated connections to MPI ports from outside trusted subnets.
Inspect Python process memory and runtime telemetry for evidence of pickle payloads originating from network sockets.
Correlate inference workload anomalies with process creation events on GPU host nodes.

Monitoring Recommendations

Forward host telemetry from inference nodes into a centralized data lake for cross-node correlation and retrospective hunting.
Track baseline process trees of TensorRT-LLM workloads and alert on deviations such as new outbound connections or file writes.
Audit firewall logs for any external traffic reaching MPI server ports, which should never be exposed beyond the cluster fabric.

How to Mitigate CVE-2025-33255

Immediate Actions Required

Apply the NVIDIA security update for TensorRT-LLM referenced in the NVIDIA Customer Support advisory.
Restrict network access to the MPI server so it is reachable only from trusted inference nodes within an isolated cluster network.
Audit existing TensorRT-LLM deployments for exposure of MPI ports to untrusted networks or shared tenant environments.
Rotate any credentials, API keys, or model assets that resided on inference hosts during the exposure window.

Patch Information

NVIDIA has published remediation guidance in advisory ID 5805. Administrators should consult the NVIDIA Security Bulletin for fixed versions and upgrade instructions for TensorRT-LLM. Refer to the NVD entry for CVE-2025-33255 and the CVE.org record for additional references.

Workarounds

Bind the MPI server to loopback or private cluster interfaces only, and block external access at the host firewall.
Place TensorRT-LLM inference clusters inside dedicated network segments with strict ingress and egress controls.
Require mutual TLS or VPN tunneling between inference nodes when MPI communication crosses host boundaries.
Run TensorRT-LLM processes under least-privileged service accounts to limit post-exploitation impact.

bash

# Configuration example: restrict MPI server exposure with host firewall rules
# Allow MPI traffic only from trusted inference subnet, drop all other inbound
sudo iptables -A INPUT -p tcp --dport 1024:65535 -s 10.0.10.0/24 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 1024:65535 -j DROP