CVE-2026-24163: Nvidia TensorRT LLM RCE Vulnerability

CVE-2026-24163 Overview

CVE-2026-24163 is an unsafe deserialization vulnerability in NVIDIA TensorRT-LLM (TRT-LLM) affecting the Remote Procedure Call (RPC) testing component. The flaw is classified under [CWE-502: Deserialization of Untrusted Data]. An attacker can deliver malicious serialized data to the RPC interface, triggering deserialization without proper validation. Successful exploitation can lead to remote code execution, denial of service, data tampering, and information disclosure on systems running vulnerable versions of TensorRT-LLM. The vulnerability is network-exploitable, requires no authentication, and requires no user interaction, making it accessible to unauthenticated remote attackers who can reach the affected RPC endpoint.

Critical Impact
Unauthenticated remote attackers can achieve arbitrary code execution on hosts running NVIDIA TensorRT-LLM by sending crafted serialized payloads to the RPC testing interface.

Affected Products

NVIDIA TensorRT-LLM (all supported platforms)
Component: nvidia:tensorrt_llm RPC testing module
Vendor: NVIDIA

Discovery Timeline

2026-05-20 - CVE-2026-24163 published to NVD
2026-05-20 - Last updated in NVD database
2026-05-20 - NVIDIA security advisory published (NVIDIA Support Document 5805)

Technical Details for CVE-2026-24163

Vulnerability Analysis

The vulnerability resides in the RPC testing functionality of NVIDIA TensorRT-LLM. TensorRT-LLM is NVIDIA's library for optimizing large language model (LLM) inference on NVIDIA GPUs. The affected component deserializes data received over the network without verifying its integrity or restricting allowed object types. When untrusted serialized data is processed, attacker-controlled objects can be instantiated within the application process. NVIDIA states that a successful exploit may result in code execution, denial of service, data tampering, and information disclosure. Because the attack vector is network-based and no privileges or user interaction are required, an attacker only needs network reachability to a vulnerable RPC endpoint to attempt exploitation.

Root Cause

The root cause is unsafe deserialization of untrusted input within the RPC testing path. The deserializer reconstructs arbitrary Python objects (or equivalent serialized structures) without enforcing an allowlist, signature verification, or type constraints. This violates secure deserialization practice and matches the [CWE-502] pattern, in which constructors, magic methods, or reduction hooks of attacker-supplied classes execute during the deserialization process.

Attack Vector

An attacker sends a crafted serialized payload to the exposed RPC testing interface of a TensorRT-LLM deployment. When the service deserializes the payload, embedded gadgets execute attacker-controlled logic in the context of the TensorRT-LLM process. This typically grants the attacker the same privileges as the inference service account, which in many GPU inference deployments runs with elevated permissions and direct access to model weights, training data, and adjacent infrastructure.

No verified public proof-of-concept code is available at this time. Refer to the NVIDIA Support Document and the NVD entry for CVE-2026-24163 for authoritative technical details.

Detection Methods for CVE-2026-24163

Indicators of Compromise

Unexpected child processes spawned by the TensorRT-LLM service account, particularly shells, interpreters (python, bash, sh), or network utilities (curl, wget, nc).
Outbound network connections from TensorRT-LLM hosts to unfamiliar external IP addresses or domains immediately following RPC traffic.
Crashes, restarts, or anomalous memory allocations in the TensorRT-LLM process correlated with inbound RPC requests.
Modification or exfiltration of model artifacts, configuration files, or GPU workload data on inference hosts.

Detection Strategies

Monitor RPC endpoints exposed by TensorRT-LLM for inbound traffic from unauthorized sources or networks.
Inspect process trees for the inference service and alert on deserialization-related child process creation.
Apply behavioral identification rules for unsafe deserialization patterns, including unexpected interpreter invocation from long-running service processes.
Correlate GPU host telemetry with network flow data to detect post-exploitation lateral movement.

Monitoring Recommendations

Enable verbose logging on the TensorRT-LLM RPC interface and forward logs to a centralized analytics platform.
Baseline normal RPC client identities and payload sizes, then alert on deviations.
Track file integrity for TensorRT-LLM binaries, configuration files, and model directories.
Continuously monitor for new CVEs and advisories from NVIDIA covering the TensorRT-LLM component.

How to Mitigate CVE-2026-24163

Immediate Actions Required

Apply the NVIDIA security update for TensorRT-LLM as described in the NVIDIA Support Document.
Restrict network access to TensorRT-LLM RPC endpoints to trusted management networks only.
Disable the RPC testing functionality in production deployments where it is not required.
Audit running TensorRT-LLM instances for indicators of prior exploitation, focusing on anomalous process and network activity.

Patch Information

NVIDIA has issued guidance and fixed versions through its product security advisory. Refer to the NVIDIA Support Document 5805 for the list of fixed TensorRT-LLM releases and upgrade instructions. Operators should upgrade to the vendor-specified patched version on every host running TensorRT-LLM.

Workarounds

Block external access to TensorRT-LLM RPC ports at perimeter firewalls and host-based firewalls.
Run TensorRT-LLM under a dedicated, least-privileged service account with no interactive login.
Place inference workloads behind authenticated, mutually authenticated TLS (mTLS) proxies to prevent unauthenticated RPC access.
Isolate GPU inference hosts in a segmented network zone with strict egress controls.

bash

# Configuration example: restrict TensorRT-LLM RPC port access with iptables
# Replace <RPC_PORT> with the actual port and <TRUSTED_CIDR> with your management subnet
iptables -A INPUT -p tcp --dport <RPC_PORT> -s <TRUSTED_CIDR> -j ACCEPT
iptables -A INPUT -p tcp --dport <RPC_PORT> -j DROP