CVE-2026-24142: Nvidia TensorRT LLM RCE Vulnerability

CVE-2026-24142 Overview

CVE-2026-24142 is an insecure deserialization vulnerability in NVIDIA TensorRT-LLM (TRT-LLM) affecting all supported platforms. The flaw stems from unsafe handling of serialized data and serialized handles within the library. Attackers can exploit the issue over the network without authentication or user interaction. Successful exploitation may lead to remote code execution, data tampering, and information disclosure on systems running affected versions of TensorRT-LLM. The vulnerability is tracked under CWE-502: Deserialization of Untrusted Data.

Critical Impact
Unauthenticated network attackers can achieve code execution, data tampering, and information disclosure on hosts running NVIDIA TensorRT-LLM.

Affected Products

NVIDIA TensorRT-LLM (TRT-LLM) - all platforms
nvidia:tensorrt_llm package distributions
Workloads embedding TensorRT-LLM for large language model inference

Discovery Timeline

2026-05-20 - CVE-2026-24142 published to the National Vulnerability Database
2026-05-21 - Last updated in NVD database

Technical Details for CVE-2026-24142

Vulnerability Analysis

The vulnerability resides in TensorRT-LLM's deserialization logic, where the library reconstructs objects and handles from serialized input without sufficient validation. When TRT-LLM processes attacker-controlled serialized payloads, it can instantiate unsafe object types or invoke dangerous methods during reconstruction. This class of flaw, classified as CWE-502, commonly yields arbitrary code execution in the process context of the inference service. Because TRT-LLM is typically deployed on GPU servers hosting model inference endpoints, exploitation grants attackers access to model weights, inference data, and adjacent network resources. NVIDIA has acknowledged the issue and published guidance through the NVIDIA Support Article.

Root Cause

The root cause is the trust placed in serialized input streams and serialized handles consumed by TensorRT-LLM. The library deserializes data without enforcing type allow-lists, integrity checks, or cryptographic validation. Any process or client able to deliver a crafted serialized payload to a TRT-LLM consumer can influence object construction during deserialization.

Attack Vector

The attack vector is network-based and requires no authentication or user interaction. An attacker submits a malicious serialized blob to an exposed TensorRT-LLM endpoint, model-loading interface, or any application layer that forwards untrusted data into TRT-LLM deserialization routines. Once parsed, the payload triggers code execution, modifies in-memory state, or exfiltrates sensitive data.

No verified public proof-of-concept exploit is available for CVE-2026-24142 at the time of publication. Refer to the NVD entry and the CVE.org record for additional technical references.

Detection Methods for CVE-2026-24142

Indicators of Compromise

Unexpected child processes spawned by TensorRT-LLM inference workers or Python processes loading tensorrt_llm modules
Outbound network connections from GPU inference hosts to unrecognized destinations
Anomalous file writes or model artifact modifications in TRT-LLM working directories
Crashes or stack traces referencing TRT-LLM deserialization functions in application logs

Detection Strategies

Monitor process lineage of inference services for shells, interpreters, or scripting binaries launched as children of TRT-LLM hosts
Inspect network traffic for serialized Python or C++ object signatures arriving at inference API endpoints
Apply file integrity monitoring to model and configuration directories used by TensorRT-LLM
Correlate authentication-free API requests with subsequent process or file system anomalies on the same host

Monitoring Recommendations

Enable verbose audit logging on all TRT-LLM inference endpoints and forward logs to a centralized analytics platform
Baseline normal GPU and CPU process activity on inference servers to surface deviations
Alert on any deserialization errors or unhandled exceptions originating from tensorrt_llm modules

How to Mitigate CVE-2026-24142

Immediate Actions Required

Apply the fixed TensorRT-LLM release identified in the NVIDIA Support Article as soon as it is available in your environment
Restrict network access to TensorRT-LLM inference endpoints using firewalls, service meshes, or private networking
Require authentication and authorization on any application layer that forwards data into TRT-LLM
Audit existing deployments for exposure of TRT-LLM ports or model-loading interfaces to untrusted networks

Patch Information

NVIDIA has published remediation guidance in the official security bulletin at the NVIDIA Support Article (Answer ID 5805). Administrators should upgrade to the fixed TensorRT-LLM version specified in that advisory and validate that no downstream applications continue to ship vulnerable copies of the library.

Workarounds

Place TensorRT-LLM behind an authenticated API gateway that rejects unexpected content types and serialized payloads
Run inference workers in isolated containers with minimal filesystem and network privileges to limit blast radius
Disable or gate any feature that accepts externally supplied serialized model artifacts or handles until patched
Validate and sign model artifacts before loading them into TensorRT-LLM

bash

# Example network restriction using iptables on a TRT-LLM inference host
# Allow inference traffic only from a trusted application tier
iptables -A INPUT -p tcp --dport 8000 -s 10.0.10.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 8000 -j DROP