CVE-2026-24142 Overview
CVE-2026-24142 is an insecure deserialization vulnerability in NVIDIA TensorRT-LLM (TRT-LLM) affecting all supported platforms. The flaw stems from unsafe handling of serialized data and serialized handles within the library. Attackers can exploit the issue over the network without authentication or user interaction. Successful exploitation may lead to remote code execution, data tampering, and information disclosure on systems running affected versions of TensorRT-LLM. The vulnerability is tracked under CWE-502: Deserialization of Untrusted Data.
Critical Impact
Unauthenticated network attackers can achieve code execution, data tampering, and information disclosure on hosts running NVIDIA TensorRT-LLM.
Affected Products
- NVIDIA TensorRT-LLM (TRT-LLM) - all platforms
- nvidia:tensorrt_llm package distributions
- Workloads embedding TensorRT-LLM for large language model inference
Discovery Timeline
- 2026-05-20 - CVE-2026-24142 published to the National Vulnerability Database
- 2026-05-21 - Last updated in NVD database
Technical Details for CVE-2026-24142
Vulnerability Analysis
The vulnerability resides in TensorRT-LLM's deserialization logic, where the library reconstructs objects and handles from serialized input without sufficient validation. When TRT-LLM processes attacker-controlled serialized payloads, it can instantiate unsafe object types or invoke dangerous methods during reconstruction. This class of flaw, classified as CWE-502, commonly yields arbitrary code execution in the process context of the inference service. Because TRT-LLM is typically deployed on GPU servers hosting model inference endpoints, exploitation grants attackers access to model weights, inference data, and adjacent network resources. NVIDIA has acknowledged the issue and published guidance through the NVIDIA Support Article.
Root Cause
The root cause is the trust placed in serialized input streams and serialized handles consumed by TensorRT-LLM. The library deserializes data without enforcing type allow-lists, integrity checks, or cryptographic validation. Any process or client able to deliver a crafted serialized payload to a TRT-LLM consumer can influence object construction during deserialization.
Attack Vector
The attack vector is network-based and requires no authentication or user interaction. An attacker submits a malicious serialized blob to an exposed TensorRT-LLM endpoint, model-loading interface, or any application layer that forwards untrusted data into TRT-LLM deserialization routines. Once parsed, the payload triggers code execution, modifies in-memory state, or exfiltrates sensitive data.
No verified public proof-of-concept exploit is available for CVE-2026-24142 at the time of publication. Refer to the NVD entry and the CVE.org record for additional technical references.
Detection Methods for CVE-2026-24142
Indicators of Compromise
- Unexpected child processes spawned by TensorRT-LLM inference workers or Python processes loading tensorrt_llm modules
- Outbound network connections from GPU inference hosts to unrecognized destinations
- Anomalous file writes or model artifact modifications in TRT-LLM working directories
- Crashes or stack traces referencing TRT-LLM deserialization functions in application logs
Detection Strategies
- Monitor process lineage of inference services for shells, interpreters, or scripting binaries launched as children of TRT-LLM hosts
- Inspect network traffic for serialized Python or C++ object signatures arriving at inference API endpoints
- Apply file integrity monitoring to model and configuration directories used by TensorRT-LLM
- Correlate authentication-free API requests with subsequent process or file system anomalies on the same host
Monitoring Recommendations
- Enable verbose audit logging on all TRT-LLM inference endpoints and forward logs to a centralized analytics platform
- Baseline normal GPU and CPU process activity on inference servers to surface deviations
- Alert on any deserialization errors or unhandled exceptions originating from tensorrt_llm modules
How to Mitigate CVE-2026-24142
Immediate Actions Required
- Apply the fixed TensorRT-LLM release identified in the NVIDIA Support Article as soon as it is available in your environment
- Restrict network access to TensorRT-LLM inference endpoints using firewalls, service meshes, or private networking
- Require authentication and authorization on any application layer that forwards data into TRT-LLM
- Audit existing deployments for exposure of TRT-LLM ports or model-loading interfaces to untrusted networks
Patch Information
NVIDIA has published remediation guidance in the official security bulletin at the NVIDIA Support Article (Answer ID 5805). Administrators should upgrade to the fixed TensorRT-LLM version specified in that advisory and validate that no downstream applications continue to ship vulnerable copies of the library.
Workarounds
- Place TensorRT-LLM behind an authenticated API gateway that rejects unexpected content types and serialized payloads
- Run inference workers in isolated containers with minimal filesystem and network privileges to limit blast radius
- Disable or gate any feature that accepts externally supplied serialized model artifacts or handles until patched
- Validate and sign model artifacts before loading them into TensorRT-LLM
# Example network restriction using iptables on a TRT-LLM inference host
# Allow inference traffic only from a trusted application tier
iptables -A INPUT -p tcp --dport 8000 -s 10.0.10.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 8000 -j DROP
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


