CVE-2026-24160: Nvidia TensorRT LLM DOS Vulnerability

CVE-2026-24160 Overview

CVE-2026-24160 affects NVIDIA TensorRT-LLM (TRT-LLM) on all supported platforms. The vulnerability stems from an unchecked return value that leads to a null pointer dereference [CWE-690]. A local attacker can trigger this condition to crash the TensorRT-LLM process, resulting in denial of service. Exploitation requires local access and user interaction, which limits the practical attack surface. The flaw does not expose data confidentiality or integrity, but availability impact is high because the affected inference runtime terminates abnormally. NVIDIA published an advisory and a corresponding fix through its support portal.

Critical Impact
Successful exploitation crashes the TensorRT-LLM runtime, disrupting large language model inference workloads on affected NVIDIA platforms.

Affected Products

NVIDIA TensorRT-LLM (all platforms)
nvidia:tensorrt_llm component across supported releases
AI inference workloads built on NVIDIA TRT-LLM

Discovery Timeline

2026-05-20 - CVE-2026-24160 published to the National Vulnerability Database
2026-05-20 - Last updated in NVD database

Technical Details for CVE-2026-24160

Vulnerability Analysis

The defect resides in NVIDIA TensorRT-LLM, a library that optimizes and executes large language model inference on NVIDIA GPUs. The code path invokes a function that returns a pointer but does not validate the return value before dereferencing it. When the underlying call fails and returns a null pointer, the subsequent dereference triggers a segmentation fault. The result is an abrupt termination of the inference process.

The Common Weakness Enumeration assigns this issue to [CWE-690]: Unchecked Return Value to NULL Pointer Dereference. The attack vector is local, with low complexity, no privileges required, and user interaction needed to deliver the malformed input or model artifact. Confidentiality and integrity are unaffected, while availability impact is high.

Root Cause

The root cause is missing validation logic between a function returning a pointer and the code that consumes that pointer. The caller assumes the allocation or lookup always succeeds. Under specific input conditions reachable by a local user, the call returns null, and the dereference faults the process.

Attack Vector

An attacker with local access loads a crafted model, configuration, or input into a TensorRT-LLM workflow. User interaction is required, meaning the victim must execute or load the attacker-supplied artifact within the TRT-LLM runtime. No elevated privileges are needed. The exploit yields denial of service rather than code execution. Because no proof-of-concept code or public exploit is available, no exploitation code example is included in this article. Refer to the NVIDIA Support Article for vendor-supplied technical detail.

Detection Methods for CVE-2026-24160

Indicators of Compromise

Unexpected segmentation faults or SIGSEGV signals terminating TensorRT-LLM processes
Core dumps generated by TRT-LLM inference workers immediately after loading a user-supplied model or input batch
Repeated process restarts of the inference service correlated with specific input artifacts

Detection Strategies

Monitor host telemetry for abnormal termination of TensorRT-LLM processes and correlate with the user account that initiated the workload
Inspect logs from AI inference orchestration platforms for crash loops tied to specific model files or prompt payloads
Audit local users with access to TRT-LLM execution environments and review recently loaded model artifacts

Monitoring Recommendations

Enable crash reporting and core dump collection on hosts running TensorRT-LLM
Forward process termination events and GPU runtime errors to a centralized logging or SIEM platform
Alert on repeated failures of the same inference worker process within short time windows

How to Mitigate CVE-2026-24160

Immediate Actions Required

Apply the NVIDIA-supplied update for TensorRT-LLM referenced in the vendor advisory
Restrict local access to systems running TRT-LLM to trusted users and service accounts only
Validate the provenance of all model files and inference inputs before loading them into TRT-LLM

Patch Information

NVIDIA has published remediation guidance and patched builds. Consult the NVIDIA Support Article for affected versions and fixed releases. Additional references are available at the NVD CVE-2026-24160 Details and the CVE.org Record CVE-2026-24160.

Workarounds

Limit which local users can submit models or inputs to TRT-LLM until patching is complete
Run TensorRT-LLM workers under process supervision so crashes are contained and automatically restarted without service-wide impact
Isolate inference workloads in dedicated containers or namespaces to reduce blast radius from a crashed process