CVE-2026-24188 Overview
NVIDIA TensorRT contains an out-of-bounds write vulnerability that an attacker can exploit over the network without authentication or user interaction. Successful exploitation can lead to data tampering and limited availability impact within affected TensorRT deployments. The flaw is categorized under CWE-787 (Out-of-bounds Write), a class of memory corruption issues that frequently enables integrity compromise of inference workloads and model data. Organizations running TensorRT for AI inference in production should treat this as a high-priority patching item.
Critical Impact
Remote, unauthenticated attackers can trigger an out-of-bounds write in NVIDIA TensorRT, leading to data tampering and partial availability loss in AI inference workloads.
Affected Products
- NVIDIA TensorRT (see NVIDIA Support Answer 5836 for specific affected versions)
- AI inference deployments using vulnerable TensorRT runtime libraries
- Applications and services that embed TensorRT for GPU-accelerated model execution
Discovery Timeline
- 2026-05-20 - CVE-2026-24188 published to NVD
- 2026-05-20 - Last updated in NVD database
Technical Details for CVE-2026-24188
Vulnerability Analysis
The vulnerability resides in NVIDIA TensorRT, a high-performance deep learning inference library used to optimize and run neural network models on NVIDIA GPUs. An attacker can submit crafted input that causes TensorRT to write data outside the bounds of an allocated buffer. The CVSS vector indicates a network-reachable attack surface, no required privileges, and no user interaction, with high integrity impact and low availability impact.
Because TensorRT is commonly deployed as part of inference servers and AI pipelines, the out-of-bounds write can corrupt adjacent memory regions used by model weights, computation graphs, or runtime metadata. This can result in altered inference outputs, manipulated model behavior, or degraded service stability.
Root Cause
The root cause is an out-of-bounds write condition [CWE-787], where TensorRT fails to correctly validate the size or offset of a write operation against the allocated buffer. When attacker-controlled data drives the write index or length, memory beyond the intended buffer boundary is modified. NVIDIA has not publicly disclosed the specific component or function involved beyond what is documented in NVIDIA Support Answer 5836.
Attack Vector
An attacker reaches the vulnerable code path over the network by submitting a malicious input to a service that uses TensorRT for inference. No authentication or user interaction is required. The attacker influences the write operation through crafted model artifacts, tensor inputs, or serialized engine data accepted by the target application. Refer to the NVD CVE-2026-24188 Detail and CVE.org Record for current technical references.
Detection Methods for CVE-2026-24188
Indicators of Compromise
- Unexpected crashes, segmentation faults, or restarts of processes hosting the TensorRT runtime
- Inference outputs that deviate from expected baselines, indicating possible tensor or weight tampering
- Anomalous network requests to inference endpoints carrying oversized or malformed tensor payloads
Detection Strategies
- Inventory hosts and containers running TensorRT and correlate versions against the fixed releases listed in the NVIDIA advisory
- Monitor process telemetry for memory access violations, abnormal heap behavior, or unexpected child process activity from inference workloads
- Inspect inference API gateways for malformed or oversized model inputs and reject requests that violate input schemas
Monitoring Recommendations
- Centralize logs from inference services, GPU drivers, and container runtimes for correlation and retention
- Alert on repeated crashes or restarts of TensorRT-based services within short time windows
- Track model output drift and integrity hashes of deployed engine files to detect tampering
How to Mitigate CVE-2026-24188
Immediate Actions Required
- Apply the fixed TensorRT version published by NVIDIA in Support Answer 5836 as soon as it is available for your platform
- Restrict network exposure of inference endpoints to trusted clients using network segmentation and authenticated gateways
- Validate and constrain all tensor inputs, model files, and serialized engines accepted from external sources
Patch Information
NVIDIA has published guidance and fixed versions through the NVIDIA Security Bulletin for TensorRT. Administrators should review the bulletin to identify affected versions and the corresponding patched releases, then upgrade all TensorRT installations, including those bundled inside containers and inference servers.
Workarounds
- Place inference services behind an authenticated reverse proxy and enforce strict request size limits until patching is complete
- Disable or isolate any inference endpoints that accept untrusted serialized engines or model artifacts
- Run TensorRT workloads in dedicated containers with minimum privileges and read-only model storage to limit the blast radius of memory corruption
# Configuration example: enforce input size limit and isolate inference workload
# (Illustrative - adapt to your environment)
docker run --rm \
--read-only \
--cap-drop=ALL \
--network=inference-net \
-e MAX_REQUEST_BYTES=1048576 \
nvcr.io/nvidia/tensorrt:<patched-version>
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


