CVE-2026-24188: NVIDIA TensorRT Buffer Overflow Flaw

CVE-2026-24188 Overview

NVIDIA TensorRT contains an out-of-bounds write vulnerability that an attacker can exploit over the network without authentication or user interaction. Successful exploitation can lead to data tampering and limited availability impact within affected TensorRT deployments. The flaw is categorized under CWE-787 (Out-of-bounds Write), a class of memory corruption issues that frequently enables integrity compromise of inference workloads and model data. Organizations running TensorRT for AI inference in production should treat this as a high-priority patching item.

Critical Impact
Remote, unauthenticated attackers can trigger an out-of-bounds write in NVIDIA TensorRT, leading to data tampering and partial availability loss in AI inference workloads.

Affected Products

NVIDIA TensorRT (see NVIDIA Support Answer 5836 for specific affected versions)
AI inference deployments using vulnerable TensorRT runtime libraries
Applications and services that embed TensorRT for GPU-accelerated model execution

Discovery Timeline

2026-05-20 - CVE-2026-24188 published to NVD
2026-05-20 - Last updated in NVD database

Technical Details for CVE-2026-24188

Vulnerability Analysis

The vulnerability resides in NVIDIA TensorRT, a high-performance deep learning inference library used to optimize and run neural network models on NVIDIA GPUs. An attacker can submit crafted input that causes TensorRT to write data outside the bounds of an allocated buffer. The CVSS vector indicates a network-reachable attack surface, no required privileges, and no user interaction, with high integrity impact and low availability impact.

Because TensorRT is commonly deployed as part of inference servers and AI pipelines, the out-of-bounds write can corrupt adjacent memory regions used by model weights, computation graphs, or runtime metadata. This can result in altered inference outputs, manipulated model behavior, or degraded service stability.

Root Cause

The root cause is an out-of-bounds write condition [CWE-787], where TensorRT fails to correctly validate the size or offset of a write operation against the allocated buffer. When attacker-controlled data drives the write index or length, memory beyond the intended buffer boundary is modified. NVIDIA has not publicly disclosed the specific component or function involved beyond what is documented in NVIDIA Support Answer 5836.

Attack Vector

An attacker reaches the vulnerable code path over the network by submitting a malicious input to a service that uses TensorRT for inference. No authentication or user interaction is required. The attacker influences the write operation through crafted model artifacts, tensor inputs, or serialized engine data accepted by the target application. Refer to the NVD CVE-2026-24188 Detail and CVE.org Record for current technical references.

Detection Methods for CVE-2026-24188

Indicators of Compromise

Unexpected crashes, segmentation faults, or restarts of processes hosting the TensorRT runtime
Inference outputs that deviate from expected baselines, indicating possible tensor or weight tampering
Anomalous network requests to inference endpoints carrying oversized or malformed tensor payloads

Detection Strategies

Inventory hosts and containers running TensorRT and correlate versions against the fixed releases listed in the NVIDIA advisory
Monitor process telemetry for memory access violations, abnormal heap behavior, or unexpected child process activity from inference workloads
Inspect inference API gateways for malformed or oversized model inputs and reject requests that violate input schemas

Monitoring Recommendations

Centralize logs from inference services, GPU drivers, and container runtimes for correlation and retention
Alert on repeated crashes or restarts of TensorRT-based services within short time windows
Track model output drift and integrity hashes of deployed engine files to detect tampering

How to Mitigate CVE-2026-24188

Immediate Actions Required

Apply the fixed TensorRT version published by NVIDIA in Support Answer 5836 as soon as it is available for your platform
Restrict network exposure of inference endpoints to trusted clients using network segmentation and authenticated gateways
Validate and constrain all tensor inputs, model files, and serialized engines accepted from external sources

Patch Information

NVIDIA has published guidance and fixed versions through the NVIDIA Security Bulletin for TensorRT. Administrators should review the bulletin to identify affected versions and the corresponding patched releases, then upgrade all TensorRT installations, including those bundled inside containers and inference servers.

Workarounds

Place inference services behind an authenticated reverse proxy and enforce strict request size limits until patching is complete
Disable or isolate any inference endpoints that accept untrusted serialized engines or model artifacts
Run TensorRT workloads in dedicated containers with minimum privileges and read-only model storage to limit the blast radius of memory corruption

bash

# Configuration example: enforce input size limit and isolate inference workload
# (Illustrative - adapt to your environment)
docker run --rm \
  --read-only \
  --cap-drop=ALL \
  --network=inference-net \
  -e MAX_REQUEST_BYTES=1048576 \
  nvcr.io/nvidia/tensorrt:<patched-version>