CVE-2026-24213: Nvidia Triton Inference Server RCE Flaw

CVE-2026-24213 Overview

CVE-2026-24213 is an out-of-bounds read vulnerability in the NVIDIA Triton Inference Server DALI backend. The flaw allows a remote, unauthenticated attacker to read memory outside allocated buffer boundaries by sending crafted requests over the network. Successful exploitation can lead to code execution, data tampering, denial of service, or information disclosure. The vulnerability is tracked under CWE-125 and affects deployments using the Data Loading Library (DALI) backend for GPU-accelerated inference. NVIDIA has published remediation guidance through NVIDIA Support Answer #5828.

Critical Impact
Remote, unauthenticated attackers can trigger out-of-bounds memory reads in the DALI backend, potentially achieving code execution or leaking sensitive inference data.

Affected Products

NVIDIA Triton Inference Server (DALI backend)
Deployments exposing the Triton inference endpoint over the network
AI/ML inference workloads using the DALI data loading pipeline

Discovery Timeline

2026-05-20 - CVE-2026-24213 published to the National Vulnerability Database
2026-05-20 - Last updated in NVD database

Technical Details for CVE-2026-24213

Vulnerability Analysis

The vulnerability resides in the DALI backend of NVIDIA Triton Inference Server. DALI is a GPU-accelerated library that handles data preprocessing pipelines feeding inference models. The backend fails to enforce proper boundary checks when reading buffer contents during request processing, classified as [CWE-125: Out-of-bounds Read].

An attacker reachable over the network can submit malformed inference requests targeting the DALI backend. The server processes attacker-controlled input without validating that read operations remain within allocated memory regions. This can return data from adjacent memory or destabilize the server process. Per NVIDIA's advisory, exploitation paths include code execution, tampering with inference data, denial of service, and disclosure of sensitive memory contents.

The EPSS probability stands at 0.036% with a 10.76 percentile, indicating no observed exploitation activity at publication. No public proof-of-concept exploit is currently available.

Root Cause

The root cause is missing or insufficient bounds validation when the DALI backend deserializes and processes inference input. When the server reads from an input buffer beyond its allocated length, it returns adjacent process memory or triggers a memory access fault. Inference servers commonly handle complex tensor shapes and serialized payloads, expanding the input parsing surface where boundary checks must be applied consistently.

Attack Vector

The attack vector is network-based and requires no authentication or user interaction. An attacker sends a crafted inference request to a Triton endpoint configured with the DALI backend. The malformed payload triggers the out-of-bounds read during request handling. Production Triton deployments often run with GPU access and elevated privileges within ML infrastructure, increasing the value of any leaked memory or achieved code execution. Refer to the NVIDIA Security Bulletin for affected version details.

Detection Methods for CVE-2026-24213

Indicators of Compromise

Unexpected crashes, restarts, or segmentation faults in the tritonserver process
Anomalous inference requests targeting DALI backend models with malformed tensor shapes or oversized payloads
Outbound traffic from inference hosts to untrusted destinations following suspicious request patterns
Elevated error rates in Triton server logs referencing DALI backend deserialization failures

Detection Strategies

Monitor Triton HTTP and gRPC endpoints for malformed inference payloads and abnormally sized input tensors
Inspect server logs for repeated DALI backend errors, parser exceptions, or memory access violations
Apply behavioral analytics to identify unauthorized network access to inference endpoints exposed beyond trusted ML infrastructure
Correlate process crashes with preceding network requests to identify exploitation attempts

Monitoring Recommendations

Enable verbose logging on the Triton Inference Server and forward logs to a centralized SIEM
Track process integrity and memory usage of tritonserver processes on GPU hosts
Alert on new outbound connections initiated by inference server processes that typically only respond to requests
Audit model repositories and DALI pipeline configurations for unauthorized modifications

How to Mitigate CVE-2026-24213

Immediate Actions Required

Apply the patched Triton Inference Server release referenced in NVIDIA Support Answer #5828 without delay
Restrict network exposure of Triton endpoints to authenticated internal clients only
Inventory all Triton deployments and confirm whether the DALI backend is enabled
Review inference server logs for prior signs of exploitation attempts

Patch Information

NVIDIA has published remediation guidance in NVIDIA Support Answer #5828. Administrators should upgrade Triton Inference Server to the fixed version specified in the advisory. Additional details are available in the NVD entry for CVE-2026-24213 and the CVE.org record.

Workarounds

Disable the DALI backend in tritonserver configuration if it is not required for production inference workloads
Place Triton endpoints behind an authenticating reverse proxy or service mesh enforcing mutual TLS
Apply network segmentation and firewall rules to limit access to inference hosts to known client IP ranges
Run the Triton server process with the least privileges necessary and isolate it within a dedicated container or namespace

bash

# Configuration example: launch Triton without the DALI backend
tritonserver \
  --model-repository=/models \
  --backend-config=dali,enabled=false \
  --allow-http=true \
  --http-address=127.0.0.1 \
  --http-port=8000