CVE-2026-24215 Overview
CVE-2026-24215 is a denial of service vulnerability in NVIDIA Triton Inference Server. The flaw resides in the Data Loading Library (DALI) backend, which handles input preprocessing for inference workloads. An unauthenticated remote attacker can submit crafted requests that cause uncontrolled resource consumption [CWE-400]. A successful exploit results in availability loss for hosted AI inference services. The vulnerability is reachable over the network without user interaction or privileges.
Critical Impact
Remote attackers can exhaust server resources and disrupt AI inference services without authentication, degrading or halting production model serving.
Affected Products
- NVIDIA Triton Inference Server (DALI backend)
- Deployments using nvidia:triton_inference_server container images
- AI inference workloads relying on DALI for input preprocessing
Discovery Timeline
- 2026-05-20 - CVE-2026-24215 published to the National Vulnerability Database
- 2026-05-20 - Last updated in NVD database
Technical Details for CVE-2026-24215
Vulnerability Analysis
The vulnerability is classified as uncontrolled resource consumption [CWE-400] in the DALI backend of NVIDIA Triton Inference Server. DALI accelerates data loading and preprocessing pipelines for deep learning models served by Triton. The backend processes input tensors and pipeline definitions submitted through Triton's HTTP and gRPC inference APIs.
When a request reaches the DALI backend, the server allocates compute and memory resources to execute the preprocessing graph. The flaw allows an attacker to submit inputs that trigger disproportionate resource use relative to request size. Repeated requests can drive CPU, GPU, or memory to exhaustion, blocking legitimate inference traffic.
The attack surface is network-reachable and requires no authentication when Triton endpoints are exposed. Operators running Triton in shared or internet-facing environments face the highest risk. Confidentiality and integrity remain intact, but availability impact is high.
Root Cause
The root cause is the absence of sufficient bounds and quotas on resource allocation within DALI pipeline processing. The backend trusts client-supplied parameters that influence buffer sizes or computation depth without enforcing protective limits.
Attack Vector
An attacker sends crafted inference requests to a reachable Triton endpoint targeting a model that uses the DALI backend. The attacker does not need credentials or prior access. Sustained or amplified requests exhaust server resources, producing a denial of service condition for all model consumers.
No public proof-of-concept exploit is available, and the vulnerability is not listed in the CISA Known Exploited Vulnerabilities catalog. The EPSS score reflects low observed exploitation activity at disclosure.
Detection Methods for CVE-2026-24215
Indicators of Compromise
- Sudden spikes in CPU, GPU, or memory utilization on Triton Inference Server hosts without proportional growth in legitimate traffic
- Increased latency, request timeouts, or 5xx responses from Triton HTTP and gRPC endpoints serving DALI-backed models
- Repeated inference requests from a narrow set of source IPs targeting DALI pipeline models
Detection Strategies
- Monitor Triton Prometheus metrics (nv_inference_request_duration_us, queue depth, GPU memory) for anomalies
- Inspect access logs for unusually large or malformed inference payloads sent to DALI endpoints
- Correlate process-level resource exhaustion events on inference hosts with inbound request patterns
Monitoring Recommendations
- Forward Triton server logs and host telemetry to a centralized analytics platform for baseline and anomaly analysis
- Alert on sustained GPU memory saturation or out-of-memory restarts of the tritonserver process
- Track per-client request rates against DALI-backed models and flag deviations from expected workloads
How to Mitigate CVE-2026-24215
Immediate Actions Required
- Apply the fixed Triton Inference Server release referenced in the NVIDIA Support Response
- Restrict network access to Triton endpoints using firewall rules, service mesh policies, or private network placement
- Require authentication and TLS at an ingress proxy in front of Triton to limit anonymous request floods
Patch Information
NVIDIA has published guidance and patched versions through the official advisory. Review the NVIDIA Support Response and the NVD entry for CVE-2026-24215 for affected version ranges and upgrade instructions. Update container images that embed Triton with DALI to the patched release.
Workarounds
- Disable or remove DALI-backed models from production deployments until the patch is applied
- Enforce rate limiting and request-size caps at an API gateway in front of Triton
- Run Triton under cgroup or Kubernetes resource limits to contain the blast radius of resource exhaustion
# Example: constrain Triton in Kubernetes to limit DoS blast radius
resources:
limits:
cpu: "8"
memory: "16Gi"
nvidia.com/gpu: "1"
requests:
cpu: "2"
memory: "4Gi"
# Restrict ingress to trusted sources
networkPolicy:
ingress:
- from:
- namespaceSelector:
matchLabels:
role: trusted-inference-clients
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


