CVE-2026-24215: Nvidia Triton Inference Server DOS Flaw

CVE-2026-24215 Overview

CVE-2026-24215 is a denial of service vulnerability in NVIDIA Triton Inference Server. The flaw resides in the Data Loading Library (DALI) backend, which handles input preprocessing for inference workloads. An unauthenticated remote attacker can submit crafted requests that cause uncontrolled resource consumption [CWE-400]. A successful exploit results in availability loss for hosted AI inference services. The vulnerability is reachable over the network without user interaction or privileges.

Critical Impact
Remote attackers can exhaust server resources and disrupt AI inference services without authentication, degrading or halting production model serving.

Affected Products

NVIDIA Triton Inference Server (DALI backend)
Deployments using nvidia:triton_inference_server container images
AI inference workloads relying on DALI for input preprocessing

Discovery Timeline

2026-05-20 - CVE-2026-24215 published to the National Vulnerability Database
2026-05-20 - Last updated in NVD database

Technical Details for CVE-2026-24215

Vulnerability Analysis

The vulnerability is classified as uncontrolled resource consumption [CWE-400] in the DALI backend of NVIDIA Triton Inference Server. DALI accelerates data loading and preprocessing pipelines for deep learning models served by Triton. The backend processes input tensors and pipeline definitions submitted through Triton's HTTP and gRPC inference APIs.

When a request reaches the DALI backend, the server allocates compute and memory resources to execute the preprocessing graph. The flaw allows an attacker to submit inputs that trigger disproportionate resource use relative to request size. Repeated requests can drive CPU, GPU, or memory to exhaustion, blocking legitimate inference traffic.

The attack surface is network-reachable and requires no authentication when Triton endpoints are exposed. Operators running Triton in shared or internet-facing environments face the highest risk. Confidentiality and integrity remain intact, but availability impact is high.

Root Cause

The root cause is the absence of sufficient bounds and quotas on resource allocation within DALI pipeline processing. The backend trusts client-supplied parameters that influence buffer sizes or computation depth without enforcing protective limits.

Attack Vector

An attacker sends crafted inference requests to a reachable Triton endpoint targeting a model that uses the DALI backend. The attacker does not need credentials or prior access. Sustained or amplified requests exhaust server resources, producing a denial of service condition for all model consumers.

No public proof-of-concept exploit is available, and the vulnerability is not listed in the CISA Known Exploited Vulnerabilities catalog. The EPSS score reflects low observed exploitation activity at disclosure.

Detection Methods for CVE-2026-24215

Indicators of Compromise

Sudden spikes in CPU, GPU, or memory utilization on Triton Inference Server hosts without proportional growth in legitimate traffic
Increased latency, request timeouts, or 5xx responses from Triton HTTP and gRPC endpoints serving DALI-backed models
Repeated inference requests from a narrow set of source IPs targeting DALI pipeline models

Detection Strategies

Monitor Triton Prometheus metrics (nv_inference_request_duration_us, queue depth, GPU memory) for anomalies
Inspect access logs for unusually large or malformed inference payloads sent to DALI endpoints
Correlate process-level resource exhaustion events on inference hosts with inbound request patterns

Monitoring Recommendations

Forward Triton server logs and host telemetry to a centralized analytics platform for baseline and anomaly analysis
Alert on sustained GPU memory saturation or out-of-memory restarts of the tritonserver process
Track per-client request rates against DALI-backed models and flag deviations from expected workloads

How to Mitigate CVE-2026-24215

Immediate Actions Required

Apply the fixed Triton Inference Server release referenced in the NVIDIA Support Response
Restrict network access to Triton endpoints using firewall rules, service mesh policies, or private network placement
Require authentication and TLS at an ingress proxy in front of Triton to limit anonymous request floods

Patch Information

NVIDIA has published guidance and patched versions through the official advisory. Review the NVIDIA Support Response and the NVD entry for CVE-2026-24215 for affected version ranges and upgrade instructions. Update container images that embed Triton with DALI to the patched release.

Workarounds

Disable or remove DALI-backed models from production deployments until the patch is applied
Enforce rate limiting and request-size caps at an API gateway in front of Triton
Run Triton under cgroup or Kubernetes resource limits to contain the blast radius of resource exhaustion

bash

# Example: constrain Triton in Kubernetes to limit DoS blast radius
resources:
  limits:
    cpu: "8"
    memory: "16Gi"
    nvidia.com/gpu: "1"
  requests:
    cpu: "2"
    memory: "4Gi"
# Restrict ingress to trusted sources
networkPolicy:
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              role: trusted-inference-clients