CVE-2026-24207: Nvidia Triton Auth Bypass Vulnerability

CVE-2026-24207 Overview

CVE-2026-24207 is an authentication bypass vulnerability in NVIDIA Triton Inference Server. An unauthenticated remote attacker can bypass authentication controls and interact with the inference server as a privileged user. Successful exploitation can lead to remote code execution, privilege escalation, data tampering, denial of service, or information disclosure.

The flaw is classified under CWE-288 (Authentication Bypass Using an Alternate Path or Channel). NVIDIA published the advisory on May 20, 2026, and the issue affects Triton Inference Server deployments exposed over the network.

Critical Impact
Unauthenticated attackers can gain full control of Triton Inference Server instances, enabling model tampering, theft of proprietary AI models, and execution of arbitrary code on the host.

Affected Products

NVIDIA Triton Inference Server (all versions prior to the vendor-fixed release identified in NVIDIA Security Bulletin 5828)
Triton Inference Server deployments running on Linux hosts
AI inference workloads exposed via Triton's HTTP and gRPC endpoints

Discovery Timeline

2026-05-20 - CVE-2026-24207 published to the National Vulnerability Database
2026-05-20 - NVIDIA security advisory published
2026-05-20 - Last updated in NVD database

Technical Details for CVE-2026-24207

Vulnerability Analysis

NVIDIA Triton Inference Server is an open-source serving platform for machine learning models that exposes inference, model management, and health endpoints over HTTP and gRPC. The vulnerability allows an attacker to bypass the authentication layer protecting these endpoints. Once bypassed, an attacker reaches privileged management functions intended only for authorized operators.

The NVIDIA advisory states that exploitation can result in code execution, escalation of privileges, data tampering, denial of service, or information disclosure. In practical terms, an attacker who reaches a network-exposed Triton endpoint can load attacker-controlled models, alter inference outputs, or extract proprietary models hosted on the server.

Root Cause

The root cause is improper enforcement of authentication on a request path or channel within Triton Inference Server, consistent with CWE-288. The vendor advisory does not publicly enumerate the affected code path. The NVD entry confirms that no privileges and no user interaction are required to exploit the issue over the network.

Attack Vector

The attack vector is network-based. Triton Inference Server typically listens on TCP ports 8000 (HTTP), 8001 (gRPC), and 8002 (metrics). An attacker with network reachability to one of these endpoints sends crafted requests that circumvent authentication and invoke privileged actions such as model loading via the model repository API. Because Triton supports custom Python and C++ backends, loading an attacker-controlled model can yield arbitrary code execution in the server process.

No verified public proof-of-concept is available at the time of publication. EPSS data lists the exploit probability at 0.096%.

Detection Methods for CVE-2026-24207

Indicators of Compromise

Unexpected calls to Triton model management endpoints such as POST /v2/repository/models/{model_name}/load or /unload from untrusted source addresses
New or modified model directories appearing in the Triton model_repository path without a corresponding deployment change ticket
Triton server processes spawning unexpected child processes such as sh, bash, python, or outbound network utilities
Anomalous outbound connections from inference hosts to unfamiliar IP addresses or domains

Detection Strategies

Inspect Triton access logs for requests to /v2/repository/* and /v2/models/*/config originating from outside the operations subnet
Alert on inference latency or output distribution drift that may indicate model tampering
Compare deployed model hashes against a signed manifest of approved models on a recurring schedule
Monitor host-level telemetry for process executions spawned by the Triton service account that fall outside known baselines

Monitoring Recommendations

Enable verbose Triton logging with --log-verbose=1 and forward logs to a centralized SIEM
Capture network flow data for ports 8000, 8001, and 8002 and baseline expected client sources
Track file integrity on the model repository directory and configuration files

How to Mitigate CVE-2026-24207

Immediate Actions Required

Apply the fixed Triton Inference Server release identified in NVIDIA Security Bulletin 5828
Restrict network access to Triton HTTP and gRPC ports to trusted client subnets using firewall or Kubernetes NetworkPolicy controls
Audit the model repository for unauthorized model additions or modifications since the server was last patched
Rotate any credentials, API keys, or model artifacts that may have been exposed on affected hosts

Patch Information

NVIDIA released a fixed version of Triton Inference Server addressing CVE-2026-24207. Patch details and download links are documented in the vendor advisory at NVIDIA Answer ID 5828. Organizations should consult the bulletin to identify the specific fixed build matching their deployment channel, then redeploy container images or binaries accordingly.

Workarounds

Place Triton Inference Server behind an authenticating reverse proxy that terminates client traffic and enforces mutual TLS
Disable the model control API by starting Triton with --model-control-mode=none to prevent runtime model loading until the patch is applied
Bind Triton listeners to internal interfaces only and block ports 8000, 8001, and 8002 from external networks
Run Triton in a network-segmented environment with egress filtering to limit post-exploitation activity

bash

# Configuration example: launch Triton with model control disabled and bound to a private interface
tritonserver \
  --model-repository=/models \
  --model-control-mode=none \
  --http-address=127.0.0.1 \
  --grpc-address=127.0.0.1 \
  --log-verbose=1

# Example iptables rule to restrict access to the operations subnet only
iptables -A INPUT -p tcp --dport 8000 -s 10.10.0.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 8000 -j DROP