CVE-2025-23336: Nvidia Triton Inference Server DOS Attack

CVE-2025-23336 Overview

CVE-2025-23336 is a denial of service vulnerability in NVIDIA Triton Inference Server for both Windows and Linux. An attacker can trigger the condition by loading a misconfigured model into the server. Successful exploitation causes the inference service to become unavailable, disrupting AI workloads that depend on it.

The issue is classified under improper input validation [CWE-20] and carries a CVSS 3.1 base score of 7.5. The flaw is network-reachable, requires no privileges, and needs no user interaction, but only impacts availability.

Critical Impact
Remote attackers can disrupt AI inference workloads by submitting a malformed model configuration to the Triton Inference Server.

Affected Products

NVIDIA Triton Inference Server (Windows)
NVIDIA Triton Inference Server (Linux)
Deployments running on supported Linux kernel and Microsoft Windows hosts

Discovery Timeline

2025-09-17 - CVE-2025-23336 published to NVD
2025-09-25 - Last updated in NVD database

Technical Details for CVE-2025-23336

Vulnerability Analysis

NVIDIA Triton Inference Server hosts machine learning models and exposes inference endpoints over the network. The vulnerability resides in the model loading path, where input handling does not adequately validate model configuration data. A malformed or misconfigured model submitted during the load operation drives the server into an error state that results in service termination or unresponsiveness.

Because the attack vector is network-based and requires no authentication or user interaction, any actor able to reach the model management interface can trigger the condition. The impact is limited to availability, with no breach of confidentiality or integrity. Hosts running mission-critical inference pipelines lose access to served models until the service is restarted and the offending configuration is purged.

Root Cause

The vulnerability stems from improper input validation [CWE-20] in the model configuration handling logic. Triton accepts model parameters and metadata during load operations but fails to enforce constraints sufficient to reject malformed inputs gracefully. Instead of returning a controlled error, the server enters a failure state that affects availability.

Attack Vector

An attacker submits a model with a crafted or invalid configuration to the Triton Inference Server through its network-exposed management interface. The server processes the configuration, fails to validate it correctly, and crashes or hangs. Environments that expose Triton endpoints to untrusted networks, including some internal staging or shared inference infrastructure, are at greater risk. No exploit code or public proof of concept is currently available, and the vulnerability is not listed in the CISA Known Exploited Vulnerabilities catalog.

Detection Methods for CVE-2025-23336

Indicators of Compromise

Unexpected Triton Inference Server crashes or restarts correlated with model load requests
Error log entries referencing failed model configuration parsing or load failures
Repeated POST requests to the model repository or model control API from unfamiliar source addresses

Detection Strategies

Monitor Triton server logs for abnormal model load failures, segmentation faults, or worker process exits
Compare loaded model inventories against an approved baseline to identify unauthorized load attempts
Alert on service availability gaps reported by health checks or upstream inference clients

Monitoring Recommendations

Forward Triton application and system logs to a centralized logging platform for correlation
Track HTTP and gRPC traffic to model management endpoints with network flow analytics
Establish baselines for model load frequency and alert on deviations

How to Mitigate CVE-2025-23336

Immediate Actions Required

Apply the security update referenced in the NVIDIA Support Article
Restrict network access to Triton model management interfaces to trusted clients only
Disable remote model loading where it is not required for production workloads
Review existing models in the repository for unauthorized or unexpected configurations

Patch Information

NVIDIA has published guidance and a fixed release in the vendor advisory. Refer to the NVIDIA Support Article for the affected versions and the patched build numbers. Upgrade all Windows and Linux Triton Inference Server deployments to the remediated version.

Workarounds

Place Triton Inference Server behind an authenticated reverse proxy that enforces allowlists
Use network segmentation and firewall rules to limit exposure of model control endpoints
Run Triton with the --model-control-mode=none option where dynamic model loading is not needed
Implement validation of model configuration files before placing them in the model repository

bash

# Configuration example: restrict Triton to explicit model loads and bind to localhost
tritonserver \
  --model-repository=/opt/triton/models \
  --model-control-mode=explicit \
  --http-address=127.0.0.1 \
  --grpc-address=127.0.0.1 \
  --allow-http=true \
  --allow-grpc=true