CVE-2025-23336 Overview
CVE-2025-23336 is a denial of service vulnerability in NVIDIA Triton Inference Server for both Windows and Linux. An attacker can trigger the condition by loading a misconfigured model into the server. Successful exploitation causes the inference service to become unavailable, disrupting AI workloads that depend on it.
The issue is classified under improper input validation [CWE-20] and carries a CVSS 3.1 base score of 7.5. The flaw is network-reachable, requires no privileges, and needs no user interaction, but only impacts availability.
Critical Impact
Remote attackers can disrupt AI inference workloads by submitting a malformed model configuration to the Triton Inference Server.
Affected Products
- NVIDIA Triton Inference Server (Windows)
- NVIDIA Triton Inference Server (Linux)
- Deployments running on supported Linux kernel and Microsoft Windows hosts
Discovery Timeline
- 2025-09-17 - CVE-2025-23336 published to NVD
- 2025-09-25 - Last updated in NVD database
Technical Details for CVE-2025-23336
Vulnerability Analysis
NVIDIA Triton Inference Server hosts machine learning models and exposes inference endpoints over the network. The vulnerability resides in the model loading path, where input handling does not adequately validate model configuration data. A malformed or misconfigured model submitted during the load operation drives the server into an error state that results in service termination or unresponsiveness.
Because the attack vector is network-based and requires no authentication or user interaction, any actor able to reach the model management interface can trigger the condition. The impact is limited to availability, with no breach of confidentiality or integrity. Hosts running mission-critical inference pipelines lose access to served models until the service is restarted and the offending configuration is purged.
Root Cause
The vulnerability stems from improper input validation [CWE-20] in the model configuration handling logic. Triton accepts model parameters and metadata during load operations but fails to enforce constraints sufficient to reject malformed inputs gracefully. Instead of returning a controlled error, the server enters a failure state that affects availability.
Attack Vector
An attacker submits a model with a crafted or invalid configuration to the Triton Inference Server through its network-exposed management interface. The server processes the configuration, fails to validate it correctly, and crashes or hangs. Environments that expose Triton endpoints to untrusted networks, including some internal staging or shared inference infrastructure, are at greater risk. No exploit code or public proof of concept is currently available, and the vulnerability is not listed in the CISA Known Exploited Vulnerabilities catalog.
Detection Methods for CVE-2025-23336
Indicators of Compromise
- Unexpected Triton Inference Server crashes or restarts correlated with model load requests
- Error log entries referencing failed model configuration parsing or load failures
- Repeated POST requests to the model repository or model control API from unfamiliar source addresses
Detection Strategies
- Monitor Triton server logs for abnormal model load failures, segmentation faults, or worker process exits
- Compare loaded model inventories against an approved baseline to identify unauthorized load attempts
- Alert on service availability gaps reported by health checks or upstream inference clients
Monitoring Recommendations
- Forward Triton application and system logs to a centralized logging platform for correlation
- Track HTTP and gRPC traffic to model management endpoints with network flow analytics
- Establish baselines for model load frequency and alert on deviations
How to Mitigate CVE-2025-23336
Immediate Actions Required
- Apply the security update referenced in the NVIDIA Support Article
- Restrict network access to Triton model management interfaces to trusted clients only
- Disable remote model loading where it is not required for production workloads
- Review existing models in the repository for unauthorized or unexpected configurations
Patch Information
NVIDIA has published guidance and a fixed release in the vendor advisory. Refer to the NVIDIA Support Article for the affected versions and the patched build numbers. Upgrade all Windows and Linux Triton Inference Server deployments to the remediated version.
Workarounds
- Place Triton Inference Server behind an authenticated reverse proxy that enforces allowlists
- Use network segmentation and firewall rules to limit exposure of model control endpoints
- Run Triton with the --model-control-mode=none option where dynamic model loading is not needed
- Implement validation of model configuration files before placing them in the model repository
# Configuration example: restrict Triton to explicit model loads and bind to localhost
tritonserver \
--model-repository=/opt/triton/models \
--model-control-mode=explicit \
--http-address=127.0.0.1 \
--grpc-address=127.0.0.1 \
--allow-http=true \
--allow-grpc=true
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


