CVE-2025-23326: Nvidia Triton Inference Server DOS Flaw

CVE-2025-23326 Overview

CVE-2025-23326 affects NVIDIA Triton Inference Server on both Windows and Linux platforms. The vulnerability allows a remote attacker to trigger an integer overflow by sending a specially crafted input to the server. Successful exploitation can lead to a denial of service condition, disrupting AI inference workloads.

The flaw is classified under [CWE-680] (Integer Overflow to Buffer Overflow). The attack vector is network-based, requires no authentication, and no user interaction. Organizations running NVIDIA Triton Inference Server to host machine learning models in production should prioritize remediation to maintain service availability.

Critical Impact
A remote, unauthenticated attacker can crash the Triton Inference Server through a crafted request, disrupting hosted AI model availability.

Affected Products

NVIDIA Triton Inference Server (Windows)
NVIDIA Triton Inference Server (Linux)
Deployments running on Linux kernel and Microsoft Windows host platforms

Discovery Timeline

2025-08-06 - CVE-2025-23326 published to the National Vulnerability Database (NVD)
2025-08-12 - Last updated in NVD database
Vendor advisory - Published by NVIDIA in support article a_id/5687

Technical Details for CVE-2025-23326

Vulnerability Analysis

The vulnerability exists in input handling logic within NVIDIA Triton Inference Server. When the server processes a specially crafted input, an arithmetic operation produces a value that exceeds the bounds of its integer type. This integer overflow leads to incorrect size calculations or memory boundary computations downstream.

The immediate consequence is a denial of service. The compromised arithmetic causes the inference server process to terminate or become unresponsive, halting model serving for legitimate clients. According to the CVSS vector, confidentiality and integrity are not affected, but availability impact is high.

Triton Inference Server is widely deployed to host machine learning models behind production APIs. Disruption of this service breaks dependent applications that rely on real-time inference, including recommendation engines, computer vision pipelines, and large language model endpoints.

Root Cause

The root cause is missing or insufficient bounds checking on integer arithmetic during input processing. An attacker-controlled value participates in a calculation that wraps around the maximum representable integer, producing an undersized or otherwise invalid result. The server then operates on this corrupted value, leading to a crash. This pattern aligns with [CWE-680], where an integer overflow propagates into buffer-related logic.

Attack Vector

The attack is remote and unauthenticated. An adversary sends a crafted request to the Triton Inference Server's exposed network endpoint, typically the HTTP or gRPC inference API. No prior privileges or user interaction are required. Servers exposed to untrusted networks or shared multi-tenant environments are at highest risk. Internal services that accept inference requests from less-trusted application tiers should also be considered exposed.

No public proof-of-concept exploit code has been published, and the vulnerability is not listed in the CISA Known Exploited Vulnerabilities catalog. See the NVIDIA Support Article for vendor technical detail.

Detection Methods for CVE-2025-23326

Indicators of Compromise

Unexpected termination, restart loops, or crash logs from the tritonserver process
Anomalously large or malformed payloads sent to Triton HTTP (port 8000) or gRPC (port 8001) endpoints
Spikes in 5xx errors or connection resets from inference clients
Sudden drops in inference throughput correlated with specific source IP addresses

Detection Strategies

Monitor process supervisor logs (systemd, container orchestrator) for repeated Triton restarts
Inspect HTTP and gRPC request payloads at the API gateway for oversized or malformed tensor metadata fields
Correlate crash events with inbound source IPs to identify probing or exploitation attempts
Enable Triton verbose logging during incident response to capture the crashing request signature

Monitoring Recommendations

Configure availability alerts on Triton health endpoints (/v2/health/live, /v2/health/ready)
Forward Triton stdout, stderr, and container runtime logs to a centralized SIEM for correlation
Track per-client request size distributions and alert on statistical outliers
Alert on repeated TCP resets or abnormal connection termination patterns against inference ports

How to Mitigate CVE-2025-23326

Immediate Actions Required

Apply the patched Triton Inference Server release referenced in the NVIDIA security bulletin
Inventory all Triton deployments across Windows and Linux hosts, including containerized workloads in Kubernetes
Restrict network exposure of Triton HTTP and gRPC ports to trusted application tiers only
Place a reverse proxy or API gateway in front of Triton to enforce request size limits and authentication

Patch Information

NVIDIA has released updated versions of Triton Inference Server that remediate the integer overflow. Refer to the official NVIDIA advisory at answer ID 5687 for the fixed version numbers and download instructions. Operators should pull the patched container image from NVIDIA NGC or upgrade binary installations on affected hosts. Validate the upgrade in a staging environment before rolling to production model-serving fleets.

Workarounds

Front Triton with an API gateway or reverse proxy that enforces strict input validation and maximum payload size
Apply network segmentation and firewall rules to limit access to Triton ports (8000, 8001, 8002) to known clients
Run Triton in a process supervisor configured for automatic restart to reduce service-impact window if a crash occurs
Enable rate limiting on inference endpoints to slow exploitation attempts

bash

# Example: restrict Triton ports to a known client subnet using iptables
iptables -A INPUT -p tcp -m multiport --dports 8000,8001,8002 \
  -s 10.20.30.0/24 -j ACCEPT
iptables -A INPUT -p tcp -m multiport --dports 8000,8001,8002 \
  -j DROP

# Example: pull the patched Triton container image (replace tag with fixed version)
docker pull nvcr.io/nvidia/tritonserver:<fixed-version>-py3