CVE-2025-23335: Nvidia Triton Inference Server DOS Flaw

CVE-2025-23335 Overview

CVE-2025-23335 affects NVIDIA Triton Inference Server on Windows and Linux when running the TensorRT backend. An attacker can trigger an integer underflow [CWE-191] by submitting a crafted model configuration combined with specific input data. Successful exploitation leads to denial of service against the inference server process. The flaw is reachable over the network without authentication or user interaction, making exposed inference endpoints a direct attack surface. NVIDIA published advisory a_id/5687 describing the issue and providing fixed versions.

Critical Impact
Unauthenticated network attackers can crash NVIDIA Triton Inference Server instances running the TensorRT backend, disrupting AI inference services on Windows and Linux deployments.

Affected Products

NVIDIA Triton Inference Server (TensorRT backend) on Linux
NVIDIA Triton Inference Server (TensorRT backend) on Microsoft Windows
Deployments exposing Triton HTTP/gRPC inference endpoints

Discovery Timeline

2025-08-06 - CVE-2025-23335 published to NVD
2025-08-12 - Last updated in NVD database

Technical Details for CVE-2025-23335

Vulnerability Analysis

The vulnerability is classified as an integer underflow [CWE-191] in the TensorRT backend of NVIDIA Triton Inference Server. Triton accepts model configurations that define tensor shapes, data types, and batching parameters. When processing a specific combination of model configuration values and input tensor data, an arithmetic operation on an unsigned integer wraps below zero. The resulting underflowed value is then used in downstream memory or loop logic, producing a process-level failure that halts inference serving.

Because Triton is commonly deployed as a network-facing service handling HTTP and gRPC inference requests, the underflow can be reached remotely without credentials. The impact is confined to availability — the advisory and CVSS vector indicate no confidentiality or integrity loss.

Root Cause

The root cause is missing or insufficient bounds validation on values derived from attacker-influenced model configuration and inference inputs. When a computed size or offset is decremented past zero in unsigned arithmetic, it wraps to a very large value, breaking the assumptions of code that consumes it. Refer to the NVIDIA Support Advisory for the precise affected versions and patched releases.

Attack Vector

An attacker with network reachability to a Triton inference endpoint loads or references a malicious model configuration, then submits a tailored inference request. The combination of configuration and input drives execution through the vulnerable arithmetic path. The Triton server process terminates or hangs, causing denial of service for all consumers of that endpoint. No authentication, privilege, or user interaction is required.

No public proof-of-concept exploit code is available for CVE-2025-23335. See the CVE.org record and NVD entry for authoritative technical references.

Detection Methods for CVE-2025-23335

Indicators of Compromise

Unexpected crashes, restarts, or core dumps of the tritonserver process on Linux or Windows hosts running the TensorRT backend
Spikes in failed inference requests followed by connection resets on Triton HTTP (default 8000) or gRPC (default 8001) endpoints
Model load events referencing unusual or attacker-supplied model configurations from untrusted clients
Container or systemd service restart loops correlated with inbound inference traffic

Detection Strategies

Monitor Triton server logs for abnormal termination, segmentation faults, or TensorRT backend errors immediately following inference requests
Alert on repeated process restarts of tritonserver within short time windows on managed inference hosts
Inspect inference request patterns for malformed tensor shapes or atypical model configuration uploads from unexpected source IPs
Correlate availability degradation of inference endpoints with concurrent network telemetry to identify probing sources

Monitoring Recommendations

Track Triton /v2/health/live and /v2/health/ready endpoint responses for sustained failures
Capture and retain stderr and stdout from Triton containers to forensically analyze crash conditions
Baseline normal model load operations and alert on unauthorized model registrations or repository changes
Forward inference gateway logs to a centralized logging platform for retention and correlation across hosts

How to Mitigate CVE-2025-23335

Immediate Actions Required

Apply the patched Triton Inference Server release identified in the NVIDIA Support Advisory to all Windows and Linux deployments using the TensorRT backend
Restrict network exposure of Triton HTTP and gRPC ports to authenticated, trusted clients only via firewall rules or service mesh policies
Audit the model repository and remove any model configurations from untrusted or unverified sources
Place an authenticating reverse proxy or API gateway in front of Triton until patching is complete

Patch Information

NVIDIA has released fixed versions of Triton Inference Server addressing the TensorRT backend underflow. Consult the NVIDIA Support Advisory a_id/5687 for the exact patched version numbers corresponding to your deployed branch. Rebuild container images and redeploy inference workloads to pick up the fixed binaries.

Workarounds

Disable the TensorRT backend if it is not required and serve models through an unaffected backend
Enforce strict access control on model repository write paths so only trusted operators can publish model configurations
Place rate limiting on inference endpoints to slow repeated exploitation attempts and trigger faster anomaly alerts
Run Triton under a process supervisor with restart-on-failure and alerting to maintain availability while patches are staged

bash

# Configuration example: restrict Triton exposure with iptables and enforce model repo permissions
sudo iptables -A INPUT -p tcp --dport 8000 -s 10.0.0.0/24 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 8001 -s 10.0.0.0/24 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 8000 -j DROP
sudo iptables -A INPUT -p tcp --dport 8001 -j DROP

# Lock down the model repository to a trusted owner
sudo chown -R triton-admin:triton /opt/triton/models
sudo chmod -R 750 /opt/triton/models