CVE-2025-23335 Overview
CVE-2025-23335 affects NVIDIA Triton Inference Server on Windows and Linux when running the TensorRT backend. An attacker can trigger an integer underflow [CWE-191] by submitting a crafted model configuration combined with specific input data. Successful exploitation leads to denial of service against the inference server process. The flaw is reachable over the network without authentication or user interaction, making exposed inference endpoints a direct attack surface. NVIDIA published advisory a_id/5687 describing the issue and providing fixed versions.
Critical Impact
Unauthenticated network attackers can crash NVIDIA Triton Inference Server instances running the TensorRT backend, disrupting AI inference services on Windows and Linux deployments.
Affected Products
- NVIDIA Triton Inference Server (TensorRT backend) on Linux
- NVIDIA Triton Inference Server (TensorRT backend) on Microsoft Windows
- Deployments exposing Triton HTTP/gRPC inference endpoints
Discovery Timeline
- 2025-08-06 - CVE-2025-23335 published to NVD
- 2025-08-12 - Last updated in NVD database
Technical Details for CVE-2025-23335
Vulnerability Analysis
The vulnerability is classified as an integer underflow [CWE-191] in the TensorRT backend of NVIDIA Triton Inference Server. Triton accepts model configurations that define tensor shapes, data types, and batching parameters. When processing a specific combination of model configuration values and input tensor data, an arithmetic operation on an unsigned integer wraps below zero. The resulting underflowed value is then used in downstream memory or loop logic, producing a process-level failure that halts inference serving.
Because Triton is commonly deployed as a network-facing service handling HTTP and gRPC inference requests, the underflow can be reached remotely without credentials. The impact is confined to availability — the advisory and CVSS vector indicate no confidentiality or integrity loss.
Root Cause
The root cause is missing or insufficient bounds validation on values derived from attacker-influenced model configuration and inference inputs. When a computed size or offset is decremented past zero in unsigned arithmetic, it wraps to a very large value, breaking the assumptions of code that consumes it. Refer to the NVIDIA Support Advisory for the precise affected versions and patched releases.
Attack Vector
An attacker with network reachability to a Triton inference endpoint loads or references a malicious model configuration, then submits a tailored inference request. The combination of configuration and input drives execution through the vulnerable arithmetic path. The Triton server process terminates or hangs, causing denial of service for all consumers of that endpoint. No authentication, privilege, or user interaction is required.
No public proof-of-concept exploit code is available for CVE-2025-23335. See the CVE.org record and NVD entry for authoritative technical references.
Detection Methods for CVE-2025-23335
Indicators of Compromise
- Unexpected crashes, restarts, or core dumps of the tritonserver process on Linux or Windows hosts running the TensorRT backend
- Spikes in failed inference requests followed by connection resets on Triton HTTP (default 8000) or gRPC (default 8001) endpoints
- Model load events referencing unusual or attacker-supplied model configurations from untrusted clients
- Container or systemd service restart loops correlated with inbound inference traffic
Detection Strategies
- Monitor Triton server logs for abnormal termination, segmentation faults, or TensorRT backend errors immediately following inference requests
- Alert on repeated process restarts of tritonserver within short time windows on managed inference hosts
- Inspect inference request patterns for malformed tensor shapes or atypical model configuration uploads from unexpected source IPs
- Correlate availability degradation of inference endpoints with concurrent network telemetry to identify probing sources
Monitoring Recommendations
- Track Triton /v2/health/live and /v2/health/ready endpoint responses for sustained failures
- Capture and retain stderr and stdout from Triton containers to forensically analyze crash conditions
- Baseline normal model load operations and alert on unauthorized model registrations or repository changes
- Forward inference gateway logs to a centralized logging platform for retention and correlation across hosts
How to Mitigate CVE-2025-23335
Immediate Actions Required
- Apply the patched Triton Inference Server release identified in the NVIDIA Support Advisory to all Windows and Linux deployments using the TensorRT backend
- Restrict network exposure of Triton HTTP and gRPC ports to authenticated, trusted clients only via firewall rules or service mesh policies
- Audit the model repository and remove any model configurations from untrusted or unverified sources
- Place an authenticating reverse proxy or API gateway in front of Triton until patching is complete
Patch Information
NVIDIA has released fixed versions of Triton Inference Server addressing the TensorRT backend underflow. Consult the NVIDIA Support Advisory a_id/5687 for the exact patched version numbers corresponding to your deployed branch. Rebuild container images and redeploy inference workloads to pick up the fixed binaries.
Workarounds
- Disable the TensorRT backend if it is not required and serve models through an unaffected backend
- Enforce strict access control on model repository write paths so only trusted operators can publish model configurations
- Place rate limiting on inference endpoints to slow repeated exploitation attempts and trigger faster anomaly alerts
- Run Triton under a process supervisor with restart-on-failure and alerting to maintain availability while patches are staged
# Configuration example: restrict Triton exposure with iptables and enforce model repo permissions
sudo iptables -A INPUT -p tcp --dport 8000 -s 10.0.0.0/24 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 8001 -s 10.0.0.0/24 -j ACCEPT
sudo iptables -A INPUT -p tcp --dport 8000 -j DROP
sudo iptables -A INPUT -p tcp --dport 8001 -j DROP
# Lock down the model repository to a trusted owner
sudo chown -R triton-admin:triton /opt/triton/models
sudo chmod -R 750 /opt/triton/models
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


