CVE-2025-23331: Nvidia Triton Inference Server DOS Flaw

CVE-2025-23331 Overview

CVE-2025-23331 affects NVIDIA Triton Inference Server on both Windows and Linux. The flaw allows a remote unauthenticated attacker to trigger an excessive memory allocation by sending an invalid request. The resulting allocation produces a segmentation fault that crashes the inference service.

The weakness is classified under CWE-789: Memory Allocation with Excessive Size Value. Successful exploitation leads to denial of service against AI inference workloads. No authentication or user interaction is required, and the attack is reachable over the network.

Critical Impact
An unauthenticated network attacker can crash NVIDIA Triton Inference Server by submitting a single malformed request, disrupting AI inference availability across affected Windows and Linux deployments.

Affected Products

NVIDIA Triton Inference Server (Linux)
NVIDIA Triton Inference Server (Windows)
Deployments on Linux kernel and Microsoft Windows host operating systems

Discovery Timeline

2025-08-06 - CVE-2025-23331 published to NVD
2025-08-12 - Last updated in NVD database

Technical Details for CVE-2025-23331

Vulnerability Analysis

NVIDIA Triton Inference Server exposes HTTP and gRPC endpoints that accept inference requests describing tensors, shapes, and data payloads. The vulnerability stems from insufficient validation of size-related fields supplied by the client.

When a malformed request specifies an oversized length or shape value, Triton attempts to allocate a buffer matching the attacker-supplied size. The allocation either fails or produces an invalid pointer that is later dereferenced. The process terminates with a segmentation fault, and the inference server becomes unavailable until restarted.

The impact is limited to availability. Confidentiality and integrity are not affected, since no attacker-controlled code executes and no memory contents are returned. Production AI workloads relying on Triton for model serving lose access to inference capacity for the duration of the outage.

Root Cause

The root cause is missing or insufficient bounds checking on size parameters extracted from inbound requests, mapped to [CWE-789]. The server trusts client-supplied size values when reserving memory for inference inputs. No upper bound rejects values that exceed available system memory or sane request sizes.

Attack Vector

The attack vector is network-based and requires no privileges or user interaction. An attacker sends a single crafted request to a Triton HTTP or gRPC endpoint. The malformed payload contains a size field large enough to trigger the faulty allocation path. The Triton process crashes, producing denial of service. Repeated requests against an auto-restarting deployment sustain the outage.

Refer to the NVIDIA Security Bulletin for vendor-supplied technical details and fixed versions.

Detection Methods for CVE-2025-23331

Indicators of Compromise

Unexpected SIGSEGV terminations of the tritonserver process recorded in system or container logs
Crash loops in Kubernetes pods or systemd units hosting Triton, with restart counters incrementing rapidly
Inbound HTTP or gRPC requests to Triton inference endpoints containing abnormally large shape, byte_size, or content-length values

Detection Strategies

Monitor process exit codes and core dumps for the Triton server binary and correlate with inbound request timestamps
Inspect reverse proxy or service mesh logs for inference requests with payload size fields exceeding expected model input dimensions
Alert on sudden drops in inference request success rates combined with elevated 5xx responses from Triton endpoints

Monitoring Recommendations

Forward Triton stdout, stderr, and crash logs to a centralized SIEM for correlation with network telemetry
Track memory allocation spikes on Triton hosts using node-level metrics and trigger alerts on near-OOM conditions
Apply rate limiting and request size validation at an upstream gateway and log requests that exceed defined thresholds

How to Mitigate CVE-2025-23331

Immediate Actions Required

Upgrade NVIDIA Triton Inference Server to the fixed version listed in the NVIDIA Support Answer 5687
Restrict network exposure of Triton HTTP and gRPC ports to trusted clients only, using network policies, firewalls, or service mesh authorization
Place an authenticating reverse proxy or API gateway in front of Triton endpoints to filter malformed requests

Patch Information

NVIDIA has published a security bulletin for this issue. Administrators should consult the NVIDIA Security Bulletin for Triton Inference Server to obtain the fixed release version and apply the upgrade. The official record is available on the NVD CVE-2025-23331 detail page and the CVE.org record.

Workarounds

Enforce maximum request size limits at an upstream proxy to reject oversized inference payloads before they reach Triton
Run Triton inside a container or systemd unit with strict memory limits so a single crash does not exhaust host resources
Require client authentication on Triton endpoints and isolate the service on a dedicated internal network segment

bash

# Example: enforce request size limit in an NGINX reverse proxy fronting Triton
server {
    listen 8000;
    client_max_body_size 16m;

    location / {
        proxy_pass http://triton_backend:8000;
        proxy_read_timeout 60s;
    }
}

SentinelOne customers can leverage Singularity Platform behavioral AI to detect anomalous process terminations and crash patterns on Triton hosts. Singularity Data Lake supports OCSF-normalized ingestion of Triton, proxy, and host logs for correlation and alerting on the indicators described above.

CVE-2025-23331 Overview

Critical Impact
An unauthenticated network attacker can crash NVIDIA Triton Inference Server by submitting a single malformed request, disrupting AI inference availability across affected Windows and Linux deployments.

Affected Products

NVIDIA Triton Inference Server (Linux)
NVIDIA Triton Inference Server (Windows)
Deployments on Linux kernel and Microsoft Windows host operating systems

Discovery Timeline

2025-08-06 - CVE-2025-23331 published to NVD
2025-08-12 - Last updated in NVD database

Technical Details for CVE-2025-23331

Vulnerability Analysis

Root Cause

Attack Vector

Refer to the NVIDIA Security Bulletin for vendor-supplied technical details and fixed versions.

Detection Methods for CVE-2025-23331

Indicators of Compromise

Unexpected SIGSEGV terminations of the tritonserver process recorded in system or container logs
Crash loops in Kubernetes pods or systemd units hosting Triton, with restart counters incrementing rapidly
Inbound HTTP or gRPC requests to Triton inference endpoints containing abnormally large shape, byte_size, or content-length values

Detection Strategies

Monitor process exit codes and core dumps for the Triton server binary and correlate with inbound request timestamps
Inspect reverse proxy or service mesh logs for inference requests with payload size fields exceeding expected model input dimensions
Alert on sudden drops in inference request success rates combined with elevated 5xx responses from Triton endpoints

Monitoring Recommendations

Forward Triton stdout, stderr, and crash logs to a centralized SIEM for correlation with network telemetry
Track memory allocation spikes on Triton hosts using node-level metrics and trigger alerts on near-OOM conditions
Apply rate limiting and request size validation at an upstream gateway and log requests that exceed defined thresholds

How to Mitigate CVE-2025-23331

Immediate Actions Required

Upgrade NVIDIA Triton Inference Server to the fixed version listed in the NVIDIA Support Answer 5687
Restrict network exposure of Triton HTTP and gRPC ports to trusted clients only, using network policies, firewalls, or service mesh authorization
Place an authenticating reverse proxy or API gateway in front of Triton endpoints to filter malformed requests

Patch Information

Workarounds

Enforce maximum request size limits at an upstream proxy to reject oversized inference payloads before they reach Triton
Run Triton inside a container or systemd unit with strict memory limits so a single crash does not exhaust host resources
Require client authentication on Triton endpoints and isolate the service on a dedicated internal network segment

bash

# Example: enforce request size limit in an NGINX reverse proxy fronting Triton
server {
    listen 8000;
    client_max_body_size 16m;

    location / {
        proxy_pass http://triton_backend:8000;
        proxy_read_timeout 60s;
    }
}

CVE-2025-23331: Nvidia Triton Inference Server DOS Flaw

CVE-2025-23331 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2025-23331

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2025-23331

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2025-23331

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform

CVE-2025-23331: Nvidia Triton Inference Server DOS Flaw

CVE-2025-23331 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2025-23331

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2025-23331

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2025-23331

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform