CVE-2025-65890: OneFlow v0.9.0 DoS Vulnerability

CVE-2025-65890 Overview

A device-ID validation flaw in OneFlow v0.9.0 allows attackers to cause a Denial of Service (DoS) by calling flow.cuda.synchronize() with an invalid or out-of-range GPU device index. This vulnerability (CWE-400: Uncontrolled Resource Consumption) enables remote attackers to disrupt service availability without requiring authentication or user interaction.

Critical Impact
Unauthenticated attackers can remotely crash OneFlow-based applications by exploiting improper validation of GPU device indices, leading to service disruption for machine learning workloads.

Affected Products

OneFlow v0.9.0
Applications utilizing OneFlow CUDA synchronization functions
Systems running OneFlow with GPU acceleration enabled

Discovery Timeline

2026-01-28 - CVE CVE-2025-65890 published to NVD
2026-01-29 - Last updated in NVD database

Technical Details for CVE-2025-65890

Vulnerability Analysis

This vulnerability stems from insufficient input validation in OneFlow's CUDA synchronization functionality. The flow.cuda.synchronize() function accepts a device index parameter that specifies which GPU device to synchronize. When an attacker supplies an invalid or out-of-range GPU device index, the function fails to properly validate this input before attempting to interact with the CUDA runtime.

The flaw allows attackers to trigger uncontrolled resource consumption by forcing the application into an error state. Because the vulnerability is accessible over the network and requires no privileges or user interaction, it presents a significant risk to systems exposing OneFlow-based services.

Root Cause

The root cause is improper input validation (CWE-400) in the device index handling logic within the flow.cuda.synchronize() function. The function does not adequately verify that the provided GPU device index corresponds to an actual, available GPU device on the system before attempting synchronization operations. This missing boundary check allows attackers to specify arbitrary device indices that exceed the number of available GPUs or use negative values.

Attack Vector

The attack vector is network-based, requiring no authentication or user interaction. An attacker can exploit this vulnerability by:

Identifying a target system running an application built on OneFlow v0.9.0 with exposed API endpoints
Crafting a malicious request that invokes flow.cuda.synchronize() with an invalid device index (e.g., a negative number or an index exceeding the available GPU count)
Sending the request to trigger the DoS condition, causing the application to crash or become unresponsive

The vulnerability is particularly concerning in production machine learning inference environments where OneFlow handles GPU workloads and service availability is critical.

For technical implementation details regarding this vulnerability, refer to OneFlow Issue #10662 which documents the device-ID validation flaw.

Detection Methods for CVE-2025-65890

Indicators of Compromise

Unexpected application crashes in OneFlow-based services, particularly those involving CUDA operations
Abnormal error messages related to invalid GPU device indices in application logs
Repeated calls to flow.cuda.synchronize() with unusual or out-of-range device parameters
Service availability issues correlating with incoming API requests targeting CUDA functions

Detection Strategies

Monitor application logs for CUDA-related errors involving device index validation failures
Implement anomaly detection for API calls to CUDA synchronization functions with unusual device parameters
Deploy intrusion detection rules to flag requests containing negative or excessively large device index values
Use application performance monitoring to detect sudden service degradation patterns

Monitoring Recommendations

Configure alerting for repeated CUDA synchronization failures in OneFlow applications
Establish baseline metrics for normal GPU device access patterns to identify anomalous behavior
Review access logs for API endpoints that expose CUDA synchronization functionality
Implement rate limiting and input validation at the application boundary for GPU-related operations

How to Mitigate CVE-2025-65890

Immediate Actions Required

Review all OneFlow v0.9.0 deployments and identify systems exposed to network access
Implement input validation at the application layer to sanitize GPU device index parameters before passing to OneFlow
Restrict network access to OneFlow-based services to trusted sources only
Monitor for patches or updates from the OneFlow development team via the OneFlow GitHub Repository

Patch Information

No official patch has been confirmed at the time of this publication. Organizations should monitor the OneFlow GitHub Repository and OneFlow Issue #10662 for updates regarding security fixes. Contact the OneFlow development team for guidance on remediation timelines.

Workarounds

Implement application-level validation to ensure GPU device indices are within the valid range (0 to N-1, where N is the number of available GPUs)
Use network segmentation to limit exposure of OneFlow-based services to trusted internal networks only
Deploy a reverse proxy or API gateway to filter and validate incoming requests before they reach the OneFlow application
Consider containerization with resource limits to contain the impact of potential DoS attacks

bash

# Example: Enumerate available CUDA devices to determine valid index range
# Use this information to implement input validation
nvidia-smi --list-gpus
# Output example: GPU 0: NVIDIA A100 (UUID: GPU-xxxx)
# Valid device indices would be 0 to (GPU_COUNT - 1)