CVE-2025-65890 Overview
A device-ID validation flaw in OneFlow v0.9.0 allows attackers to cause a Denial of Service (DoS) by calling flow.cuda.synchronize() with an invalid or out-of-range GPU device index. This vulnerability (CWE-400: Uncontrolled Resource Consumption) enables remote attackers to disrupt service availability without requiring authentication or user interaction.
Critical Impact
Unauthenticated attackers can remotely crash OneFlow-based applications by exploiting improper validation of GPU device indices, leading to service disruption for machine learning workloads.
Affected Products
- OneFlow v0.9.0
- Applications utilizing OneFlow CUDA synchronization functions
- Systems running OneFlow with GPU acceleration enabled
Discovery Timeline
- 2026-01-28 - CVE CVE-2025-65890 published to NVD
- 2026-01-29 - Last updated in NVD database
Technical Details for CVE-2025-65890
Vulnerability Analysis
This vulnerability stems from insufficient input validation in OneFlow's CUDA synchronization functionality. The flow.cuda.synchronize() function accepts a device index parameter that specifies which GPU device to synchronize. When an attacker supplies an invalid or out-of-range GPU device index, the function fails to properly validate this input before attempting to interact with the CUDA runtime.
The flaw allows attackers to trigger uncontrolled resource consumption by forcing the application into an error state. Because the vulnerability is accessible over the network and requires no privileges or user interaction, it presents a significant risk to systems exposing OneFlow-based services.
Root Cause
The root cause is improper input validation (CWE-400) in the device index handling logic within the flow.cuda.synchronize() function. The function does not adequately verify that the provided GPU device index corresponds to an actual, available GPU device on the system before attempting synchronization operations. This missing boundary check allows attackers to specify arbitrary device indices that exceed the number of available GPUs or use negative values.
Attack Vector
The attack vector is network-based, requiring no authentication or user interaction. An attacker can exploit this vulnerability by:
- Identifying a target system running an application built on OneFlow v0.9.0 with exposed API endpoints
- Crafting a malicious request that invokes flow.cuda.synchronize() with an invalid device index (e.g., a negative number or an index exceeding the available GPU count)
- Sending the request to trigger the DoS condition, causing the application to crash or become unresponsive
The vulnerability is particularly concerning in production machine learning inference environments where OneFlow handles GPU workloads and service availability is critical.
For technical implementation details regarding this vulnerability, refer to OneFlow Issue #10662 which documents the device-ID validation flaw.
Detection Methods for CVE-2025-65890
Indicators of Compromise
- Unexpected application crashes in OneFlow-based services, particularly those involving CUDA operations
- Abnormal error messages related to invalid GPU device indices in application logs
- Repeated calls to flow.cuda.synchronize() with unusual or out-of-range device parameters
- Service availability issues correlating with incoming API requests targeting CUDA functions
Detection Strategies
- Monitor application logs for CUDA-related errors involving device index validation failures
- Implement anomaly detection for API calls to CUDA synchronization functions with unusual device parameters
- Deploy intrusion detection rules to flag requests containing negative or excessively large device index values
- Use application performance monitoring to detect sudden service degradation patterns
Monitoring Recommendations
- Configure alerting for repeated CUDA synchronization failures in OneFlow applications
- Establish baseline metrics for normal GPU device access patterns to identify anomalous behavior
- Review access logs for API endpoints that expose CUDA synchronization functionality
- Implement rate limiting and input validation at the application boundary for GPU-related operations
How to Mitigate CVE-2025-65890
Immediate Actions Required
- Review all OneFlow v0.9.0 deployments and identify systems exposed to network access
- Implement input validation at the application layer to sanitize GPU device index parameters before passing to OneFlow
- Restrict network access to OneFlow-based services to trusted sources only
- Monitor for patches or updates from the OneFlow development team via the OneFlow GitHub Repository
Patch Information
No official patch has been confirmed at the time of this publication. Organizations should monitor the OneFlow GitHub Repository and OneFlow Issue #10662 for updates regarding security fixes. Contact the OneFlow development team for guidance on remediation timelines.
Workarounds
- Implement application-level validation to ensure GPU device indices are within the valid range (0 to N-1, where N is the number of available GPUs)
- Use network segmentation to limit exposure of OneFlow-based services to trusted internal networks only
- Deploy a reverse proxy or API gateway to filter and validate incoming requests before they reach the OneFlow application
- Consider containerization with resource limits to contain the impact of potential DoS attacks
# Example: Enumerate available CUDA devices to determine valid index range
# Use this information to implement input validation
nvidia-smi --list-gpus
# Output example: GPU 0: NVIDIA A100 (UUID: GPU-xxxx)
# Valid device indices would be 0 to (GPU_COUNT - 1)
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

