CVE-2025-70999: OneFlow CUDA DoS Vulnerability

CVE-2025-70999 Overview

A GPU device-ID validation flaw exists in the flow.cuda.get_device_capability() component of OneFlow v0.9.0. This vulnerability allows attackers to cause a Denial of Service (DoS) condition by providing a crafted device ID that bypasses input validation checks. The flaw stems from improper validation of user-supplied device identifiers before they are processed by the CUDA component.

Critical Impact
Attackers can remotely trigger service disruption by exploiting improper input validation in GPU device handling, potentially causing system crashes or resource exhaustion in machine learning infrastructure.

Affected Products

OneFlow v0.9.0
Systems utilizing flow.cuda.get_device_capability() component
Machine learning pipelines dependent on OneFlow CUDA operations

Discovery Timeline

2026-01-28 - CVE CVE-2025-70999 published to NVD
2026-01-29 - Last updated in NVD database

Technical Details for CVE-2025-70999

Vulnerability Analysis

This vulnerability is classified under CWE-400 (Uncontrolled Resource Consumption), indicating that the flaw allows attackers to consume excessive resources through malicious input. The flow.cuda.get_device_capability() function fails to properly validate device ID parameters before attempting to query GPU capabilities, allowing specially crafted values to trigger unexpected behavior.

The vulnerability can be exploited over the network without requiring authentication or user interaction, making it accessible to remote attackers. When a malicious device ID is passed to the function, it can cause the application to enter an error state, consume excessive resources, or crash entirely, resulting in denial of service for legitimate users.

Root Cause

The root cause lies in insufficient input validation within the flow.cuda.get_device_capability() function. The code does not properly verify that the provided device ID corresponds to a valid, accessible GPU device before attempting operations. This allows out-of-range or malformed device IDs to be processed, triggering error conditions that lead to service disruption.

According to GitHub Issue #10660, the vulnerability was identified through boundary testing of the device ID parameter, revealing that negative values, excessively large integers, or specially formatted inputs can bypass existing validation logic.

Attack Vector

The attack can be conducted remotely over the network by any unauthenticated user who can interact with OneFlow's CUDA functions. The attacker crafts a malicious device ID value and passes it to the flow.cuda.get_device_capability() function. Since no special privileges or user interaction are required, the attack has a low barrier to entry.

The vulnerability mechanism involves passing invalid device identifiers to the CUDA capability query function. When the function receives a device ID that does not correspond to a valid GPU or falls outside expected ranges, it fails to handle the error gracefully, leading to resource exhaustion or application crash. Technical details and the original vulnerability report can be found in the GitHub Issue #10660 filed against the OneFlow repository.

Detection Methods for CVE-2025-70999

Indicators of Compromise

Unexpected crashes or restarts of OneFlow-based applications
Unusual error messages related to CUDA device queries in application logs
Spike in failed GPU device capability requests
Application memory or CPU usage anomalies during CUDA operations

Detection Strategies

Monitor application logs for repeated errors from flow.cuda.get_device_capability() calls
Implement input validation logging to capture malformed device ID attempts
Set up alerts for abnormal patterns in CUDA API call failures
Deploy network-level monitoring for suspicious requests targeting ML endpoints

Monitoring Recommendations

Enable verbose logging for OneFlow CUDA operations in production environments
Configure resource utilization alerts for systems running OneFlow workloads
Implement rate limiting on APIs that expose CUDA functionality
Regularly audit device ID parameters in application request logs

How to Mitigate CVE-2025-70999

Immediate Actions Required

Review all code paths that call flow.cuda.get_device_capability() and implement input validation
Restrict network access to OneFlow services where possible
Implement input sanitization for any user-controllable device ID parameters
Consider deploying a Web Application Firewall (WAF) to filter malicious requests

Patch Information

Users should monitor the OneFlow GitHub repository for official patches addressing this vulnerability. As of the last NVD update on 2026-01-29, no vendor advisory or patch has been published. Users are encouraged to check the OneFlow Homepage for security announcements.

Workarounds

Validate device IDs against the actual number of available GPUs before calling flow.cuda.get_device_capability()
Implement try-catch exception handling around CUDA device queries to prevent crash propagation
Limit access to OneFlow APIs to trusted networks only
Consider running OneFlow services in isolated containers with resource limits to contain DoS impact

bash

# Example validation workaround
# Before calling flow.cuda.get_device_capability(), verify device ID bounds
# Check available GPU count and validate input accordingly
# Implementation will vary based on deployment environment