CVE-2026-6607: lm-sys fastchat DoS Vulnerability

CVE-2026-6607 Overview

A resource exhaustion vulnerability has been identified in lm-sys FastChat versions up to 0.2.36. This security flaw affects the api_generate function within the Worker API Endpoint component. The vulnerability stems from improper handling of blocking calls in async context, allowing remote attackers to manipulate the component and cause excessive resource consumption, potentially leading to denial of service conditions.

Critical Impact
Remote attackers can exploit this vulnerability to exhaust server resources through the Worker API Endpoint, causing service degradation or complete denial of service for FastChat deployments.

Affected Products

lm-sys FastChat versions up to and including 0.2.36
FastChat Worker API Endpoint component
FastChat base_model_worker.py and huggingface_api_worker.py modules

Discovery Timeline

April 20, 2026 - CVE-2026-6607 published to NVD
April 22, 2026 - Last updated in NVD database

Technical Details for CVE-2026-6607

Vulnerability Analysis

This vulnerability is classified as CWE-400 (Uncontrolled Resource Consumption). The core issue lies in the synchronous execution of blocking operations within asynchronous API handlers in FastChat's Worker API. When blocking calls like worker.get_embeddings() or worker.generate_gate() are executed directly in async functions without proper thread offloading, they block the event loop, preventing the server from processing other requests.

An attacker can exploit this by sending multiple concurrent requests to the affected API endpoints. Since blocking calls hold up the async event loop, the server's ability to handle legitimate traffic degrades rapidly. The exploit has been publicly disclosed, and proof-of-concept code is available through a GitHub Gist.

Root Cause

The vulnerability originates from improper async/await implementation in the FastChat worker components. The api_generate, api_get_embeddings, and related functions in base_model_worker.py and huggingface_api_worker.py were calling blocking synchronous methods directly within async handlers. This design flaw causes the Python asyncio event loop to stall during CPU-intensive or I/O-blocking operations, creating a bottleneck that attackers can exploit to exhaust server resources.

The initial patch (commit ff66426) addressed the issue in api_generate but missed other entry points, leaving additional vulnerable endpoints exposed until the complete fix was implemented.

Attack Vector

The attack can be initiated remotely over the network without requiring authentication or user interaction. Attackers send crafted requests to the Worker API endpoints, exploiting the blocking nature of the synchronous calls to consume server resources and starve legitimate requests.

python

# Security patch in base_model_worker.py - wrapping blocking calls with asyncio.to_thread
 async def api_get_embeddings(request: Request):
     params = await request.json()
     await acquire_worker_semaphore()
-    embedding = worker.get_embeddings(params)
+    embedding = await asyncio.to_thread(worker.get_embeddings, params)
     release_worker_semaphore()
     return JSONResponse(content=embedding)

Source: GitHub Commit Details

python

# Security patch in huggingface_api_worker.py - preventing DoS through thread offloading
     params = await request.json()
     worker = worker_map[params["model"]]
     await acquire_worker_semaphore(worker)
-    output = worker.generate_gate(params)
+    output = await asyncio.to_thread(worker.generate_gate, params)
     release_worker_semaphore(worker)
     return JSONResponse(output)

Source: GitHub Commit Details

Detection Methods for CVE-2026-6607

Indicators of Compromise

Unusual spike in concurrent connections to FastChat Worker API endpoints
Significant increase in response latency for /api/generate and /api/get_embeddings endpoints
High CPU utilization without corresponding legitimate workload increase
Server logs showing request timeouts or dropped connections during normal traffic periods

Detection Strategies

Monitor request rates to Worker API endpoints for anomalous patterns indicating resource exhaustion attacks
Implement connection rate limiting and alerting thresholds for the FastChat API endpoints
Deploy application performance monitoring (APM) to detect event loop blocking conditions
Audit FastChat version in deployment to confirm whether vulnerable versions (≤0.2.36) are in use

Monitoring Recommendations

Enable detailed logging for the FastChat Worker API to capture request metadata and timing
Set up alerts for sustained high latency on API endpoints exceeding normal operational baselines
Monitor system resource utilization (CPU, memory, thread counts) for FastChat worker processes
Review connection queuing metrics to identify potential DoS conditions early

How to Mitigate CVE-2026-6607

Immediate Actions Required

Update FastChat to a version containing the security patch (commit c9e84b89c91d45191dc24466888de526fa04cf33 or later)
Review and restrict network access to Worker API endpoints to trusted sources only
Implement rate limiting at the network or application layer for exposed API endpoints
Monitor for exploitation attempts using the detection strategies outlined above

Patch Information

The vulnerability has been patched in the FastChat repository. The fix involves wrapping blocking synchronous calls with asyncio.to_thread() to prevent event loop starvation. The patch is available in commit c9e84b89c91d45191dc24466888de526fa04cf33. Additional details can be found in the GitHub Issue Tracker and Pull Request #3835.

Workarounds

Deploy a reverse proxy with rate limiting in front of FastChat Worker API endpoints
Restrict API endpoint access to internal networks or authenticated clients only
Implement connection timeouts to prevent resource exhaustion from long-running requests
Consider containerization with resource limits to contain potential DoS impact

bash

# Example nginx rate limiting configuration for FastChat API
limit_req_zone $binary_remote_addr zone=fastchat_api:10m rate=10r/s;

location /api/ {
    limit_req zone=fastchat_api burst=20 nodelay;
    proxy_pass http://fastchat_backend;
    proxy_read_timeout 30s;
    proxy_connect_timeout 10s;
}