CVE-2026-6607 Overview
A resource exhaustion vulnerability has been identified in lm-sys FastChat versions up to 0.2.36. This security flaw affects the api_generate function within the Worker API Endpoint component. The vulnerability stems from improper handling of blocking calls in async context, allowing remote attackers to manipulate the component and cause excessive resource consumption, potentially leading to denial of service conditions.
Critical Impact
Remote attackers can exploit this vulnerability to exhaust server resources through the Worker API Endpoint, causing service degradation or complete denial of service for FastChat deployments.
Affected Products
- lm-sys FastChat versions up to and including 0.2.36
- FastChat Worker API Endpoint component
- FastChat base_model_worker.py and huggingface_api_worker.py modules
Discovery Timeline
- April 20, 2026 - CVE-2026-6607 published to NVD
- April 22, 2026 - Last updated in NVD database
Technical Details for CVE-2026-6607
Vulnerability Analysis
This vulnerability is classified as CWE-400 (Uncontrolled Resource Consumption). The core issue lies in the synchronous execution of blocking operations within asynchronous API handlers in FastChat's Worker API. When blocking calls like worker.get_embeddings() or worker.generate_gate() are executed directly in async functions without proper thread offloading, they block the event loop, preventing the server from processing other requests.
An attacker can exploit this by sending multiple concurrent requests to the affected API endpoints. Since blocking calls hold up the async event loop, the server's ability to handle legitimate traffic degrades rapidly. The exploit has been publicly disclosed, and proof-of-concept code is available through a GitHub Gist.
Root Cause
The vulnerability originates from improper async/await implementation in the FastChat worker components. The api_generate, api_get_embeddings, and related functions in base_model_worker.py and huggingface_api_worker.py were calling blocking synchronous methods directly within async handlers. This design flaw causes the Python asyncio event loop to stall during CPU-intensive or I/O-blocking operations, creating a bottleneck that attackers can exploit to exhaust server resources.
The initial patch (commit ff66426) addressed the issue in api_generate but missed other entry points, leaving additional vulnerable endpoints exposed until the complete fix was implemented.
Attack Vector
The attack can be initiated remotely over the network without requiring authentication or user interaction. Attackers send crafted requests to the Worker API endpoints, exploiting the blocking nature of the synchronous calls to consume server resources and starve legitimate requests.
# Security patch in base_model_worker.py - wrapping blocking calls with asyncio.to_thread
async def api_get_embeddings(request: Request):
params = await request.json()
await acquire_worker_semaphore()
- embedding = worker.get_embeddings(params)
+ embedding = await asyncio.to_thread(worker.get_embeddings, params)
release_worker_semaphore()
return JSONResponse(content=embedding)
Source: GitHub Commit Details
# Security patch in huggingface_api_worker.py - preventing DoS through thread offloading
params = await request.json()
worker = worker_map[params["model"]]
await acquire_worker_semaphore(worker)
- output = worker.generate_gate(params)
+ output = await asyncio.to_thread(worker.generate_gate, params)
release_worker_semaphore(worker)
return JSONResponse(output)
Source: GitHub Commit Details
Detection Methods for CVE-2026-6607
Indicators of Compromise
- Unusual spike in concurrent connections to FastChat Worker API endpoints
- Significant increase in response latency for /api/generate and /api/get_embeddings endpoints
- High CPU utilization without corresponding legitimate workload increase
- Server logs showing request timeouts or dropped connections during normal traffic periods
Detection Strategies
- Monitor request rates to Worker API endpoints for anomalous patterns indicating resource exhaustion attacks
- Implement connection rate limiting and alerting thresholds for the FastChat API endpoints
- Deploy application performance monitoring (APM) to detect event loop blocking conditions
- Audit FastChat version in deployment to confirm whether vulnerable versions (≤0.2.36) are in use
Monitoring Recommendations
- Enable detailed logging for the FastChat Worker API to capture request metadata and timing
- Set up alerts for sustained high latency on API endpoints exceeding normal operational baselines
- Monitor system resource utilization (CPU, memory, thread counts) for FastChat worker processes
- Review connection queuing metrics to identify potential DoS conditions early
How to Mitigate CVE-2026-6607
Immediate Actions Required
- Update FastChat to a version containing the security patch (commit c9e84b89c91d45191dc24466888de526fa04cf33 or later)
- Review and restrict network access to Worker API endpoints to trusted sources only
- Implement rate limiting at the network or application layer for exposed API endpoints
- Monitor for exploitation attempts using the detection strategies outlined above
Patch Information
The vulnerability has been patched in the FastChat repository. The fix involves wrapping blocking synchronous calls with asyncio.to_thread() to prevent event loop starvation. The patch is available in commit c9e84b89c91d45191dc24466888de526fa04cf33. Additional details can be found in the GitHub Issue Tracker and Pull Request #3835.
Workarounds
- Deploy a reverse proxy with rate limiting in front of FastChat Worker API endpoints
- Restrict API endpoint access to internal networks or authenticated clients only
- Implement connection timeouts to prevent resource exhaustion from long-running requests
- Consider containerization with resource limits to contain potential DoS impact
# Example nginx rate limiting configuration for FastChat API
limit_req_zone $binary_remote_addr zone=fastchat_api:10m rate=10r/s;
location /api/ {
limit_req zone=fastchat_api burst=20 nodelay;
proxy_pass http://fastchat_backend;
proxy_read_timeout 30s;
proxy_connect_timeout 10s;
}
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


