CVE-2026-3059: Lmsys Sglang RCE Vulnerability

CVE-2026-3059 Overview

CVE-2026-3059 is an unauthenticated remote code execution vulnerability in the SGLang large language model serving framework maintained by LMSYS. The flaw resides in the multimodal generation module, where the ZeroMQ (ZMQ) broker deserializes incoming messages using Python's pickle.loads() without any authentication or integrity checks. Any attacker with network access to the ZMQ socket can submit a crafted pickle payload and execute arbitrary code in the context of the SGLang process. The vulnerability is tracked under [CWE-502: Deserialization of Untrusted Data].

Critical Impact
Unauthenticated attackers can achieve remote code execution on hosts running the SGLang multimodal generation service, compromising model serving infrastructure and any data accessible to the process.

Affected Products

LMSYS SGLang multimodal generation module (versions prior to v0.5.10)
Deployments exposing the SGLang scheduler ZMQ broker on reachable interfaces
Multimodal inference pipelines built on affected SGLang releases

Discovery Timeline

2026-03-12 - CVE-2026-3059 published to the National Vulnerability Database (NVD)
2026-04-07 - Last updated in NVD database

Technical Details for CVE-2026-3059

Vulnerability Analysis

SGLang is a serving framework for large language models and multimodal models. Its multimodal generation runtime uses a ZMQ-based scheduler to dispatch work between client and worker processes. The scheduler client component, implemented in python/sglang/multimodal_gen/runtime/scheduler_client.py, reads messages from a ZMQ socket and reconstructs Python objects with pickle.loads().

Pickle is not a safe deserialization format. Any object reconstruction can invoke arbitrary callables through the __reduce__ protocol. Because the ZMQ broker accepts connections without authentication, message signing, or transport-level access control, anyone who can reach the listening socket controls what the server deserializes.

An attacker who reaches the broker can deliver a pickle payload whose __reduce__ method returns a tuple referencing os.system, subprocess.Popen, or any other callable. Deserialization triggers execution in the SGLang worker context, yielding code execution with the privileges of the serving process.

Root Cause

The root cause is unsafe use of pickle.loads() on data received from an untrusted network source. The ZMQ broker channel was treated as an internal trust boundary, but no binding restriction, authentication layer (such as CurveZMQ), or safer serialization format (such as JSON or protobuf) was enforced.

Attack Vector

Exploitation requires network reachability to the SGLang multimodal scheduler ZMQ endpoint. No user interaction or prior authentication is needed. The attacker crafts a malicious pickle object, transmits it through a ZMQ client connection, and the broker deserializes it on receipt. Deployments that bind the broker to 0.0.0.0 or expose it through container networking, Kubernetes services, or shared inference clusters are directly reachable. See the SGLang Security Advisory GHSA-3cp7-c6q2-94xr and the Orca Security RCE analysis for technical details.

Detection Methods for CVE-2026-3059

Indicators of Compromise

Unexpected child processes spawned by the SGLang Python worker, such as shells, curl, wget, or interpreters launched outside normal model serving flows
Outbound network connections from SGLang processes to unfamiliar hosts shortly after inbound ZMQ traffic
New files written under the SGLang working directory or /tmp that contain shell scripts, loaders, or cron entries
ZMQ traffic to the multimodal scheduler port from clients outside the documented inference cluster

Detection Strategies

Inspect process trees for python or SGLang workers spawning /bin/sh, bash, python -c, or other interpreters as child processes
Monitor for anomalous deserialization-related stack frames in application logs, including unhandled exceptions originating in pickle or scheduler_client.py
Apply network analytics to flag external connections reaching the ZMQ scheduler port on inference hosts

Monitoring Recommendations

Forward SGLang stdout, stderr, and crash logs to a centralized logging pipeline and alert on tracebacks referencing pickle.loads
Capture process telemetry on GPU and inference nodes to baseline expected child processes and detect deviations
Track ingress traffic to model serving ports and alert on connections from non-allowlisted source addresses

How to Mitigate CVE-2026-3059

Immediate Actions Required

Upgrade SGLang to release v0.5.10 or later, which contains the fix shipped in pull request #20904
Restrict network reachability of the multimodal scheduler ZMQ broker to loopback or a trusted internal segment until patches are applied
Rotate any credentials, API keys, or model artifacts that were accessible from hosts running an unpatched SGLang instance
Audit inference hosts for the indicators of compromise listed above before returning them to service

Patch Information

The maintainers released SGLang v0.5.10 addressing the unsafe deserialization path in the multimodal scheduler client. Review the fix in pull request #20904 and the GHSA-3cp7-c6q2-94xr advisory for the full list of changes.

Workarounds

Bind the ZMQ broker to 127.0.0.1 and route all inference traffic through an authenticated reverse proxy when an immediate upgrade is not possible
Enforce host firewall rules or Kubernetes NetworkPolicies that allow only known client IPs to reach the scheduler port
Run SGLang workers as a low-privilege account inside a hardened container to limit blast radius if exploitation occurs
Place the inference cluster behind a mutually authenticated transport such as a service mesh with mTLS until the patched release is deployed