CVE-2026-31234: Horovod KVStore RCE Vulnerability

CVE-2026-31234 Overview

CVE-2026-31234 is an insecure deserialization vulnerability [CWE-502] in Horovod, an open-source distributed deep learning training framework. Versions up to and including 0.28.1 are affected. The flaw exists in the KVStore HTTP server component, which coordinates distributed training tasks across workers. The server lacks authentication and authorization controls, allowing any remote attacker to write arbitrary data via HTTP PUT requests. Horovod workers later retrieve this data via HTTP GET and deserialize it with cloudpickle.loads() without source or integrity verification. An attacker who races the legitimate writer can deliver a malicious pickle payload, causing the victim worker to execute arbitrary code.

Critical Impact
Unauthenticated remote attackers can achieve arbitrary code execution on Horovod worker nodes by poisoning the KVStore before legitimate data is written.

Affected Products

Horovod distributed training framework versions up to and including 0.28.1
Horovod KVStore HTTP server component
Deployments using cloudpickle.loads() for KVStore value deserialization

Discovery Timeline

2026-05-12 - CVE-2026-31234 published to NVD
2026-05-14 - Last updated in NVD database

Technical Details for CVE-2026-31234

Vulnerability Analysis

Horovod uses a KVStore HTTP server to coordinate state between distributed workers during training. The server exposes simple HTTP endpoints where workers PUT serialized values under named keys and GET them back when needed. The server does not enforce authentication, authorization, or value integrity checks. Any host able to reach the KVStore port can write arbitrary bytes under any key.

When a worker reads a value, it passes the response body directly into cloudpickle.loads(). Python's pickle protocol allows arbitrary callables to be invoked through __reduce__ during deserialization. An attacker who writes a crafted pickle payload before the legitimate writer wins the race causes the victim worker to execute attacker-controlled Python code in its own process context. This typically grants the attacker access to training data, model weights, GPU resources, and credentials available to the worker process.

Root Cause

The root cause combines two design flaws. First, the KVStore HTTP server omits authentication and authorization, treating any HTTP client as trusted. Second, the consuming code calls cloudpickle.loads() on untrusted bytes without integrity validation such as a signed digest or transport authentication. Either control alone would block exploitation. Their joint absence converts a coordination channel into an unauthenticated code execution primitive.

Attack Vector

The attacker reaches the KVStore HTTP port over the network and issues an HTTP PUT containing a malicious pickle payload under a key the victim worker is expected to read. When the worker performs its scheduled GET, cloudpickle.loads() reconstructs the payload object and triggers attacker-supplied callables. The race window depends on training topology, but distributed jobs frequently have predictable key names and read timings. No user interaction or prior credentials are required. See the Horovod project repository and the CVE analysis notes for additional context.

Detection Methods for CVE-2026-31234

Indicators of Compromise

HTTP PUT requests to the Horovod KVStore port from hosts outside the expected worker set
KVStore values whose byte prefix matches pickle opcodes (\\x80\\x04 or \\x80\\x05) but did not originate from a known worker process
Worker processes spawning unexpected child processes such as sh, bash, python -c, or outbound network utilities immediately after a KVStore GET
Unexpected outbound connections from training nodes to attacker infrastructure during job startup

Detection Strategies

Inspect network flow logs for connections to the KVStore HTTP port originating outside the training cluster subnet
Hunt for Python worker processes invoking os.system, subprocess.Popen, or socket calls shortly after KVStore reads
Correlate Horovod worker logs with EDR process telemetry to identify deserialization events followed by anomalous child processes

Monitoring Recommendations

Enable verbose logging on the KVStore server to record source IP, key, and payload size for every PUT and GET
Alert on any KVStore client IP that is not a member of the authorized training node list
Forward worker process telemetry to a centralized data lake and apply identification rules for post-deserialization command execution patterns

How to Mitigate CVE-2026-31234

Immediate Actions Required

Restrict KVStore HTTP server exposure to a private, isolated network segment reachable only by authorized training workers
Apply host firewall rules or Kubernetes NetworkPolicies that block KVStore ports from all non-worker sources
Audit running Horovod deployments for unauthorized PUT activity against KVStore keys before assuming integrity

Patch Information

No fixed version is identified in the published CVE record for Horovod at the time of writing. Monitor the Horovod GitHub repository for security releases and upgrade as soon as a patched version is published. Until then, treat all KVStore traffic as untrusted and rely on network-level controls.

Workarounds

Place the KVStore server behind a mutually authenticated TLS tunnel or a service mesh that enforces workload identity
Replace the HTTP coordination backend with an authenticated alternative such as a managed key-value store with access controls
Sign KVStore payloads with an HMAC keyed to the training job and reject values that fail verification before calling cloudpickle.loads()
Run Horovod workers under minimal-privilege service accounts with no outbound internet access to limit post-exploitation impact

bash

# Configuration example: restrict KVStore port to worker subnet only
# Replace 10.0.0.0/24 with your training cluster CIDR and 8080 with the KVStore port
iptables -A INPUT -p tcp --dport 8080 -s 10.0.0.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 8080 -j DROP