CVE-2026-31222: Snorkel RCE Vulnerability

CVE-2026-31222 Overview

CVE-2026-31222 is an insecure deserialization vulnerability [CWE-502] in the Snorkel library through version v0.10.0. The flaw resides in the Trainer.load() method of the Trainer class, which loads model checkpoint files using torch.load() without the weights_only=True parameter. This default behavior permits deserialization of arbitrary Python objects through the Pickle module. A remote attacker who supplies a maliciously crafted model file can achieve arbitrary code execution on the victim's system when the file is loaded.

Critical Impact
Loading an attacker-supplied model checkpoint triggers arbitrary code execution under the privileges of the user running the Snorkel training pipeline.

Affected Products

Snorkel library versions up to and including v0.10.0
Python machine learning pipelines that invoke Trainer.load() on untrusted checkpoint files
Downstream applications that bundle the snorkel-team/snorkel package

Discovery Timeline

2026-05-12 - CVE-2026-31222 published to NVD
2026-05-15 - Last updated in NVD database

Technical Details for CVE-2026-31222

Vulnerability Analysis

The Snorkel library provides programmatic data labeling and weak supervision tooling for machine learning workflows. Its Trainer class persists model state through checkpoint files and restores that state via Trainer.load(). The load implementation delegates to PyTorch's torch.load() without setting weights_only=True, so the function deserializes Pickle streams as full Python object graphs.

Pickle deserialization in Python executes __reduce__ methods embedded in the serialized payload. An attacker who controls the contents of a checkpoint file can embed a __reduce__ directive that calls os.system, subprocess.Popen, or any importable callable. The result is arbitrary code execution at parse time, before any model state is validated.

Successful exploitation compromises confidentiality, integrity, and availability of the host running the training or inference job. User interaction is required because a victim must invoke Trainer.load() on the malicious file.

Root Cause

The root cause is the use of torch.load() with its insecure default. PyTorch's documentation has explicitly warned that loading Pickle data is equivalent to executing arbitrary code, and the weights_only=True argument was introduced to restrict deserialization to tensor primitives. Snorkel's Trainer.load() does not pass this flag and does not perform any signature or origin validation on the input file.

Attack Vector

Exploitation requires an attacker to deliver a crafted checkpoint file and convince a user to load it. Delivery channels include shared model registries, public model hubs, supply chain repositories, email attachments, and collaborative notebook environments. Once Trainer.load("malicious.pt") runs, the embedded Pickle reducer executes immediately under the calling process's permissions.

No verified public proof-of-concept exploit is available at the time of writing. The vulnerability pattern follows the well-documented Pickle reducer technique used against other PyTorch-based projects. See the Snorkel project repository for source-level context.

Detection Methods for CVE-2026-31222

Indicators of Compromise

Unexpected child processes spawned by Python interpreters running Snorkel training scripts, such as sh, bash, powershell.exe, or curl.
Checkpoint files (.pt, .pth, .pkl) originating from untrusted sources or recently downloaded to model directories.
Outbound network connections from ML training hosts to unfamiliar domains immediately after a checkpoint load operation.

Detection Strategies

Scan Python codebases for calls to Trainer.load() and torch.load() that omit weights_only=True.
Inspect Pickle files with tooling such as pickletools to flag GLOBAL opcodes referencing os, subprocess, builtins, or posix modules.
Use endpoint behavioral analytics to correlate Python process activity with shell, network, and filesystem operations that deviate from normal training behavior.

Monitoring Recommendations

Log and alert on process trees where python or jupyter is the parent of interactive shells or remote download utilities.
Track checkpoint file provenance through hash inventories and signed manifests in model registries.
Forward ML host telemetry to a centralized data lake for cross-correlation of file-load events with subsequent process and network behavior.

How to Mitigate CVE-2026-31222

Immediate Actions Required

Audit all uses of snorkel.classification.Trainer.load() and replace them with safe loading routines until an upstream patch is applied.
Treat any existing checkpoint files from external sources as untrusted and quarantine them pending verification.
Restrict who can publish to internal model registries and require signed artifacts for checkpoint distribution.

Patch Information

No fixed version is referenced in the published NVD entry at the time of writing. Monitor the Snorkel project repository for a release that enforces weights_only=True or replaces Pickle-based serialization. Until a patched release is available, apply the workarounds below.

Workarounds

Wrap or monkey-patch Trainer.load() to call torch.load(path, weights_only=True) and reject files that fail to load under that constraint.
Run training and inference jobs in isolated containers or sandboxes with no outbound network access and minimal filesystem permissions.
Verify checkpoint integrity with cryptographic signatures or hashes generated from a trusted publisher before loading.

bash

# Configuration example: enforce weights_only loading in a wrapper module
# safe_loader.py
python - <<'EOF'
import torch

def safe_load(path):
    return torch.load(path, weights_only=True, map_location="cpu")
EOF

CVE-2026-31222 Overview

Critical Impact
Loading an attacker-supplied model checkpoint triggers arbitrary code execution under the privileges of the user running the Snorkel training pipeline.

Affected Products

Snorkel library versions up to and including v0.10.0
Python machine learning pipelines that invoke Trainer.load() on untrusted checkpoint files
Downstream applications that bundle the snorkel-team/snorkel package

Discovery Timeline

2026-05-12 - CVE-2026-31222 published to NVD
2026-05-15 - Last updated in NVD database

Technical Details for CVE-2026-31222

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2026-31222

Indicators of Compromise

Unexpected child processes spawned by Python interpreters running Snorkel training scripts, such as sh, bash, powershell.exe, or curl.
Checkpoint files (.pt, .pth, .pkl) originating from untrusted sources or recently downloaded to model directories.
Outbound network connections from ML training hosts to unfamiliar domains immediately after a checkpoint load operation.

Detection Strategies

Scan Python codebases for calls to Trainer.load() and torch.load() that omit weights_only=True.
Inspect Pickle files with tooling such as pickletools to flag GLOBAL opcodes referencing os, subprocess, builtins, or posix modules.
Use endpoint behavioral analytics to correlate Python process activity with shell, network, and filesystem operations that deviate from normal training behavior.

Monitoring Recommendations

Log and alert on process trees where python or jupyter is the parent of interactive shells or remote download utilities.
Track checkpoint file provenance through hash inventories and signed manifests in model registries.
Forward ML host telemetry to a centralized data lake for cross-correlation of file-load events with subsequent process and network behavior.

How to Mitigate CVE-2026-31222

Immediate Actions Required

Audit all uses of snorkel.classification.Trainer.load() and replace them with safe loading routines until an upstream patch is applied.
Treat any existing checkpoint files from external sources as untrusted and quarantine them pending verification.
Restrict who can publish to internal model registries and require signed artifacts for checkpoint distribution.

Patch Information

Workarounds

Wrap or monkey-patch Trainer.load() to call torch.load(path, weights_only=True) and reject files that fail to load under that constraint.
Run training and inference jobs in isolated containers or sandboxes with no outbound network access and minimal filesystem permissions.
Verify checkpoint integrity with cryptographic signatures or hashes generated from a trusted publisher before loading.

bash

# Configuration example: enforce weights_only loading in a wrapper module
# safe_loader.py
python - <<'EOF'
import torch

def safe_load(path):
    return torch.load(path, weights_only=True, map_location="cpu")
EOF

CVE-2026-31222: Snorkel RCE Vulnerability

CVE-2026-31222 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2026-31222

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2026-31222

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2026-31222

Immediate Actions Required

Patch Information

Workarounds

Experience the Most Advanced Cybersecurity Platform

CVE-2026-31222: Snorkel RCE Vulnerability

CVE-2026-31222 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2026-31222

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2026-31222

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2026-31222

Immediate Actions Required

Patch Information

Workarounds

Experience the Most Advanced Cybersecurity Platform