CVE-2026-31222 Overview
CVE-2026-31222 is an insecure deserialization vulnerability [CWE-502] in the Snorkel library through version v0.10.0. The flaw resides in the Trainer.load() method of the Trainer class, which loads model checkpoint files using torch.load() without the weights_only=True parameter. This default behavior permits deserialization of arbitrary Python objects through the Pickle module. A remote attacker who supplies a maliciously crafted model file can achieve arbitrary code execution on the victim's system when the file is loaded.
Critical Impact
Loading an attacker-supplied model checkpoint triggers arbitrary code execution under the privileges of the user running the Snorkel training pipeline.
Affected Products
- Snorkel library versions up to and including v0.10.0
- Python machine learning pipelines that invoke Trainer.load() on untrusted checkpoint files
- Downstream applications that bundle the snorkel-team/snorkel package
Discovery Timeline
- 2026-05-12 - CVE-2026-31222 published to NVD
- 2026-05-15 - Last updated in NVD database
Technical Details for CVE-2026-31222
Vulnerability Analysis
The Snorkel library provides programmatic data labeling and weak supervision tooling for machine learning workflows. Its Trainer class persists model state through checkpoint files and restores that state via Trainer.load(). The load implementation delegates to PyTorch's torch.load() without setting weights_only=True, so the function deserializes Pickle streams as full Python object graphs.
Pickle deserialization in Python executes __reduce__ methods embedded in the serialized payload. An attacker who controls the contents of a checkpoint file can embed a __reduce__ directive that calls os.system, subprocess.Popen, or any importable callable. The result is arbitrary code execution at parse time, before any model state is validated.
Successful exploitation compromises confidentiality, integrity, and availability of the host running the training or inference job. User interaction is required because a victim must invoke Trainer.load() on the malicious file.
Root Cause
The root cause is the use of torch.load() with its insecure default. PyTorch's documentation has explicitly warned that loading Pickle data is equivalent to executing arbitrary code, and the weights_only=True argument was introduced to restrict deserialization to tensor primitives. Snorkel's Trainer.load() does not pass this flag and does not perform any signature or origin validation on the input file.
Attack Vector
Exploitation requires an attacker to deliver a crafted checkpoint file and convince a user to load it. Delivery channels include shared model registries, public model hubs, supply chain repositories, email attachments, and collaborative notebook environments. Once Trainer.load("malicious.pt") runs, the embedded Pickle reducer executes immediately under the calling process's permissions.
No verified public proof-of-concept exploit is available at the time of writing. The vulnerability pattern follows the well-documented Pickle reducer technique used against other PyTorch-based projects. See the Snorkel project repository for source-level context.
Detection Methods for CVE-2026-31222
Indicators of Compromise
- Unexpected child processes spawned by Python interpreters running Snorkel training scripts, such as sh, bash, powershell.exe, or curl.
- Checkpoint files (.pt, .pth, .pkl) originating from untrusted sources or recently downloaded to model directories.
- Outbound network connections from ML training hosts to unfamiliar domains immediately after a checkpoint load operation.
Detection Strategies
- Scan Python codebases for calls to Trainer.load() and torch.load() that omit weights_only=True.
- Inspect Pickle files with tooling such as pickletools to flag GLOBAL opcodes referencing os, subprocess, builtins, or posix modules.
- Use endpoint behavioral analytics to correlate Python process activity with shell, network, and filesystem operations that deviate from normal training behavior.
Monitoring Recommendations
- Log and alert on process trees where python or jupyter is the parent of interactive shells or remote download utilities.
- Track checkpoint file provenance through hash inventories and signed manifests in model registries.
- Forward ML host telemetry to a centralized data lake for cross-correlation of file-load events with subsequent process and network behavior.
How to Mitigate CVE-2026-31222
Immediate Actions Required
- Audit all uses of snorkel.classification.Trainer.load() and replace them with safe loading routines until an upstream patch is applied.
- Treat any existing checkpoint files from external sources as untrusted and quarantine them pending verification.
- Restrict who can publish to internal model registries and require signed artifacts for checkpoint distribution.
Patch Information
No fixed version is referenced in the published NVD entry at the time of writing. Monitor the Snorkel project repository for a release that enforces weights_only=True or replaces Pickle-based serialization. Until a patched release is available, apply the workarounds below.
Workarounds
- Wrap or monkey-patch Trainer.load() to call torch.load(path, weights_only=True) and reject files that fail to load under that constraint.
- Run training and inference jobs in isolated containers or sandboxes with no outbound network access and minimal filesystem permissions.
- Verify checkpoint integrity with cryptographic signatures or hashes generated from a trusted publisher before loading.
# Configuration example: enforce weights_only loading in a wrapper module
# safe_loader.py
python - <<'EOF'
import torch
def safe_load(path):
return torch.load(path, weights_only=True, map_location="cpu")
EOF
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


