CVE-2026-31224 Overview
CVE-2026-31224 is an insecure deserialization vulnerability [CWE-502] in the Snorkel machine learning library through version 0.10.0. The flaw resides in the MultitaskClassifier.load() method of the MultitaskClassifier class. The method invokes torch.load() without setting the weights_only=True parameter, allowing arbitrary Python objects to be deserialized through the Pickle module. A remote attacker can craft a malicious model file that executes arbitrary code when a victim loads it. The vulnerability carries a CVSS 3.1 score of 8.8 and affects users who load untrusted Snorkel model checkpoints.
Critical Impact
Loading a malicious Snorkel model file triggers arbitrary code execution in the context of the user running the Python process.
Affected Products
- Snorkel library versions up to and including 0.10.0
- Applications and pipelines invoking MultitaskClassifier.load()
- Downstream ML workflows that ingest third-party Snorkel checkpoints
Discovery Timeline
- 2026-05-12 - CVE-2026-31224 published to NVD
- 2026-05-13 - Last updated in NVD database
Technical Details for CVE-2026-31224
Vulnerability Analysis
The Snorkel library provides weak supervision and multi-task learning utilities for training machine learning models. The MultitaskClassifier class exposes a load() method to restore previously serialized model weights from disk. Internally, the method calls PyTorch's torch.load() function without specifying weights_only=True. PyTorch's default deserialization path relies on Python's pickle module, which reconstructs arbitrary objects by invoking their __reduce__ methods during unpickling.
An attacker who controls the model file can embed a malicious object whose __reduce__ returns a callable such as os.system or subprocess.Popen along with attacker-supplied arguments. When the victim calls MultitaskClassifier.load() on the file, the embedded payload executes immediately in the Python interpreter. The attacker gains the same privileges as the user running the workflow.
Root Cause
The root cause is the unsafe default invocation of torch.load(). PyTorch introduced weights_only=True to constrain deserialization to tensors and basic data types. Snorkel's MultitaskClassifier.load() does not pass this parameter, leaving Pickle-based object reconstruction enabled. Any file accepted by the loader is implicitly treated as trusted code.
Attack Vector
Exploitation requires the victim to load a model file supplied by the attacker. Distribution channels include public model hubs, shared training artifacts, supply-chain compromise of ML repositories, and email or messaging attachments. User interaction is required, which is reflected in the CVSS vector component UI:R. Network attack delivery is straightforward because model files are routinely exchanged across organizational boundaries.
No verified public proof-of-concept code is available. The exploitation pattern follows the well-documented Pickle-based code execution technique used against other unsafe torch.load() callers. See the GitHub Snorkel Repository for source context.
Detection Methods for CVE-2026-31224
Indicators of Compromise
- Unexpected child processes spawned by Python interpreters running Snorkel workloads, such as sh, bash, cmd.exe, or powershell.exe.
- Outbound network connections initiated immediately after a call to MultitaskClassifier.load().
- Snorkel model files (.pt, .pth, .pkl) originating from untrusted sources or recently downloaded from public repositories.
Detection Strategies
- Inspect model files with tools such as pickletools or fickling to flag dangerous opcodes like GLOBAL, REDUCE, and references to os, subprocess, or builtins.eval.
- Add static analysis rules that flag calls to torch.load() without weights_only=True in CI pipelines.
- Monitor process trees in data science environments for Python parents spawning shell or interpreter children.
Monitoring Recommendations
- Forward endpoint process telemetry from ML training and inference hosts to a centralized analytics platform for behavioral analysis.
- Alert on Snorkel and PyTorch processes performing file writes outside expected model directories or reading from credential stores.
- Track ingress of model artifacts and correlate file hashes against allowlists of vetted checkpoints.
How to Mitigate CVE-2026-31224
Immediate Actions Required
- Stop loading Snorkel model files from untrusted sources until a patched version is deployed.
- Audit existing pipelines for direct or indirect calls to MultitaskClassifier.load() and quarantine any checkpoints of unknown provenance.
- Isolate ML training and inference workloads in sandboxed environments with restricted network egress and limited filesystem access.
Patch Information
No vendor advisory or fixed release was referenced in the NVD entry at the time of publication. Monitor the GitHub Snorkel Repository for an upstream fix that passes weights_only=True to torch.load() or migrates to a safer serialization format such as safetensors. Refer to the Notion CVE-2026-31224 Documentation for additional reporter notes.
Workarounds
- Wrap MultitaskClassifier.load() calls with a custom loader that invokes torch.load(path, weights_only=True) before reconstructing the classifier state.
- Validate model files against cryptographic signatures or hashes from a trusted internal registry before loading.
- Execute model loading inside containers or virtual machines with no credentials, no production network access, and ephemeral filesystems.
# Configuration example: scan a Snorkel checkpoint for unsafe Pickle opcodes
pip install fickling
fickling --check-safety model_checkpoint.pt
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


