CVE-2026-31214: ml-engineering torch-checkpoint RCE Flaw

CVE-2026-31214 Overview

CVE-2026-31214 is an insecure deserialization vulnerability [CWE-502] in the torch-checkpoint-shrink.py script from the ml-engineering project. The vulnerable code exists in commit 0099885db36a8f06556efe1faf552518852cb1e0. The script calls torch.load() on PyTorch checkpoint files without setting weights_only=True. This allows the underlying pickle module to deserialize arbitrary Python objects supplied by an attacker. A remote attacker who delivers a crafted .pt checkpoint file can achieve arbitrary code execution as the user running the script. Machine learning workflows that consume third-party checkpoints face direct exposure.

Critical Impact
Loading an attacker-supplied PyTorch checkpoint triggers arbitrary code execution in the context of the user running torch-checkpoint-shrink.py.

Affected Products

ml-engineering project, torch-checkpoint-shrink.py script
Commit 0099885db36a8f06556efe1faf552518852cb1e0 and earlier revisions containing the unsafe torch.load() call
Any downstream tooling or pipeline that invokes this script on untrusted .pt files

Discovery Timeline

2026-05-12 - CVE-2026-31214 published to NVD
2026-05-13 - Last updated in NVD database

Technical Details for CVE-2026-31214

Vulnerability Analysis

The vulnerability lives in the checkpoint loading path of torch-checkpoint-shrink.py. The script uses torch.load() to read PyTorch checkpoint files with the .pt extension. PyTorch's torch.load() defaults to using Python's pickle module for deserialization when weights_only=True is not explicitly set. The pickle format permits embedded objects whose constructors execute arbitrary Python code during unpickling. An attacker who controls a checkpoint file can therefore run code on the host that processes it. The attack vector is network-reachable because checkpoint files are routinely downloaded from model hubs, shared via collaboration platforms, or pulled from artifact registries.

Root Cause

The root cause is the absence of the weights_only=True parameter on the torch.load() invocation at line 57 of the script. Without that flag, PyTorch falls back to the legacy pickle-based loader. The legacy loader does not constrain which classes or callables can be reconstructed during deserialization. Trust is implicitly granted to whoever produced the checkpoint.

Attack Vector

An attacker crafts a malicious .pt file containing a pickle payload that defines a __reduce__ method returning a callable such as os.system or subprocess.Popen with attacker-controlled arguments. The attacker distributes the file through a model repository, a shared bucket, or a download link. When a victim runs torch-checkpoint-shrink.py against the file, torch.load() invokes the pickle deserializer, which executes the embedded callable. Code runs with the privileges of the user executing the script. See the GitHub Script Example for the affected call site.

Detection Methods for CVE-2026-31214

Indicators of Compromise

Unexpected child processes spawned by Python interpreters running torch-checkpoint-shrink.py, such as shells, curl, wget, or package managers
Outbound network connections from Python processes immediately after a checkpoint load operation
New files written under home directories, /tmp, or model cache paths during checkpoint processing
.pt files originating from untrusted sources or with anomalous size relative to declared model dimensions

Detection Strategies

Hunt process trees where python parents non-ML utilities like /bin/sh, bash, nc, or powershell during checkpoint loading windows
Inspect .pt files for suspicious opcodes by scanning pickle streams for GLOBAL references to modules such as os, subprocess, posix, or builtins
Flag torch.load() call sites in code review and CI scanning that omit weights_only=True
Correlate file download events for .pt artifacts with subsequent process and network activity on the same host

Monitoring Recommendations

Enable command-line and process-creation auditing on ML training and inference hosts
Log all file reads of .pt and .pth artifacts and the user account performing the load
Alert on Python processes initiating outbound TCP connections to non-allowlisted destinations
Track integrity hashes of checkpoint files against a known-good registry before consumption

How to Mitigate CVE-2026-31214

Immediate Actions Required

Stop running torch-checkpoint-shrink.py against checkpoints from untrusted or unverified sources
Patch the script locally by adding weights_only=True to the torch.load() call at line 57
Audit the repository for additional torch.load() usages that omit the safe-loading flag
Quarantine .pt files received from external collaborators until they are scanned and validated

Patch Information

No upstream patch is currently referenced in the NVD record. Apply a local fix by modifying the torch.load() invocation to pass weights_only=True, which restricts deserialization to tensor data and rejects arbitrary Python objects. PyTorch versions 2.6 and later enable this behavior by default, but the script's explicit call pattern should still set the parameter for clarity and backward compatibility. Track the upstream repository at the ml-engineering project for any forthcoming fix.

Workarounds

Run the script inside an ephemeral container or sandbox with no network egress and no access to credentials
Pre-validate checkpoint files by loading them with weights_only=True in an isolated process before any conversion step
Restrict execution to dedicated service accounts without write access to source code or secrets
Enforce checksum verification against a trusted manifest for every .pt artifact entering the pipeline

bash

# Configuration example: safe-load wrapper
python -c "import torch; torch.load('checkpoint.pt', map_location='cpu', weights_only=True)"