CVE-2026-31253: Flash-Attention Framework RCE Vulnerability

CVE-2026-31253 Overview

CVE-2026-31253 is an insecure deserialization vulnerability [CWE-502] in the flash-attention training framework maintained by Dao-AILab. The flaw exists in the framework's checkpoint loading mechanism through commit e724e2588cbe754beb97cf7c011b5e7e34119e62. The load_checkpoint() function in checkpoint.py and the checkpoint loading logic in eval.py call torch.load() without the weights_only=True parameter. This permits deserialization of arbitrary Python objects via the pickle module. An attacker who supplies a maliciously crafted checkpoint file can execute arbitrary code when a victim loads it during model warmstarting or evaluation.

Critical Impact
Arbitrary code execution on the victim's system when loading an attacker-supplied checkpoint file into flash-attention.

Affected Products

flash-attention training framework (Dao-AILab) through commit e724e2588cbe754beb97cf7c011b5e7e34119e62
checkpoint.pyload_checkpoint() function
eval.py checkpoint loading code path

Discovery Timeline

2026-05-11 - CVE-2026-31253 published to NVD
2026-05-12 - Last updated in NVD database

Technical Details for CVE-2026-31253

Vulnerability Analysis

The flash-attention framework loads model checkpoints using PyTorch's torch.load() function. By default, torch.load() deserializes data using Python's pickle module, which can instantiate arbitrary Python objects and invoke their __reduce__ methods during unpickling. PyTorch introduced the weights_only=True parameter to restrict deserialization to tensor data only. The vulnerable code paths in checkpoint.py and eval.py omit this parameter, leaving the default permissive behavior in place.

This pattern is common across machine learning frameworks that share pretrained weights between users and organizations. Researchers, model hubs, and collaborative training pipelines routinely exchange checkpoint files. A single malicious checkpoint can compromise any system that loads it.

Root Cause

The root cause is missing input validation on deserialized data. The load_checkpoint() function trusts the contents of the checkpoint file and passes it directly to pickle-based deserialization. There is no integrity check, signature verification, or restriction on which Python classes may be reconstructed during the load process.

Attack Vector

An attacker crafts a checkpoint file containing a Python object whose __reduce__ method returns a callable such as os.system or subprocess.Popen along with attacker-controlled arguments. The attacker distributes this file through model-sharing platforms, pull requests, supply-chain compromise, or social engineering targeting researchers. When the victim invokes flash-attention's warmstart or evaluation flow against the file, the embedded payload runs with the privileges of the loading process. See the flash-attention GitHub repository for the affected code paths.

Detection Methods for CVE-2026-31253

Indicators of Compromise

Checkpoint files (.pt, .pth, .ckpt) obtained from untrusted or unverified sources prior to patching.
Unexpected child processes spawned by Python training or evaluation jobs, particularly shells, network tools, or package installers.
Outbound network connections from training hosts to unfamiliar domains shortly after a checkpoint load operation.

Detection Strategies

Scan repositories and shared storage for torch.load() calls that omit weights_only=True using static analysis tools.
Use pickletools or equivalent inspection utilities to dump opcodes from suspect checkpoint files and flag REDUCE, GLOBAL, or STACK_GLOBAL references to dangerous modules such as os, subprocess, or posix.
Monitor process trees rooted in Python interpreters running flash-attention for anomalous descendant processes.

Monitoring Recommendations

Forward endpoint and host telemetry from GPU and training nodes to a centralized analytics platform for behavioral review.
Alert on file writes to autostart locations, SSH key paths, or cron entries from training workloads.
Correlate checkpoint file ingress events with subsequent process and network activity to identify weaponized files.

How to Mitigate CVE-2026-31253

Immediate Actions Required

Identify all hosts that run flash-attention and inventory checkpoint files loaded by load_checkpoint() or eval.py workflows.
Block ingestion of checkpoint files from untrusted sources until the framework is patched.
Audit recent training and evaluation runs for evidence of unexpected process or network activity following checkpoint loads.

Patch Information

No official patch commit is referenced in the published advisory at this time. Track the flash-attention GitHub repository for updates that add weights_only=True to the affected torch.load() calls. Until a fixed release is available, apply the workarounds below.

Workarounds

Modify local copies of checkpoint.py and eval.py to pass weights_only=True to every torch.load() invocation.
Restrict checkpoint loading to files originating from internal, signed, and access-controlled storage.
Run training and evaluation jobs inside isolated containers or sandboxes with no outbound network access and minimal filesystem privileges.
Verify checkpoint files with cryptographic signatures or hash allowlists before loading.

bash

# Configuration example: enforce safe checkpoint loading in patched code
# Replace vulnerable calls such as:
#   state = torch.load(path)
# with:
#   state = torch.load(path, weights_only=True, map_location="cpu")

# Optional: validate checkpoint integrity before loading
sha256sum /path/to/checkpoint.pt
# Compare against an internally maintained allowlist of approved hashes