CVE-2026-41486: Ray AI Compute Engine RCE Vulnerability

CVE-2026-41486 Overview

CVE-2026-41486 is a remote code execution vulnerability in Ray, an open-source AI compute engine maintained by the Ray project. The flaw affects Ray Data versions 2.54.0 up to (but not including) 2.55.0. Ray Data registers custom Arrow extension types (ray.data.arrow_tensor, ray.data.arrow_tensor_v2, ray.data.arrow_variable_shaped_tensor) globally in PyArrow. When PyArrow reads a Parquet file containing one of these extension types, it invokes __arrow_ext_deserialize__ on the field's metadata bytes. Ray's implementation passes those bytes directly to cloudpickle.loads(), triggering arbitrary code execution during schema parsing before any row data is read. The issue is classified under [CWE-94] (Improper Control of Generation of Code).

Critical Impact
A malicious Parquet file can execute attacker-controlled Python code in the Ray worker process the moment its schema is parsed, with no row data access required.

Affected Products

Ray Data 2.54.0
Ray Data versions up to (but not including) 2.55.0
Any application that reads untrusted Parquet files using Ray Data with these versions installed

Discovery Timeline

2026-05-08 - CVE-2026-41486 published to NVD
2026-05-13 - Last updated in NVD database

Technical Details for CVE-2026-41486

Vulnerability Analysis

The vulnerability lives in Ray Data's tensor extension implementation in python/ray/data/_internal/tensor_extensions/arrow.py. Ray registers three custom PyArrow extension types globally at import time. PyArrow stores per-field metadata for extension types inside the Parquet schema. When a consumer opens a Parquet file, PyArrow looks up the registered extension class and calls its __arrow_ext_deserialize__(storage_type, serialized) method, passing the raw bytes from the file. Ray's implementation forwards serialized directly to cloudpickle.loads(), which fully deserializes arbitrary Python objects, including __reduce__ gadgets that execute code. Because extension type registration is process-global, any code path that reads a Parquet file in a Python process with Ray Data imported is vulnerable, not only explicit ray.data.read_parquet calls.

Root Cause

The root cause is the unsafe use of cloudpickle.loads() on attacker-controlled bytes inside __arrow_ext_deserialize__. Pickle-family deserializers must never operate on untrusted input, and Arrow extension metadata travels with the file. The fix in pull request #62056 replaces pickle-based deserialization with a safe parser and is shipped in Ray 2.55.0.

Attack Vector

An attacker crafts a Parquet file whose schema declares a field using one of Ray's registered extension type names and embeds a malicious pickle payload in the extension metadata. The attacker then induces a victim Ray cluster, data pipeline, or notebook to read that file from a shared object store, S3 bucket, dataset registry, or HTTP URL. Code execution occurs during schema parsing, before any filtering, projection, or row read, so even a read_parquet followed by .schema() is sufficient.

python

# Patch excerpt: python/ray/data/_internal/tensor_extensions/arrow.py
# Source: https://github.com/ray-project/ray/commit/c02bd31ae31996805868baa446a131a8d304525f
 import functools
 import json
 import logging
+import os
 import sys
 import threading
 import warnings

The full patch removes the cloudpickle.loads() call path from __arrow_ext_deserialize__ and introduces a structured deserializer. See the security advisory GHSA-mw35-8rx3-xf9r for the complete diff.

Detection Methods for CVE-2026-41486

Indicators of Compromise

Parquet files whose schema metadata references ray.data.arrow_tensor, ray.data.arrow_tensor_v2, or ray.data.arrow_variable_shaped_tensor extension names from untrusted sources.
Ray worker or driver processes spawning unexpected child processes such as sh, bash, python -c, curl, or wget immediately after a read_parquet operation.
Outbound network connections from Ray nodes to unknown hosts shortly after dataset ingestion.

Detection Strategies

Inventory all Python environments and container images and flag any with ray==2.54.* installed alongside pyarrow.
Scan Parquet files at ingestion using a tool that inspects Arrow schema metadata and rejects files whose extension type payloads contain pickle opcodes (bytes beginning with \\x80).
Enable Python audit hooks (sys.addaudithook) on Ray workers to log pickle.find_class events during Parquet reads.

Monitoring Recommendations

Forward Ray driver, worker, and dashboard logs to a centralized analytics platform and alert on process-execution anomalies tied to dataset operations.
Monitor egress traffic from Ray clusters and baseline expected destinations for object storage; alert on deviations.
Track Ray version metadata across the fleet and alert when vulnerable versions appear after a patch deadline.

How to Mitigate CVE-2026-41486

Immediate Actions Required

Upgrade Ray to version 2.55.0 or later on all driver, worker, and head nodes, then rebuild and redeploy any container images that pin earlier versions.
Treat Parquet files from external partners, public buckets, and shared dataset hubs as untrusted until the upgrade is complete.
Audit recent Ray Data jobs that ingested third-party Parquet data on vulnerable versions and review host telemetry for signs of code execution.

Patch Information

The fix is delivered in Ray 2.55.0 via pull request #62056 and commit c02bd31. The patched __arrow_ext_deserialize__ no longer calls cloudpickle.loads() on file-supplied bytes. Full advisory details are in GHSA-mw35-8rx3-xf9r.

Workarounds

If upgrading immediately is not possible, restrict Ray Data ingestion to Parquet files produced by trusted internal pipelines and block reads from external object storage.
Run Ray workers under least-privilege service accounts and within sandboxed containers with no outbound internet access to limit blast radius.
Pre-validate Parquet schemas with a hardened Arrow reader in an isolated process that has pickle disabled before handing files to Ray.

bash

# Upgrade Ray Data to the patched release
pip install --upgrade "ray[data]>=2.55.0"

# Verify the installed version
python -c "import ray; print(ray.__version__)"