CVE-2025-54920: Apache Spark RCE Vulnerability

CVE-2025-54920 Overview

CVE-2025-54920 is an insecure deserialization vulnerability [CWE-502] in the Apache Spark History Web UI. The flaw stems from overly permissive Jackson polymorphic deserialization of event log data. An attacker with write access to the Spark event logs directory can inject crafted JSON payloads that cause the History Server to instantiate arbitrary Java classes, leading to code execution on the host. Apache Spark 3.5.4 and earlier, plus 4.0.0 release candidates and 4.0.1-rc1, are affected. Apache patched the issue in versions 3.5.7 and 4.0.1.

Critical Impact
An authenticated attacker with write access to Spark event logs can achieve arbitrary code execution on the Spark History Server, potentially compromising the entire host.

Affected Products

Apache Spark versions before 3.5.7
Apache Spark 4.0.0 (including release candidates rc1 through rc7)
Apache Spark 4.0.1-rc1

Discovery Timeline

2026-03-16 - CVE-2025-54920 published to NVD
2026-03-20 - Last updated in NVD database

Technical Details for CVE-2025-54920

Vulnerability Analysis

The Spark History Server reads JSON-formatted event logs to reconstruct historical job state for the web UI. To support its extensible event model, Spark uses Jackson polymorphic deserialization configured with @JsonTypeInfo.Id.CLASS on SparkListenerEvent objects. This configuration allows the JSON payload itself to declare which Java class should be instantiated during deserialization.

The deserializer does not restrict the set of acceptable target classes. An attacker who can write to the event log directory can therefore force the server to instantiate arbitrary classes from the classpath. Classes such as org.apache.hive.jdbc.HiveConnection perform network operations during construction, providing a primitive for outbound connections and further exploitation chains.

The vulnerability is reachable whenever the History Server starts up or loads an attacker-controlled event log file. Because deserialization occurs server-side with the privileges of the Spark History Server process, successful exploitation can lead to full host compromise.

Root Cause

The root cause is unsafe Jackson polymorphic type handling. Spark trusts the "Event" field in JSON event logs as a fully qualified class name and passes it to Jackson for instantiation. No allow-list or class hierarchy check constrains which classes can be loaded, violating safe deserialization practices defined in [CWE-502].

Attack Vector

Exploitation requires write access to the directory containing Spark event logs (commonly spark-logs). The attacker prepends or modifies an event log file with a JSON object whose "Event" value points to a sensitive class. The published proof of concept references org.apache.hive.jdbc.HiveConnection with attacker-controlled uri and hive.metastore.uris parameters. When the History Server processes the log, it instantiates HiveConnection, which establishes a JDBC connection to the attacker-controlled endpoint and enables further command injection through the Hive client.

Detailed PoC content is available in the Apache mailing list advisory and the upstream fixes in GitHub Pull Request 51312 and GitHub Pull Request 51323.

Detection Methods for CVE-2025-54920

Indicators of Compromise

Event log files under the Spark History Server log directory containing "Event" values that are fully qualified Java class names other than the expected org.apache.spark.scheduler.* event types.
JSON payloads referencing org.apache.hive.jdbc.HiveConnection, JDBC URIs such as jdbc:hive2://, or thrift:// metastore endpoints inside Spark event logs.
Unexpected outbound network connections originating from the Spark History Server JVM process to untrusted hosts.
Child processes spawned by the History Server that are inconsistent with normal Spark operation.

Detection Strategies

Scan event log directories for JSON objects whose "Event" field does not match the documented Spark listener event class names.
Monitor the Spark History Server process for new TCP connections, particularly to JDBC, Thrift, or LDAP endpoints not previously observed.
Enable JVM-level auditing or Java Flight Recorder on History Server hosts to capture reflective class loading from deserialized event logs.

Monitoring Recommendations

Forward Spark History Server logs and host process telemetry to a centralized analytics platform for behavioral baselining.
Alert on file writes to the event log directory by any identity other than the Spark driver service account.
Track installed Spark versions across the environment to confirm patch coverage for 3.5.7 and 4.0.1.

How to Mitigate CVE-2025-54920

Immediate Actions Required

Upgrade Apache Spark to version 3.5.7 or 4.0.1, which constrain Jackson deserialization to safe SparkListenerEvent subtypes.
Restrict write permissions on the Spark event log directory to the Spark driver service account only.
Audit existing event log directories for tampered files and remove any entries containing non-Spark class names in the "Event" field.
Rotate credentials and review network egress logs on History Server hosts if tampering is detected.

Patch Information

Apache addressed the issue in Spark 3.5.7 and 4.0.1. The fixes, tracked in Apache JIRA SPARK-52381 and merged through GitHub Pull Request 51312 and GitHub Pull Request 51323, replace permissive polymorphic deserialization with a constrained type registry that rejects arbitrary class names. Additional context is available in the OpenWall OSS Security discussion.

Workarounds

Disable the Spark History Server in environments where upgrading is not immediately feasible.
Set filesystem ACLs so that only the Spark driver identity can write to the configured spark.eventLog.dir, and make the directory read-only for the History Server service account.
Place the History Server on an isolated network segment with strict egress filtering to block outbound JDBC, Thrift, and LDAP traffic to untrusted destinations.

bash

# Configuration example: restrict event log directory permissions on HDFS
hdfs dfs -chown spark:spark /spark-logs
hdfs dfs -chmod 730 /spark-logs

# Local filesystem equivalent
chown spark:spark /var/log/spark-events
chmod 730 /var/log/spark-events