CVE-2022-25168 Overview
CVE-2022-25168 is a command injection vulnerability in Apache Hadoop's FileUtil.unTar(File, File) API. The function does not escape input file names before passing them to the shell, allowing attackers to inject arbitrary commands [CWE-78]. In Hadoop 2.x, this API is used during YARN localization, which exposes the flaw to remote code execution by remote attackers submitting jobs. In Hadoop 3.3, the API is reached through InMemoryAliasMap.completeBootstrapTransfer, which is only invoked by a local user. Apache Spark is also affected through the ADD ARCHIVE SQL command, although this path does not grant new permissions since ADD ARCHIVE already loads binaries into the classpath.
Critical Impact
Remote attackers can execute arbitrary shell commands on Hadoop 2.x clusters through YARN localization by submitting crafted file names containing shell metacharacters.
Affected Products
- Apache Hadoop 2.x prior to 2.10.2
- Apache Hadoop 3.2.x prior to 3.2.4
- Apache Hadoop 3.3.x prior to 3.3.3
Discovery Timeline
- 2022-08-04 - CVE-2022-25168 published to NVD
- 2024-11-21 - Last updated in NVD database
Technical Details for CVE-2022-25168
Vulnerability Analysis
The vulnerability resides in Apache Hadoop's FileUtil.unTar(File, File) API, which extracts tar archives by invoking the system shell. The implementation passes the input file name directly to a shell command without sanitization or escaping. When the file name contains shell metacharacters such as backticks, semicolons, or $(), those constructs are interpreted by the shell rather than treated as literal path components. This results in attacker-controlled command execution under the privileges of the Hadoop process.
The exposure depends on which code path reaches unTar. In Hadoop 2.x, YARN localization downloads and extracts archives supplied by job submitters, which makes the flaw reachable across the network. In Hadoop 3.3, the only caller is InMemoryAliasMap.completeBootstrapTransfer, which runs in a local context. Apache Spark's ADD ARCHIVE SQL command also exercises the vulnerable path, although that command already permits adding arbitrary binaries to the classpath.
Root Cause
The root cause is missing input sanitization before constructing a shell command string. FileUtil.unTar concatenates the file name into a shell invocation rather than executing the extraction utility with arguments passed as a separate array. This pattern is a textbook OS Command Injection weakness [CWE-78].
Attack Vector
In vulnerable Hadoop 2.x deployments, an attacker submits a YARN application referencing a localized resource whose file name embeds shell metacharacters. When the NodeManager localizes the resource and invokes FileUtil.unTar, the embedded commands execute on the cluster node. No prior authentication is required if the cluster accepts unauthenticated job submissions, and the executed commands run with the privileges of the Hadoop service account.
No verified public exploit code is associated with this CVE. See the Apache Mailing List Discussion for the official advisory and remediation context.
Detection Methods for CVE-2022-25168
Indicators of Compromise
- Unexpected child processes spawned by Hadoop daemons such as NodeManager, ResourceManager, or Spark executors, particularly shells like /bin/sh -c with unusual command strings.
- File or archive names submitted to YARN or Spark jobs that contain shell metacharacters including backticks, ;, &&, |, or $().
- Outbound network connections from Hadoop worker nodes to unfamiliar hosts shortly after job submission or archive localization.
Detection Strategies
- Inspect Hadoop application submission logs and YARN localization logs for archive resources with suspicious file names.
- Correlate process execution telemetry on cluster nodes to identify shell invocations descended from JVM processes running Hadoop or Spark.
- Hunt for anomalous command-line patterns originating from FileUtil.unTar execution paths during archive extraction events.
Monitoring Recommendations
- Enable audit logging on the Hadoop ResourceManager and NodeManager and forward logs to a centralized analytics platform.
- Monitor Spark SQL query history for ADD ARCHIVE statements referencing externally controlled paths.
- Alert on process lineage where Java Hadoop processes spawn sh, bash, tar, or other shell utilities outside of normal job execution.
How to Mitigate CVE-2022-25168
Immediate Actions Required
- Upgrade Apache Hadoop to 2.10.2, 3.2.4, 3.3.3, or later, all of which include the HADOOP-18136 fix.
- Upgrade Apache Spark to 3.1.4, 3.2.2, 3.3.0, or later to pick up SPARK-38305, which validates file existence before extraction and prevents shell execution regardless of the Hadoop library version.
- Restrict YARN job submission to authenticated, trusted users and enable Kerberos authentication on the cluster.
Patch Information
The Apache Software Foundation addressed the issue under HADOOP-18136 by escaping file names before they reach the shell. Fixed versions are Apache Hadoop 2.10.2, 3.2.4, and 3.3.3. The Apache Spark project independently mitigated the issue in SPARK-38305, included in Spark 3.1.4, 3.2.2, and 3.3.0. Refer to the Apache Mailing List Discussion and NetApp Security Advisory NTAP-20220915-0007 for product-specific guidance.
Workarounds
- Validate or reject archive file names containing shell metacharacters before passing them to Hadoop or Spark APIs.
- Disable or restrict YARN localization of user-supplied archives where feasible until patched versions are deployed.
- Apply network segmentation to ensure Hadoop services are not exposed to untrusted networks while remediation is in progress.
# Verify installed Hadoop version and upgrade if below the fixed releases
hadoop version
# Example: upgrade Hadoop package on a Debian-based system
sudo apt-get update
sudo apt-get install --only-upgrade hadoop=3.3.3
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


