CVE-2024-0520 Overview
A critical path traversal vulnerability exists in MLflow version 8.2.1 that enables remote code execution through improper neutralization of special elements in the mlflow.data.http_dataset_source.py module. When loading a dataset from a source URL with an HTTP scheme, the filename extracted from the Content-Disposition header or the URL path is used to generate the final file path without proper sanitization. This flaw allows attackers to fully control the file path using path traversal sequences (e.g., ../../tmp/poc.txt) or absolute paths (e.g., /tmp/poc.txt), leading to arbitrary file write conditions that can be exploited for command execution.
Critical Impact
Attackers can achieve arbitrary file write leading to remote code execution, potentially compromising ML model data, training datasets, and underlying server infrastructure.
Affected Products
- MLflow versions prior to 2.9.0
- lfprojects mlflow (all configurations using HTTP dataset sources)
- Systems utilizing mlflow.data.http_dataset_source.py module
Discovery Timeline
- 2024-06-06 - CVE-2024-0520 published to NVD
- 2025-10-15 - Last updated in NVD database
Technical Details for CVE-2024-0520
Vulnerability Analysis
This vulnerability represents a classic path traversal attack combined with arbitrary file write capabilities. The root issue lies in the HTTP dataset source handler within MLflow, which processes remote dataset URLs and downloads content to local storage. The module fails to properly validate and sanitize filenames extracted from either the Content-Disposition HTTP response header or directly from the URL path.
An attacker controlling the remote HTTP server can craft malicious responses that include path traversal sequences in the filename. When MLflow processes this response, it constructs a file path using the unsanitized filename, allowing the attacker to write files to arbitrary locations on the filesystem. This arbitrary file write primitive can be escalated to remote code execution by overwriting critical application files, configuration files, or placing executable content in locations that will be subsequently executed by the system or application.
The attack requires user interaction in that a victim must load a dataset from an attacker-controlled URL, but once initiated, the exploitation is straightforward and reliable.
Root Cause
The vulnerability stems from insufficient input validation in the mlflow.data.http_dataset_source.py module. Specifically, the code extracts filenames from HTTP responses without checking for or removing path traversal sequences such as ../ or handling absolute paths. The module trusts the externally-provided filename data to construct local filesystem paths, violating the principle of never trusting user-controlled input for security-sensitive operations.
Attack Vector
The attack is network-based and requires minimal complexity. An attacker must:
- Set up a malicious HTTP server that responds with crafted Content-Disposition headers containing path traversal payloads
- Convince a user or automated process to load a dataset from this malicious URL using MLflow's HTTP dataset source functionality
- The malicious filename bypasses validation and allows arbitrary file write
- The attacker leverages the file write to achieve code execution (e.g., by overwriting startup scripts, cron jobs, or application configuration)
The vulnerability is particularly dangerous in ML pipeline environments where automated data loading from external sources is common practice. An attacker could craft payloads like ../../.bashrc to inject malicious commands that execute on user login, or target application-specific files for immediate code execution.
Detection Methods for CVE-2024-0520
Indicators of Compromise
- Unusual file creation or modification in sensitive directories such as /tmp, /etc, or user home directories originating from MLflow processes
- HTTP requests to MLflow instances that subsequently trigger file writes outside the expected dataset storage locations
- Log entries showing dataset loading operations with suspicious filenames containing ../ sequences or absolute paths
- Unexpected files appearing in system directories with timestamps correlating to MLflow dataset operations
Detection Strategies
- Implement file integrity monitoring (FIM) on critical system and application directories to detect unauthorized file writes
- Monitor MLflow process file system activity for writes outside designated dataset directories
- Deploy network monitoring to identify MLflow HTTP dataset requests to external or unknown hosts
- Configure logging to capture full dataset source URLs and extracted filenames for forensic analysis
Monitoring Recommendations
- Enable detailed audit logging for all file creation and modification events by MLflow processes
- Implement anomaly detection for MLflow network connections to identify requests to potentially malicious external sources
- Set up alerts for file write attempts targeting sensitive paths such as /etc, /root, or application configuration directories
- Monitor for execution of newly created files in non-standard locations following MLflow dataset operations
How to Mitigate CVE-2024-0520
Immediate Actions Required
- Upgrade MLflow to version 2.9.0 or later immediately, as this version contains the security fix for CVE-2024-0520
- Audit recent MLflow dataset loading operations for any signs of exploitation, particularly checking for unexpected files in sensitive directories
- Restrict MLflow's filesystem permissions to limit the impact of potential exploitation
- Implement network egress controls to restrict MLflow's ability to fetch datasets from untrusted external sources
Patch Information
The vulnerability is addressed in MLflow version 2.9.0. The fix implements proper filename sanitization to prevent path traversal attacks when processing HTTP dataset sources. The security patch is available in the GitHub commit 400c226953b4568f4361bc0a0c223511652c2b9d.
Organizations should upgrade to the patched version using standard Python package management:
pip install --upgrade mlflow>=2.9.0
Additional details about the vulnerability discovery and disclosure can be found in the Huntr bounty report.
Workarounds
- If immediate patching is not possible, restrict MLflow dataset loading to only trusted internal URLs via network controls or application configuration
- Implement a reverse proxy or web application firewall to inspect and sanitize incoming dataset URLs before they reach MLflow
- Run MLflow processes in containerized environments with read-only root filesystems and limited writable directories
- Disable HTTP dataset source functionality if not required for your ML workflows
# Example: Restrict MLflow container filesystem access (Docker)
docker run --read-only \
--tmpfs /tmp:rw,noexec,nosuid \
-v /safe/mlflow/data:/mlflow/data:rw \
mlflow:latest
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

