CVE-2025-64712: Unstructured Path Traversal Vulnerability

CVE-2025-64712 Overview

CVE-2025-64712 is a critical path traversal vulnerability affecting the Unstructured library, an open-source Python framework used for ingesting and pre-processing images and text documents including PDFs, HTML, Word docs, and many more file formats. Prior to version 0.18.18, a path traversal vulnerability in the partition_msg function allows an attacker to write or overwrite arbitrary files on the filesystem when processing malicious MSG files with attachments.

Critical Impact
Attackers can achieve arbitrary file write or overwrite capabilities on the filesystem by crafting malicious MSG files with specially constructed attachment filenames, potentially leading to remote code execution or system compromise.

Affected Products

Unstructured library versions prior to 0.18.18
Python applications using the partition_msg function for MSG file processing
Document processing pipelines handling untrusted MSG attachments

Discovery Timeline

2026-02-04 - CVE CVE-2025-64712 published to NVD
2026-02-27 - Last updated in NVD database

Technical Details for CVE-2025-64712

Vulnerability Analysis

This vulnerability stems from insufficient input validation in the MSG file attachment processing functionality within the Unstructured library. The partition_msg function fails to properly sanitize attachment filenames extracted from MSG files before using them in file system operations. When processing an MSG file, the library extracts attachment metadata including filenames, which are then used to determine where attachments are written on disk.

The lack of proper filename sanitization means that maliciously crafted filenames containing path traversal sequences (such as ../ for Unix systems or ..\ for Windows) are not neutralized. This allows an attacker to escape the intended directory and write files to arbitrary locations on the filesystem where the application has write permissions.

Root Cause

The root cause is a classic CWE-22 (Path Traversal) vulnerability where the file_name property of MSG attachments was directly returned without any sanitization. The original implementation simply returned self._attachment.file_name or "unknown" without removing path components, null bytes, or other potentially dangerous characters that could be used to traverse directories.

Attack Vector

An attacker can exploit this vulnerability by crafting a malicious MSG file containing attachments with specially crafted filenames. The attack is network-accessible, requiring no authentication or user interaction beyond the victim's application processing the malicious MSG file.

The attack scenario involves:

Creating an MSG file with an attachment whose filename contains path traversal sequences (e.g., ../../../etc/cron.d/malicious)
Delivering this MSG file to a target system running the vulnerable Unstructured library
When the application processes the MSG file using partition_msg, the attachment is written to an attacker-controlled path on the filesystem

python

        """The original name of the attached file, no path.
 
         This value is 'unknown' if it is not present in the MSG file (not expected).
+        The filename is sanitized to prevent path traversal attacks.
         """
-        return self._attachment.file_name or "unknown"
+        raw_filename = self._attachment.file_name or "unknown"
+
+        # Sanitize the filename to prevent path traversal attacks
+        # Remove any path components for both Unix and Windows paths
+        # Use both separators to handle cross-platform attacks
+        safe_filename = os.path.basename(raw_filename.replace("\\", "/"))
+
+        # Remove null bytes and other control characters
+        safe_filename = safe_filename.replace("\0", "")
+
+        # If the filename becomes empty after sanitization, use a default
+        if not safe_filename or safe_filename in (".", ".."):
+            safe_filename = "unknown"
+
+        return safe_filename
 
     @lazyproperty
     def _attachment_last_modified(self) -> str | None:

Source: GitHub Commit Update

Detection Methods for CVE-2025-64712

Indicators of Compromise

Unexpected files appearing in sensitive directories such as /etc/, /var/, or application configuration paths
MSG files containing attachments with suspicious filenames including ../ or ..\ sequences
File system events showing write operations to paths outside expected document processing directories
Log entries indicating MSG file processing followed by unexpected file creation events

Detection Strategies

Implement file integrity monitoring on critical system directories and application paths
Deploy runtime application security monitoring to detect path traversal attempts during file operations
Analyze MSG files at ingestion points for attachments with path traversal patterns in filenames
Monitor Python application logs for unusual file write operations during document processing

Monitoring Recommendations

Configure alerts for file creation or modification in sensitive system directories during MSG processing workflows
Implement behavioral analysis to baseline normal file write patterns for document processing applications
Deploy endpoint detection and response (EDR) solutions capable of correlating MSG file processing with suspicious filesystem activity
Enable audit logging for all file system operations in environments processing untrusted documents

How to Mitigate CVE-2025-64712

Immediate Actions Required

Upgrade the Unstructured library to version 0.18.18 or later immediately
Audit existing deployments to identify systems running vulnerable versions using pip show unstructured
Review file systems for evidence of exploitation, particularly unexpected files in sensitive directories
Implement network-level controls to quarantine MSG files pending library updates

Patch Information

The vulnerability has been patched in version 0.18.18 of the Unstructured library. The fix implements proper filename sanitization by using os.path.basename() to extract only the filename component, replacing backslashes with forward slashes to handle cross-platform path traversal attempts, removing null bytes and control characters, and defaulting to "unknown" for empty or dangerous filenames. See the GitHub Security Advisory for complete details.

Workarounds

If immediate patching is not possible, implement input validation at the application layer to reject MSG files with suspicious attachment filenames
Process MSG files in isolated sandbox environments with restricted filesystem permissions
Configure application-level controls to limit write operations to specific directories only
Deploy web application firewalls or content filters to scan incoming MSG files for path traversal patterns in attachment metadata

bash

# Upgrade to patched version
pip install --upgrade unstructured>=0.18.18

# Verify installed version
pip show unstructured | grep Version

# Check for vulnerable installations in virtual environments
find /opt -name "unstructured*" -exec pip show unstructured \;