CVE-2026-26831: textract OS Command Injection Vulnerability

CVE-2026-26831 Overview

CVE-2026-26831 is an OS Command Injection vulnerability affecting textract through version 2.5.0. The vulnerability exists in the file path parameter handling across multiple extractors within the Node.js text extraction library. When processing files with malicious filenames, the filePath parameter is passed directly to child_process.exec() in multiple extractor modules including lib/extractors/doc.js, rtf.js, dxf.js, images.js, and lib/util.js without adequate sanitization.

Critical Impact
Attackers can achieve arbitrary OS command execution by crafting malicious filenames that escape the command context and inject shell commands, potentially leading to complete system compromise.

Affected Products

textract versions through 2.5.0
Node.js applications using the vulnerable textract npm package
Systems processing user-supplied files with textract extractors

Discovery Timeline

2026-03-25 - CVE-2026-26831 published to NVD
2026-03-26 - Last updated in NVD database

Technical Details for CVE-2026-26831

Vulnerability Analysis

The textract library is designed to extract text from various document formats including DOC, RTF, DXF, and image files. The vulnerability stems from improper handling of user-controlled file paths that are concatenated directly into shell commands executed via Node.js child_process.exec() function.

When a file is processed by textract, the filename or path is incorporated into command strings that invoke external tools (such as antiword for DOC files, unrtf for RTF files, etc.). Because the library does not properly sanitize or escape special shell metacharacters in the file path, an attacker who can control the filename can inject arbitrary shell commands.

For example, a filename containing shell metacharacters like backticks, semicolons, or pipe characters could break out of the intended command context and execute attacker-controlled commands with the privileges of the Node.js process.

Root Cause

The root cause is the use of child_process.exec() with unsanitized user input. The exec() function spawns a shell to execute commands, making it vulnerable to command injection when user-controlled data is interpolated into the command string. The affected extractor modules in lib/extractors/doc.js, lib/extractors/rtf.js, lib/extractors/dxf.js, lib/extractors/images.js, and lib/util.js all exhibit this unsafe pattern. Proper mitigation would require using child_process.execFile() or child_process.spawn() with arguments passed as an array, which avoids shell interpretation of the input.

Attack Vector

An attacker can exploit this vulnerability by providing a maliciously crafted filename to any application using the textract library. The attack requires the ability to supply or influence filenames that are processed by textract. Common attack scenarios include:

Web applications that accept file uploads and extract text using textract
Document processing pipelines that handle files from untrusted sources
APIs that process files with user-controlled names

The malicious filename would contain shell metacharacters that, when passed to exec(), would execute arbitrary commands. For technical details on the vulnerable code patterns, refer to the textract doc.js extractor and utility functions.

Detection Methods for CVE-2026-26831

Indicators of Compromise

Unusual child process spawning from Node.js applications using textract
Files with unusual names containing shell metacharacters (;, |, `, $(), etc.) being processed
Unexpected network connections or file system modifications originating from Node.js processes
Command execution logs showing suspicious commands invoked alongside document extraction tools

Detection Strategies

Monitor for anomalous process trees where Node.js spawns unexpected child processes
Implement application-level logging to track filenames processed by textract
Use static analysis tools to identify child_process.exec() calls with user-controlled input in Node.js applications
Deploy runtime application self-protection (RASP) to detect command injection attempts

Monitoring Recommendations

Enable detailed audit logging for file operations in applications using textract
Monitor for processes executed with unusual command-line arguments
Review system calls and process execution patterns in containerized environments
Alert on any shell command execution anomalies from document processing services

How to Mitigate CVE-2026-26831

Immediate Actions Required

Audit all applications using the textract npm package for exposure to this vulnerability
Implement strict filename sanitization before passing files to textract
Consider using alternative text extraction libraries that do not use shell execution
Restrict file upload functionality to only accept files with validated, sanitized filenames

Patch Information

At the time of publication, no official patch has been released for this vulnerability. Monitor the textract GitHub repository and npm package page for security updates. For detailed vulnerability information, refer to the CVE-2026-26831 disclosure.

Workarounds

Implement a strict allowlist for filename characters, rejecting any files with shell metacharacters
Sanitize all filenames by removing or escaping special characters before processing with textract
Run textract processing in an isolated container or sandbox environment with minimal privileges
Fork the textract library and replace exec() calls with safer alternatives like execFile() or spawn() with argument arrays

bash

# Example: Sanitize filenames before processing
# Remove shell metacharacters from uploaded filenames
sanitized_filename=$(echo "$filename" | tr -cd '[:alnum:]._-')

CVE-2026-26831 Overview

Critical Impact
Attackers can achieve arbitrary OS command execution by crafting malicious filenames that escape the command context and inject shell commands, potentially leading to complete system compromise.

Affected Products

textract versions through 2.5.0
Node.js applications using the vulnerable textract npm package
Systems processing user-supplied files with textract extractors

Discovery Timeline

2026-03-25 - CVE-2026-26831 published to NVD
2026-03-26 - Last updated in NVD database

Technical Details for CVE-2026-26831

Vulnerability Analysis

Root Cause

Attack Vector

Web applications that accept file uploads and extract text using textract
Document processing pipelines that handle files from untrusted sources
APIs that process files with user-controlled names

Detection Methods for CVE-2026-26831

Indicators of Compromise

Unusual child process spawning from Node.js applications using textract
Files with unusual names containing shell metacharacters (;, |, `, $(), etc.) being processed
Unexpected network connections or file system modifications originating from Node.js processes
Command execution logs showing suspicious commands invoked alongside document extraction tools

Detection Strategies

Monitor for anomalous process trees where Node.js spawns unexpected child processes
Implement application-level logging to track filenames processed by textract
Use static analysis tools to identify child_process.exec() calls with user-controlled input in Node.js applications
Deploy runtime application self-protection (RASP) to detect command injection attempts

Monitoring Recommendations

Enable detailed audit logging for file operations in applications using textract
Monitor for processes executed with unusual command-line arguments
Review system calls and process execution patterns in containerized environments
Alert on any shell command execution anomalies from document processing services

How to Mitigate CVE-2026-26831

Immediate Actions Required

Audit all applications using the textract npm package for exposure to this vulnerability
Implement strict filename sanitization before passing files to textract
Consider using alternative text extraction libraries that do not use shell execution
Restrict file upload functionality to only accept files with validated, sanitized filenames

Patch Information

Workarounds

Implement a strict allowlist for filename characters, rejecting any files with shell metacharacters
Sanitize all filenames by removing or escaping special characters before processing with textract
Run textract processing in an isolated container or sandbox environment with minimal privileges
Fork the textract library and replace exec() calls with safer alternatives like execFile() or spawn() with argument arrays

bash

# Example: Sanitize filenames before processing
# Remove shell metacharacters from uploaded filenames
sanitized_filename=$(echo "$filename" | tr -cd '[:alnum:]._-')

CVE-2026-26831: textract OS Command Injection Vulnerability

CVE-2026-26831 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2026-26831

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2026-26831

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2026-26831

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform

CVE-2026-26831: textract OS Command Injection Vulnerability

CVE-2026-26831 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2026-26831

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2026-26831

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2026-26831

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform