CVE-2026-26831 Overview
CVE-2026-26831 is an OS Command Injection vulnerability affecting textract through version 2.5.0. The vulnerability exists in the file path parameter handling across multiple extractors within the Node.js text extraction library. When processing files with malicious filenames, the filePath parameter is passed directly to child_process.exec() in multiple extractor modules including lib/extractors/doc.js, rtf.js, dxf.js, images.js, and lib/util.js without adequate sanitization.
Critical Impact
Attackers can achieve arbitrary OS command execution by crafting malicious filenames that escape the command context and inject shell commands, potentially leading to complete system compromise.
Affected Products
- textract versions through 2.5.0
- Node.js applications using the vulnerable textract npm package
- Systems processing user-supplied files with textract extractors
Discovery Timeline
- 2026-03-25 - CVE-2026-26831 published to NVD
- 2026-03-26 - Last updated in NVD database
Technical Details for CVE-2026-26831
Vulnerability Analysis
The textract library is designed to extract text from various document formats including DOC, RTF, DXF, and image files. The vulnerability stems from improper handling of user-controlled file paths that are concatenated directly into shell commands executed via Node.js child_process.exec() function.
When a file is processed by textract, the filename or path is incorporated into command strings that invoke external tools (such as antiword for DOC files, unrtf for RTF files, etc.). Because the library does not properly sanitize or escape special shell metacharacters in the file path, an attacker who can control the filename can inject arbitrary shell commands.
For example, a filename containing shell metacharacters like backticks, semicolons, or pipe characters could break out of the intended command context and execute attacker-controlled commands with the privileges of the Node.js process.
Root Cause
The root cause is the use of child_process.exec() with unsanitized user input. The exec() function spawns a shell to execute commands, making it vulnerable to command injection when user-controlled data is interpolated into the command string. The affected extractor modules in lib/extractors/doc.js, lib/extractors/rtf.js, lib/extractors/dxf.js, lib/extractors/images.js, and lib/util.js all exhibit this unsafe pattern. Proper mitigation would require using child_process.execFile() or child_process.spawn() with arguments passed as an array, which avoids shell interpretation of the input.
Attack Vector
An attacker can exploit this vulnerability by providing a maliciously crafted filename to any application using the textract library. The attack requires the ability to supply or influence filenames that are processed by textract. Common attack scenarios include:
- Web applications that accept file uploads and extract text using textract
- Document processing pipelines that handle files from untrusted sources
- APIs that process files with user-controlled names
The malicious filename would contain shell metacharacters that, when passed to exec(), would execute arbitrary commands. For technical details on the vulnerable code patterns, refer to the textract doc.js extractor and utility functions.
Detection Methods for CVE-2026-26831
Indicators of Compromise
- Unusual child process spawning from Node.js applications using textract
- Files with unusual names containing shell metacharacters (;, |, `, $(), etc.) being processed
- Unexpected network connections or file system modifications originating from Node.js processes
- Command execution logs showing suspicious commands invoked alongside document extraction tools
Detection Strategies
- Monitor for anomalous process trees where Node.js spawns unexpected child processes
- Implement application-level logging to track filenames processed by textract
- Use static analysis tools to identify child_process.exec() calls with user-controlled input in Node.js applications
- Deploy runtime application self-protection (RASP) to detect command injection attempts
Monitoring Recommendations
- Enable detailed audit logging for file operations in applications using textract
- Monitor for processes executed with unusual command-line arguments
- Review system calls and process execution patterns in containerized environments
- Alert on any shell command execution anomalies from document processing services
How to Mitigate CVE-2026-26831
Immediate Actions Required
- Audit all applications using the textract npm package for exposure to this vulnerability
- Implement strict filename sanitization before passing files to textract
- Consider using alternative text extraction libraries that do not use shell execution
- Restrict file upload functionality to only accept files with validated, sanitized filenames
Patch Information
At the time of publication, no official patch has been released for this vulnerability. Monitor the textract GitHub repository and npm package page for security updates. For detailed vulnerability information, refer to the CVE-2026-26831 disclosure.
Workarounds
- Implement a strict allowlist for filename characters, rejecting any files with shell metacharacters
- Sanitize all filenames by removing or escaping special characters before processing with textract
- Run textract processing in an isolated container or sandbox environment with minimal privileges
- Fork the textract library and replace exec() calls with safer alternatives like execFile() or spawn() with argument arrays
# Example: Sanitize filenames before processing
# Remove shell metacharacters from uploaded filenames
sanitized_filename=$(echo "$filename" | tr -cd '[:alnum:]._-')
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


