CVE-2026-35375 Overview
A logic error in the split utility of uutils coreutils causes the corruption of output filenames when provided with non-UTF-8 prefix or suffix inputs. The implementation utilizes to_string_lossy() when constructing chunk filenames, which automatically rewrites invalid byte sequences into the UTF-8 replacement character (U+FFFD). This behavior diverges from GNU split, which preserves raw pathname bytes intact. In environments utilizing non-UTF-8 encodings, this vulnerability leads to the creation of files with incorrect names, potentially causing filename collisions, broken automation, or the misdirection of output data.
Critical Impact
Filename corruption in non-UTF-8 environments can lead to data misdirection, filename collisions, and broken automation pipelines relying on predictable output paths.
Affected Products
- uutils coreutils versions prior to 0.8.0
Discovery Timeline
- 2026-04-22 - CVE CVE-2026-35375 published to NVD
- 2026-04-22 - Last updated in NVD database
Technical Details for CVE-2026-35375
Vulnerability Analysis
This vulnerability stems from improper Unicode encoding handling in the uutils coreutils implementation of the split utility. The core issue relates to CWE-176 (Improper Handling of Unicode Encoding), where the application fails to preserve raw byte sequences when constructing output filenames.
When a user provides a prefix or suffix containing non-UTF-8 byte sequences to the split command, the utility internally converts these values using Rust's to_string_lossy() method. This method is designed for safe string conversion but replaces any invalid UTF-8 sequences with the Unicode replacement character (U+FFFD, displayed as �). While this prevents crashes from invalid UTF-8, it fundamentally alters the intended filename.
The practical impact manifests in systems using legacy encodings (such as ISO-8859-1, Shift-JIS, or other non-UTF-8 character sets) where filenames may contain bytes that are valid in those encodings but invalid in UTF-8. Automation scripts expecting specific output filenames will fail, and multiple split operations could create filename collisions if different non-UTF-8 prefixes are normalized to the same replacement character sequence.
Root Cause
The root cause is the use of to_string_lossy() for filename construction in the split utility, which performs lossy UTF-8 conversion rather than preserving raw bytes. This design choice prioritizes UTF-8 string safety over filename fidelity, diverging from the POSIX-compatible behavior of GNU coreutils which treats filenames as raw byte sequences without encoding assumptions.
Attack Vector
The vulnerability requires local access and can be triggered by invoking the split command with prefix or suffix arguments containing non-UTF-8 byte sequences. An attacker with local privileges could exploit this to:
- Cause automation scripts to fail by making output filenames unpredictable
- Create filename collisions that could overwrite existing files
- Misdirect output data to unexpected file paths
The attack surface is limited to local command execution, requiring the attacker to have the ability to run commands or influence command-line arguments passed to the split utility.
Detection Methods for CVE-2026-35375
Indicators of Compromise
- Presence of files containing the UTF-8 replacement character (�) in unexpected locations
- Split output files with names that do not match expected prefix/suffix patterns
- Automation or script failures related to missing or incorrectly named split output files
- File collision warnings when running split operations with non-ASCII prefixes
Detection Strategies
- Audit systems for uutils coreutils installations and verify version numbers are 0.8.0 or later
- Review automation scripts that utilize the split command with dynamic or user-supplied prefix/suffix arguments
- Implement file integrity monitoring on directories where split operations write output files
- Search for anomalous filenames containing U+FFFD replacement characters in split output directories
Monitoring Recommendations
- Monitor for unexpected file creation patterns in directories used by split operations
- Configure alerts for script failures in pipelines that depend on predictable split output filenames
- Track uutils coreutils version deployments across infrastructure to identify unpatched systems
- Review logs for split command invocations with non-ASCII arguments in multi-language environments
How to Mitigate CVE-2026-35375
Immediate Actions Required
- Upgrade uutils coreutils to version 0.8.0 or later which addresses this filename handling issue
- Audit scripts and automation that use the split command with non-ASCII prefix or suffix values
- Temporarily switch to GNU coreutils split if non-UTF-8 filename preservation is critical for operations
- Validate that split output filenames match expected patterns before downstream processing
Patch Information
The vulnerability has been addressed in uutils coreutils version 0.8.0. The fix modifies the filename construction logic to properly handle non-UTF-8 byte sequences. For detailed information about the patch, see the GitHub Pull Request #11397. The patched version is available from the GitHub Release 0.8.0.
Workarounds
- Use GNU coreutils split instead of uutils coreutils in environments requiring non-UTF-8 filename support
- Ensure all prefix and suffix arguments to split are valid UTF-8 strings
- Implement wrapper scripts that validate prefix/suffix arguments for UTF-8 compliance before passing to split
- Pre-process filenames to encode non-UTF-8 sequences in a reversible format (such as URL encoding) before use with split
# Workaround: Validate UTF-8 compliance of prefix before using split
PREFIX="your_prefix_here"
if echo "$PREFIX" | iconv -f UTF-8 -t UTF-8 > /dev/null 2>&1; then
split -d --additional-suffix=.txt "$PREFIX" inputfile
else
echo "Warning: Non-UTF-8 prefix detected, using GNU split instead"
/usr/bin/split -d --additional-suffix=.txt "$PREFIX" inputfile
fi
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


