CVE-2026-35346 Overview
The comm utility in uutils coreutils contains a data integrity vulnerability that silently corrupts output data by performing lossy UTF-8 conversion on all output lines. The implementation uses String::from_utf8_lossy(), which replaces invalid UTF-8 byte sequences with the Unicode replacement character (U+FFFD). This behavior differs from GNU comm, which processes raw bytes and preserves the original input. As a result, corrupted output occurs when the utility is used to compare binary files or files using non-UTF-8 legacy encodings.
Critical Impact
Silent data corruption in file comparison operations can lead to incorrect results when processing binary data or legacy encoded files, potentially causing data integrity issues in automated pipelines and scripts that rely on accurate file comparisons.
Affected Products
- uutils coreutils (versions prior to 0.6.0)
Discovery Timeline
- 2026-04-22 - CVE CVE-2026-35346 published to NVD
- 2026-04-22 - Last updated in NVD database
Technical Details for CVE-2026-35346
Vulnerability Analysis
This vulnerability falls under CWE-176 (Improper Handling of Unicode Encoding), representing an input validation error where the comm utility improperly handles byte sequences that do not conform to UTF-8 encoding standards. The core issue stems from a design decision to use Rust's String::from_utf8_lossy() function for processing input data.
When processing files, the comm utility reads input and converts it to UTF-8 strings. The from_utf8_lossy() function, while convenient for handling potentially malformed text, replaces any byte sequence that doesn't represent valid UTF-8 with the Unicode replacement character (U+FFFD, displayed as �). This lossy conversion occurs silently without any warning or error message to the user, making it difficult to detect when data corruption has occurred.
The vulnerability has significant implications for users who depend on the comm utility for comparing files that contain binary data, use legacy character encodings (such as ISO-8859-1, Windows-1252, or various East Asian encodings), or contain raw byte sequences that happen to not be valid UTF-8. Unlike GNU comm, which operates on raw bytes and faithfully reproduces input regardless of encoding, the uutils implementation modifies the data stream during processing.
Root Cause
The root cause of this vulnerability is the use of String::from_utf8_lossy() in the uutils coreutils comm implementation. This function prioritizes producing valid UTF-8 strings over preserving the original byte content. When encountering byte sequences that are not valid UTF-8, rather than failing with an error or preserving the raw bytes, the function silently substitutes the replacement character. This design choice creates an incompatibility with GNU coreutils behavior and violates the principle that utilities should preserve data integrity by default.
Attack Vector
This is a local vulnerability requiring the attacker or user to execute the comm utility on files containing non-UTF-8 byte sequences. The attack vector involves providing input files that contain binary data or legacy-encoded text to the comm utility, which then produces corrupted output without warning.
Exploitation scenarios include:
- Using comm in automated data processing pipelines where binary or legacy-encoded files are compared
- Relying on comm output for deduplication or merging operations where data integrity is critical
- Processing log files or data exports that contain non-UTF-8 characters
The vulnerability does not require elevated privileges to trigger, as any user running the comm utility with appropriate input files can experience data corruption.
Detection Methods for CVE-2026-35346
Indicators of Compromise
- Presence of Unicode replacement characters (U+FFFD, displayed as �) in comm output when processing binary or legacy-encoded files
- Unexpected differences in file comparison results between uutils comm and GNU comm
- Discrepancies in automated pipeline outputs that rely on the comm utility
Detection Strategies
- Compare output from uutils comm against GNU comm when processing files with known non-UTF-8 content
- Implement validation checks for the presence of U+FFFD replacement characters in comm output
- Review scripts and automation pipelines that use the comm utility for binary or legacy-encoded file processing
Monitoring Recommendations
- Audit systems using uutils coreutils to identify any reliance on the comm utility for processing non-UTF-8 data
- Monitor data integrity in automated pipelines that use comm for file comparison operations
- Implement checksum validation before and after file processing operations involving comm
How to Mitigate CVE-2026-35346
Immediate Actions Required
- Upgrade to uutils coreutils version 0.6.0 or later, which addresses this vulnerability
- For systems that cannot be immediately upgraded, switch to GNU coreutils comm for processing binary or non-UTF-8 files
- Review and validate output from any automated processes that use the comm utility with potentially affected input
Patch Information
The vulnerability has been addressed in uutils coreutils version 0.6.0. The fix was implemented via Pull Request #10206, which modifies the comm implementation to properly handle non-UTF-8 byte sequences without lossy conversion. Users should upgrade to version 0.6.0 or later to receive the fix.
For additional context on the vulnerability, refer to GitHub Issue #10192 which documents the original bug report and discussion.
Workarounds
- Use GNU coreutils comm instead of uutils comm when processing binary files or files with legacy encodings
- Pre-convert input files to valid UTF-8 using iconv or similar tools before processing with the affected comm version
- Implement post-processing validation to detect and flag any output containing the U+FFFD replacement character
# Example: Check for replacement characters in comm output
# If replacement characters are found, the file may have been corrupted
comm file1.txt file2.txt | grep -F $'\\xEF\\xBF\\xBD' && echo "Warning: Possible data corruption detected"
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

