CVE-2026-35346: uutils coreutils Information Disclosure

CVE-2026-35346 Overview

The comm utility in uutils coreutils contains a data integrity vulnerability that silently corrupts output data by performing lossy UTF-8 conversion on all output lines. The implementation uses String::from_utf8_lossy(), which replaces invalid UTF-8 byte sequences with the Unicode replacement character (U+FFFD). This behavior differs from GNU comm, which processes raw bytes and preserves the original input. As a result, corrupted output occurs when the utility is used to compare binary files or files using non-UTF-8 legacy encodings.

Critical Impact
Silent data corruption in file comparison operations can lead to incorrect results when processing binary data or legacy encoded files, potentially causing data integrity issues in automated pipelines and scripts that rely on accurate file comparisons.

Affected Products

uutils coreutils (versions prior to 0.6.0)

Discovery Timeline

2026-04-22 - CVE CVE-2026-35346 published to NVD
2026-04-22 - Last updated in NVD database

Technical Details for CVE-2026-35346

Vulnerability Analysis

This vulnerability falls under CWE-176 (Improper Handling of Unicode Encoding), representing an input validation error where the comm utility improperly handles byte sequences that do not conform to UTF-8 encoding standards. The core issue stems from a design decision to use Rust's String::from_utf8_lossy() function for processing input data.

When processing files, the comm utility reads input and converts it to UTF-8 strings. The from_utf8_lossy() function, while convenient for handling potentially malformed text, replaces any byte sequence that doesn't represent valid UTF-8 with the Unicode replacement character (U+FFFD, displayed as �). This lossy conversion occurs silently without any warning or error message to the user, making it difficult to detect when data corruption has occurred.

The vulnerability has significant implications for users who depend on the comm utility for comparing files that contain binary data, use legacy character encodings (such as ISO-8859-1, Windows-1252, or various East Asian encodings), or contain raw byte sequences that happen to not be valid UTF-8. Unlike GNU comm, which operates on raw bytes and faithfully reproduces input regardless of encoding, the uutils implementation modifies the data stream during processing.

Root Cause

The root cause of this vulnerability is the use of String::from_utf8_lossy() in the uutils coreutils comm implementation. This function prioritizes producing valid UTF-8 strings over preserving the original byte content. When encountering byte sequences that are not valid UTF-8, rather than failing with an error or preserving the raw bytes, the function silently substitutes the replacement character. This design choice creates an incompatibility with GNU coreutils behavior and violates the principle that utilities should preserve data integrity by default.

Attack Vector

This is a local vulnerability requiring the attacker or user to execute the comm utility on files containing non-UTF-8 byte sequences. The attack vector involves providing input files that contain binary data or legacy-encoded text to the comm utility, which then produces corrupted output without warning.

Exploitation scenarios include:

Using comm in automated data processing pipelines where binary or legacy-encoded files are compared
Relying on comm output for deduplication or merging operations where data integrity is critical
Processing log files or data exports that contain non-UTF-8 characters

The vulnerability does not require elevated privileges to trigger, as any user running the comm utility with appropriate input files can experience data corruption.

Detection Methods for CVE-2026-35346

Indicators of Compromise

Presence of Unicode replacement characters (U+FFFD, displayed as �) in comm output when processing binary or legacy-encoded files
Unexpected differences in file comparison results between uutils comm and GNU comm
Discrepancies in automated pipeline outputs that rely on the comm utility

Detection Strategies

Compare output from uutils comm against GNU comm when processing files with known non-UTF-8 content
Implement validation checks for the presence of U+FFFD replacement characters in comm output
Review scripts and automation pipelines that use the comm utility for binary or legacy-encoded file processing

Monitoring Recommendations

Audit systems using uutils coreutils to identify any reliance on the comm utility for processing non-UTF-8 data
Monitor data integrity in automated pipelines that use comm for file comparison operations
Implement checksum validation before and after file processing operations involving comm

How to Mitigate CVE-2026-35346

Immediate Actions Required

Upgrade to uutils coreutils version 0.6.0 or later, which addresses this vulnerability
For systems that cannot be immediately upgraded, switch to GNU coreutils comm for processing binary or non-UTF-8 files
Review and validate output from any automated processes that use the comm utility with potentially affected input

Patch Information

The vulnerability has been addressed in uutils coreutils version 0.6.0. The fix was implemented via Pull Request #10206, which modifies the comm implementation to properly handle non-UTF-8 byte sequences without lossy conversion. Users should upgrade to version 0.6.0 or later to receive the fix.

For additional context on the vulnerability, refer to GitHub Issue #10192 which documents the original bug report and discussion.

Workarounds

Use GNU coreutils comm instead of uutils comm when processing binary files or files with legacy encodings
Pre-convert input files to valid UTF-8 using iconv or similar tools before processing with the affected comm version
Implement post-processing validation to detect and flag any output containing the U+FFFD replacement character

bash

# Example: Check for replacement characters in comm output
# If replacement characters are found, the file may have been corrupted
comm file1.txt file2.txt | grep -F $'\\xEF\\xBF\\xBD' && echo "Warning: Possible data corruption detected"