CVE-2026-31963: Htslib CRAM Buffer Overflow Vulnerability

CVE-2026-31963 Overview

CVE-2026-31963 is a heap buffer overflow vulnerability in HTSlib, a widely-used library for reading and writing bioinformatics file formats. The flaw exists in the CRAM (Compressed Reference-oriented Alignment Map) file format decoder, specifically in how the library handles features that appear beyond the extent of a CRAM record sequence. An off-by-one error in the boundary validation logic allows an attacker to write one controlled byte beyond the end of a heap buffer, potentially leading to arbitrary code execution.

Critical Impact
This heap buffer overflow vulnerability in HTSlib's CRAM decoder can be exploited through maliciously crafted files to crash applications, corrupt heap memory structures, or achieve arbitrary code execution on systems processing untrusted bioinformatics data.

Affected Products

HTSlib versions prior to 1.21.1
HTSlib versions 1.22.x prior to 1.22.2
HTSlib version 1.23 (fixed in 1.23.1)

Discovery Timeline

2026-03-18 - CVE-2026-31963 published to NVD
2026-03-19 - Last updated in NVD database

Technical Details for CVE-2026-31963

Vulnerability Analysis

The vulnerability resides in HTSlib's CRAM format decoder, which handles compressed DNA sequence alignment data. CRAM employs reference-based compression to reduce file sizes by storing only the differences between alignment records and a reference sequence, rather than complete sequence data. These differences are encoded as "features" that indicate variations at specific positions.

The flaw stems from an off-by-one error in the boundary checking logic within the cram/cram_decode.c file. When decoding CRAM features, the code validates that feature positions fall within the bounds of the record sequence. However, the original implementation failed to properly account for edge cases where certain operations (like deletions, insertions with padding, or hard clips) legitimately occur at or after the last base of the sequence.

The vulnerable code path allowed a feature with an attacker-controlled position value to write one byte beyond the allocated heap buffer. While a single-byte overflow may seem limited, it can corrupt heap metadata or adjacent heap objects, enabling exploitation techniques such as heap grooming to achieve arbitrary code execution.

Root Cause

The root cause is an insufficient boundary validation check in the CRAM feature decoding logic. The original code used a simple comparison if (pos > cr->len+1) that failed to account for the different valid position ranges depending on the feature operation type. Operations like N (reference skip), P (padding), H (hard clip), and D (deletion) can legitimately occur at position cr->len+1, while other operations should be restricted to cr->len. This logic error allowed malicious CRAM files to specify feature positions that would trigger writes past the end of the sequence buffer.

Attack Vector

The attack requires a user to open a maliciously crafted CRAM file. This could occur through:

Research Data Exchange: Bioinformatics researchers frequently share and download sequence alignment files from public repositories or collaborators
Pipeline Processing: Automated bioinformatics pipelines that process externally-sourced CRAM files
Web Services: Online bioinformatics tools that accept user-uploaded files for analysis

When a vulnerable version of HTSlib processes the malicious file, the heap buffer overflow is triggered during the feature decoding phase, potentially allowing the attacker to:

Crash the application (denial of service)
Corrupt memory in unexpected ways
Achieve arbitrary code execution by carefully crafting heap layout

         if (r) return r;
         pos += prev_pos;
 
+        // Misplaced feature detection - before start is easy
         if (pos <= 0) {
             hts_log_error("Feature position %d before start of read", pos);
             return -1;
         }
 
-        if (pos > seq_pos) {
-            if (pos > cr->len+1)
+        // After end is more complicated as the sequence may be absent,
+        // and operations like deletions could occur after the end
+        // of the stored sequence.  First quickly find out if the feature is
+        // on or after the last base.
+        if (cr->len != 0 && pos > cr->len) {
+            // Now check carefully to ensure it's allowed.
+            int32_t valid_end = (op == 'N' || op == 'P' || op == 'H' || op == 'D')
+                ? cr->len+1
+                : cr->len;
+            if (pos > valid_end) {
+                hts_log_error("Feature position %d after end of read", pos);
                 return -1;
+            }
+        }
 
+        if (pos > seq_pos) {
             if (s->ref && cr->ref_id >= 0) {
                 if (ref_pos + pos - seq_pos > bfd->ref[cr->ref_id].len) {
                     static int whinged = 0;

Source: GitHub Commit Details

Detection Methods for CVE-2026-31963

Indicators of Compromise

Unexpected crashes in applications using HTSlib when processing CRAM files
Abnormal memory consumption patterns in bioinformatics pipeline processes
Core dumps or segmentation faults from tools like samtools, bcftools, or custom applications linked against HTSlib
Suspicious CRAM files with unusual feature position values in processing logs

Detection Strategies

Monitor file processing applications for signs of heap corruption or unexpected termination
Implement file integrity checks and source validation for CRAM files from external sources
Use memory safety tools (AddressSanitizer, Valgrind) during development to detect heap overflows
Review system logs for repeated crashes of bioinformatics applications processing external data

Monitoring Recommendations

Enable application-level logging for HTSlib operations to identify malformed file processing attempts
Configure crash monitoring for systems running bioinformatics pipelines that process external data
Audit incoming CRAM files from untrusted sources before processing in production environments
Deploy endpoint detection solutions to identify post-exploitation activities if code execution is achieved

How to Mitigate CVE-2026-31963

Immediate Actions Required

Update HTSlib to version 1.23.1, 1.22.2, or 1.21.1 depending on your version branch
Audit systems to identify all applications and pipelines using HTSlib for CRAM file processing
Restrict processing of CRAM files from untrusted sources until patches are applied
Review access controls for systems handling external bioinformatics data

Patch Information

The HTSlib maintainers have released fixed versions addressing this vulnerability:

Version 1.23.1 for users on the 1.23 release
Version 1.22.2 for users on the 1.22 release branch
Version 1.21.1 for users on the 1.21 release branch

The fix improves the boundary checking logic to properly validate feature positions based on the operation type, ensuring that features cannot be placed beyond valid positions for the given sequence. For detailed information, refer to the GitHub Security Advisory and the patch commit 8bcc9907.

Workarounds

There is no workaround for this vulnerability; upgrading to a patched version is required
As a temporary risk reduction measure, avoid processing CRAM files from untrusted or unverified sources
Consider using alternative formats (BAM/SAM) for untrusted data until patching is complete, though this only reduces exposure, not the underlying risk

bash

# Update HTSlib to the latest patched version
# For systems using package managers:
apt-get update && apt-get install htslib

# For manual builds, download and compile the patched version:
wget https://github.com/samtools/htslib/releases/download/1.23.1/htslib-1.23.1.tar.bz2
tar -xjf htslib-1.23.1.tar.bz2
cd htslib-1.23.1
./configure && make && make install

# Verify the installed version
htsfile --version