CVE-2026-31970 Overview
CVE-2026-31970 is a heap buffer overflow vulnerability in HTSlib, a widely-used library for reading and writing bioinformatics file formats. The vulnerability exists in the GZI (GZIP index) file loading function bgzf_index_load_hfile(), where an integer overflow can be triggered during index processing. This flaw allows attackers to cause an under-sized or zero-sized buffer allocation, followed by out-of-bounds memory writes that corrupt heap structures.
GZI files are used to index block-compressed GZIP (BGZF) files in bioinformatics workflows. When processing a maliciously crafted .gzi file, the vulnerable function writes sixteen zero bytes to the incorrectly-sized buffer, and depending on the overflow result, may attempt to load additional file data into the buffer as well. While the function eventually fails due to record count mismatches, the heap corruption has already occurred by this point.
Critical Impact
Successful exploitation of this vulnerability can lead to program crashes, data corruption, and potentially arbitrary code execution through heap manipulation when processing malicious GZI index files.
Affected Products
- HTSlib versions prior to 1.21.1
- HTSlib versions 1.22.x prior to 1.22.2
- HTSlib version 1.23 (prior to 1.23.1)
Discovery Timeline
- 2026-03-18 - CVE-2026-31970 published to NVD
- 2026-03-19 - Last updated in NVD database
Technical Details for CVE-2026-31970
Vulnerability Analysis
The vulnerability is classified as CWE-122 (Heap-based Buffer Overflow) and resides in the bgzf_index_load_hfile() function within HTSlib's BGZF handling code. When loading a GZI index file, the function reads a 64-bit unsigned integer (uint64_t) value that specifies the number of index entries. This value is then used to calculate the buffer size needed for storing index structures.
The root issue is that the multiplication of the entry count by sizeof(bgzidx1_t) can overflow, resulting in a much smaller buffer being allocated than required. The function then proceeds to write data to this undersized buffer, causing heap memory corruption. While the operation eventually fails when the expected record count doesn't match, the damage to heap structures has already been done.
The network attack vector requires user interaction—specifically, a user must open or process a maliciously crafted GZI file. This is particularly concerning in bioinformatics environments where researchers may process data from various external sources, including public genomic databases.
Root Cause
The root cause is the lack of integer overflow validation when calculating buffer sizes in bgzf_index_load_hfile(). The function reads an untrusted 64-bit value from the file and uses it directly in size calculations without verifying that the resulting allocation size doesn't wrap around due to integer overflow. When x is a sufficiently large value, the expression (x + 1) * sizeof(bgzidx1_t) overflows, producing a small or zero value that leads to an inadequate memory allocation.
Attack Vector
An attacker can exploit this vulnerability by crafting a malicious .gzi index file containing a specially chosen 64-bit value designed to trigger the integer overflow. The attack requires user interaction—the victim must process the malicious file using a vulnerable version of HTSlib or an application that depends on it (such as samtools, bcftools, or other bioinformatics tools).
The attack flow is:
- Attacker creates a malicious GZI file with an oversized entry count value
- Victim attempts to load the GZI file using a vulnerable HTSlib version
- Integer overflow occurs during buffer size calculation
- Undersized heap buffer is allocated
- Data is written beyond buffer boundaries, corrupting heap structures
- Potential for arbitrary code execution through heap exploitation techniques
if (fp->idx == NULL) goto fail;
uint64_t x;
if (hread_uint64(&x, idx) < 0) goto fail;
+ if (x >= ((SIZE_MAX < UINT64_MAX ? SIZE_MAX : UINT64_MAX)
+ / sizeof(bgzidx1_t) / 2))
+ goto fail;
fp->idx->noffs = fp->idx->moffs = x + 1;
fp->idx->offs = (bgzidx1_t*) malloc(fp->idx->moffs*sizeof(bgzidx1_t));
Source: GitHub Commit Details
The patch adds an overflow check that validates the value of x before using it in size calculations. It ensures that the subsequent multiplication cannot overflow by comparing x against a computed maximum safe value based on SIZE_MAX and the structure size.
Detection Methods for CVE-2026-31970
Indicators of Compromise
- Unexpected crashes in applications using HTSlib when processing GZI files
- Segmentation faults or memory access violations in bgzf_index_load_hfile() function
- Core dumps showing heap corruption patterns in processes using samtools, bcftools, or other HTSlib-dependent applications
- Anomalously large values in GZI file headers when inspecting file contents
Detection Strategies
- Monitor for process crashes in bioinformatics applications with heap corruption signatures
- Implement file integrity checks on GZI files from external sources before processing
- Deploy memory safety tools (AddressSanitizer, Valgrind) in development/testing environments to detect heap overflows
- Use static analysis tools to identify HTSlib version dependencies in your software stack
Monitoring Recommendations
- Audit systems for installed HTSlib versions using package managers (dpkg -l, rpm -qa, or similar)
- Monitor bioinformatics pipelines for unexpected failures when processing indexed BGZF files
- Implement logging for GZI file processing operations to identify suspicious activity
- Track HTSlib library usage across containers and virtual environments in research infrastructure
How to Mitigate CVE-2026-31970
Immediate Actions Required
- Update HTSlib to version 1.23.1, 1.22.2, or 1.21.1 (depending on your version branch)
- Rebuild any statically-linked applications that include HTSlib
- Audit and remove untrusted .gzi index files from systems processing bioinformatics data
- Restrict processing of GZI files to those from trusted, verified sources only
Patch Information
HTSlib maintainers have released patched versions addressing this vulnerability. The fix adds an overflow check in bgzf_index_load_hfile() that validates the index entry count before performing size calculations. The security patch is available in the GitHub commit with commit hash 6dd0d7d0e9e7e2e173a28969e624db8bc8bb5828. Organizations should upgrade to HTSlib 1.23.1, 1.22.2, or 1.21.1 as appropriate for their environment.
For additional details, refer to the GitHub Security Advisory.
Workarounds
- Discard any .gzi index files from untrusted sources before processing
- Recreate GZI index files locally using the bgzip -r option on trusted BGZF files
- Implement input validation to check GZI file header values before passing to HTSlib functions
- Isolate bioinformatics processing in sandboxed environments to limit exploitation impact
# Regenerate GZI index files from trusted sources
# Remove untrusted .gzi files and recreate them locally
rm untrusted_file.gz.gzi
bgzip -r trusted_file.gz
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

