CVE-2026-7482 Overview
CVE-2026-7482 is a heap out-of-bounds read vulnerability in Ollama versions prior to 0.17.1. The flaw resides in the GGUF model loader, specifically in fs/ggml/gguf.go and the WriteTo() function in server/quantization.go. An attacker submits a crafted GGUF file to the unauthenticated /api/create endpoint with tensor offset and size values that exceed the file's actual length. During quantization, the server reads past the allocated heap buffer. Leaked memory may contain environment variables, API keys, system prompts, and conversation data from concurrent users. Attackers can exfiltrate the leaked contents by pushing the resulting model artifact to an attacker-controlled registry through /api/push [CWE-125].
Critical Impact
Unauthenticated network attackers can read sensitive heap memory, including API keys, system prompts, and other users' conversation data, then exfiltrate it through the model push API.
Affected Products
- Ollama versions prior to 0.17.1
- Deployments exposing the API via OLLAMA_HOST=0.0.0.0
- Any Ollama instance reachable on the network without an authentication proxy
Discovery Timeline
- 2026-05-04 - CVE-2026-7482 published to NVD
- 2026-05-05 - Last updated in NVD database
Technical Details for CVE-2026-7482
Vulnerability Analysis
The vulnerability exists in Ollama's GGUF model loading and quantization pipeline. GGUF (GPT-Generated Unified Format) files declare tensor metadata, including offsets and sizes, that the loader uses to map tensor data into memory. The pre-patch code in fs/ggml/gguf.go trusted the declared offset and size without comparing them against the actual file length. When WriteTo() in server/quantization.go later read tensor bytes for quantization, the read extended beyond the mapped file region into adjacent heap memory.
Because Ollama processes multiple concurrent requests in the same process, the leaked heap region can contain residual data from other users, including system prompts, conversation history, environment variables, and API keys held in memory by the inference runtime.
Root Cause
The loader did not validate that tensorOffset + tensor.Offset + tensor.Size() stays within the file bounds. The quantization path additionally failed to verify that the buffer returned for a tensor matched the expected size declared in the model header. Either gap is sufficient to trigger an out-of-bounds read.
Attack Vector
An unauthenticated attacker with network access to the Ollama API uploads a malicious GGUF file via /api/create. The server quantizes the model and reads beyond the file boundary into heap memory. The attacker then calls /api/push to send the resulting artifact, now containing leaked memory bytes baked into tensor data, to a registry they control. Neither endpoint requires authentication in the upstream distribution.
// Security patch in fs/ggml/gguf.go - ensure tensor size is valid
padding := ggufPadding(offset, int64(alignment))
llm.tensorOffset = uint64(offset + padding)
// get file size to validate tensor bounds
fileSize, err := rs.Seek(0, io.SeekEnd)
if err != nil {
return fmt.Errorf("failed to determine file size: %w", err)
}
if _, err := rs.Seek(offset, io.SeekStart); err != nil {
return fmt.Errorf("failed to seek back after size check: %w", err)
}
for _, tensor := range llm.tensors {
tensorEnd := llm.tensorOffset + tensor.Offset + tensor.Size()
if tensorEnd > uint64(fileSize) {
return fmt.Errorf("tensor %q offset+size (%d) exceeds file size (%d)", tensor.Name, tensorEnd, fileSize)
}
Source: GitHub Commit 88d57d0
The companion fix in server/quantization.go rejects buffers smaller than the declared tensor size, preventing the quantizer from operating on truncated reads:
// Security patch in server/quantization.go
if uint64(len(data)) < q.from.Size() {
return 0, fmt.Errorf("tensor %s data size %d is less than expected %d from shape %v", q.from.Name, len(data), q.from.Size(), q.from.Shape)
}
Source: GitHub Commit 88d57d0
Detection Methods for CVE-2026-7482
Indicators of Compromise
- POST requests to /api/create carrying GGUF payloads from unexpected source IPs
- Outbound /api/push calls referencing registries outside the organization's allowlist
- Ollama process error logs containing failed to get current offset or unexplained tensor read errors
- Ollama instances bound to 0.0.0.0 reachable from the public internet
Detection Strategies
- Inspect HTTP request bodies to /api/create for GGUF magic bytes followed by oversized tensor offset or size fields
- Alert on any successful /api/push operation targeting a registry hostname that is not in an approved list
- Correlate /api/create and /api/push calls from the same client within a short window, which matches the leak-and-exfiltrate pattern
Monitoring Recommendations
- Capture and retain Ollama HTTP access logs for forensic review of model upload activity
- Monitor egress traffic from inference hosts to identify pushes to unknown registries
- Track environment variables and secrets accessible to the Ollama process and rotate any that may have been resident in memory during exposure
How to Mitigate CVE-2026-7482
Immediate Actions Required
- Upgrade Ollama to version 0.17.1 or later on every host running the service
- Restrict the Ollama listener to 127.0.0.1 or a private interface unless external access is required
- Place an authenticating reverse proxy in front of /api/create and /api/push if remote access is needed
- Rotate API keys, tokens, and secrets that were present in the Ollama process environment
Patch Information
The fix shipped in Ollama v0.17.1 and was merged through pull request #14406. The patch adds file-size validation in the GGUF loader and a tensor data length check in the quantizer. Review the commit details for the full diff.
Workarounds
- Block inbound access to TCP port 11434 at the host firewall and network edge
- Disable /api/create and /api/push at a reverse proxy layer until patched
- Run Ollama in an isolated container with no access to host secrets or production credentials
- Require authentication and request allowlisting through an API gateway for any exposed instance
# Configuration example: bind Ollama to loopback only
export OLLAMA_HOST=127.0.0.1:11434
systemctl restart ollama
# Verify the listener is not exposed externally
ss -tlnp | grep 11434
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


