CVE-2025-49847: Llama.cpp Buffer Overflow Vulnerability

CVE-2025-49847 Overview

CVE-2025-49847 is a buffer overflow vulnerability affecting llama.cpp, a popular C/C++ implementation for inference of several large language models (LLMs). Prior to version b5662, an attacker-supplied GGUF model vocabulary can trigger a buffer overflow in llama.cpp's vocabulary-loading code. The vulnerability exists in the _try_copy helper function within llama.cpp/src/vocab.cpp, where the llama_vocab::impl::token_to_piece() function casts a very large size_t token length into an int32_t. This causes the length check (if (length < (int32_t)size)) to be bypassed, allowing memcpy to be called with an oversized size parameter. As a result, a malicious model can overwrite memory beyond the intended buffer, leading to arbitrary memory corruption and potential code execution.

Critical Impact
This vulnerability allows attackers to achieve arbitrary memory corruption and potential remote code execution through a maliciously crafted GGUF model file. Users who load untrusted model files are at significant risk.

Affected Products

ggml llama.cpp versions prior to b5662

Discovery Timeline

2025-06-17 - CVE-2025-49847 published to NVD
2025-08-27 - Last updated in NVD database

Technical Details for CVE-2025-49847

Vulnerability Analysis

The vulnerability stems from an integer overflow issue in the vocabulary loading mechanism of llama.cpp. When processing token data from a GGUF model file, the token_to_piece() function in vocab.cpp handles token length values. The function uses a size_t type for the token length, which is then cast to an int32_t for a boundary check. When an attacker provides an extremely large token length value (greater than INT32_MAX), the cast causes the value to wrap around to a negative number or a small positive number. This bypasses the intended length validation check, causing memcpy to copy far more data than the destination buffer can hold.

Root Cause

The root cause is classified as CWE-119 (Improper Restriction of Operations within the Bounds of a Memory Buffer). Specifically, the unsafe type conversion from size_t to int32_t in the _try_copy helper function creates a numeric truncation error. The signed/unsigned mismatch allows integer overflow to defeat the bounds checking logic, enabling a heap or stack buffer overflow depending on where the target buffer is allocated.

Attack Vector

The attack vector is network-based and requires user interaction. An attacker must craft a malicious GGUF model file containing vocabulary tokens with specially crafted oversized length values. When a victim loads this model using llama.cpp (for example, by downloading and running inference on an untrusted model), the vulnerability is triggered during the vocabulary parsing phase. The attacker gains the ability to:

Corrupt adjacent memory regions with attacker-controlled data
Potentially overwrite function pointers or return addresses
Achieve arbitrary code execution within the context of the llama.cpp process

The security patch adds proper header includes and type handling to prevent the integer overflow:

cpp

 #include <set>
 #include <unordered_map>
 #include <cctype>
+#include <cinttypes>
 
 //
 // helpers

Source: GitHub Commit

Detection Methods for CVE-2025-49847

Indicators of Compromise

Presence of GGUF model files with abnormally large vocabulary token lengths in metadata
Unexpected crashes or segmentation faults when loading untrusted model files
Memory corruption indicators such as heap corruption errors or stack smashing detection alerts
Unusual process behavior following model loading operations

Detection Strategies

Implement file integrity monitoring for GGUF model files in production environments
Deploy application-level crash monitoring to detect exploitation attempts
Use memory sanitizers (AddressSanitizer, MemorySanitizer) during development and testing
Monitor for anomalous memory allocation patterns during model loading

Monitoring Recommendations

Log and audit all model file loading operations, especially from external or untrusted sources
Implement runtime memory protection mechanisms where available
Configure endpoint detection to alert on heap spray or buffer overflow exploitation patterns
Monitor llama.cpp process behavior for indicators of successful exploitation

How to Mitigate CVE-2025-49847

Immediate Actions Required

Upgrade llama.cpp to version b5662 or later immediately
Audit all deployed GGUF model files for provenance and integrity
Restrict model loading to trusted, verified sources only
Implement network-level controls to prevent downloading of untrusted model files

Patch Information

The vulnerability has been patched in llama.cpp version b5662. The fix addresses the integer overflow issue by implementing proper type handling and bounds checking in the vocabulary loading code. Users should update to this version or later by pulling the latest changes from the official repository. The security patch is available in commit 3cfbbdb44e08fd19429fed6cc85b982a91f0efd5. For additional details, refer to the GitHub Security Advisory.

Workarounds

Only load GGUF models from trusted and verified sources until the patch can be applied
Implement sandboxing or containerization for llama.cpp processes to limit the blast radius of potential exploitation
Deploy application-level memory protection mechanisms such as ASLR and stack canaries
Consider running llama.cpp in a restricted environment with minimal privileges

bash

# Update llama.cpp to the patched version
cd llama.cpp
git fetch origin
git checkout b5662
make clean && make

CVE-2025-49847 Overview

Critical Impact
This vulnerability allows attackers to achieve arbitrary memory corruption and potential remote code execution through a maliciously crafted GGUF model file. Users who load untrusted model files are at significant risk.

Affected Products

ggml llama.cpp versions prior to b5662

Discovery Timeline

2025-06-17 - CVE-2025-49847 published to NVD
2025-08-27 - Last updated in NVD database

Technical Details for CVE-2025-49847

Vulnerability Analysis

Root Cause

Attack Vector

Corrupt adjacent memory regions with attacker-controlled data
Potentially overwrite function pointers or return addresses
Achieve arbitrary code execution within the context of the llama.cpp process

The security patch adds proper header includes and type handling to prevent the integer overflow:

cpp

 #include <set>
 #include <unordered_map>
 #include <cctype>
+#include <cinttypes>
 
 //
 // helpers

Source: GitHub Commit

Detection Methods for CVE-2025-49847

Indicators of Compromise

Presence of GGUF model files with abnormally large vocabulary token lengths in metadata
Unexpected crashes or segmentation faults when loading untrusted model files
Memory corruption indicators such as heap corruption errors or stack smashing detection alerts
Unusual process behavior following model loading operations

Detection Strategies

Implement file integrity monitoring for GGUF model files in production environments
Deploy application-level crash monitoring to detect exploitation attempts
Use memory sanitizers (AddressSanitizer, MemorySanitizer) during development and testing
Monitor for anomalous memory allocation patterns during model loading

Monitoring Recommendations

Log and audit all model file loading operations, especially from external or untrusted sources
Implement runtime memory protection mechanisms where available
Configure endpoint detection to alert on heap spray or buffer overflow exploitation patterns
Monitor llama.cpp process behavior for indicators of successful exploitation

How to Mitigate CVE-2025-49847

Immediate Actions Required

Upgrade llama.cpp to version b5662 or later immediately
Audit all deployed GGUF model files for provenance and integrity
Restrict model loading to trusted, verified sources only
Implement network-level controls to prevent downloading of untrusted model files

Patch Information

Workarounds

Only load GGUF models from trusted and verified sources until the patch can be applied
Implement sandboxing or containerization for llama.cpp processes to limit the blast radius of potential exploitation
Deploy application-level memory protection mechanisms such as ASLR and stack canaries
Consider running llama.cpp in a restricted environment with minimal privileges

bash

# Update llama.cpp to the patched version
cd llama.cpp
git fetch origin
git checkout b5662
make clean && make

CVE-2025-49847: Llama.cpp Buffer Overflow Vulnerability

CVE-2025-49847 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2025-49847

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2025-49847

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2025-49847

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform

CVE-2025-49847: Llama.cpp Buffer Overflow Vulnerability

CVE-2025-49847 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2025-49847

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2025-49847

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2025-49847

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform