CVE-2025-49847 Overview
CVE-2025-49847 is a buffer overflow vulnerability affecting llama.cpp, a popular C/C++ implementation for inference of several large language models (LLMs). Prior to version b5662, an attacker-supplied GGUF model vocabulary can trigger a buffer overflow in llama.cpp's vocabulary-loading code. The vulnerability exists in the _try_copy helper function within llama.cpp/src/vocab.cpp, where the llama_vocab::impl::token_to_piece() function casts a very large size_t token length into an int32_t. This causes the length check (if (length < (int32_t)size)) to be bypassed, allowing memcpy to be called with an oversized size parameter. As a result, a malicious model can overwrite memory beyond the intended buffer, leading to arbitrary memory corruption and potential code execution.
Critical Impact
This vulnerability allows attackers to achieve arbitrary memory corruption and potential remote code execution through a maliciously crafted GGUF model file. Users who load untrusted model files are at significant risk.
Affected Products
- ggml llama.cpp versions prior to b5662
Discovery Timeline
- 2025-06-17 - CVE-2025-49847 published to NVD
- 2025-08-27 - Last updated in NVD database
Technical Details for CVE-2025-49847
Vulnerability Analysis
The vulnerability stems from an integer overflow issue in the vocabulary loading mechanism of llama.cpp. When processing token data from a GGUF model file, the token_to_piece() function in vocab.cpp handles token length values. The function uses a size_t type for the token length, which is then cast to an int32_t for a boundary check. When an attacker provides an extremely large token length value (greater than INT32_MAX), the cast causes the value to wrap around to a negative number or a small positive number. This bypasses the intended length validation check, causing memcpy to copy far more data than the destination buffer can hold.
Root Cause
The root cause is classified as CWE-119 (Improper Restriction of Operations within the Bounds of a Memory Buffer). Specifically, the unsafe type conversion from size_t to int32_t in the _try_copy helper function creates a numeric truncation error. The signed/unsigned mismatch allows integer overflow to defeat the bounds checking logic, enabling a heap or stack buffer overflow depending on where the target buffer is allocated.
Attack Vector
The attack vector is network-based and requires user interaction. An attacker must craft a malicious GGUF model file containing vocabulary tokens with specially crafted oversized length values. When a victim loads this model using llama.cpp (for example, by downloading and running inference on an untrusted model), the vulnerability is triggered during the vocabulary parsing phase. The attacker gains the ability to:
- Corrupt adjacent memory regions with attacker-controlled data
- Potentially overwrite function pointers or return addresses
- Achieve arbitrary code execution within the context of the llama.cpp process
The security patch adds proper header includes and type handling to prevent the integer overflow:
#include <set>
#include <unordered_map>
#include <cctype>
+#include <cinttypes>
//
// helpers
Source: GitHub Commit
Detection Methods for CVE-2025-49847
Indicators of Compromise
- Presence of GGUF model files with abnormally large vocabulary token lengths in metadata
- Unexpected crashes or segmentation faults when loading untrusted model files
- Memory corruption indicators such as heap corruption errors or stack smashing detection alerts
- Unusual process behavior following model loading operations
Detection Strategies
- Implement file integrity monitoring for GGUF model files in production environments
- Deploy application-level crash monitoring to detect exploitation attempts
- Use memory sanitizers (AddressSanitizer, MemorySanitizer) during development and testing
- Monitor for anomalous memory allocation patterns during model loading
Monitoring Recommendations
- Log and audit all model file loading operations, especially from external or untrusted sources
- Implement runtime memory protection mechanisms where available
- Configure endpoint detection to alert on heap spray or buffer overflow exploitation patterns
- Monitor llama.cpp process behavior for indicators of successful exploitation
How to Mitigate CVE-2025-49847
Immediate Actions Required
- Upgrade llama.cpp to version b5662 or later immediately
- Audit all deployed GGUF model files for provenance and integrity
- Restrict model loading to trusted, verified sources only
- Implement network-level controls to prevent downloading of untrusted model files
Patch Information
The vulnerability has been patched in llama.cpp version b5662. The fix addresses the integer overflow issue by implementing proper type handling and bounds checking in the vocabulary loading code. Users should update to this version or later by pulling the latest changes from the official repository. The security patch is available in commit 3cfbbdb44e08fd19429fed6cc85b982a91f0efd5. For additional details, refer to the GitHub Security Advisory.
Workarounds
- Only load GGUF models from trusted and verified sources until the patch can be applied
- Implement sandboxing or containerization for llama.cpp processes to limit the blast radius of potential exploitation
- Deploy application-level memory protection mechanisms such as ASLR and stack canaries
- Consider running llama.cpp in a restricted environment with minimal privileges
# Update llama.cpp to the patched version
cd llama.cpp
git fetch origin
git checkout b5662
make clean && make
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


