CVE-2024-14021: LlamaIndex Unsafe Deserialization Vulnerability

CVE-2024-14021 Overview

CVE-2024-14021 is an unsafe deserialization vulnerability affecting LlamaIndex (run-llama/llama_index) versions up to and including 0.11.6. The vulnerability exists in the BGEM3Index.load_from_disk() function located in llama_index/indices/managed/bge_m3/base.py. The function uses Python's pickle.load() to deserialize multi_embed_store.pkl from a user-supplied persist_dir without proper validation. An attacker who can provide a crafted persist directory containing a malicious pickle file can trigger arbitrary code execution when a victim loads the index from disk.

Critical Impact
This insecure deserialization vulnerability (CWE-502) allows attackers to achieve arbitrary code execution by crafting malicious pickle files that are loaded without validation, potentially compromising systems that use LlamaIndex for AI/ML applications.

Affected Products

LlamaIndex (run-llama/llama_index) versions up to and including 0.11.6

Discovery Timeline

2026-01-12 - CVE CVE-2024-14021 published to NVD
2026-01-13 - Last updated in NVD database

Technical Details for CVE-2024-14021

Vulnerability Analysis

This vulnerability is classified as Insecure Deserialization (CWE-502), a dangerous class of vulnerabilities that occurs when untrusted data is deserialized without proper validation. In this case, the BGEM3Index.load_from_disk() function directly deserializes pickle files from user-controlled directories without verifying their integrity or origin.

Python's pickle module is inherently unsafe for deserializing untrusted data because it can execute arbitrary code during the deserialization process. When a maliciously crafted pickle file is loaded, it can instantiate arbitrary Python objects and execute code through the __reduce__ method or similar mechanisms. This makes pickle-based deserialization particularly dangerous when the source of the serialized data is not fully trusted.

The attack requires local access and user interaction—specifically, a victim must be convinced to load an index from a directory controlled by the attacker. This could occur in scenarios where users download shared model indices, use indices from untrusted sources, or in supply chain attack scenarios.

Root Cause

The root cause is the use of Python's native pickle.load() function to deserialize the multi_embed_store.pkl file without any validation or safety measures. The pickle module documentation explicitly warns that it is not secure against erroneous or maliciously constructed data and should never be used to deserialize data from untrusted sources.

The vulnerable code path in llama_index/indices/managed/bge_m3/base.py accepts a persist_dir parameter from the user and loads pickle files from that directory, trusting the serialized data implicitly. Safe alternatives such as JSON serialization, cryptographic signature verification, or allowlist-based deserialization were not implemented.

Attack Vector

The attack vector requires local access where an attacker must place a malicious pickle file in a directory that the victim will subsequently load using BGEM3Index.load_from_disk(). Attack scenarios include:

Distribution of malicious model indices through file sharing platforms, compromised repositories, or as part of social engineering campaigns where users are convinced to download and load pre-built indices. Supply chain attacks where legitimate index repositories are compromised to include malicious pickle files. Internal attacks within organizations where shared storage locations contain indices that multiple users may load.

When the victim calls BGEM3Index.load_from_disk() pointing to the attacker-controlled directory, the malicious multi_embed_store.pkl file is deserialized, executing the attacker's payload with the privileges of the user running the Python process.

Detection Methods for CVE-2024-14021

Indicators of Compromise

Unexpected multi_embed_store.pkl files in LlamaIndex persist directories with unusual file sizes or modification times
Process execution anomalies following calls to BGEM3Index.load_from_disk() including unexpected child processes or network connections
File system access patterns indicating deserialization of pickle files from untrusted or newly created directories
Python process executing system commands or accessing sensitive resources after loading index data

Detection Strategies

Monitor for calls to pickle.load() in LlamaIndex-related code paths, particularly when loading data from user-specified directories
Implement file integrity monitoring on directories used for LlamaIndex index persistence
Use application-level logging to track BGEM3Index.load_from_disk() calls and their source directories
Deploy endpoint detection and response (EDR) solutions to identify suspicious post-exploitation behavior following Python process activity

Monitoring Recommendations

Enable verbose logging for LlamaIndex operations to track index loading activities
Monitor for unusual Python process behavior including unexpected subprocess spawning or network connections
Implement alerts for file creation or modification in LlamaIndex persist directories from untrusted sources
Review access logs for any indices loaded from external or shared storage locations

How to Mitigate CVE-2024-14021

Immediate Actions Required

Upgrade LlamaIndex to a version newer than 0.11.6 that addresses the unsafe deserialization vulnerability
Audit all existing LlamaIndex persist directories for potentially malicious pickle files
Restrict loading of indices to trusted, verified sources only
Implement file integrity verification for any shared or downloaded index files before loading

Patch Information

Users should upgrade to a patched version of LlamaIndex that addresses this vulnerability. Check the LlamaIndex GitHub Repository for the latest secure release. Additional details about this vulnerability can be found in the Huntr Bounty Listing and the VulnCheck Advisory.

Workarounds

Avoid using BGEM3Index.load_from_disk() with directories from untrusted sources until the vulnerability is patched
Implement directory allowlisting to restrict index loading to specific pre-approved paths only
Manually validate the contents of persist directories before loading, checking for unexpected or modified pickle files
Consider running LlamaIndex operations in sandboxed environments with restricted system access to limit the impact of potential exploitation

bash

# Configuration example - Validate persist directory before loading
# Check file integrity of index files before loading
sha256sum /path/to/trusted/persist_dir/multi_embed_store.pkl
# Compare against known-good hash before using BGEM3Index.load_from_disk()

CVE-2024-14021 Overview

Critical Impact
This insecure deserialization vulnerability (CWE-502) allows attackers to achieve arbitrary code execution by crafting malicious pickle files that are loaded without validation, potentially compromising systems that use LlamaIndex for AI/ML applications.

Affected Products

LlamaIndex (run-llama/llama_index) versions up to and including 0.11.6

Discovery Timeline

2026-01-12 - CVE CVE-2024-14021 published to NVD
2026-01-13 - Last updated in NVD database

Technical Details for CVE-2024-14021

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2024-14021

Indicators of Compromise

Unexpected multi_embed_store.pkl files in LlamaIndex persist directories with unusual file sizes or modification times
Process execution anomalies following calls to BGEM3Index.load_from_disk() including unexpected child processes or network connections
File system access patterns indicating deserialization of pickle files from untrusted or newly created directories
Python process executing system commands or accessing sensitive resources after loading index data

Detection Strategies

Monitor for calls to pickle.load() in LlamaIndex-related code paths, particularly when loading data from user-specified directories
Implement file integrity monitoring on directories used for LlamaIndex index persistence
Use application-level logging to track BGEM3Index.load_from_disk() calls and their source directories
Deploy endpoint detection and response (EDR) solutions to identify suspicious post-exploitation behavior following Python process activity

Monitoring Recommendations

Enable verbose logging for LlamaIndex operations to track index loading activities
Monitor for unusual Python process behavior including unexpected subprocess spawning or network connections
Implement alerts for file creation or modification in LlamaIndex persist directories from untrusted sources
Review access logs for any indices loaded from external or shared storage locations

How to Mitigate CVE-2024-14021

Immediate Actions Required

Upgrade LlamaIndex to a version newer than 0.11.6 that addresses the unsafe deserialization vulnerability
Audit all existing LlamaIndex persist directories for potentially malicious pickle files
Restrict loading of indices to trusted, verified sources only
Implement file integrity verification for any shared or downloaded index files before loading

Patch Information

Workarounds

Avoid using BGEM3Index.load_from_disk() with directories from untrusted sources until the vulnerability is patched
Implement directory allowlisting to restrict index loading to specific pre-approved paths only
Manually validate the contents of persist directories before loading, checking for unexpected or modified pickle files
Consider running LlamaIndex operations in sandboxed environments with restricted system access to limit the impact of potential exploitation

bash

# Configuration example - Validate persist directory before loading
# Check file integrity of index files before loading
sha256sum /path/to/trusted/persist_dir/multi_embed_store.pkl
# Compare against known-good hash before using BGEM3Index.load_from_disk()

CVE-2024-14021: LlamaIndex Unsafe Deserialization Vulnerability

CVE-2024-14021 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2024-14021

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2024-14021

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2024-14021

Immediate Actions Required

Patch Information

Workarounds

Experience the Most Advanced Cybersecurity Platform

CVE-2024-14021: LlamaIndex Unsafe Deserialization Vulnerability

CVE-2024-14021 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2024-14021

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2024-14021

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2024-14021

Immediate Actions Required

Patch Information

Workarounds

Experience the Most Advanced Cybersecurity Platform