CVE-2024-7042: Langchain GraphCypherQAChain SQLi Vulnerability

CVE-2024-7042 Overview

A critical prompt injection vulnerability exists in the GraphCypherQAChain class of langchain-ai/langchainjs version 0.2.5 and all versions containing this class. This vulnerability allows attackers to inject malicious prompts that are subsequently interpreted as SQL injection payloads, enabling unauthorized database operations against Neo4j graph databases. The flaw stems from insufficient input sanitization when processing user-provided natural language queries that are converted to Cypher queries.

Critical Impact
This vulnerability permits unauthorized data manipulation, data exfiltration, denial of service through complete data deletion, breaches in multi-tenant security environments, and severe data integrity issues. Attackers can create, update, or delete nodes and relationships without proper authorization.

Affected Products

langchain-ai/langchainjs version 0.2.5
All langchainjs versions containing the GraphCypherQAChain class
Applications utilizing GraphCypherQAChain for Neo4j graph database queries

Discovery Timeline

2024-10-29 - CVE-2024-7042 published to NVD
2024-10-31 - Last updated in NVD database

Technical Details for CVE-2024-7042

Vulnerability Analysis

The GraphCypherQAChain class in langchainjs is designed to convert natural language queries into Cypher queries for Neo4j graph databases. The vulnerability exists because user-supplied input is not properly sanitized before being incorporated into Cypher query construction. When an LLM generates Cypher queries based on user prompts, malicious actors can craft prompts that manipulate the query generation process to include arbitrary Cypher commands.

This attack chain involves prompt injection as the initial vector, which then leads to what is functionally equivalent to SQL injection (Cypher injection) in the graph database context. The lack of parameterized queries or proper input validation allows attackers to break out of intended query structures and execute arbitrary database operations.

Root Cause

The root cause is classified under CWE-89 (Improper Neutralization of Special Elements used in an SQL Command - SQL Injection). Specifically, the GraphCypherQAChain class fails to implement proper input validation and sanitization mechanisms when processing user queries. The generated Cypher statements directly incorporate unsanitized user input, allowing injection of malicious Cypher syntax that modifies the intended query behavior.

Attack Vector

The attack is executed over the network without requiring authentication or user interaction. An attacker can exploit this vulnerability by:

Submitting crafted natural language queries to applications using GraphCypherQAChain
The malicious prompt manipulates the LLM to generate Cypher queries containing injected commands
The injected Cypher commands execute against the Neo4j database with the application's privileges
Attackers can extract sensitive data, modify or delete records, or access data belonging to other tenants in multi-tenant deployments

The security patch modifies how the LanceDB integration handles vector store operations, removing direct database connection handling from examples and implementing safer patterns for database interactions:

typescript

// Security patch example - safer vector store initialization
// Source: https://github.com/langchain-ai/langchainjs/commit/615b9d9ab30a2d23a2f95fb8d7acfdf4b41ad7a6

import fs from "node:fs/promises";
import path from "node:path";
import os from "node:os";

// Create docs with a loader
const loader = new TextLoader("src/document_loaders/example_data/example.txt");
const docs = await loader.load();

export const run = async () => {
  const vectorStore = await LanceDB.fromDocuments(docs, new OpenAIEmbeddings());

  const resultOne = await vectorStore.similaritySearch("hello world", 1);
  console.log(resultOne);

  // [
  //   Document {
  //     pageContent: 'Foo\nBar\nBaz\n\n',
  //     metadata: { source: 'src/document_loaders/example_data/example.txt' }
  //   }
  // ]
};

export const run_with_existing_table = async () => {
  const dir = await fs.mkdtemp(path.join(os.tmpdir(), "lancedb-"));

Source: GitHub Commit 615b9d9

Detection Methods for CVE-2024-7042

Indicators of Compromise

Unusual Cypher query patterns in Neo4j logs containing DELETE, DETACH DELETE, CREATE, or SET operations from application queries
Unexpected data modifications or deletions in graph database nodes and relationships
Evidence of data exfiltration through abnormally large query result sets
Cross-tenant data access patterns in multi-tenant deployments
Application logs showing malformed or unusually complex natural language queries

Detection Strategies

Monitor Neo4j database audit logs for destructive operations (DELETE, DETACH DELETE) originating from GraphCypherQAChain queries
Implement query pattern analysis to detect Cypher injection attempts in generated queries
Deploy application-layer logging to capture all user inputs to GraphCypherQAChain endpoints
Use database activity monitoring to track privilege escalation or unauthorized data access patterns
Review application logs for prompt injection indicators such as encoded characters or query manipulation syntax

Monitoring Recommendations

Enable comprehensive audit logging on Neo4j database servers to capture all query activity
Implement real-time alerting for bulk data operations or schema modifications
Monitor for unusual patterns in LangChain application request/response cycles
Track data egress patterns from database servers for potential exfiltration activity
Deploy SIEM rules to correlate langchainjs application logs with database activity

How to Mitigate CVE-2024-7042

Immediate Actions Required

Update langchainjs to a patched version that addresses the GraphCypherQAChain vulnerability
Audit all applications using GraphCypherQAChain for exposure to untrusted user input
Implement input validation and sanitization at the application layer before processing queries
Review database permissions to ensure principle of least privilege for application connections
Consider temporarily disabling GraphCypherQAChain functionality in production until patching is complete

Patch Information

The langchain-ai team has released a security patch addressing this vulnerability. The fix is available in commit 615b9d9ab30a2d23a2f95fb8d7acfdf4b41ad7a6. Organizations should update to the latest version of langchainjs that incorporates this fix. The patch can be reviewed at the GitHub Commit Change page. Additional technical details about the vulnerability discovery are available at the Huntr Bounty Listing.

Workarounds

Implement strict input validation to reject queries containing Cypher-specific syntax or keywords
Deploy a Web Application Firewall (WAF) with rules to detect and block injection attempts
Use read-only database credentials for GraphCypherQAChain operations where write access is not required
Implement query result limiting and monitoring to detect potential data exfiltration
Consider wrapping GraphCypherQAChain with additional sanitization layers that validate generated Cypher queries before execution

bash

# Configuration example - Restrict Neo4j user permissions
# Create a read-only user for GraphCypherQAChain operations
# Connect to Neo4j and execute:

# Create restricted user
# CREATE USER langchain_readonly SET PASSWORD 'secure_password' CHANGE NOT REQUIRED;

# Grant only read access
# GRANT ROLE reader TO langchain_readonly;

# Revoke any write permissions
# DENY WRITE ON GRAPH * TO langchain_readonly;

# Update application connection string to use restricted credentials
export NEO4J_URI="bolt://localhost:7687"
export NEO4J_USER="langchain_readonly"
export NEO4J_PASSWORD="secure_password"