CVE-2026-5970 Overview
CVE-2026-5970 is a code injection vulnerability affecting FoundationAgents MetaGPT up to version 0.8.1. The flaw resides in the check_solution function within the HumanEvalBenchmark and MBPPBenchmark components. An attacker can manipulate input to this function to inject and execute arbitrary code remotely. The vulnerability is classified under [CWE-74] (Improper Neutralization of Special Elements in Output Used by a Downstream Component). Public exploit details exist, and the maintainers were notified through a pull request but have not yet released a fix.
Critical Impact
Remote attackers can inject code into the MetaGPT benchmark evaluation path, leading to execution within the host process running the agent framework.
Affected Products
- FoundationAgents MetaGPT versions up to and including 0.8.1
- HumanEvalBenchmark component (check_solution function)
- MBPPBenchmark component (check_solution function)
Discovery Timeline
- 2026-04-09 - CVE-2026-5970 published to NVD
- 2026-04-29 - Last updated in NVD database
Technical Details for CVE-2026-5970
Vulnerability Analysis
MetaGPT is a multi-agent framework that orchestrates large language model (LLM) workflows for software engineering tasks. The framework includes benchmark modules such as HumanEvalBenchmark and MBPPBenchmark for evaluating generated code. The check_solution function in these benchmark components processes candidate solutions and executes them to verify correctness.
The function does not sufficiently neutralize special elements in the supplied solution string before passing it to an interpreter. An attacker who controls the input fed into check_solution can embed malicious Python statements that the function will execute during evaluation. Because the framework is typically driven by LLM-generated content or user-supplied test harnesses, the attack surface extends to anyone able to influence benchmark inputs.
The issue was reported to the maintainers through pull request #1988 and tracked in issue #1942, but no patched release is available at publication time.
Root Cause
The root cause is unsafe handling of untrusted code strings inside check_solution. The function evaluates solution payloads without sandboxing or sanitization, allowing arbitrary statements to execute in the host Python interpreter alongside the agent process.
Attack Vector
The vulnerability is exploitable over the network with no authentication or user interaction. An attacker delivers a crafted solution payload to a MetaGPT instance running HumanEvalBenchmark or MBPPBenchmark. When the benchmark invokes check_solution, the embedded code executes with the privileges of the MetaGPT process. Public exploit information is available through VulDB entry #356524.
No verified exploit code has been published from the affected vendor. See the GitHub MetaGPT issue #1942 for technical discussion of the injection path.
Detection Methods for CVE-2026-5970
Indicators of Compromise
- Unexpected child processes spawned by the Python interpreter hosting MetaGPT, especially shells or network utilities
- Outbound network connections from MetaGPT worker processes to unfamiliar destinations during benchmark runs
- Modifications to files outside the MetaGPT working directory shortly after check_solution invocations
Detection Strategies
- Monitor execution of HumanEvalBenchmark and MBPPBenchmark modules for anomalous syscalls such as execve, socket, or file writes to sensitive paths
- Inspect benchmark input payloads for Python constructs like __import__, exec, eval, subprocess, or os.system strings
- Correlate MetaGPT process telemetry with identity context to flag benchmark runs that deviate from baseline behavior
Monitoring Recommendations
- Enable verbose logging on MetaGPT benchmark invocations and forward logs to a centralized analytics platform
- Track process lineage for the MetaGPT runtime to identify post-exploitation activity
- Alert on new outbound connections originating from agent evaluation hosts
How to Mitigate CVE-2026-5970
Immediate Actions Required
- Disable the HumanEvalBenchmark and MBPPBenchmark features in production deployments until a patch is released
- Restrict access to MetaGPT instances by placing them behind authenticated network boundaries
- Run MetaGPT processes under least-privilege accounts and isolate them in containers or sandboxes
Patch Information
No official patched release of MetaGPT is available at the time of publication. A community-submitted fix is pending review at pull request #1988. Track the MetaGPT repository for release updates and apply the fix immediately when published.
Workarounds
- Apply the patch from pull request #1988 manually after independent code review
- Execute benchmark evaluations inside ephemeral sandboxes such as gVisor, Firecracker, or rootless containers without network egress
- Validate and filter solution payloads to reject Python constructs associated with code execution before passing them to check_solution
- Remove the benchmark modules from deployments that do not require evaluation workflows
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


