CVE-2026-25873 Overview
OmniGen2-RL contains an unauthenticated remote code execution vulnerability in the reward server component that allows remote attackers to execute arbitrary commands by sending malicious HTTP POST requests. Attackers can exploit insecure pickle deserialization of request bodies to achieve code execution on the host system running the exposed service.
This critical vulnerability (CWE-502: Deserialization of Untrusted Data) affects the reward server component of OmniGen2-RL, a reinforcement learning framework. The flaw stems from the application's use of Python's pickle module to deserialize untrusted data from incoming HTTP requests without any authentication or input validation, enabling complete system compromise.
Critical Impact
Unauthenticated attackers can achieve full remote code execution on systems running the OmniGen2-RL reward server by sending specially crafted HTTP POST requests containing malicious pickle payloads, potentially leading to complete host compromise.
Affected Products
- OmniGen2-RL reward server component (reward_proxy.py)
- OmniGen2-RL reward server component (reward_server.py)
- VectorSpaceLab OmniGen2 framework with RL components
Discovery Timeline
- 2026-03-18 - CVE-2026-25873 published to NVD
- 2026-03-19 - Last updated in NVD database
Technical Details for CVE-2026-25873
Vulnerability Analysis
The vulnerability exists in the OmniGen2-RL reward server component, specifically within the HTTP request handling logic of reward_proxy.py and reward_server.py. The application processes incoming HTTP POST requests by directly deserializing the request body using Python's native pickle.loads() function without any prior authentication checks or data validation.
Python's pickle module is inherently unsafe when used with untrusted data because it can execute arbitrary code during the deserialization process. When an attacker sends a malicious pickle payload to the vulnerable endpoint, the server blindly deserializes it, triggering code execution with the privileges of the running process.
The vulnerability is particularly severe because:
- No authentication is required to reach the vulnerable endpoint
- The reward server is designed to be network-accessible for distributed RL training
- Successful exploitation grants the attacker complete control over the host system
Root Cause
The root cause is the insecure use of Python's pickle deserialization on untrusted network input. The vulnerable code in reward_proxy.py (lines 208 and 224) and reward_server.py (line 118) accepts HTTP POST request bodies and passes them directly to pickle.loads() without any sanitization, authentication, or use of safer deserialization alternatives.
This represents a fundamental security anti-pattern, as the Python documentation explicitly warns against using pickle with untrusted data: "Warning: The pickle module is not secure. Only unpickle data you trust."
Attack Vector
The attack is network-based and requires no user interaction or authentication. An attacker can craft a malicious pickle payload that, when deserialized, executes arbitrary Python code. This is typically achieved by defining a class with a __reduce__ method that returns a callable (such as os.system or subprocess.Popen) along with command arguments.
The exploitation process involves:
- Attacker identifies an exposed OmniGen2-RL reward server endpoint
- Attacker constructs a malicious pickle object containing a payload (e.g., reverse shell)
- Attacker sends an HTTP POST request with the malicious pickle as the request body
- The server deserializes the pickle, triggering immediate code execution
- Attacker gains command execution with the privileges of the server process
For detailed technical analysis of the exploitation technique, see the Chocapikk blog post and the VulnCheck Security Advisory.
Detection Methods for CVE-2026-25873
Indicators of Compromise
- Unexpected HTTP POST requests to the reward server port with binary or unusual payload content
- Process spawning from the Python reward server process (e.g., /bin/sh, bash, curl, wget)
- Reverse shell connections originating from hosts running OmniGen2-RL components
- Suspicious network traffic from reward server hosts to unknown external IPs
Detection Strategies
- Monitor for HTTP POST requests containing pickle magic bytes (\\x80\\x04\\x95 for protocol 4 or similar patterns) in request bodies
- Implement network-level monitoring for unexpected outbound connections from reward server hosts
- Deploy application-level logging to capture deserialization events and monitor for __reduce__ method invocations
- Use endpoint detection solutions to identify anomalous child processes spawned by Python interpreters
Monitoring Recommendations
- Enable verbose logging on all OmniGen2-RL reward server instances
- Implement network segmentation to isolate reward servers from untrusted networks
- Deploy intrusion detection rules for pickle deserialization attack patterns
- Monitor system call activity on hosts running the reward server for suspicious execve patterns
How to Mitigate CVE-2026-25873
Immediate Actions Required
- Restrict network access to reward server endpoints using firewall rules or network ACLs
- Avoid exposing the reward server to untrusted networks or the public internet
- Implement network segmentation to ensure only trusted training nodes can reach the reward server
- Consider temporarily disabling the reward server component until a patch is applied
Patch Information
A fix has been proposed in GitHub Pull Request #139 for the OmniGen2 repository. Organizations using OmniGen2-RL should monitor the official repository for merged patches and update to a patched version as soon as one becomes available.
The recommended remediation involves replacing pickle deserialization with safer alternatives such as:
- Using json for data serialization when possible
- Implementing signature verification for pickle data
- Adopting restricted unpicklers that block dangerous reduction functions
Workarounds
- Bind the reward server to localhost only (127.0.0.1) and use SSH tunneling or VPN for remote access
- Implement a reverse proxy with authentication (e.g., mTLS or API keys) in front of the reward server
- Deploy host-based firewall rules to restrict access to specific trusted IP addresses
- Run the reward server in an isolated container or VM to limit blast radius
# Configuration example - Restrict reward server to localhost only
# In reward_server.py startup or configuration:
# Change: server.bind("0.0.0.0", PORT)
# To: server.bind("127.0.0.1", PORT)
# Using iptables to restrict access to reward server port (example port 8080)
iptables -A INPUT -p tcp --dport 8080 -s 127.0.0.1 -j ACCEPT
iptables -A INPUT -p tcp --dport 8080 -s TRUSTED_TRAINING_NODE_IP -j ACCEPT
iptables -A INPUT -p tcp --dport 8080 -j DROP
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


