CVE-2020-13091 Overview
CVE-2020-13091 is an insecure deserialization vulnerability affecting pandas through version 1.0.3. The vulnerability allows attackers to unserialize and execute arbitrary commands from an untrusted file when passed to the read_pickle() function, specifically when __reduce__ makes an os.system call. This represents a classic Python pickle deserialization attack vector that can lead to remote code execution.
It is important to note that third parties dispute this issue because the read_pickle() function is documented as unsafe, and it is the user's responsibility to use the function in a secure manner. However, the vulnerability remains significant in environments where pickle files from untrusted sources may be processed.
Critical Impact
Successful exploitation allows attackers to execute arbitrary system commands on the target machine, potentially leading to complete system compromise, data exfiltration, or lateral movement within an organization's network.
Affected Products
- NumFocus pandas versions through 1.0.3
Discovery Timeline
- 2020-05-15 - CVE CVE-2020-13091 published to NVD
- 2024-11-21 - Last updated in NVD database
Technical Details for CVE-2020-13091
Vulnerability Analysis
This vulnerability falls under CWE-502 (Deserialization of Untrusted Data), a well-known class of security issues in Python applications. The pandas read_pickle() function uses Python's native pickle module to deserialize data, which inherently allows arbitrary code execution during the deserialization process.
When a malicious pickle file containing a crafted __reduce__ method is processed, the deserialization routine can be manipulated to invoke system calls. This behavior is by design in Python's pickle module, but creates significant security implications when applications process pickle files from untrusted sources without proper validation.
The vulnerability is exploitable over the network, requiring no authentication or user interaction. An attacker only needs to convince an application to process a malicious pickle file, which could be achieved through various attack scenarios including supply chain attacks, malicious data feeds, or compromised data storage systems.
Root Cause
The root cause stems from Python's pickle serialization format, which supports arbitrary object reconstruction through the __reduce__ method. When pandas calls pickle.load() internally within read_pickle(), any object defined in the pickle file is reconstructed, including objects that define malicious __reduce__ methods designed to execute system commands.
The pandas library does not implement additional validation or sandboxing around the deserialization process, relying instead on user awareness of pickle's inherent risks. While this is documented behavior, it creates a significant attack surface in production environments where data provenance may not be strictly controlled.
Attack Vector
The attack requires an attacker to craft a malicious pickle file containing a specially constructed Python object. When the target application calls pandas.read_pickle() on this file, the malicious payload executes with the privileges of the running process.
Attack scenarios include:
- Data Pipeline Poisoning: Injecting malicious pickle files into data processing pipelines that automatically consume serialized data
- Supply Chain Attacks: Compromising data sources or storage systems to replace legitimate pickle files with malicious versions
- Social Engineering: Tricking users or automated systems into processing attacker-controlled pickle files
The malicious object typically implements a __reduce__ method that returns a tuple containing os.system (or similar callable) and command arguments. When pickle deserializes the object, it invokes this method, executing the attacker's commands.
For detailed technical analysis and proof-of-concept information, refer to the GitHub Vulnerability Report.
Detection Methods for CVE-2020-13091
Indicators of Compromise
- Unexpected calls to pandas.read_pickle() on files from external or untrusted sources
- Process spawning from Python processes that typically only perform data analysis
- Network connections or system command execution originating from pandas-based applications
- Presence of pickle files with unusual or obfuscated object structures
Detection Strategies
- Monitor for calls to os.system, subprocess, or similar functions originating from pickle deserialization contexts
- Implement application-level logging around all read_pickle() invocations to track file sources
- Use static analysis tools to identify code paths that may process untrusted pickle files
- Deploy runtime application security monitoring to detect anomalous behavior during deserialization
Monitoring Recommendations
- Enable comprehensive logging for all data ingestion pipelines that process serialized data
- Implement file integrity monitoring on directories where pickle files are stored or processed
- Monitor Python process behavior for unexpected child process creation or network activity
- Alert on any pickle file processing from user-controllable input sources
How to Mitigate CVE-2020-13091
Immediate Actions Required
- Audit all code paths that call pandas.read_pickle() and verify data sources are trusted
- Replace pickle serialization with safer alternatives such as JSON, CSV, or Parquet formats where possible
- Implement strict access controls on any directories containing pickle files
- Consider using pandas.read_parquet() or pandas.read_csv() as safer alternatives for data interchange
Patch Information
As noted in the pandas documentation, the read_pickle() function is explicitly documented as unsafe for untrusted data. The pandas maintainers consider this documented behavior rather than a security vulnerability requiring a patch. Users are expected to only use read_pickle() with data from trusted sources.
Organizations should treat this as a secure coding issue and implement application-level mitigations rather than waiting for a library patch.
Workarounds
- Never use read_pickle() on data from untrusted or unverified sources
- Migrate data interchange formats to safer alternatives like Parquet (pd.read_parquet()), Feather, or CSV
- Implement cryptographic signatures to verify pickle file integrity and authenticity before processing
- Use containerization or sandboxing to isolate processes that must handle pickle files
# Configuration example - Replace pickle usage with safer Parquet format
# Instead of: df = pd.read_pickle('data.pkl')
# Use: df = pd.read_parquet('data.parquet')
# If pickle is required, verify file hash before loading
sha256sum -c data.pkl.sha256 && python -c "import pandas as pd; df = pd.read_pickle('data.pkl')"
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

