CVE-2023-47248 Overview
CVE-2023-47248 is a critical insecure deserialization vulnerability affecting Apache PyArrow versions 0.14.0 through 14.0.0. The vulnerability exists in the IPC and Parquet readers, allowing arbitrary code execution when an application processes Arrow IPC, Feather, or Parquet data from untrusted sources such as user-supplied input files.
This vulnerability specifically impacts PyArrow and does not affect other Apache Arrow implementations or language bindings. The flaw stems from unsafe deserialization practices in the PyExtensionType autoload functionality, which can be exploited to execute malicious code during data processing.
Critical Impact
Successful exploitation allows remote attackers to achieve arbitrary code execution on systems that process untrusted Arrow IPC, Feather, or Parquet files, potentially leading to complete system compromise.
Affected Products
- Apache PyArrow versions 0.14.0 through 14.0.0
- Applications processing Arrow IPC data from untrusted sources
- Applications processing Feather or Parquet files from untrusted sources
Discovery Timeline
- 2023-11-09 - CVE-2023-47248 published to NVD
- 2025-02-13 - Last updated in NVD database
Technical Details for CVE-2023-47248
Vulnerability Analysis
CVE-2023-47248 represents a dangerous insecure deserialization vulnerability classified under CWE-502 (Deserialization of Untrusted Data). The vulnerability allows attackers to execute arbitrary code by crafting malicious Arrow IPC, Feather, or Parquet files that exploit the automatic type loading mechanism in PyArrow.
When PyArrow processes data files, it automatically deserializes extension type metadata. This automatic deserialization occurs without proper validation of the serialized data, creating an opportunity for attackers to inject malicious payloads. The attack requires no privileges and can be executed remotely if an application accepts file uploads or processes files from network sources.
The impact is severe as successful exploitation grants the attacker the same privileges as the application processing the malicious file, potentially leading to data theft, system compromise, or lateral movement within a network.
Root Cause
The root cause of this vulnerability lies in the PyExtensionType autoload functionality within PyArrow's deserialization routines. The library automatically loads and instantiates Python extension types when reading IPC data without adequate security controls. This design allows maliciously crafted serialized objects to be deserialized and instantiated, triggering arbitrary code execution.
The vulnerable code path exists in python/pyarrow/types.pxi, where extension type metadata is processed during file reading operations without validating the safety of the deserialized content.
Attack Vector
The attack vector is network-based, requiring the victim application to process a maliciously crafted file. Attack scenarios include:
- File Upload Attacks: Applications accepting user-uploaded Parquet or Feather files for data processing
- Data Pipeline Poisoning: Compromising data sources that feed into analytics pipelines using PyArrow
- Supply Chain Attacks: Distributing malicious data files through shared datasets or data marketplaces
The security patch disables the automatic PyExtensionType autoload functionality to prevent unsafe deserialization:
Parameters
----------
storage_type : DataType
+ The underlying storage type for the extension type.
extension_name : str
+ A unique name distinguishing this extension type. The name will be
+ used when deserializing IPC data.
Examples
--------
Source: GitHub Apache Arrow Commit
Detection Methods for CVE-2023-47248
Indicators of Compromise
- Unexpected process spawning from applications that process Arrow/Parquet/Feather files
- Unusual network connections originating from data processing applications
- Anomalous file system activity following Parquet or Feather file processing
- Memory anomalies or crashes in PyArrow-dependent applications
Detection Strategies
- Monitor for suspicious Python process behavior including unexpected child processes or network connections
- Implement file integrity monitoring for applications processing untrusted data files
- Deploy application-level logging to track Parquet and Feather file processing activities
- Use SentinelOne's behavioral AI to detect anomalous code execution patterns following file operations
Monitoring Recommendations
- Audit all applications using PyArrow to identify vulnerable versions in your environment
- Implement strict input validation for file upload functionality accepting Arrow data formats
- Enable enhanced logging for data pipeline applications to capture file processing events
- Deploy endpoint detection to monitor for exploitation attempts targeting PyArrow vulnerabilities
How to Mitigate CVE-2023-47248
Immediate Actions Required
- Upgrade PyArrow to version 14.0.1 or later immediately
- Audit all applications and dependencies that use PyArrow for vulnerable versions
- Review data ingestion pipelines that process untrusted Parquet, Feather, or Arrow IPC files
- Implement input validation to restrict file processing to trusted sources only
Patch Information
Apache has released PyArrow version 14.0.1 which addresses this vulnerability. The fix is available via PyPI and conda-forge. For detailed patch information, see the GitHub Apache Arrow Commit.
Downstream library maintainers should update their dependency requirements to specify PyArrow 14.0.1 or later to ensure consumers receive the patched version.
Workarounds
- Install the pyarrow-hotfix package from PyPI if immediate upgrade is not possible
- Restrict file processing to trusted sources only until patching is complete
- Implement network segmentation to isolate data processing applications
- Consider sandboxing applications that must process untrusted Arrow data files
# Install the hotfix package for older PyArrow versions
pip install pyarrow-hotfix
# Or upgrade directly to the patched version
pip install pyarrow>=14.0.1
# Verify the installed version
pip show pyarrow | grep Version
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

