CVE-2023-47038 Overview
A heap-based buffer overflow vulnerability was discovered in Perl versions 5.30.0 through 5.38.0. This security flaw occurs when a crafted regular expression containing an illegal user-defined Unicode property is compiled by the Perl interpreter. The vulnerability allows an attacker to trigger a controlled byte buffer overflow in a heap-allocated buffer, potentially leading to arbitrary code execution or application crashes.
Critical Impact
This vulnerability enables attackers to write past buffer boundaries when processing malicious regular expressions, which could result in code execution, denial of service, or memory corruption in applications processing untrusted Perl code.
Affected Products
- Perl versions 5.30.0 through 5.38.0
- Fedora 39
- Red Hat Enterprise Linux 8.0 and 9.0
- Red Hat Enterprise Linux AUS 9.4
- Red Hat Enterprise Linux EUS 9.4
Discovery Timeline
- December 18, 2023 - CVE-2023-47038 published to NVD
- November 4, 2025 - Last updated in NVD database
Technical Details for CVE-2023-47038
Vulnerability Analysis
The vulnerability resides in Perl's regular expression compilation engine, specifically in the regcomp.c file. When processing user-defined Unicode properties within regular expressions, the parser fails to properly validate property names before accessing internal buffers. The issue stems from improper initialization and tracking of index variables during the parsing of Unicode property specifications.
An attacker can craft a malicious regular expression using illegal user-defined Unicode property names (such as \p{utf8::perl x}) to trigger read and write operations past the allocated buffer boundaries. This heap-based buffer overflow can corrupt adjacent memory structures, potentially allowing code execution or causing application crashes.
Root Cause
The root cause lies in the variable initialization within the Unicode property parsing code in regcomp.c. The original code declared variables i and j without proper initialization tracking, leading to scenarios where buffer operations could proceed with incorrect index values. The fix introduces proper initialization of variables (i = 0, i_zero = 0, j = 0) and adds validation for illegal user-defined property names to prevent the out-of-bounds access condition.
Attack Vector
The attack requires local access and the ability to supply crafted Perl code to a vulnerable interpreter. The attacker must construct a regular expression with a malformed Unicode property specification that triggers the buffer overflow condition during regex compilation. Applications that process untrusted Perl regular expressions from external sources are at elevated risk.
// Security patch in regcomp.c - Fix read/write past buffer end: perl-security#140
* compile perl to know about them) */
bool is_nv_type = FALSE;
- unsigned int i, j = 0;
+ unsigned int i = 0, i_zero = 0, j = 0;
int equals_pos = -1; /* Where the '=' is found, or negative if none */
int slash_pos = -1; /* Where the '/' is found, or negative if none */
int table_index = 0; /* The entry number for this property in the table
Source: GitHub Perl Commit 12c313c
// Security patch in t/re/pat_advanced.t - Test case for perl-security#140
{ # perl-security#140, read/write past buffer end
fresh_perl_like('qr/\p{utf8::perl x}/',
qr/Illegal user-defined property name "utf8::perl x" in regex/,
{}, "perl-security#140");
fresh_perl_is('qr/\p{utf8::_perl_surrogate}/', "",
{}, "perl-security#140");
}
Source: GitHub Perl Commit 12c313c
Detection Methods for CVE-2023-47038
Indicators of Compromise
- Unexpected crashes in Perl interpreter or applications during regex compilation
- Memory corruption errors or segmentation faults in Perl processes
- Attempts to compile regular expressions containing unusual Unicode property patterns like \p{utf8::...} with invalid names
- Heap corruption detected by memory analysis tools during Perl execution
Detection Strategies
- Monitor system logs for Perl process crashes during regex operations, particularly segmentation faults
- Implement code scanning to identify usage of user-defined Unicode properties in regex patterns
- Use vulnerability scanners to identify affected Perl versions in your environment
- Deploy application-level monitoring for regex compilation failures with memory-related errors
Monitoring Recommendations
- Enable core dump collection for Perl processes to assist with post-incident analysis
- Monitor for unusual patterns in Perl script execution, especially those processing external input
- Track Perl interpreter version deployments across your infrastructure using asset inventory tools
- Set up alerts for memory violation signals (SIGSEGV, SIGABRT) in Perl processes
How to Mitigate CVE-2023-47038
Immediate Actions Required
- Upgrade Perl to version 5.38.2 or later which contains the security fix
- Review and audit applications that process untrusted regular expressions
- Restrict user-supplied input that could influence regex compilation
- Apply vendor-supplied patches from your Linux distribution
Patch Information
The vulnerability has been addressed in Perl version 5.38.2. Multiple commits were applied to fix the issue, including commits 12c313c, 7047915, and ff1f9f5. Red Hat has released security advisories RHSA-2024:2228 and RHSA-2024:3128 for Enterprise Linux. Fedora users should apply updates via the Fedora Package Announcements.
Workarounds
- Sanitize and validate all regex patterns before compilation, rejecting patterns with user-defined Unicode properties
- Implement application-level sandboxing for Perl processes that handle untrusted input
- Use Safe module to restrict Perl operations in environments processing external code
- Consider containerization with resource limits to contain potential exploitation attempts
# Check installed Perl version
perl -v
# Update Perl on RHEL/CentOS systems
sudo dnf update perl
# Update Perl on Fedora systems
sudo dnf upgrade perl
# Update Perl on Debian/Ubuntu systems
sudo apt update && sudo apt upgrade perl
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

