CVE-2006-10002 Overview
CVE-2006-10002 is a heap-based buffer overflow vulnerability affecting XML::Parser versions through 2.47 for Perl. The vulnerability occurs in the parse_stream() function within Expat.xs when processing XML input streams with a :utf8 PerlIO layer. Due to a mismatch between how Perl's read() returns decoded characters and how SvPV() provides multi-byte UTF-8 bytes, the pre-allocated buffer can be overflowed, leading to heap corruption (double free or corruption) and application crashes.
Critical Impact
Attackers can exploit this vulnerability remotely by providing specially crafted UTF-8 XML content to cause heap corruption and denial of service through application crashes.
Affected Products
- XML::Parser versions through 2.47 for Perl
- Applications using XML::Parser with :utf8 PerlIO layer for stream parsing
- Systems processing untrusted UTF-8 encoded XML content via XML::Parser
Discovery Timeline
- 2026-03-19 - CVE-2006-10002 published to NVD
- 2026-03-19 - Last updated in NVD database
Technical Details for CVE-2006-10002
Vulnerability Analysis
This heap-based buffer overflow (CWE-122) stems from a fundamental miscalculation in buffer allocation when handling UTF-8 encoded XML streams. When the XML::Parser module processes input through a filehandle with the :utf8 PerlIO layer enabled, the parse_stream() function in Expat.xs pre-allocates a buffer based on character count rather than byte count.
The core issue arises because Perl's read() function returns decoded characters when operating on a :utf8 layer, while SvPV() returns the underlying multi-byte UTF-8 representation. For UTF-8 content containing multi-byte characters (such as CJK characters which require 3 bytes per character), the actual byte count can significantly exceed the pre-allocated buffer size, causing a heap buffer overflow.
Root Cause
The root cause lies in the buffer size calculation within parse_stream(). The original code calculated the buffer size as BUFSIZE * 6 bytes, assuming this would accommodate the maximum UTF-8 expansion. However, when Perl's read operation returns decoded characters and those are then accessed via SvPV() as raw bytes, the mismatch causes the byte count to overflow the pre-allocated XML_GetBuffer size. This results in heap corruption, manifesting as double free conditions or memory corruption errors that crash the application.
Attack Vector
The vulnerability is exploitable over the network without requiring authentication or user interaction. An attacker can trigger this condition by providing XML content with a high concentration of multi-byte UTF-8 characters to any application that:
- Uses XML::Parser for parsing
- Opens the XML input stream with the :utf8 PerlIO layer
- Processes untrusted or externally-provided XML content
The following patch demonstrates the fix applied to Expat.xs:
}
else {
tbuff = newSV(0);
- tsiz = newSViv(BUFSIZE); /* in UTF-8 characters */
- buffsize = BUFSIZE * 6; /* in bytes that encode an UTF-8 string */
+ tsiz = newSViv(BUFSIZE);
+ buffsize = BUFSIZE;
}
while (! done)
Source: GitHub Patch Commit
The regression test added demonstrates the vulnerable scenario using Chinese characters (3 bytes each in UTF-8):
+BEGIN { print "1..2\n"; }
+END { print "not ok 1\n" unless $loaded; }
+use XML::Parser;
+$loaded = 1;
+print "ok 1\n";
+
+################################################################
+# Test parsing from a filehandle with :utf8 layer
+# Regression test for rt.cpan.org #19859 / GitHub issue #64
+# A UTF-8 stream caused buffer overflow because SvPV byte count
+# could exceed the pre-allocated XML_GetBuffer size.
+
+use File::Temp qw(tempfile);
+
+# Create a temp file with UTF-8 XML content containing multi-byte chars
+my ($fh, $tmpfile) = tempfile(UNLINK => 1);
+binmode($fh, ':raw');
+# Write raw UTF-8 bytes: XML with Chinese characters (3 bytes each in UTF-8)
+# U+4E16 U+754C (世界 = "world") repeated to create substantial multi-byte content
+my $body = "\\xe4\\xb8\\x96\\xe7\\x95\\x8c" x 20000; # 120000 bytes / 40000 chars of 3-byte UTF-8
+print $fh qq(<?xml version="1.0" encoding="UTF-8"?>\n<doc>$body</doc>\n);
+close($fh);
+
+my $text = '';
+my $parser = XML::Parser->new(
+ Handlers => {
+ Char => sub { $text .= $_[1]; },
+ }
+);
+
Source: GitHub Patch Commit
Detection Methods for CVE-2006-10002
Indicators of Compromise
- Application crashes with heap corruption errors (double free, memory corruption) when processing UTF-8 XML content
- Segmentation faults or core dumps originating from Expat.xs or XML::Parser module code paths
- Unexpected process terminations during XML parsing operations involving multi-byte character content
Detection Strategies
- Monitor for abnormal process terminations in applications using XML::Parser for XML processing
- Implement application-level logging to track XML parsing failures and memory errors
- Use memory debugging tools (Valgrind, AddressSanitizer) during development to detect heap corruption patterns
- Audit Perl applications for XML::Parser usage with :utf8 filehandle layers
Monitoring Recommendations
- Configure crash monitoring and alerting for services that process XML content via Perl applications
- Review application logs for repeated XML parsing failures or memory allocation errors
- Implement resource monitoring to detect unusual memory patterns in XML processing services
How to Mitigate CVE-2006-10002
Immediate Actions Required
- Upgrade XML::Parser to a patched version that includes the fix from commit 6b291f4d260fc124a6ec80382b87a918f372bc6b
- Audit Perl applications for XML::Parser usage with :utf8 PerlIO layers on untrusted input
- Consider using binary mode for XML input handling as a temporary workaround
- Implement input validation to limit or sanitize multi-byte UTF-8 content in XML streams
Patch Information
The vulnerability has been addressed in commit 6b291f4d260fc124a6ec80382b87a918f372bc6b in the XML::Parser repository. The fix corrects the buffer size calculation by aligning the pre-allocated buffer size with the actual byte requirements. Users should update to the latest version of XML::Parser that incorporates this patch.
For detailed patch information, refer to:
Workarounds
- Avoid using the :utf8 PerlIO layer when opening filehandles for XML::Parser input streams
- Process XML content in binary mode and handle encoding conversion separately
- Implement input size limits on XML content containing multi-byte UTF-8 characters
- Consider using alternative XML parsing modules that do not exhibit this buffer handling issue
# Example: Open XML file in binary mode instead of :utf8
# Vulnerable approach (avoid):
# open(my $fh, '<:utf8', 'input.xml');
# Safer approach:
open(my $fh, '<:raw', 'input.xml') or die "Cannot open file: $!";
my $parser = XML::Parser->new();
$parser->parse($fh);
close($fh);
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


