CVE-2026-28350 Overview
CVE-2026-28350 is a security vulnerability in lxml_html_clean, a Python project providing HTML cleaning functionalities originally copied from lxml.html.clean. Prior to version 0.4.4, the <base> tag passes through the default Cleaner configuration, allowing attackers to inject it and hijack relative links on the page. While the page_structure=True setting removes html, head, and title tags, there is no specific handling for <base> tags, creating a gap in the sanitization process.
Critical Impact
Attackers can inject <base> tags to hijack all relative URLs on a page, potentially redirecting users to malicious destinations and enabling phishing or credential theft attacks.
Affected Products
- lxml_html_clean versions prior to 0.4.4
- Applications using lxml_html_clean with default Cleaner configuration
- Systems relying on lxml_html_clean for HTML sanitization without explicit <base> tag removal
Discovery Timeline
- 2026-03-05 - CVE CVE-2026-28350 published to NVD
- 2026-03-05 - Last updated in NVD database
Technical Details for CVE-2026-28350
Vulnerability Analysis
This vulnerability stems from improper output encoding for the intended context (CWE-116). The lxml_html_clean library's Cleaner class is designed to sanitize HTML content by removing potentially dangerous elements and attributes. However, the default configuration fails to account for the <base> HTML tag, which defines the base URL for all relative URLs in a document.
When an attacker injects a <base> tag with a malicious href attribute into HTML content that gets processed by the Cleaner, the tag passes through unfiltered. This causes all relative links, image sources, and other URL references on the page to resolve against the attacker-controlled base URL rather than the legitimate origin. The attack requires user interaction (clicking a link or loading resources), but once the malicious base tag is in place, all relative navigation becomes compromised.
Root Cause
The root cause is the absence of explicit handling for <base> tags in the Cleaner's default configuration. According to HTML specifications, <base> must reside within the <head> element. However, browsers may interpret misplaced <base> tags even when outside of <head>, allowing the attack to succeed even when page_structure=True removes the <head> tag itself. The Cleaner's logic did not account for this browser behavior, leaving a sanitization gap.
Attack Vector
The attack is network-based and requires no authentication. An attacker can exploit this vulnerability by submitting HTML content containing a <base> tag through any input that gets processed by the vulnerable lxml_html_clean Cleaner. When this content is rendered to users, the injected <base> tag redirects all relative URLs to an attacker-controlled domain.
The following patch was applied to address the vulnerability:
if self.annoying_tags:
remove_tags.update(('blink', 'marquee'))
+ # Remove <base> tags whenever <head> is being removed.
+ # According to HTML spec, <base> must be in <head>, but browsers
+ # may interpret it even when misplaced, allowing URL hijacking attacks.
+ if 'head' in kill_tags or 'head' in remove_tags:
+ kill_tags.add('base')
+
_remove = deque()
_kill = deque()
for el in doc.iter():
Source: GitHub Commit
Detection Methods for CVE-2026-28350
Indicators of Compromise
- Presence of <base> tags in sanitized HTML output where none should exist
- User reports of being redirected to unexpected or suspicious domains
- Web application logs showing unusual URL patterns or external domain references in relative paths
- Anomalous outbound connections from user browsers to unknown external hosts
Detection Strategies
- Implement content security policies (CSP) with base-uri directives to restrict allowed base URLs
- Audit HTML output from lxml_html_clean for unexpected <base> tag presence
- Monitor web server logs for signs of URL manipulation or unexpected redirects
- Deploy web application firewalls (WAF) with rules to detect <base> tag injection attempts
Monitoring Recommendations
- Enable logging for all HTML sanitization operations to track input/output patterns
- Set up alerts for CSP violations related to base-uri restrictions
- Implement automated scanning of rendered HTML for unauthorized <base> tags
- Monitor user-facing applications for phishing indicators or redirect anomalies
How to Mitigate CVE-2026-28350
Immediate Actions Required
- Upgrade lxml_html_clean to version 0.4.4 or later immediately
- Review all applications using lxml_html_clean to ensure the updated version is deployed
- Implement Content Security Policy headers with explicit base-uri 'self' or base-uri 'none' directives
- Audit existing sanitized content for potentially injected <base> tags
Patch Information
The vulnerability has been patched in lxml_html_clean version 0.4.4. The fix ensures that <base> tags are automatically removed whenever the <head> tag is removed (via page_structure=True or manual configuration). This aligns with HTML specifications which require <base> to reside within <head>.
For detailed patch information, refer to the GitHub Security Advisory and the commit implementing the fix.
Workarounds
- Manually add base to the kill_tags parameter when instantiating the Cleaner class
- Implement post-processing to strip <base> tags from sanitized HTML output
- Deploy CSP headers with strict base-uri directives as an additional layer of defense
# Upgrade lxml_html_clean to patched version
pip install --upgrade lxml_html_clean>=0.4.4
# Verify installed version
pip show lxml_html_clean | grep Version
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.

