CVE-2026-28350: lxml_html_clean XSS Vulnerability

CVE-2026-28350 Overview

CVE-2026-28350 is a security vulnerability in lxml_html_clean, a Python project providing HTML cleaning functionalities originally copied from lxml.html.clean. Prior to version 0.4.4, the <base> tag passes through the default Cleaner configuration, allowing attackers to inject it and hijack relative links on the page. While the page_structure=True setting removes html, head, and title tags, there is no specific handling for <base> tags, creating a gap in the sanitization process.

Critical Impact
Attackers can inject <base> tags to hijack all relative URLs on a page, potentially redirecting users to malicious destinations and enabling phishing or credential theft attacks.

Affected Products

lxml_html_clean versions prior to 0.4.4
Applications using lxml_html_clean with default Cleaner configuration
Systems relying on lxml_html_clean for HTML sanitization without explicit <base> tag removal

Discovery Timeline

2026-03-05 - CVE CVE-2026-28350 published to NVD
2026-03-05 - Last updated in NVD database

Technical Details for CVE-2026-28350

Vulnerability Analysis

This vulnerability stems from improper output encoding for the intended context (CWE-116). The lxml_html_clean library's Cleaner class is designed to sanitize HTML content by removing potentially dangerous elements and attributes. However, the default configuration fails to account for the <base> HTML tag, which defines the base URL for all relative URLs in a document.

When an attacker injects a <base> tag with a malicious href attribute into HTML content that gets processed by the Cleaner, the tag passes through unfiltered. This causes all relative links, image sources, and other URL references on the page to resolve against the attacker-controlled base URL rather than the legitimate origin. The attack requires user interaction (clicking a link or loading resources), but once the malicious base tag is in place, all relative navigation becomes compromised.

Root Cause

The root cause is the absence of explicit handling for <base> tags in the Cleaner's default configuration. According to HTML specifications, <base> must reside within the <head> element. However, browsers may interpret misplaced <base> tags even when outside of <head>, allowing the attack to succeed even when page_structure=True removes the <head> tag itself. The Cleaner's logic did not account for this browser behavior, leaving a sanitization gap.

Attack Vector

The attack is network-based and requires no authentication. An attacker can exploit this vulnerability by submitting HTML content containing a <base> tag through any input that gets processed by the vulnerable lxml_html_clean Cleaner. When this content is rendered to users, the injected <base> tag redirects all relative URLs to an attacker-controlled domain.

The following patch was applied to address the vulnerability:

python

         if self.annoying_tags:
             remove_tags.update(('blink', 'marquee'))
 
+        # Remove <base> tags whenever <head> is being removed.
+        # According to HTML spec, <base> must be in <head>, but browsers
+        # may interpret it even when misplaced, allowing URL hijacking attacks.
+        if 'head' in kill_tags or 'head' in remove_tags:
+            kill_tags.add('base')
+
         _remove = deque()
         _kill = deque()
         for el in doc.iter():

Source: GitHub Commit

Detection Methods for CVE-2026-28350

Indicators of Compromise

Presence of <base> tags in sanitized HTML output where none should exist
User reports of being redirected to unexpected or suspicious domains
Web application logs showing unusual URL patterns or external domain references in relative paths
Anomalous outbound connections from user browsers to unknown external hosts

Detection Strategies

Implement content security policies (CSP) with base-uri directives to restrict allowed base URLs
Audit HTML output from lxml_html_clean for unexpected <base> tag presence
Monitor web server logs for signs of URL manipulation or unexpected redirects
Deploy web application firewalls (WAF) with rules to detect <base> tag injection attempts

Monitoring Recommendations

Enable logging for all HTML sanitization operations to track input/output patterns
Set up alerts for CSP violations related to base-uri restrictions
Implement automated scanning of rendered HTML for unauthorized <base> tags
Monitor user-facing applications for phishing indicators or redirect anomalies

How to Mitigate CVE-2026-28350

Immediate Actions Required

Upgrade lxml_html_clean to version 0.4.4 or later immediately
Review all applications using lxml_html_clean to ensure the updated version is deployed
Implement Content Security Policy headers with explicit base-uri 'self' or base-uri 'none' directives
Audit existing sanitized content for potentially injected <base> tags

Patch Information

The vulnerability has been patched in lxml_html_clean version 0.4.4. The fix ensures that <base> tags are automatically removed whenever the <head> tag is removed (via page_structure=True or manual configuration). This aligns with HTML specifications which require <base> to reside within <head>.

For detailed patch information, refer to the GitHub Security Advisory and the commit implementing the fix.

Workarounds

Manually add base to the kill_tags parameter when instantiating the Cleaner class
Implement post-processing to strip <base> tags from sanitized HTML output
Deploy CSP headers with strict base-uri directives as an additional layer of defense

bash

# Upgrade lxml_html_clean to patched version
pip install --upgrade lxml_html_clean>=0.4.4

# Verify installed version
pip show lxml_html_clean | grep Version

CVE-2026-28350 Overview

Critical Impact
Attackers can inject <base> tags to hijack all relative URLs on a page, potentially redirecting users to malicious destinations and enabling phishing or credential theft attacks.

Affected Products

lxml_html_clean versions prior to 0.4.4
Applications using lxml_html_clean with default Cleaner configuration
Systems relying on lxml_html_clean for HTML sanitization without explicit <base> tag removal

Discovery Timeline

2026-03-05 - CVE CVE-2026-28350 published to NVD
2026-03-05 - Last updated in NVD database

Technical Details for CVE-2026-28350

Vulnerability Analysis

Root Cause

Attack Vector

The following patch was applied to address the vulnerability:

python

         if self.annoying_tags:
             remove_tags.update(('blink', 'marquee'))
 
+        # Remove <base> tags whenever <head> is being removed.
+        # According to HTML spec, <base> must be in <head>, but browsers
+        # may interpret it even when misplaced, allowing URL hijacking attacks.
+        if 'head' in kill_tags or 'head' in remove_tags:
+            kill_tags.add('base')
+
         _remove = deque()
         _kill = deque()
         for el in doc.iter():

Source: GitHub Commit

Detection Methods for CVE-2026-28350

Indicators of Compromise

Presence of <base> tags in sanitized HTML output where none should exist
User reports of being redirected to unexpected or suspicious domains
Web application logs showing unusual URL patterns or external domain references in relative paths
Anomalous outbound connections from user browsers to unknown external hosts

Detection Strategies

Implement content security policies (CSP) with base-uri directives to restrict allowed base URLs
Audit HTML output from lxml_html_clean for unexpected <base> tag presence
Monitor web server logs for signs of URL manipulation or unexpected redirects
Deploy web application firewalls (WAF) with rules to detect <base> tag injection attempts

Monitoring Recommendations

Enable logging for all HTML sanitization operations to track input/output patterns
Set up alerts for CSP violations related to base-uri restrictions
Implement automated scanning of rendered HTML for unauthorized <base> tags
Monitor user-facing applications for phishing indicators or redirect anomalies

How to Mitigate CVE-2026-28350

Immediate Actions Required

Upgrade lxml_html_clean to version 0.4.4 or later immediately
Review all applications using lxml_html_clean to ensure the updated version is deployed
Implement Content Security Policy headers with explicit base-uri 'self' or base-uri 'none' directives
Audit existing sanitized content for potentially injected <base> tags

Patch Information

For detailed patch information, refer to the GitHub Security Advisory and the commit implementing the fix.

Workarounds

Manually add base to the kill_tags parameter when instantiating the Cleaner class
Implement post-processing to strip <base> tags from sanitized HTML output
Deploy CSP headers with strict base-uri directives as an additional layer of defense

bash

# Upgrade lxml_html_clean to patched version
pip install --upgrade lxml_html_clean>=0.4.4

# Verify installed version
pip show lxml_html_clean | grep Version

CVE-2026-28350: lxml_html_clean XSS Vulnerability

CVE-2026-28350 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2026-28350

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2026-28350

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2026-28350

Immediate Actions Required

Patch Information

Workarounds

Experience the Most Advanced Cybersecurity Platform

CVE-2026-28350: lxml_html_clean XSS Vulnerability

CVE-2026-28350 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2026-28350

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2026-28350

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2026-28350

Immediate Actions Required

Patch Information

Workarounds

Experience the Most Advanced Cybersecurity Platform