CVE-2026-34760: vLLM Audio Processing Vulnerability

CVE-2026-34760 Overview

CVE-2026-34760 is an Input Validation Error vulnerability affecting vLLM, a popular inference and serving engine for large language models (LLMs). The vulnerability stems from Librosa's use of numpy.mean for mono downmixing (to_mono), which deviates from the international standard ITU-R BS.775-4 that specifies a weighted downmixing algorithm. This discrepancy creates an inconsistency between audio as heard by humans (through headphones or speakers) and audio processed by AI models using Librosa-based infrastructure such as vLLM and transformer libraries.

Critical Impact
Audio processed by affected vLLM versions may be interpreted differently by AI models compared to human perception, potentially leading to incorrect model outputs, audio classification errors, or manipulation of audio-based AI systems through adversarial inputs that exploit the downmixing inconsistency.

Affected Products

vLLM versions 0.5.5 through 0.17.x (prior to version 0.18.0)
Systems using Librosa for audio preprocessing in AI/ML pipelines
Transformer-based audio processing implementations relying on affected vLLM versions

Discovery Timeline

2026-04-02 - CVE CVE-2026-34760 published to NVD
2026-04-02 - Last updated in NVD database

Technical Details for CVE-2026-34760

Vulnerability Analysis

The vulnerability is classified under CWE-20 (Improper Input Validation) and relates to how audio data is preprocessed before being fed into large language models. When stereo or multi-channel audio is converted to mono format, the affected versions of vLLM rely on Librosa's default behavior which uses a simple arithmetic mean (numpy.mean) to combine channels. This approach does not account for psychoacoustic principles defined in ITU-R BS.775-4, which specifies weighted coefficients for proper stereo-to-mono conversion.

The practical impact allows an attacker to craft audio that sounds benign or expected to human listeners but is processed differently by the AI model due to the non-standard downmixing. This creates a potential attack surface for audio-based adversarial inputs where the human-audible content differs from what the model "hears" and processes.

Root Cause

The root cause lies in vLLM's dependency on the Librosa library for audio processing. Librosa's to_mono function applies equal weighting to all audio channels rather than the internationally standardized weighted algorithm. This design choice, while computationally simpler, introduces a semantic gap between human audio perception and machine audio processing.

Attack Vector

An attacker with network access and low-level privileges could exploit this vulnerability by submitting specially crafted multi-channel audio to a vLLM-based audio processing endpoint. The audio could be designed such that the standard weighted downmix produces one interpretation while the arithmetic mean produces a different interpretation, potentially causing the model to:

Misclassify audio content
Transcribe audio incorrectly
Generate inappropriate or unexpected responses based on manipulated audio input
Bypass audio-based content filtering or verification systems

The attack requires high complexity (AC:H) as it necessitates understanding of both the downmixing algorithms and the target model's behavior to craft effective adversarial audio.

python

# Vulnerable dependency configuration in setup.py (before patch)
# The librosa library uses numpy.mean for mono downmixing
        "audio": [
-            "librosa",
+            "av",
+            "resampy",
             "scipy",
             "soundfile",
             "mistral_common[audio]",
-            "av",
         ],  # Required for audio processing

Source: GitHub Commit Details

Detection Methods for CVE-2026-34760

Indicators of Compromise

Unexpected or anomalous transcription results from audio inputs with significant stereo separation
Audio processing outputs that differ from expected behavior based on human-audible content
Unusual patterns in audio submissions that contain highly divergent stereo channels

Detection Strategies

Audit Python environments for vLLM versions between 0.5.5 and 0.17.x using pip list | grep vllm
Review application dependencies for the presence of librosa in audio processing pipelines
Implement audio input validation to compare weighted vs. unweighted mono conversion results for anomaly detection

Monitoring Recommendations

Monitor audio processing endpoints for inputs with unusual stereo channel characteristics or phase relationships
Log and analyze cases where audio model outputs are flagged as inconsistent with expected behavior
Establish baseline metrics for audio processing fidelity to detect deviations

How to Mitigate CVE-2026-34760

Immediate Actions Required

Upgrade vLLM to version 0.18.0 or later immediately on all production systems
Review and update any custom audio preprocessing pipelines that depend on Librosa's to_mono function
Validate that audio processing infrastructure uses ITU-R BS.775-4 compliant downmixing

Patch Information

The vulnerability has been addressed in vLLM version 0.18.0. The patch removes the Librosa dependency from the audio processing stack and replaces it with av and resampy libraries. The fix was implemented via Pull Request #37058 and is detailed in the GitHub Security Advisory GHSA-6c4r-fmh3-7rh8.

python

# Patched audio asset handling in vllm/assets/audio.py
# Librosa dependency removed and replaced with vllm.multimodal.media.audio

import numpy.typing as npt

-from vllm.utils.import_utils import PlaceholderModule
+from vllm.multimodal.media.audio import load_audio

from .base import VLLM_S3_BUCKET_URL, get_vllm_public_assets

-try:
-    import librosa
-except ImportError:
-    librosa = PlaceholderModule("librosa")  # type: ignore[assignment]

Source: GitHub Commit Details

Workarounds

Implement custom audio preprocessing that applies ITU-R BS.775-4 weighted downmixing before passing audio to vLLM
Convert all audio inputs to mono format using compliant tools before processing with affected vLLM versions
Restrict audio input sources to trusted origins while awaiting upgrade to the patched version

bash

# Upgrade vLLM to patched version
pip install --upgrade vllm>=0.18.0

# Verify installation
pip show vllm | grep Version

# Check for librosa removal in audio dependencies
pip show vllm | grep -A 20 Requires

CVE-2026-34760 Overview

Critical Impact
Audio processed by affected vLLM versions may be interpreted differently by AI models compared to human perception, potentially leading to incorrect model outputs, audio classification errors, or manipulation of audio-based AI systems through adversarial inputs that exploit the downmixing inconsistency.

Affected Products

vLLM versions 0.5.5 through 0.17.x (prior to version 0.18.0)
Systems using Librosa for audio preprocessing in AI/ML pipelines
Transformer-based audio processing implementations relying on affected vLLM versions

Discovery Timeline

2026-04-02 - CVE CVE-2026-34760 published to NVD
2026-04-02 - Last updated in NVD database

Technical Details for CVE-2026-34760

Vulnerability Analysis

Root Cause

Attack Vector

Misclassify audio content
Transcribe audio incorrectly
Generate inappropriate or unexpected responses based on manipulated audio input
Bypass audio-based content filtering or verification systems

The attack requires high complexity (AC:H) as it necessitates understanding of both the downmixing algorithms and the target model's behavior to craft effective adversarial audio.

python

# Vulnerable dependency configuration in setup.py (before patch)
# The librosa library uses numpy.mean for mono downmixing
        "audio": [
-            "librosa",
+            "av",
+            "resampy",
             "scipy",
             "soundfile",
             "mistral_common[audio]",
-            "av",
         ],  # Required for audio processing

Source: GitHub Commit Details

Detection Methods for CVE-2026-34760

Indicators of Compromise

Unexpected or anomalous transcription results from audio inputs with significant stereo separation
Audio processing outputs that differ from expected behavior based on human-audible content
Unusual patterns in audio submissions that contain highly divergent stereo channels

Detection Strategies

Audit Python environments for vLLM versions between 0.5.5 and 0.17.x using pip list | grep vllm
Review application dependencies for the presence of librosa in audio processing pipelines
Implement audio input validation to compare weighted vs. unweighted mono conversion results for anomaly detection

Monitoring Recommendations

Monitor audio processing endpoints for inputs with unusual stereo channel characteristics or phase relationships
Log and analyze cases where audio model outputs are flagged as inconsistent with expected behavior
Establish baseline metrics for audio processing fidelity to detect deviations

How to Mitigate CVE-2026-34760

Immediate Actions Required

Upgrade vLLM to version 0.18.0 or later immediately on all production systems
Review and update any custom audio preprocessing pipelines that depend on Librosa's to_mono function
Validate that audio processing infrastructure uses ITU-R BS.775-4 compliant downmixing

Patch Information

python

# Patched audio asset handling in vllm/assets/audio.py
# Librosa dependency removed and replaced with vllm.multimodal.media.audio

import numpy.typing as npt

-from vllm.utils.import_utils import PlaceholderModule
+from vllm.multimodal.media.audio import load_audio

from .base import VLLM_S3_BUCKET_URL, get_vllm_public_assets

-try:
-    import librosa
-except ImportError:
-    librosa = PlaceholderModule("librosa")  # type: ignore[assignment]

Source: GitHub Commit Details

Workarounds

Implement custom audio preprocessing that applies ITU-R BS.775-4 weighted downmixing before passing audio to vLLM
Convert all audio inputs to mono format using compliant tools before processing with affected vLLM versions
Restrict audio input sources to trusted origins while awaiting upgrade to the patched version

bash

# Upgrade vLLM to patched version
pip install --upgrade vllm>=0.18.0

# Verify installation
pip show vllm | grep Version

# Check for librosa removal in audio dependencies
pip show vllm | grep -A 20 Requires

CVE-2026-34760: vLLM Audio Processing Vulnerability

CVE-2026-34760 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2026-34760

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2026-34760

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2026-34760

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform

CVE-2026-34760: vLLM Audio Processing Vulnerability

CVE-2026-34760 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2026-34760

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2026-34760

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2026-34760

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform