CVE-2026-34760 Overview
CVE-2026-34760 is an Input Validation Error vulnerability affecting vLLM, a popular inference and serving engine for large language models (LLMs). The vulnerability stems from Librosa's use of numpy.mean for mono downmixing (to_mono), which deviates from the international standard ITU-R BS.775-4 that specifies a weighted downmixing algorithm. This discrepancy creates an inconsistency between audio as heard by humans (through headphones or speakers) and audio processed by AI models using Librosa-based infrastructure such as vLLM and transformer libraries.
Critical Impact
Audio processed by affected vLLM versions may be interpreted differently by AI models compared to human perception, potentially leading to incorrect model outputs, audio classification errors, or manipulation of audio-based AI systems through adversarial inputs that exploit the downmixing inconsistency.
Affected Products
- vLLM versions 0.5.5 through 0.17.x (prior to version 0.18.0)
- Systems using Librosa for audio preprocessing in AI/ML pipelines
- Transformer-based audio processing implementations relying on affected vLLM versions
Discovery Timeline
- 2026-04-02 - CVE CVE-2026-34760 published to NVD
- 2026-04-02 - Last updated in NVD database
Technical Details for CVE-2026-34760
Vulnerability Analysis
The vulnerability is classified under CWE-20 (Improper Input Validation) and relates to how audio data is preprocessed before being fed into large language models. When stereo or multi-channel audio is converted to mono format, the affected versions of vLLM rely on Librosa's default behavior which uses a simple arithmetic mean (numpy.mean) to combine channels. This approach does not account for psychoacoustic principles defined in ITU-R BS.775-4, which specifies weighted coefficients for proper stereo-to-mono conversion.
The practical impact allows an attacker to craft audio that sounds benign or expected to human listeners but is processed differently by the AI model due to the non-standard downmixing. This creates a potential attack surface for audio-based adversarial inputs where the human-audible content differs from what the model "hears" and processes.
Root Cause
The root cause lies in vLLM's dependency on the Librosa library for audio processing. Librosa's to_mono function applies equal weighting to all audio channels rather than the internationally standardized weighted algorithm. This design choice, while computationally simpler, introduces a semantic gap between human audio perception and machine audio processing.
Attack Vector
An attacker with network access and low-level privileges could exploit this vulnerability by submitting specially crafted multi-channel audio to a vLLM-based audio processing endpoint. The audio could be designed such that the standard weighted downmix produces one interpretation while the arithmetic mean produces a different interpretation, potentially causing the model to:
- Misclassify audio content
- Transcribe audio incorrectly
- Generate inappropriate or unexpected responses based on manipulated audio input
- Bypass audio-based content filtering or verification systems
The attack requires high complexity (AC:H) as it necessitates understanding of both the downmixing algorithms and the target model's behavior to craft effective adversarial audio.
# Vulnerable dependency configuration in setup.py (before patch)
# The librosa library uses numpy.mean for mono downmixing
"audio": [
- "librosa",
+ "av",
+ "resampy",
"scipy",
"soundfile",
"mistral_common[audio]",
- "av",
], # Required for audio processing
Source: GitHub Commit Details
Detection Methods for CVE-2026-34760
Indicators of Compromise
- Unexpected or anomalous transcription results from audio inputs with significant stereo separation
- Audio processing outputs that differ from expected behavior based on human-audible content
- Unusual patterns in audio submissions that contain highly divergent stereo channels
Detection Strategies
- Audit Python environments for vLLM versions between 0.5.5 and 0.17.x using pip list | grep vllm
- Review application dependencies for the presence of librosa in audio processing pipelines
- Implement audio input validation to compare weighted vs. unweighted mono conversion results for anomaly detection
Monitoring Recommendations
- Monitor audio processing endpoints for inputs with unusual stereo channel characteristics or phase relationships
- Log and analyze cases where audio model outputs are flagged as inconsistent with expected behavior
- Establish baseline metrics for audio processing fidelity to detect deviations
How to Mitigate CVE-2026-34760
Immediate Actions Required
- Upgrade vLLM to version 0.18.0 or later immediately on all production systems
- Review and update any custom audio preprocessing pipelines that depend on Librosa's to_mono function
- Validate that audio processing infrastructure uses ITU-R BS.775-4 compliant downmixing
Patch Information
The vulnerability has been addressed in vLLM version 0.18.0. The patch removes the Librosa dependency from the audio processing stack and replaces it with av and resampy libraries. The fix was implemented via Pull Request #37058 and is detailed in the GitHub Security Advisory GHSA-6c4r-fmh3-7rh8.
# Patched audio asset handling in vllm/assets/audio.py
# Librosa dependency removed and replaced with vllm.multimodal.media.audio
import numpy.typing as npt
-from vllm.utils.import_utils import PlaceholderModule
+from vllm.multimodal.media.audio import load_audio
from .base import VLLM_S3_BUCKET_URL, get_vllm_public_assets
-try:
- import librosa
-except ImportError:
- librosa = PlaceholderModule("librosa") # type: ignore[assignment]
Source: GitHub Commit Details
Workarounds
- Implement custom audio preprocessing that applies ITU-R BS.775-4 weighted downmixing before passing audio to vLLM
- Convert all audio inputs to mono format using compliant tools before processing with affected vLLM versions
- Restrict audio input sources to trusted origins while awaiting upgrade to the patched version
# Upgrade vLLM to patched version
pip install --upgrade vllm>=0.18.0
# Verify installation
pip show vllm | grep Version
# Check for librosa removal in audio dependencies
pip show vllm | grep -A 20 Requires
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


