CVE-2025-46722 Overview
CVE-2025-46722 affects vLLM, an inference and serving engine for large language models (LLMs). The vulnerability exists in vllm/multimodal/hasher.py, where the MultiModalHasher class serializes PIL.Image.Image objects using only obj.tobytes(). This method returns raw pixel data without metadata such as image width, height, or mode. Two images of different dimensions but with identical pixel byte sequences can produce the same hash value. The flaw enables hash collisions, incorrect cache hits, and potential data leakage between requests. The issue affects vLLM versions 0.7.0 through 0.9.0 and is tracked under [CWE-1023] (Incomplete Comparison with Missing Factors).
Critical Impact
Hash collisions in the multimodal cache can return another user's cached inference results, leading to cross-request data leakage in shared vLLM deployments.
Affected Products
- vLLM versions 0.7.0 through 0.8.x
- vLLM multimodal inference deployments using MultiModalHasher
- LLM serving infrastructure relying on PIL image caching in vLLM
Discovery Timeline
- 2025-05-29 - CVE-2025-46722 published to NVD
- 2025-06-24 - Last updated in NVD database
Technical Details for CVE-2025-46722
Vulnerability Analysis
The vulnerability resides in the input serialization logic used by vLLM's multimodal cache. The MultiModalHasher class generates content-addressable hashes for incoming multimodal inputs, including images. These hashes drive cache lookups that reuse precomputed embeddings across inference requests.
The original implementation called Image.tobytes() on PIL.Image.Image objects. This method emits only the raw pixel buffer and omits dimensional metadata. An attacker can craft two images with identical pixel byte streams but different shapes, for example a 30x100 image and a 100x30 image. Both images hash to the same value and collide in the cache.
When a collision occurs, vLLM may return cached embeddings or model outputs computed for a different user's image. The result is incorrect inference output and potential exposure of another tenant's data in shared serving environments.
Root Cause
The root cause is incomplete object serialization for hashing. The serializer ignored shape, dtype, and color mode when converting images to bytes. Numeric scalars and NumPy arrays were also serialized without type or shape context. Hash equality therefore did not imply semantic equality of the inputs.
Attack Vector
The attack vector is network-based and requires no authentication when the vLLM endpoint is exposed. An attacker submits crafted images to the inference API. If a colliding cache entry exists, the server returns embeddings or outputs derived from another request's input. Exploitation does not require code execution or privilege escalation on the vLLM host.
# Patch from vllm/multimodal/hasher.py (vLLM 0.9.0)
return obj.encode("utf-8")
if isinstance(obj, bytes):
return obj
- if isinstance(obj, Image.Image):
- return obj.tobytes()
+ if isinstance(obj, (int, float)):
+ return np.array(obj).tobytes()
- # Convertible to NumPy arrays
+ if isinstance(obj, Image.Image):
+ return cls.item_to_bytes("image", np.array(obj.convert("RGBA")))
if isinstance(obj, torch.Tensor):
- obj = obj.numpy()
- if isinstance(obj, (int, float)):
- obj = np.array(obj)
+ return cls.item_to_bytes("tensor", obj.numpy())
if isinstance(obj, np.ndarray):
- return obj.tobytes()
+ return cls.item_to_bytes(
+ "ndarray", {
+ "dtype": obj.dtype.str,
+ "shape": obj.shape,
+ "data": obj.data.tobytes(),
+ })
Source: vLLM Security Patch Commit 99404f5
The patch converts images to RGBA NumPy arrays and serializes ndarrays with explicit dtype and shape fields. This guarantees that different image dimensions or color modes produce different hash inputs.
Detection Methods for CVE-2025-46722
Indicators of Compromise
- Unexpected or duplicated inference responses returned to different multimodal API requests
- Anomalously high cache hit rates on the vLLM multimodal cache for distinct input images
- API logs showing requests with images of differing dimensions producing identical processing times consistent with cached responses
- vLLM server versions reported between 0.7.0 and 0.8.x in deployment manifests or pip show vllm output
Detection Strategies
- Inventory running vLLM instances and compare installed versions against the fixed release 0.9.0
- Review container images, Kubernetes pods, and Python environments for the vulnerable vllm/multimodal/hasher.py implementation
- Audit application logs for multimodal inference requests where output content does not correspond to the submitted image
- Test deployments with two images that share pixel bytes but differ in shape and verify that responses differ
Monitoring Recommendations
- Monitor vLLM cache metrics for abnormal hit ratios that may indicate collision-driven reuse
- Log image metadata (dimensions, mode, hash) per request to support post-incident correlation
- Alert on outbound responses containing data unrelated to the requesting tenant in multi-tenant inference platforms
- Track network access to vLLM API endpoints and restrict exposure to trusted clients only
How to Mitigate CVE-2025-46722
Immediate Actions Required
- Upgrade vLLM to version 0.9.0 or later, which contains the fix in commit 99404f5
- Identify all vLLM deployments running versions 0.7.0 through 0.8.x and schedule remediation
- Flush existing multimodal caches after upgrading to remove entries computed with the vulnerable hash function
- Restrict network access to vLLM inference endpoints until patching is complete
Patch Information
The fix is included in vLLM 0.9.0 via pull request vllm-project/vllm#17378 and commit 99404f53c72965b41558aceb1bc2380875f5d848. Full details are available in GitHub Security Advisory GHSA-c65p-x677-fgj6. The patch reroutes image serialization through np.array(obj.convert("RGBA")) and adds dtype and shape fields to ndarray serialization.
Workarounds
- Disable the multimodal input cache on vLLM serving instances if upgrading is not immediately feasible
- Place vLLM behind an authenticated reverse proxy and segregate tenants into isolated inference instances
- Pre-normalize all incoming images to a fixed canonical size and mode before submission to vLLM
# Upgrade vLLM to the patched release
pip install --upgrade "vllm>=0.9.0"
# Verify installed version
python -c "import vllm; print(vllm.__version__)"
# Optional: clear multimodal cache directory after upgrade
rm -rf ~/.cache/vllm/multimodal/*
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


