CVE-2025-46722: Vllm Information Disclosure Vulnerability

CVE-2025-46722 Overview

CVE-2025-46722 affects vLLM, an inference and serving engine for large language models (LLMs). The vulnerability exists in vllm/multimodal/hasher.py, where the MultiModalHasher class serializes PIL.Image.Image objects using only obj.tobytes(). This method returns raw pixel data without metadata such as image width, height, or mode. Two images of different dimensions but with identical pixel byte sequences can produce the same hash value. The flaw enables hash collisions, incorrect cache hits, and potential data leakage between requests. The issue affects vLLM versions 0.7.0 through 0.9.0 and is tracked under [CWE-1023] (Incomplete Comparison with Missing Factors).

Critical Impact
Hash collisions in the multimodal cache can return another user's cached inference results, leading to cross-request data leakage in shared vLLM deployments.

Affected Products

vLLM versions 0.7.0 through 0.8.x
vLLM multimodal inference deployments using MultiModalHasher
LLM serving infrastructure relying on PIL image caching in vLLM

Discovery Timeline

2025-05-29 - CVE-2025-46722 published to NVD
2025-06-24 - Last updated in NVD database

Technical Details for CVE-2025-46722

Vulnerability Analysis

The vulnerability resides in the input serialization logic used by vLLM's multimodal cache. The MultiModalHasher class generates content-addressable hashes for incoming multimodal inputs, including images. These hashes drive cache lookups that reuse precomputed embeddings across inference requests.

The original implementation called Image.tobytes() on PIL.Image.Image objects. This method emits only the raw pixel buffer and omits dimensional metadata. An attacker can craft two images with identical pixel byte streams but different shapes, for example a 30x100 image and a 100x30 image. Both images hash to the same value and collide in the cache.

When a collision occurs, vLLM may return cached embeddings or model outputs computed for a different user's image. The result is incorrect inference output and potential exposure of another tenant's data in shared serving environments.

Root Cause

The root cause is incomplete object serialization for hashing. The serializer ignored shape, dtype, and color mode when converting images to bytes. Numeric scalars and NumPy arrays were also serialized without type or shape context. Hash equality therefore did not imply semantic equality of the inputs.

Attack Vector

The attack vector is network-based and requires no authentication when the vLLM endpoint is exposed. An attacker submits crafted images to the inference API. If a colliding cache entry exists, the server returns embeddings or outputs derived from another request's input. Exploitation does not require code execution or privilege escalation on the vLLM host.

python

# Patch from vllm/multimodal/hasher.py (vLLM 0.9.0)
             return obj.encode("utf-8")
         if isinstance(obj, bytes):
             return obj
-        if isinstance(obj, Image.Image):
-            return obj.tobytes()
+        if isinstance(obj, (int, float)):
+            return np.array(obj).tobytes()

-        # Convertible to NumPy arrays
+        if isinstance(obj, Image.Image):
+            return cls.item_to_bytes("image", np.array(obj.convert("RGBA")))
         if isinstance(obj, torch.Tensor):
-            obj = obj.numpy()
-        if isinstance(obj, (int, float)):
-            obj = np.array(obj)
+            return cls.item_to_bytes("tensor", obj.numpy())
         if isinstance(obj, np.ndarray):
-            return obj.tobytes()
+            return cls.item_to_bytes(
+                "ndarray", {
+                    "dtype": obj.dtype.str,
+                    "shape": obj.shape,
+                    "data": obj.data.tobytes(),
+                })

Source: vLLM Security Patch Commit 99404f5

The patch converts images to RGBA NumPy arrays and serializes ndarrays with explicit dtype and shape fields. This guarantees that different image dimensions or color modes produce different hash inputs.

Detection Methods for CVE-2025-46722

Indicators of Compromise

Unexpected or duplicated inference responses returned to different multimodal API requests
Anomalously high cache hit rates on the vLLM multimodal cache for distinct input images
API logs showing requests with images of differing dimensions producing identical processing times consistent with cached responses
vLLM server versions reported between 0.7.0 and 0.8.x in deployment manifests or pip show vllm output

Detection Strategies

Inventory running vLLM instances and compare installed versions against the fixed release 0.9.0
Review container images, Kubernetes pods, and Python environments for the vulnerable vllm/multimodal/hasher.py implementation
Audit application logs for multimodal inference requests where output content does not correspond to the submitted image
Test deployments with two images that share pixel bytes but differ in shape and verify that responses differ

Monitoring Recommendations

Monitor vLLM cache metrics for abnormal hit ratios that may indicate collision-driven reuse
Log image metadata (dimensions, mode, hash) per request to support post-incident correlation
Alert on outbound responses containing data unrelated to the requesting tenant in multi-tenant inference platforms
Track network access to vLLM API endpoints and restrict exposure to trusted clients only

How to Mitigate CVE-2025-46722

Immediate Actions Required

Upgrade vLLM to version 0.9.0 or later, which contains the fix in commit 99404f5
Identify all vLLM deployments running versions 0.7.0 through 0.8.x and schedule remediation
Flush existing multimodal caches after upgrading to remove entries computed with the vulnerable hash function
Restrict network access to vLLM inference endpoints until patching is complete

Patch Information

The fix is included in vLLM 0.9.0 via pull request vllm-project/vllm#17378 and commit 99404f53c72965b41558aceb1bc2380875f5d848. Full details are available in GitHub Security Advisory GHSA-c65p-x677-fgj6. The patch reroutes image serialization through np.array(obj.convert("RGBA")) and adds dtype and shape fields to ndarray serialization.

Workarounds

Disable the multimodal input cache on vLLM serving instances if upgrading is not immediately feasible
Place vLLM behind an authenticated reverse proxy and segregate tenants into isolated inference instances
Pre-normalize all incoming images to a fixed canonical size and mode before submission to vLLM

bash

# Upgrade vLLM to the patched release
pip install --upgrade "vllm>=0.9.0"

# Verify installed version
python -c "import vllm; print(vllm.__version__)"

# Optional: clear multimodal cache directory after upgrade
rm -rf ~/.cache/vllm/multimodal/*

CVE-2025-46722 Overview

Critical Impact
Hash collisions in the multimodal cache can return another user's cached inference results, leading to cross-request data leakage in shared vLLM deployments.

Affected Products

vLLM versions 0.7.0 through 0.8.x
vLLM multimodal inference deployments using MultiModalHasher
LLM serving infrastructure relying on PIL image caching in vLLM

Discovery Timeline

2025-05-29 - CVE-2025-46722 published to NVD
2025-06-24 - Last updated in NVD database

Technical Details for CVE-2025-46722

Vulnerability Analysis

Root Cause

Attack Vector

python

# Patch from vllm/multimodal/hasher.py (vLLM 0.9.0)
             return obj.encode("utf-8")
         if isinstance(obj, bytes):
             return obj
-        if isinstance(obj, Image.Image):
-            return obj.tobytes()
+        if isinstance(obj, (int, float)):
+            return np.array(obj).tobytes()

-        # Convertible to NumPy arrays
+        if isinstance(obj, Image.Image):
+            return cls.item_to_bytes("image", np.array(obj.convert("RGBA")))
         if isinstance(obj, torch.Tensor):
-            obj = obj.numpy()
-        if isinstance(obj, (int, float)):
-            obj = np.array(obj)
+            return cls.item_to_bytes("tensor", obj.numpy())
         if isinstance(obj, np.ndarray):
-            return obj.tobytes()
+            return cls.item_to_bytes(
+                "ndarray", {
+                    "dtype": obj.dtype.str,
+                    "shape": obj.shape,
+                    "data": obj.data.tobytes(),
+                })

Source: vLLM Security Patch Commit 99404f5

Detection Methods for CVE-2025-46722

Indicators of Compromise

Unexpected or duplicated inference responses returned to different multimodal API requests
Anomalously high cache hit rates on the vLLM multimodal cache for distinct input images
API logs showing requests with images of differing dimensions producing identical processing times consistent with cached responses
vLLM server versions reported between 0.7.0 and 0.8.x in deployment manifests or pip show vllm output

Detection Strategies

Inventory running vLLM instances and compare installed versions against the fixed release 0.9.0
Review container images, Kubernetes pods, and Python environments for the vulnerable vllm/multimodal/hasher.py implementation
Audit application logs for multimodal inference requests where output content does not correspond to the submitted image
Test deployments with two images that share pixel bytes but differ in shape and verify that responses differ

Monitoring Recommendations

Monitor vLLM cache metrics for abnormal hit ratios that may indicate collision-driven reuse
Log image metadata (dimensions, mode, hash) per request to support post-incident correlation
Alert on outbound responses containing data unrelated to the requesting tenant in multi-tenant inference platforms
Track network access to vLLM API endpoints and restrict exposure to trusted clients only

How to Mitigate CVE-2025-46722

Immediate Actions Required

Upgrade vLLM to version 0.9.0 or later, which contains the fix in commit 99404f5
Identify all vLLM deployments running versions 0.7.0 through 0.8.x and schedule remediation
Flush existing multimodal caches after upgrading to remove entries computed with the vulnerable hash function
Restrict network access to vLLM inference endpoints until patching is complete

Patch Information

Workarounds

Disable the multimodal input cache on vLLM serving instances if upgrading is not immediately feasible
Place vLLM behind an authenticated reverse proxy and segregate tenants into isolated inference instances
Pre-normalize all incoming images to a fixed canonical size and mode before submission to vLLM

bash

# Upgrade vLLM to the patched release
pip install --upgrade "vllm>=0.9.0"

# Verify installed version
python -c "import vllm; print(vllm.__version__)"

# Optional: clear multimodal cache directory after upgrade
rm -rf ~/.cache/vllm/multimodal/*

CVE-2025-46722: Vllm Information Disclosure Vulnerability

CVE-2025-46722 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2025-46722

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2025-46722

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2025-46722

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform

CVE-2025-46722: Vllm Information Disclosure Vulnerability

CVE-2025-46722 Overview

Critical Impact

Affected Products

Discovery Timeline

Technical Details for CVE-2025-46722

Vulnerability Analysis

Root Cause

Attack Vector

Detection Methods for CVE-2025-46722

Indicators of Compromise

Detection Strategies

Monitoring Recommendations

How to Mitigate CVE-2025-46722

Immediate Actions Required

Patch Information

Workarounds

Experience the World’s Most Advanced Cybersecurity Platform