fix(deps): update dependency vllm to ^0.9.0 [security] #119

dreadnode-renovate-bot · 2025-05-28T20:03:06Z

This PR contains the following updates:

Package	Type	Update	Change
vllm	extras	minor	`^0.5.0` -> `^0.9.0`

GitHub Vulnerability Alerts

CVE-2025-24357

Description

The vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It use torch.load function and weights_only parameter is default value False. There is a security warning on https://pytorch.org/docs/stable/generated/torch.load.html, when torch.load load a malicious pickle data it will execute arbitrary code during unpickling.

Impact

This vulnerability can be exploited to execute arbitrary codes and OS commands in the victim machine who fetch the pretrained repo remotely.

Note that most models now use the safetensors format, which is not vulnerable to this issue.

References

CVE-2025-25183

Summary

Maliciously constructed prompts can lead to hash collisions, resulting in prefix cache reuse, which can interfere with subsequent responses and cause unintended behavior.

Details

vLLM's prefix caching makes use of Python's built-in hash() function. As of Python 3.12, the behavior of hash(None) has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions.

Impact

The impact of a collision would be using cache that was generated using different content. Given knowledge of prompts in use and predictable hashing behavior, someone could intentionally populate the cache using a prompt known to collide with another prompt in use.

Solution

We address this problem by initializing hashes in vllm with a value that is no longer constant and predictable. It will be different each time vllm runs. This restores behavior we got in Python versions prior to 3.12.

Using a hashing algorithm that is less prone to collision (like sha256, for example) would be the best way to avoid the possibility of a collision. However, it would have an impact to both performance and memory footprint. Hash collisions may still occur, though they are no longer straight forward to predict.

To give an idea of the likelihood of a collision, for randomly generated hash values (assuming the hash generation built into Python is uniformly distributed), with a cache capacity of 50,000 messages and an average prompt length of 300, a collision will occur on average once every 1 trillion requests.

References

CVE-2025-29770

Impact

The outlines library is one of the backends used by vLLM to support structured output (a.k.a. guided decoding). Outlines provides an optional cache for its compiled grammars on the local filesystem. This cache has been on by default in vLLM. Outlines is also available by default through the OpenAI compatible API server.

The affected code in vLLM is vllm/model_executor/guided_decoding/outlines_logits_processors.py, which unconditionally uses the cache from outlines. vLLM should have this off by default and allow administrators to opt-in due to the potential for abuse.

A malicious user can send a stream of very short decoding requests with unique schemas, resulting in an addition to the cache for each request. This can result in a Denial of Service if the filesystem runs out of space.

Note that even if vLLM was configured to use a different backend by default, it is still possible to choose outlines on a per-request basis using the guided_decoding_backend key of the extra_body field of the request.

This issue applies to the V0 engine only. The V1 engine is not affected.

Patches

https://github.com/vllm-project/vllm/pull/14837

The fix is to disable this cache by default since it does not provide an option to limit its size. If you want to use this cache anyway, you may set the VLLM_V0_USE_OUTLINES_CACHE environment variable to 1.

Workarounds

There is no way to workaround this issue in existing versions of vLLM other than preventing untrusted access to the OpenAI compatible API server.

References

GHSA-ggpf-24jw-3fcw

Description

GHSA-rh4j-5rhw-hr54 reported a vulnerability where loading a malicious model could result in code execution on the vllm host. The fix applied to specify weights_only=True to calls to torch.load() did not solve the problem prior to PyTorch 2.6.0.

PyTorch has issued a new CVE about this problem: GHSA-53q9-r3pm-6pq6

This means that versions of vLLM using PyTorch before 2.6.0 are vulnerable to this problem.

Background Knowledge

When users install VLLM according to the official manual

But the version of PyTorch is specified in the requirements. txt file

So by default when the user install VLLM, it will install the PyTorch with version 2.5.1

In CVE-2025-24357, weights_only=True was used for patching, but we know this is not secure.
Because we found that using Weights_only=True in pyTorch before 2.5.1 was unsafe

Here, we use this interface to prove that it is not safe.

Fix

update PyTorch version to 2.6.0

Credit

This vulnerability was found By Ji'an Zhou and Li'shuo Song

CVE-2025-30202

Impact

In a multi-node vLLM deployment, vLLM uses ZeroMQ for some multi-node communication purposes. The primary vLLM host opens an XPUB ZeroMQ socket and binds it to ALL interfaces. While the socket is always opened for a multi-node deployment, it is only used when doing tensor parallelism across multiple hosts.

Any client with network access to this host can connect to this XPUB socket unless its port is blocked by a firewall. Once connected, these arbitrary clients will receive all of the same data broadcasted to all of the secondary vLLM hosts. This data is internal vLLM state information that is not useful to an attacker.

By potentially connecting to this socket many times and not reading data published to them, an attacker can also cause a denial of service by slowing down or potentially blocking the publisher.

Detailed Analysis

The XPUB socket in question is created here:

https://github.com/vllm-project/vllm/blob/c21b99b91241409c2fdf9f3f8c542e8748b317be/vllm/distributed/device_communicators/shm_broadcast.py#L236-L237

Data is published over this socket via MessageQueue.enqueue() which is called by MessageQueue.broadcast_object():

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/device_communicators/shm_broadcast.py#L452-L453

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/device_communicators/shm_broadcast.py#L475-L478

The MessageQueue.broadcast_object() method is called by the GroupCoordinator.broadcast_object() method in parallel_state.py:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L364-L366

The broadcast over ZeroMQ is only done if the GroupCoordinator was created with use_message_queue_broadcaster set to True:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L216-L219

The only case where GroupCoordinator is created with use_message_queue_broadcaster is the coordinator for the tensor parallelism group:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L931-L936

To determine what data is broadcasted to the tensor parallism group, we must continue tracing. GroupCoordinator.broadcast_object() is called by GroupCoordinator.broadcoast_tensor_dict():

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L489

which is called by broadcast_tensor_dict() in communication_op.py:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/communication_op.py#L29-L34

If we look at _get_driver_input_and_broadcast() in the V0 worker_base.py, we'll see how this tensor dict is formed:

https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/worker/worker_base.py#L332-L352

but the data actually sent over ZeroMQ is the metadata_list portion that is split from this tensor_dict. The tensor parts are sent via torch.distributed and only metadata about those tensors is sent via ZeroMQ.

https://github.com/vllm-project/vllm/blob/54a66e5fee4a1ea62f1e4c79a078b20668e408c6/vllm/distributed/parallel_state.py#L61-L83

Patches

https://github.com/vllm-project/vllm/pull/17197

Workarounds

Prior to the fix, your options include:

Do not expose the vLLM host to a network where any untrusted connections may reach the host.
Ensure that only the other vLLM hosts are able to connect to the TCP port used for the XPUB socket. Note that port used is random.

References

Relevant code first introduced in https://github.com/vllm-project/vllm/pull/6183

CVE-2025-46570

This issue arises from the prefix caching mechanism, which may expose the system to a timing side-channel attack.

Description

When a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). Our tests revealed that the timing differences caused by matching chunks are significant enough to be recognized and exploited.

For instance, if the victim has submitted a sensitive prompt or if a valuable system prompt has been cached, an attacker sharing the same backend could attempt to guess the victim's input. By measuring the TTFT based on prefix matches, the attacker could verify if their guess is correct, leading to potential leakage of private information.

Unlike token-by-token sharing mechanisms, vLLM’s chunk-based approach (PageAttention) processes tokens in larger units (chunks). In our tests, with chunk_size=2, the timing differences became noticeable enough to allow attackers to infer whether portions of their input match the victim's prompt at the chunk level.

Environment

GPU: NVIDIA A100 (40G)
CUDA: 11.8
PyTorch: 2.3.1
OS: Ubuntu 18.04
vLLM: v0.5.1
Configuration: We launched vLLM using the default settings and adjusted chunk_size=2 to evaluate the TTFT.

Leakage

We conducted our tests using LLaMA2-70B-GPTQ on a single device. We analyzed the timing differences when prompts shared prefixes of 2 chunks, and plotted the corresponding ROC curves. Our results suggest that timing differences can be reliably used to distinguish prefix matches, demonstrating a potential side-channel vulnerability.

Results

In our experiment, we analyzed the response time differences between cache hits and misses in vLLM's PageAttention mechanism. Using ROC curve analysis to assess the distinguishability of these timing differences, we observed the following results:

With a 1-token prefix, the ROC curve yielded an AUC value of 0.571, indicating that even with a short prefix, an attacker can reasonably distinguish between cache hits and misses based on response times.
When the prefix length increases to 8 tokens, the AUC value rises significantly to 0.99, showing that the attacker can almost perfectly identify cache hits with a longer prefix.

Fixes

https://github.com/vllm-project/vllm/pull/17045

Release Notes

vllm-project/vllm (vllm)

`v0.9.0`

Compare Source

Highlights

This release features 649 commits, from 215 contributors (82 new contributors!)

vLLM has upgraded to PyTorch 2.7! (#16859) This is a breaking change for environment dependency.
- The default wheel has been upgraded from CUDA 12.4 to CUDA 12.8. We will distribute CUDA 12.6 wheel on GitHub artifact.
- As a general rule of thumb, our CUDA version policy follow PyTorch's CUDA version policy.
Enhanced NVIDIA Blackwell support. vLLM now ships with initial set of optimized kernels on NVIDIA Blackwell with both attention and mlp.
- You can use our docker image or install FlashInfer nightly wheel pip install https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.5%2Bcu128torch2.7-cp38-abi3-linux_x86_64.whl then set VLLM_ATTENTION_BACKEND=FLASHINFER for better performance.
- Upgraded support for the new FlashInfer main branch. (#15777)
- Please checkout https://github.com/vllm-project/vllm/issues/18153 for the full roadmap
Initial DP, EP, PD support for large scale inference
- EP:
  - Permute and unpermute kernel for moe optimization (#14568)
  - Modularize fused experts and integrate PPLX kernels (#15956)
  - Refactor pplx init logic to make it modular (prepare for deepep) (#18200)
  - Add ep group and all2all interface (#18077)
- DP:
  - Decouple engine process management and comms (#15977)
- PD:
  - NIXL Integration (#17751)
  - Local attention optimization for NIXL (#18170)
  - Support multiple kv connectors (#17564)
Migrate docs from Sphinx to MkDocs (#18145, #18610, #18614, #18616. #18622, #18626, #18627, #18635, #18637, #18657, #18663, #18666, #18713)

Notable Changes

Removal of CUDA 12.4 support due to PyTorch upgrade to 2.7.
Change top_k to be disabled with 0 (still accept -1 for now) (#17773)
The seed is now set to 0 by default for V1 Engine, meaning that different vLLM runs now yield the same outputs even if temperature > 0. This does not modify the random state in user code since workers are run in separate processes unless VLLM_USE_V1_MULTIPROCESSING=0. (#17929, #18741)

Model Enhancements

Support MiMo-7B (#17433), MiniMax-VL-01 (#16328), Ovis 1.6 (#17861), Ovis 2 (#15826), GraniteMoeHybrid 4.0 (#17497), FalconH1* (#18406), LlamaGuard4 (#17315)
- Please install the development version of transformers (from source) to use Falcon-H1.
Embedding models: nomic-embed-text-v2-moe (#17785), new class of gte models (#17986)
Progress in Hybrid Memory Allocator (#17394, #17479, #17474, #17483, #17193, #17946, #17945, #17999, #18001, #18593)
DeepSeek: perf enhancement by moving more calls into cuda-graph region(#17484, #17668), Function Call (#17784), MTP in V1 (#18435)
Qwen2.5-1M: Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844)
Qwen2.5-VL speed enhancement via rotary_emb optimization (#17973)
InternVL models with Qwen2.5 backbone now support video inputs (#18499)

Performance, Production and Scaling

Support full cuda graph in v1 (#16072)
Pipeline Parallelism: MultiprocExecutor support (#14219), torchrun (#17827)
Support sequence parallelism combined with pipeline parallelism (#18243)
Async tensor parallelism using compilation pass (#17882)
Perf: Use small max_num_batched_tokens for A100 (#17885)
Fast Model Loading: Tensorizer support for V1 and LoRA (#17926)
Multi-modality: Automatically cast multi-modal input dtype before transferring device (#18756)

Security

Prevent side-channel attacks via cache salting (#17045)
Fix image hash collision in certain edge cases (#17378)
Add VLLM_ALLOW_INSECURE_SERIALIZATION env var (#17490)
Migrate to REGEX Library to prevent catastrophic backtracking (#18454, #18750)

Features

CLI: deprecated=True (#17426)
Frontend: progress bar for adding requests (#17525), chat_template_kwargs in LLM.chat (#17356), /classify endpoint (#17032), truncation control for embedding models (#14776), cached_tokens in response usage (#18149)
LoRA: default local directory LoRA resolver plugin. (#16855)
Metrics: kv event publishing (#16750), API for accessing in-memory Prometheus metrics (#17010)
Quantization: nvidia/DeepSeek-R1-FP4 (#16362), Quark MXFP4 format (#16943), AutoRound (#17850), torchao models with AOPerModuleConfig (#17826), CUDA Graph support for V1 GGUF support (#18646)
Reasoning: deprecate --enable-reasoning (#17452)
Spec Decode: EAGLE share input embedding (#17326), torch.compile & cudagraph to EAGLE (#17211), EAGLE3 (#17504), log accumulated metrics(#17913), Medusa (#17956)
Structured Outputs: Thinking compatibility (#16577), Spec Decoding (#14702), Qwen3 reasoning parser (#17466), tool_choice: required for Xgrammar (#17845), Structural Tag with Guidance backend (#17333)
Transformers backend: named parameters (#16868), interleaved sliding window attention (#18494)

Hardwares

NVIDIA: cutlass support for blackwell fp8 blockwise gemm (#14383)
TPU: Multi-LoRA implementation(#14238), default max-num-batched-tokens (#17508), V1 backend by default (#17673), top-logprobs (#17072)
Neuron: NeuronxDistributedInference support (#15970), Speculative Decoding, Dynamic on-device sampling (#16357), Mistral Model (#18222), Multi-LoRA (#18284)
AMD: Enable FP8 KV cache on V1 (#17870), Tuned fused moe config for Qwen3 MoE on MI300X (#17535, #17530), AITER biased group topk (#17955), Block-Scaled GEMM (#14968), MLA (#17523), Radeon GPU use Custom Paged Attention (#17004), reduce the number of environment variables in command line (#17229)
Extensibility: Make PiecewiseBackend pluggable and extendable (#18076)

Documentation

Update quickstart and install for cu128 using --torch-backend=auto (#18505)
NVIDIA TensorRT Model Optimizer (#17561)
Usage of Qwen3 thinking (#18291)

Developer Facing

Benchmark: Add single turn MTBench to Serving Bench (#17202)
Usability: Decrease import time of vllm.multimodal (#18031)
Code Format: Code formatting using ruff format (#17656, #18068, #18400)
Readability:
- Configuration and arguments unification is now complete! (#17130, #17453, #17562)
- Update deprecated type hinting from Python 3.7 (#18056, #18130, #18132, #18129, #18073, #18072, #18126, #18128, #18057, #18058)
Process:
- Propose a deprecation policy for the project (#17063)
Testing: expanding torch nightly tests (#18004)

What's Changed

Support loading transformers models with named parameters by @wuisawesome in https://github.com/vllm-project/vllm/pull/16868
Add tuned triton fused_moe configs for Qwen3Moe by @mgoin in https://github.com/vllm-project/vllm/pull/17328
[Benchmark] Add single turn MTBench to Serving Bench by @ekagra-ranjan in https://github.com/vllm-project/vllm/pull/17202
[Optim] Compute multimodal hash only once per item by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/17314
implement Structural Tag with Guidance backend by @mmoskal in https://github.com/vllm-project/vllm/pull/17333
[V1][Spec Decode] Make Eagle model arch config driven by @ekagra-ranjan in https://github.com/vllm-project/vllm/pull/17323
[model] make llama4 compatible with pure dense layers by @luccafong in https://github.com/vllm-project/vllm/pull/17315
[Bugfix] Fix numel() downcast in fused_layernorm_dynamic_per_token_quant.cu by @r-barnes in https://github.com/vllm-project/vllm/pull/17316
Ignore '<string>' filepath by @zou3519 in https://github.com/vllm-project/vllm/pull/17330
[Bugfix] Add contiguous call inside rope kernel wrapper by @timzsu in https://github.com/vllm-project/vllm/pull/17091
[Misc] Add a Jinja template to support Mistral3 function calling by @chaunceyjiang in https://github.com/vllm-project/vllm/pull/17195
[Model] support MiniMax-VL-01 model by @qscqesze in https://github.com/vllm-project/vllm/pull/16328
[Misc] Move config fields to MultiModalConfig by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/17343
[Misc]Use a platform independent interface to obtain the device attributes by @ponix-j in https://github.com/vllm-project/vllm/pull/17100
[Fix] Documentation spacing in compilation config help text by @Zerohertz in https://github.com/vllm-project/vllm/pull/17342
[Build][Bugfix] Restrict setuptools version to <80 by @gshtras in https://github.com/vllm-project/vllm/pull/17320
[Model] Ignore rotary embed load for Cohere model by @ekagra-ranjan in https://github.com/vllm-project/vllm/pull/17319
Update docs requirements by @hmellor in https://github.com/vllm-project/vllm/pull/17379
[Doc] Fix QWen3MOE info by @jeejeelee in https://github.com/vllm-project/vllm/pull/17381
[Bugfix] Clean up MiniMax-VL and fix processing by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/17354
pre-commit autoupdate by @hmellor in https://github.com/vllm-project/vllm/pull/17380
[Frontend] Support chat_template_kwargs in LLM.chat by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/17356
Transformers backend tweaks by @hmellor in https://github.com/vllm-project/vllm/pull/17365
Fix: Spelling of inference by @a2q1p in https://github.com/vllm-project/vllm/pull/17387
Improve literal dataclass field conversion to argparse argument by @hmellor in https://github.com/vllm-project/vllm/pull/17391
[V1] Remove num_input_tokens from attn_metadata by @heheda12345 in https://github.com/vllm-project/vllm/pull/17193
[Bugfix] add qwen3 reasoning-parser fix content is None when disable … by @mofanke in https://github.com/vllm-project/vllm/pull/17369
fix gemma3 results all zero by @mayuyuace in https://github.com/vllm-project/vllm/pull/17364
[Misc][ROCm] Exclude cutlass_mla_decode for ROCm build by @tywuAMD in https://github.com/vllm-project/vllm/pull/17289
Enabling multi-group kernel tests. by @Alexei-V-Ivanov-AMD in https://github.com/vllm-project/vllm/pull/17115
[Docs] Propose a deprecation policy for the project by @russellb in https://github.com/vllm-project/vllm/pull/17063
[Doc][Typo] Fixing label in new model requests link in overview.md by @casinca in https://github.com/vllm-project/vllm/pull/17400
[TPU][V1][CI] Replace python3 setup.py develop with standard pip install --e on TPU by @NickLucche in https://github.com/vllm-project/vllm/pull/17374
[CI] Uses Python 3.11 for TPU by @aarnphm in https://github.com/vllm-project/vllm/pull/17359
[CI/Build] Add retry mechanism for add-apt-repository by @reidliu41 in https://github.com/vllm-project/vllm/pull/17107
[Bugfix] Fix Minicpm-O-int4 GPTQ model inference by @Isotr0py in https://github.com/vllm-project/vllm/pull/17397
Simplify (and fix) passing of guided decoding backend options by @hmellor in https://github.com/vllm-project/vllm/pull/17008
Remove Falcon3 2x7B from CI by @hmellor in https://github.com/vllm-project/vllm/pull/17404
Fix: Python package installation for opentelmetry by @dilipgb in https://github.com/vllm-project/vllm/pull/17049
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE by @luyuzhe111 in https://github.com/vllm-project/vllm/pull/17211
Remove Bamba 9B from CI by @hmellor in https://github.com/vllm-project/vllm/pull/17407
[V1][Feature] Enable Speculative Decoding with Structured Outputs by @benchislett in https://github.com/vllm-project/vllm/pull/14702
[release] Always git fetch all to get latest tag on TPU release by @khluu in https://github.com/vllm-project/vllm/pull/17322
Truncation control for embedding models by @gmarinho2 in https://github.com/vllm-project/vllm/pull/14776
Update PyTorch to 2.7.0 by @huydhn in https://github.com/vllm-project/vllm/pull/16859
Improve configs - ModelConfig by @hmellor in https://github.com/vllm-project/vllm/pull/17130
Fix call to logger.info_once by @hmellor in https://github.com/vllm-project/vllm/pull/17416
Fix some speculative decode tests with tl.dot by @huydhn in https://github.com/vllm-project/vllm/pull/17371
Support LoRA for Mistral3 by @mgoin in https://github.com/vllm-project/vllm/pull/17428
[Intel GPU] [CI]Fix XPU ci, setuptools >=80.0 have build issue by @jikunshang in https://github.com/vllm-project/vllm/pull/17298
[Hardware][Intel GPU] Upgrade to torch 2.7 by @jikunshang in https://github.com/vllm-project/vllm/pull/17444
[Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' by @chaunceyjiang in https://github.com/vllm-project/vllm/pull/17434
[MODEL ADDITION] Ovis2 Model Addition by @mlinmg in https://github.com/vllm-project/vllm/pull/15826
Make the _apply_rotary_emb compatible with dynamo by @houseroad in https://github.com/vllm-project/vllm/pull/17435
[Misc] Remove deprecated files by @chaunceyjiang in https://github.com/vllm-project/vllm/pull/17447
[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None by @lengrongfu in https://github.com/vllm-project/vllm/pull/15755
[TPU][V1][CI] Update regression test baseline for v6 CI by @NickLucche in https://github.com/vllm-project/vllm/pull/17064
[Core] Prevent side-channel attacks via cache salting by @dr75 in https://github.com/vllm-project/vllm/pull/17045
[V1][Metrics] add support for kv event publishing by @alec-flowers in https://github.com/vllm-project/vllm/pull/16750
[Feature] The Qwen3 reasoning parser supports guided decoding by @chaunceyjiang in https://github.com/vllm-project/vllm/pull/17466
[Docs] Add command for running mypy tests from CI by @russellb in https://github.com/vllm-project/vllm/pull/17475
[Fix] Support passing args to logger by @aarnphm in https://github.com/vllm-project/vllm/pull/17425
[Bugfix] Fixed mistral tokenizer path when pointing to file by @psav in https://github.com/vllm-project/vllm/pull/17457
[V1] Allow turning off pickle fallback in vllm.v1.serial_utils by @russellb in https://github.com/vllm-project/vllm/pull/17427
[Docs] Update optimization.md doc by @mgoin in https://github.com/vllm-project/vllm/pull/17482
[BugFix] Fix authorization of openai_transcription_client.py by @hhy3 in https://github.com/vllm-project/vllm/pull/17321
[Bugfix][ROCm] Restrict ray version due to a breaking release by @gshtras in https://github.com/vllm-project/vllm/pull/17480
[doc] add install tips by @reidliu41 in https://github.com/vllm-project/vllm/pull/17373
doc: fix bug report Github template formatting by @davidxia in https://github.com/vllm-project/vllm/pull/17486
[v1][Spec Decode] Make sliding window compatible with eagle prefix caching by @heheda12345 in https://github.com/vllm-project/vllm/pull/17398
Bump Compressed Tensors version to 0.9.4 by @rahul-tuli in https://github.com/vllm-project/vllm/pull/17478
[Misc] Rename Audios -> Audio in Qwen2audio Processing by @alex-jw-brooks in https://github.com/vllm-project/vllm/pull/17507
[CI][TPU] Skip Multimodal test by @lsy323 in https://github.com/vllm-project/vllm/pull/17488
[Bugfix][ROCm] Fix import error on ROCm by @gshtras in https://github.com/vllm-project/vllm/pull/17495
[Bugfix] Temporarily disable gptq_bitblas on ROCm by @nlzy in https://github.com/vllm-project/vllm/pull/17411
[CI][TPU] Skip structured outputs+spec decode tests on TPU by @mgoin in https://github.com/vllm-project/vllm/pull/17510
[CI][Bugfix] Fix failing V1 Test due to missing 'cache_salt' arg by @mgoin in https://github.com/vllm-project/vllm/pull/17500
[CI/Build] Reorganize models tests by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/17459
FIxing the AMD test failures caused by PR#16457 by @Alexei-V-Ivanov-AMD in https://github.com/vllm-project/vllm/pull/17511
[Build] Require setuptools >= 77.0.3 for PEP 639 by @russellb in https://github.com/vllm-project/vllm/pull/17389
[ROCm] Effort to reduce the number of environment variables in command line by @hongxiayang in https://github.com/vllm-project/vllm/pull/17229
[BugFix] fix speculative decoding memory leak when speculation is disabled by @noyoshi in https://github.com/vllm-project/vllm/pull/15506
[BugFix] Fix mla cpu - missing 3 required positional arguments by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/17494
Avoid overwriting vllm_compile_cache.py by @youngkent in https://github.com/vllm-project/vllm/pull/17418
[Core] Enable IPv6 with vllm.utils.make_zmq_socket() by @russellb in https://github.com/vllm-project/vllm/pull/16506
[Misc] Optimize the Qwen3_ReasoningParser extract_reasoning_content by @chaunceyjiang in https://github.com/vllm-project/vllm/pull/17515
Improve configs - ObservabilityConfig by @hmellor in https://github.com/vllm-project/vllm/pull/17453
[Bugfix][Benchmarks] Allow benchmark of deepspeed-mii backend to select a model by @tishizaki in https://github.com/vllm-project/vllm/pull/17285
[Frontend] Show progress bar for adding requests by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/17525
[Misc] Clean up test docstrings and names by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/17521
[FEAT] [ROCm]: Add Qwen/Qwen3-30B-A3B-FP8 fused moe config for MI300X by @tjtanaa in https://github.com/vllm-project/vllm/pull/17530
Fix more broken speculative decode tests by @huydhn in https://github.com/vllm-project/vllm/pull/17450
[doc] add streamlit integration by @reidliu41 in https://github.com/vllm-project/vllm/pull/17522
[FEAT] [ROCm]: Add Qwen/Qwen3-235B-A22B-FP8 TP4 triton fused moe config by @tjtanaa in https://github.com/vllm-project/vllm/pull/17535
[Feature][Frontend]: Deprecate --enable-reasoning by @chaunceyjiang in https://github.com/vllm-project/vllm/pull/17452
[ROCm] remove unsupported archs from rocm triton flash-attention supported list by @hongxiayang in https://github.com/vllm-project/vllm/pull/17536
[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations by @SageMoore in https://github.com/vllm-project/vllm/pull/10867
[Misc] refactor example - cpu_offload_lmcache by @reidliu41 in https://github.com/vllm-project/vllm/pull/17460
[CI/Build] Remove awscli dependency by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/17532
Move the last arguments in arg_utils.py to be in their final groups by @hmellor in https://github.com/vllm-project/vllm/pull/17531
[Model] Refactor Ovis2 to support original tokenizer by @Isotr0py in https://github.com/vllm-project/vllm/pull/17537
[ROCm] update installation guide to include build aiter from source instructions by @hongxiayang in https://github.com/vllm-project/vllm/pull/17542
[Misc]add configurable cuda graph size by @CXIAAAAA in https://github.com/vllm-project/vllm/pull/17201
[Bugfix] Fix lint error by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/17547
[ROCM] Add gfx950 to the custom attention archs by @jpvillam-amd in https://github.com/vllm-project/vllm/pull/16034
Remove duplicate code from dbrx.py by @sstamenk in https://github.com/vllm-project/vllm/pull/17550
[Bug]change the position of cuda_graph_sizes in dataclasses by @CXIAAAAA in https://github.com/vllm-project/vllm/pull/17548
[Misc][Tools][Benchmark] Publish script to auto tune server parameters by @Chenyaaang in https://github.com/vllm-project/vllm/pull/17207
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE3 by @zixi-qi in https://github.com/vllm-project/vllm/pull/17504
[Bugfix] Disable gptq_bitblas for <SM80 to fix GPTQ on V100/T4 by @mgoin in https://github.com/vllm-project/vllm/pull/17541
[Doc] note that not all unit tests pass on CPU platforms by @davidxia in https://github.com/vllm-project/vllm/pull/17554
[Attention] MLA move o_proj q_proj into cuda-graph region by @LucasWilkinson in https://github.com/vllm-project/vllm/pull/17484
[CI] Actually run tests/kv_transfer/test_disagg.py in CI by @mgoin in https://github.com/vllm-project/vllm/pull/17555
Check if bitblas is installed during support check by @mgoin in https://github.com/vllm-project/vllm/pull/17572
[Misc] Continue refactoring model tests by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/17573
Fix PixtralHF missing spatial_merge_size by @mgoin in https://github.com/vllm-project/vllm/pull/17571
Add pt_load_map_location to allow loading to cuda by @jerryzh168 in https://github.com/vllm-project/vllm/pull/16869
[Bugifx] Remove TritonPlaceholder from sys.modules by @Isotr0py in https://github.com/vllm-project/vllm/pull/17317
[Core] [Bugfix] Add Input Embeddings by @qthequartermasterman in https://github.com/vllm-project/vllm/pull/15428
[BugFix] Fix Memory Leak by @robertgshaw2-redhat in https://github.com/vllm-project/vllm/pull/17567
[Misc] Rename assets for testing by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/17575
add more pytorch related tests for torch nightly by @yangw-dev in https://github.com/vllm-project/vllm/pull/17422
[doc] add the print result by @reidliu41 in https://github.com/vllm-project/vllm/pull/17584
Automatically tell users that dict args must be valid JSON in CLI by @hmellor in https://github.com/vllm-project/vllm/pull/17577
[Security] Fix image hash collision by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/17378
Support W8A8 INT8 MoE for compressed-tensors by @mgoin in https://github.com/vllm-project/vllm/pull/16745
[doc] miss result by @reidliu41 in https://github.com/vllm-project/vllm/pull/17589
[Misc] Clean up input processing by @DarkLight1337 in https://github.com/vllm-project/vllm/pull/17582
[Bugfix] fix tmp_out and exp_sums dimensions by @hliuca in https://github.com/vllm-project/vllm/pull/17438
[BugFix][Attention] Fix sliding window attention in V1 giving incorrect results by @LucasWilkinson in [https://github.com/[BugFix][Attention] Fix sliding window attention in V1 giving incorrect results vllm-project/vllm#17574](https:/

Configuration

📅 Schedule: Branch creation - "" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Enabled.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR has been generated by Renovate Bot.

| datasource | package | from | to | | ---------- | ------- | ----- | ----- | | pypi | vllm | 0.5.5 | 0.9.0 |

dreadnode-renovate-bot · 2025-05-28T20:03:08Z

⚠️ Artifact update problem

Renovate failed to update an artifact related to this branch. You probably do not want to merge this PR as-is.

♻ Renovate will retry this branch, including artifacts, only when one of the following happens:

any of the package files in this branch needs updating, or
the branch becomes conflicted, or
you click the rebase/retry checkbox if found above, or
you rename this PR's title to start with "rebase!" to trigger it manually

The artifact failure details are included below:

File name: poetry.lock

Updating dependencies
Resolving dependencies...

Creating virtualenv rigging-zv1ltd1o-py3.13 in /home/ubuntu/.cache/pypoetry/virtualenvs

The current project's supported Python range (>=3.10,<4.0) is not compatible with some of the required packages Python requirement:
  - vllm requires Python <3.13,>=3.9, so it will not be installable for Python >=3.13,<4.0

Because no versions of vllm match >0.9.0,<0.10.0
 and vllm (0.9.0) requires Python <3.13,>=3.9, vllm is forbidden.
So, because rigging depends on vllm (^0.9.0), version solving failed.

  * Check your dependencies Python requirement: The Python requirement can be specified via the `python` or `markers` properties

    For vllm, a possible solution would be to set the `python` property to ">=3.10,<3.13"

    https://python-poetry.org/docs/dependency-specification/#python-restricted-dependencies,
    https://python-poetry.org/docs/dependency-specification/#using-environment-markers

fix(deps): update dependency vllm to ^0.9.0 [security]

4632d1a

| datasource | package | from | to | | ---------- | ------- | ----- | ----- | | pypi | vllm | 0.5.5 | 0.9.0 |

dreadnode-renovate-bot bot requested a review from a team as a code owner May 28, 2025 20:03

dreadnode-renovate-bot bot added type/digest Dependency digest updates area/python Changes to Python package configuration and dependencies labels May 28, 2025

monoxgas closed this Jun 8, 2025

dreadnode-renovate-bot bot deleted the renovate/pypi-vllm-vulnerability branch June 8, 2025 20:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(deps): update dependency vllm to ^0.9.0 [security] #119

fix(deps): update dependency vllm to ^0.9.0 [security] #119

Uh oh!

dreadnode-renovate-bot bot commented May 28, 2025 •

edited

Loading

Uh oh!

dreadnode-renovate-bot bot commented May 28, 2025

Uh oh!

Uh oh!

fix(deps): update dependency vllm to ^0.9.0 [security] #119

fix(deps): update dependency vllm to ^0.9.0 [security] #119

Uh oh!

Conversation

dreadnode-renovate-bot bot commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GitHub Vulnerability Alerts

CVE-2025-24357

Description

Impact

References

CVE-2025-25183

Summary

Details

Impact

Solution

References

CVE-2025-29770

Impact

Patches

Workarounds

References

GHSA-ggpf-24jw-3fcw

Description

Background Knowledge

Fix

Credit

CVE-2025-30202

Impact

Detailed Analysis

Patches

Workarounds

References

CVE-2025-46570

Description

Environment

Leakage

Results

Fixes

Release Notes

v0.9.0

Highlights

Notable Changes

Model Enhancements

Performance, Production and Scaling

Security

Features

Hardwares

Documentation

Developer Facing

What's Changed

Configuration

Uh oh!

dreadnode-renovate-bot bot commented May 28, 2025

⚠️ Artifact update problem

File name: poetry.lock

Uh oh!

Uh oh!

dreadnode-renovate-bot bot commented May 28, 2025 •

edited

Loading

`v0.9.0`