Skip to content

Commit 6ae196d

Browse files
authored
[ci, vllm] chore: update vllm-omni 0.18.0 official release and Miscellaneous (#5809)
### What does this PR do? > Add **concise** overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review. - **vLLM / vllm-omni 0.18.0** official release in the vLLM-Omni CI workflow (replacing a git SHA install), adds TP support. - Aligns the FlowGRPO `QwenImagePipelineWithLogProb` example with upstream’s test pipeline `__init__` pattern [`in vllm-omni`](https://github.com/vllm-project/vllm-omni/blob/v0.18.0/tests/e2e/offline_inference/custom_pipeline/qwen_image_pipeline_with_logprob.py). - Updates Omni sampling tests to use `true_cfg_scale` instead of `guidance_scale` for Qwen-Image-style CFG. - Enables `tensor_model_parallel_size = 2` in the diffusion agent loop test. **Remark: `tiny-random/Qwen-Image` has config `num_attention_heads` being 1, and thus we need to manually create a tmp file based on it to properly test TP behavior.** ### Checklist Before Starting - [x] Search for similar PRs. Paste at least one query link here: ... - [x] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `veomni`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data`, `cfg`, `reward`, `fully_async`, `one_step_off` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test `tests/experimental/agent_loop/test_diffusion_agent_loop.py` and `test_vllm_omni_generate.py` updated and passed ``` tests/experimental/agent_loop/test_diffusion_agent_loop.py . [100%] ======================================== warnings summary ========================================= ../../miniforge3/envs/vllm-omni-dev/lib/python3.12/site-packages/requests/__init__.py:113 /scratch/fq9hpsac/mikecheung/miniforge3/envs/vllm-omni-dev/lib/python3.12/site-packages/requests/__init__.py:113: RequestsDependencyWarning: urllib3 (2.6.3) or chardet (6.0.0.post1)/charset_normalizer (3.4.4) doesn't match a supported version! warnings.warn( ../../miniforge3/envs/vllm-omni-dev/lib/python3.12/site-packages/ray/util/state/util.py:55 /scratch/fq9hpsac/mikecheung/miniforge3/envs/vllm-omni-dev/lib/python3.12/site-packages/ray/util/state/util.py:55: DeprecationWarning: Ray state API is no longer experimental. Please import from `ray.util.state` instead. Importing from `ray.experimental` will be deprecated in future releases. warnings.warn( <frozen importlib._bootstrap>:488 <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute <frozen importlib._bootstrap>:488 <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute ../../miniforge3/envs/vllm-omni-dev/lib/python3.12/site-packages/torch/jit/_script.py:362: 14 warnings /scratch/fq9hpsac/mikecheung/miniforge3/envs/vllm-omni-dev/lib/python3.12/site-packages/torch/jit/_script.py:362: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`. warnings.warn( tests/experimental/agent_loop/test_diffusion_agent_loop.py::test_single_turn /scratch/fq9hpsac/mikecheung/gitlocal/verl/tests/experimental/agent_loop/test_diffusion_agent_loop.py:63: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 with initialize_config_dir(config_dir=os.path.abspath("verl/trainer/config")): tests/experimental/agent_loop/test_diffusion_agent_loop.py::test_single_turn /scratch/fq9hpsac/mikecheung/miniforge3/envs/vllm-omni-dev/lib/python3.12/site-packages/ray/_private/worker.py:2052: FutureWarning: Tip: In future versions of Ray, Ray will no longer override accelerator visible devices env var if num_gpus=0 or num_gpus=None (default). To enable this behavior and turn off this error message, set RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0 warnings.warn( tests/experimental/agent_loop/test_diffusion_agent_loop.py::test_single_turn /scratch/fq9hpsac/mikecheung/miniforge3/envs/vllm-omni-dev/lib/python3.12/site-packages/pydub/utils.py:14: DeprecationWarning: 'audioop' is deprecated and slated for removal in Python 3.13 import audioop tests/experimental/agent_loop/test_diffusion_agent_loop.py::test_single_turn /scratch/fq9hpsac/mikecheung/miniforge3/envs/vllm-omni-dev/lib/python3.12/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning) tests/experimental/agent_loop/test_diffusion_agent_loop.py::test_single_turn /scratch/fq9hpsac/mikecheung/miniforge3/envs/vllm-omni-dev/lib/python3.12/site-packages/vllm_omni/entrypoints/openai/protocol/audio.py:112: PydanticDeprecatedSince20: Support for class-based `config` is deprecated, use ConfigDict instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.12/migration/ class CreateAudio(BaseModel): -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html ========================= 1 passed, 23 warnings in 105.69s (0:01:45) ========================== sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute ``` > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example NA > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. NA ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [x] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [x] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [x] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [x] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [x] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ). (If not accessible, please try [the Feishu group (飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).) - [x] If your PR is related to the `recipe` submodule, please also update the reference to the submodule commit via `git submodule update --remote` or `cd recipe && git pull origin main`.
1 parent 4b9c14f commit 6ae196d

File tree

4 files changed

+74
-79
lines changed

4 files changed

+74
-79
lines changed

.github/workflows/vllm_omni.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ jobs:
111111
pip3 install --no-deps -e .
112112
- name: Install vllm-omni
113113
run: |
114-
pip3 install git+https://github.com/vllm-project/vllm-omni.git@a90a769
114+
pip3 install 'vllm-omni==0.18.0'
115115
- name: Test vLLM Omni generate
116116
run: |
117117
ray stop --force

examples/flowgrpo_trainer/vllm_omni/pipeline_qwenimage.py

Lines changed: 1 addition & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,10 @@
1515
from typing import Any, Literal
1616

1717
import torch
18-
from diffusers.models.autoencoders.autoencoder_kl_qwenimage import AutoencoderKLQwenImage
19-
from transformers import Qwen2_5_VLForConditionalGeneration
2018
from vllm_omni.diffusion.data import DiffusionOutput, OmniDiffusionConfig
2119
from vllm_omni.diffusion.distributed.utils import get_local_device
22-
from vllm_omni.diffusion.model_loader.diffusers_loader import DiffusersPipelineLoader
2320
from vllm_omni.diffusion.models.qwen_image import QwenImagePipeline
24-
from vllm_omni.diffusion.models.qwen_image.qwen_image_transformer import QwenImageTransformer2DModel
2521
from vllm_omni.diffusion.request import OmniDiffusionRequest
26-
from vllm_omni.diffusion.utils.tf_utils import get_transformer_config_kwargs
2722

2823
from ..scheduler import FlowMatchSDEDiscreteScheduler
2924

@@ -38,19 +33,7 @@ def _maybe_to_cpu(v):
3833
# This is compatible with API of vllm-omni custom pipeline
3934
class QwenImagePipelineWithLogProb(QwenImagePipeline):
4035
def __init__(self, *, od_config: OmniDiffusionConfig, prefix: str = ""):
41-
super(QwenImagePipeline, self).__init__()
42-
self.od_config = od_config
43-
self.parallel_config = od_config.parallel_config
44-
self.weights_sources = [
45-
DiffusersPipelineLoader.ComponentSource(
46-
model_or_path=od_config.model,
47-
subfolder="transformer",
48-
revision=None,
49-
prefix="transformer.",
50-
fall_back_to_pt=True,
51-
)
52-
]
53-
36+
super().__init__(od_config=od_config, prefix=prefix)
5437
self.device = get_local_device()
5538
model = od_config.model
5639
# Check if model is a local path
@@ -59,27 +42,6 @@ def __init__(self, *, od_config: OmniDiffusionConfig, prefix: str = ""):
5942
self.scheduler = FlowMatchSDEDiscreteScheduler.from_pretrained(
6043
model, subfolder="scheduler", local_files_only=local_files_only
6144
)
62-
self.text_encoder = Qwen2_5_VLForConditionalGeneration.from_pretrained(
63-
model, subfolder="text_encoder", local_files_only=local_files_only
64-
)
65-
self.vae = AutoencoderKLQwenImage.from_pretrained(model, subfolder="vae", local_files_only=local_files_only).to(
66-
self.device
67-
)
68-
transformer_kwargs = get_transformer_config_kwargs(od_config.tf_model_config, QwenImageTransformer2DModel)
69-
70-
self.transformer = QwenImageTransformer2DModel(od_config=od_config, **transformer_kwargs)
71-
72-
self.stage = None
73-
74-
self.vae_scale_factor = 2 ** len(self.vae.temperal_downsample) if getattr(self, "vae", None) else 8
75-
# QwenImage latents are turned into 2x2 patches and packed.
76-
# This means the latent width and height has to be divisible
77-
# by the patch size. So the vae scale factor is multiplied by the patch size to account for this
78-
# self.image_processor = VaeImageProcessor(
79-
# vae_scale_factor=self.vae_scale_factor * 2
80-
# )
81-
self.prompt_template_encode_start_idx = 34
82-
self.default_sample_size = 128
8345

8446
def _get_qwen_prompt_embeds(
8547
self,

tests/experimental/agent_loop/test_diffusion_agent_loop.py

Lines changed: 69 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414
import os
15+
import shutil
16+
import tempfile
1517

1618
import numpy as np
1719
import pytest
@@ -24,49 +26,80 @@
2426
pytestmark = pytest.mark.vllm_omni
2527

2628

29+
def _create_tp_compatible_model(parent_dir, src_model_path, num_attention_heads=2):
30+
"""Copy base model and recreate transformer on-the-fly with TP-compatible head count.
31+
32+
The tiny-random Qwen-Image model has num_attention_heads=1 in its transformer config,
33+
which is not divisible by tensor_model_parallel_size=2. This helper copies the full
34+
model directory (vae, text_encoder, tokenizer, scheduler) and overwrites only the
35+
transformer component with a freshly-initialized one that has the desired head count.
36+
"""
37+
from diffusers import QwenImageTransformer2DModel
38+
39+
dst = os.path.join(parent_dir, "Qwen-Image")
40+
shutil.copytree(src_model_path, dst)
41+
42+
transformer = QwenImageTransformer2DModel(
43+
num_attention_heads=num_attention_heads,
44+
attention_head_dim=32,
45+
num_layers=2,
46+
in_channels=64,
47+
out_channels=16,
48+
patch_size=2,
49+
joint_attention_dim=32,
50+
axes_dims_rope=(8, 12, 12),
51+
guidance_embeds=False,
52+
)
53+
transformer.save_pretrained(os.path.join(dst, "transformer"))
54+
55+
return dst
56+
57+
2758
@pytest.fixture
2859
def init_config() -> DictConfig:
2960
from hydra import compose, initialize_config_dir
3061

3162
with initialize_config_dir(config_dir=os.path.abspath("verl/trainer/config")):
3263
config = compose(config_name="diffusion_trainer")
3364

34-
model_path = os.path.expanduser("~/models/tiny-random/Qwen-Image")
35-
config.actor_rollout_ref.model.path = model_path
36-
config.actor_rollout_ref.model.tokenizer_path = os.path.join(model_path, "tokenizer")
37-
config.actor_rollout_ref.rollout.name = "vllm_omni"
38-
config.actor_rollout_ref.rollout.mode = "async"
39-
config.actor_rollout_ref.rollout.enforce_eager = True
40-
config.actor_rollout_ref.rollout.n = 4
41-
config.actor_rollout_ref.rollout.num_inference_steps = 10
42-
config.actor_rollout_ref.rollout.calculate_log_probs = True
43-
config.actor_rollout_ref.rollout.agent.num_workers = 2
44-
config.actor_rollout_ref.rollout.agent.default_agent_loop = "diffusion_single_turn_agent"
45-
tokenizer_max_length = 1024
46-
prompt_template_encode_start_idx = 34
47-
max_length = tokenizer_max_length + prompt_template_encode_start_idx
48-
49-
with open_dict(config.actor_rollout_ref.model.extra_configs):
50-
config.actor_rollout_ref.model.extra_configs.true_cfg_scale = 4.0
51-
config.actor_rollout_ref.model.extra_configs.max_sequence_length = max_length
52-
config.actor_rollout_ref.model.extra_configs.noise_level = 1.0
53-
config.actor_rollout_ref.model.extra_configs.sde_window_size = 2
54-
config.actor_rollout_ref.model.extra_configs.sde_window_range = [0, 5]
55-
56-
config.actor_rollout_ref.rollout.nnodes = 1
57-
58-
qwen_pipeline = "examples.flowgrpo_trainer.vllm_omni.pipeline_qwenimage.QwenImagePipelineWithLogProb"
59-
config.actor_rollout_ref.rollout.engine_kwargs.vllm_omni = {"custom_pipeline": qwen_pipeline}
60-
config.reward.reward_manager.name = "image"
61-
config.trainer.n_gpus_per_node = 4
62-
63-
config.data.apply_chat_template_kwargs = dict(max_length=max_length, padding=True, truncation=True)
64-
config.data.max_prompt_length = max_length
65-
config.actor_rollout_ref.rollout.max_model_len = max_length
66-
67-
# TODO (mike): test with TP later
68-
config.actor_rollout_ref.rollout.tensor_model_parallel_size = 1
69-
return config
65+
base_model_path = os.path.expanduser("~/models/tiny-random/Qwen-Image")
66+
with tempfile.TemporaryDirectory() as tmp_dir:
67+
model_path = _create_tp_compatible_model(tmp_dir, base_model_path, num_attention_heads=2)
68+
config.actor_rollout_ref.model.path = model_path
69+
config.actor_rollout_ref.model.tokenizer_path = os.path.join(model_path, "tokenizer")
70+
config.actor_rollout_ref.rollout.name = "vllm_omni"
71+
config.actor_rollout_ref.rollout.mode = "async"
72+
config.actor_rollout_ref.rollout.enforce_eager = True
73+
config.actor_rollout_ref.rollout.n = 4
74+
config.actor_rollout_ref.rollout.num_inference_steps = 10
75+
config.actor_rollout_ref.rollout.calculate_log_probs = True
76+
config.actor_rollout_ref.rollout.agent.num_workers = 2
77+
config.actor_rollout_ref.rollout.agent.default_agent_loop = "diffusion_single_turn_agent"
78+
tokenizer_max_length = 1024
79+
prompt_template_encode_start_idx = 34
80+
max_length = tokenizer_max_length + prompt_template_encode_start_idx
81+
82+
with open_dict(config.actor_rollout_ref.model.extra_configs):
83+
config.actor_rollout_ref.model.extra_configs.true_cfg_scale = 4.0
84+
config.actor_rollout_ref.model.extra_configs.max_sequence_length = max_length
85+
config.actor_rollout_ref.model.extra_configs.noise_level = 1.0
86+
config.actor_rollout_ref.model.extra_configs.sde_window_size = 2
87+
config.actor_rollout_ref.model.extra_configs.sde_window_range = [0, 5]
88+
89+
config.actor_rollout_ref.rollout.nnodes = 1
90+
91+
qwen_pipeline = "examples.flowgrpo_trainer.vllm_omni.pipeline_qwenimage.QwenImagePipelineWithLogProb"
92+
config.actor_rollout_ref.rollout.engine_kwargs.vllm_omni = {"custom_pipeline": qwen_pipeline}
93+
config.reward.reward_manager.name = "image"
94+
config.trainer.n_gpus_per_node = 4
95+
96+
config.data.apply_chat_template_kwargs = dict(max_length=max_length, padding=True, truncation=True)
97+
config.data.max_prompt_length = max_length
98+
config.actor_rollout_ref.rollout.max_model_len = max_length
99+
100+
config.actor_rollout_ref.rollout.tensor_model_parallel_size = 2
101+
102+
yield config
70103

71104

72105
def test_single_turn(init_config):

tests/workers/rollout/rollout_vllm/test_vllm_omni_generate.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ def test_generate(init_server):
160160
prompt_ids=prompt_ids,
161161
sampling_params={
162162
"num_inference_steps": 10,
163-
"guidance_scale": 4.0,
163+
"true_cfg_scale": 4.0,
164164
"height": 512,
165165
"width": 512,
166166
},
@@ -195,7 +195,7 @@ def test_generate_with_logprobs(init_server):
195195
prompt_ids=prompt_ids,
196196
sampling_params={
197197
"num_inference_steps": 10,
198-
"guidance_scale": 4.0,
198+
"true_cfg_scale": 4.0,
199199
"height": 512,
200200
"width": 512,
201201
"logprobs": True,
@@ -244,7 +244,7 @@ def test_generate_concurrent(init_server):
244244
prompt_ids=_tokenize_prompt(prompts[i]),
245245
sampling_params={
246246
"num_inference_steps": 10,
247-
"guidance_scale": 4.0,
247+
"true_cfg_scale": 4.0,
248248
"height": 512,
249249
"width": 512,
250250
},

0 commit comments

Comments
 (0)