[Core] Supports stage abstraction in the diffusion model by fake0fan · Pull Request #391 · vllm-project/vllm-omni

fake0fan · 2025-12-20T13:33:02Z

To align with our initial vision and for future overall optimization, we gradually began to provide Stage abstractions for Diffusion.

[Core] Add Stage Abstraction Support for Diffusion Models

Overview

This PR adds stage abstraction support for the diffusion model component of vLLM-Omni, achieving a consistent architectural design with LLM models. It also includes code refactoring to unify the sampling parameter interface, improving code maintainability and extensibility.

Major Changes

1. Code Refactoring

Refactored entry point code structure: Refactored and integrated LLM-related code from omni_llm.py into omni.py, unifying entry point management
Enhanced Stage abstraction: Extended omni_stage.py to support stage configuration and management for diffusion models

2. Diffusion Stage Abstraction Support

New unified sampling parameter class (omni_sampling_params.py):
- Created OmniSamplingParams class to uniformly manage sampling parameters for both LLM and diffusion models
- Supports LLM parameters (temperature, top_p, top_k, etc.) and diffusion parameters (num_inference_steps, guidance_scale, etc.)
- Provides conversion methods with vLLM SamplingParams
Extended Diffusion Engine:
- Updated diffusion_engine.py to support stage abstraction
- Enhanced stage support in gpu_worker.py
Updated configuration system:
- Added configuration file: QwenImagePipeline.yaml
- Updated stage configuration files for multiple models

3. Example Updates

Updated text_to_image.py example to demonstrate how to use the new stage abstraction interface

4. Outputs

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-12-20T13:38:51Z

vllm_omni/entrypoints/omni.py

@@ -33,25 +69,465 @@ def __init__(self, *args, **kwargs):
            args[0] = model


Avoid assigning to immutable args tuple in Omni init

When Omni is instantiated with the model passed positionally (e.g. Omni("Qwen/Qwen-Image")), the constructor assigns to args[0], but args is a tuple, so the assignment raises TypeError: 'tuple' object does not support item assignment before any initialization occurs. This makes the new entrypoint unusable for positional calls that previously worked with OmniLLM; callers must now pass model as a keyword or hit a hard crash.

Useful? React with 👍 / 👎.

ZJY0516 · 2025-12-20T14:52:02Z

looking forward to it

hsliuustc0106 · 2025-12-20T15:29:03Z

let's get the initial version done before 1230 release

ZJY0516 · 2025-12-22T03:25:53Z

Does this pr support reuse vllm as text encoding stage for diffusion models?

fake0fan · 2025-12-22T03:31:12Z

Does this pr support reuse vllm as text encoding stage for diffusion models?

Not yet.

This PR only encapsulates the entire diffusion model into a single stage first.

vllm_omni/model_executor/stage_configs/QwenImagePipeline.yaml

princepride · 2025-12-22T05:57:36Z

Does this PR mean that all models under diffusion folder can be deployed using YAML?

fake0fan · 2025-12-23T02:03:39Z

Does this PR mean that all models under diffusion folder can be deployed using YAML?

Through some offline discussions, we decided that this version will not require providing a yaml file for the Diffusion model. Instead, the system will automatically generate a YAML file for the current Diffusion model.

princepride · 2025-12-23T02:10:07Z

In which group you are discussing? Can you add me?

erfgss · 2025-12-23T02:55:57Z

Fixes #340

examples/online_serving/text_to_image/gradio_demo.py

vllm_omni/entrypoints/async_omni_llm.py

hsliuustc0106 · 2025-12-26T06:24:23Z

vllm_omni/entrypoints/omni.py

+                if "dtype" in kwargs:
+                    kwargs["dtype"] = str(kwargs["dtype"])
+                # TODO: hack, calculate devices based on parallel config.
+                devices = "0"


this may bring some problems but we can leave it later

hsliuustc0106 · 2025-12-26T06:33:23Z