Skip to content

Commit 58fa1d9

Browse files
Bounty-hunterhsliuustc0106SamitHuang
authored andcommitted
[Doc] Adding diffusion model (vllm-project#524)
Signed-off-by: dengyunyang <584797741@qq.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Co-authored-by: Samit <285365963@qq.com>
1 parent 19f349d commit 58fa1d9

File tree

4 files changed

+159
-2
lines changed

4 files changed

+159
-2
lines changed

docs/.nav.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ nav:
4141
- Model Implementation:
4242
- contributing/model/README.md
4343
- contributing/model/adding_multi_stage_model.md
44+
- contributing/model/adding_diffusion_model.md
4445
- CI: contributing/ci
4546
- Tests: contributing/tests
4647
- Design Documents:

docs/contributing/model/README.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This section provides comprehensive guidance on how to add a new model to vLLM-O
44

55
## Documentation
66

7-
- **[Adding a New Model Guide](adding_multi_stage_model.md)**: Complete step-by-step guide using Qwen3-Omni as an example
7+
- **[Adding a New Omni Model Guide](adding_multi_stage_model.md)**: Complete step-by-step guide using Qwen3-Omni as an example
88

99
The guide covers:
1010
- Directory structure and organization
@@ -15,6 +15,15 @@ The guide covers:
1515
- Stage input processors
1616
- Testing strategies
1717

18+
- **[Adding a New Diffusion Model Guide](adding_diffusion_model.md)**: Complete step-by-step guide using Qwen/Qwen-Image-Edit as an example
19+
The guide covers:
20+
- Overview
21+
- Directory Structure
22+
- Step-by-Step Implementation
23+
- Testing
24+
25+
26+
1827
## Quick Start
1928

20-
For a quick reference, see the [Adding a New Model Guide](adding_multi_stage_model.md) which walks through the complete implementation of Qwen3-Omni, a multi-stage omni-modality model.
29+
For a quick reference, see the [Adding a New Omni Model Guide](adding_multi_stage_model.md) and [Adding a New Diffusion Model Guide](adding_diffusion_model.md).
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# Adding a New Diffusion Model to vLLM-Omni
2+
This guide walks through the process of adding a new Diffusion model to vLLM-Omni, using Qwen/Qwen-Image-Edit as a comprehensive example.
3+
4+
# Table of Contents
5+
1. [Overview](#overview)
6+
2. [Directory Structure](#directory-structure)
7+
3. [Step-by-Step Implementation](#step-by-step-implementation)
8+
4. [Testing](#testing)
9+
5. [Adding a Model Recipe](#adding-a-model-recipe)
10+
11+
12+
# Overview
13+
When add a new diffusion model into vLLM-Omni, additional adaptation work is required due to the following reasons:
14+
15+
+ New model must follow the framework’s parameter passing mechanisms and inference flow.
16+
17+
+ Replacing the model’s default implementations with optimized modules, which is necessary to achieve the better performance.
18+
19+
The diffusion execution flow as follow:
20+
<p align="center">
21+
<picture>
22+
<source media="(prefers-color-scheme: dark)" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/vllm-omni-diffusion-flow.png">
23+
<img alt="Diffusion Flow" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/vllm-omni-diffusion-flow.png" width=55%>
24+
</picture>
25+
</p>
26+
27+
28+
# Directory Structure
29+
File Structure for Adding a New Diffusion Model
30+
31+
```
32+
vllm_omni/
33+
└── examples/
34+
└──offline_inference
35+
└── example script # reuse existing if possible (e.g., image_edit.py)
36+
└──online_serving
37+
└── example script
38+
└── diffusion/
39+
└── registry.py # Registry work
40+
├── request.py # Request Info
41+
└── models/your_model_name/ # Model directory (e.g., qwen_image)
42+
└── pipeline_xxx.py # Model implementation (e.g., pipeline_qwen_image_edit.py)
43+
```
44+
45+
# Step-by-step-implementation
46+
## Step 1: Model Implementation
47+
The diffusion pipeline’s implementation follows **HuggingFace Diffusers**, and components that do not need modification can be imported directly.
48+
### 1.1 Define the Pipeline Class
49+
Define the pipeline class, e.g., `QwenImageEditPipeline`, and initialize all required submodules, either from HuggingFace `diffusers` or custom implementations. In `QwenImageEditPipeline`, only `QwenImageTransformer2DModel` is re-implemented to support optimizations such as Ulysses-SP. When adding new models in the future, you can either reuse this re-implemented `QwenImageTransformer2DModel` or extend it as needed.
50+
51+
### 1.2 Pre-Processing and Post-Processing Extraction
52+
Extract the pre-processing and post-processing logic from the pipeline class to follow vLLM-Omni’s execution flow. For Qwen-Image-Edit:
53+
```python
54+
def get_qwen_image_edit_pre_process_func(
55+
od_config: OmniDiffusionConfig,
56+
):
57+
"""
58+
Define a pre-processing function that resizes input images and
59+
pre-process for subsequent inference.
60+
"""
61+
```
62+
63+
```python
64+
def get_qwen_image_edit_post_process_func(
65+
od_config: OmniDiffusionConfig,
66+
):
67+
"""
68+
Defines a post-processing function that post-process images.
69+
"""
70+
```
71+
72+
### 1.3 Define the forward function
73+
The forward function of `QwenImageEditPipeline` follows the HuggingFace `diffusers` design for the most part. The key differences are:
74+
+ As described in the overview, arguments are passed through `OnniDiffusionRequest`, so we need to get user parameters from it accordingly.
75+
```python
76+
prompt = req.prompt if req.prompt is not None else prompt
77+
```
78+
+ pre/post-processing are handled by the framework elsewhere, so skip them.
79+
80+
## Step 2: Extend OmniDiffusionRequest Fields
81+
User-provided inputs are ultimately passed to the model’s forward method through OmniDiffusionRequest, so we add the required fields here to support the new model.
82+
```python
83+
prompt: str | list[str] | None = None
84+
negative_prompt: str | list[str] | None = None
85+
...
86+
```
87+
88+
## Step 3: Registry
89+
+ registry diffusion model in registry.py
90+
```python
91+
_DIFFUSION_MODELS = {
92+
# arch:(mod_folder, mod_relname, cls_name)
93+
...
94+
"QwenImageEditPipeline": (
95+
"qwen_image",
96+
"pipeline_qwen_image_edit",
97+
"QwenImageEditPipeline",
98+
),
99+
...
100+
}
101+
```
102+
+ registry pre-process get function
103+
```python
104+
_DIFFUSION_PRE_PROCESS_FUNCS = {
105+
# arch: pre_process_func
106+
...
107+
"QwenImageEditPipeline": "get_qwen_image_edit_pre_process_func",
108+
...
109+
}
110+
```
111+
112+
+ registry post-process get function
113+
```python
114+
_DIFFUSION_POST_PROCESS_FUNCS = {
115+
# arch: post_process_func
116+
...
117+
"QwenImageEditPipeline": "get_qwen_image_edit_post_process_func",
118+
...
119+
}
120+
```
121+
122+
## Step 4: Add an Example Script
123+
For each newly integrated model, we need to provide examples script under the examples/ to demonstrate how to initialize the pipeline with Omni, pass in user inputs, and generate outputs.
124+
Key point for writing the example:
125+
126+
+ Use the Omni entrypoint to load the model and construct the pipeline.
127+
128+
+ Show how to format user inputs and pass them via omni.generate(...).
129+
130+
+ Demonstrate the common runtime arguments, such as:
131+
132+
+ model path or model name
133+
134+
+ input image(s) or prompt text
135+
136+
+ key diffusion parameters (e.g., inference steps, guidance scale)
137+
138+
+ optional acceleration backends (e.g., Cache-DiT, TeaCache)
139+
140+
+ Save or display the generated results so users can validate the integration.
141+
142+
# Testing
143+
For comprehensive testing guidelines, please refer to the [Test File Structure and Style Guide](../tests/tests_style.md).
144+
145+
146+
## Adding a Model Recipe
147+
After implementing and testing your model, please add a model recipe to the [vllm-project/recipes](https://github.com/vllm-project/recipes) repository. This helps other users understand how to use your model with vLLM-Omni.
316 KB
Loading

0 commit comments

Comments
 (0)