|
| 1 | +# Adding a New Diffusion Model to vLLM-Omni |
| 2 | +This guide walks through the process of adding a new Diffusion model to vLLM-Omni, using Qwen/Qwen-Image-Edit as a comprehensive example. |
| 3 | + |
| 4 | +# Table of Contents |
| 5 | +1. [Overview](#overview) |
| 6 | +2. [Directory Structure](#directory-structure) |
| 7 | +3. [Step-by-Step Implementation](#step-by-step-implementation) |
| 8 | +4. [Testing](#testing) |
| 9 | +5. [Adding a Model Recipe](#adding-a-model-recipe) |
| 10 | + |
| 11 | + |
| 12 | +# Overview |
| 13 | +When add a new diffusion model into vLLM-Omni, additional adaptation work is required due to the following reasons: |
| 14 | + |
| 15 | ++ New model must follow the framework’s parameter passing mechanisms and inference flow. |
| 16 | + |
| 17 | ++ Replacing the model’s default implementations with optimized modules, which is necessary to achieve the better performance. |
| 18 | + |
| 19 | +The diffusion execution flow as follow: |
| 20 | +<p align="center"> |
| 21 | + <picture> |
| 22 | + <source media="(prefers-color-scheme: dark)" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/vllm-omni-diffusion-flow.png"> |
| 23 | + <img alt="Diffusion Flow" src="https://raw.githubusercontent.com/vllm-project/vllm-omni/refs/heads/main/docs/source/architecture/vllm-omni-diffusion-flow.png" width=55%> |
| 24 | + </picture> |
| 25 | +</p> |
| 26 | + |
| 27 | + |
| 28 | +# Directory Structure |
| 29 | +File Structure for Adding a New Diffusion Model |
| 30 | + |
| 31 | +``` |
| 32 | +vllm_omni/ |
| 33 | +└── examples/ |
| 34 | + └──offline_inference |
| 35 | + └── example script # reuse existing if possible (e.g., image_edit.py) |
| 36 | + └──online_serving |
| 37 | + └── example script |
| 38 | +└── diffusion/ |
| 39 | + └── registry.py # Registry work |
| 40 | + ├── request.py # Request Info |
| 41 | + └── models/your_model_name/ # Model directory (e.g., qwen_image) |
| 42 | + └── pipeline_xxx.py # Model implementation (e.g., pipeline_qwen_image_edit.py) |
| 43 | +``` |
| 44 | + |
| 45 | +# Step-by-step-implementation |
| 46 | +## Step 1: Model Implementation |
| 47 | +The diffusion pipeline’s implementation follows **HuggingFace Diffusers**, and components that do not need modification can be imported directly. |
| 48 | +### 1.1 Define the Pipeline Class |
| 49 | +Define the pipeline class, e.g., `QwenImageEditPipeline`, and initialize all required submodules, either from HuggingFace `diffusers` or custom implementations. In `QwenImageEditPipeline`, only `QwenImageTransformer2DModel` is re-implemented to support optimizations such as Ulysses-SP. When adding new models in the future, you can either reuse this re-implemented `QwenImageTransformer2DModel` or extend it as needed. |
| 50 | + |
| 51 | +### 1.2 Pre-Processing and Post-Processing Extraction |
| 52 | +Extract the pre-processing and post-processing logic from the pipeline class to follow vLLM-Omni’s execution flow. For Qwen-Image-Edit: |
| 53 | +```python |
| 54 | +def get_qwen_image_edit_pre_process_func( |
| 55 | + od_config: OmniDiffusionConfig, |
| 56 | +): |
| 57 | + """ |
| 58 | + Define a pre-processing function that resizes input images and |
| 59 | + pre-process for subsequent inference. |
| 60 | + """ |
| 61 | +``` |
| 62 | + |
| 63 | +```python |
| 64 | +def get_qwen_image_edit_post_process_func( |
| 65 | + od_config: OmniDiffusionConfig, |
| 66 | +): |
| 67 | + """ |
| 68 | + Defines a post-processing function that post-process images. |
| 69 | + """ |
| 70 | +``` |
| 71 | + |
| 72 | +### 1.3 Define the forward function |
| 73 | +The forward function of `QwenImageEditPipeline` follows the HuggingFace `diffusers` design for the most part. The key differences are: |
| 74 | ++ As described in the overview, arguments are passed through `OnniDiffusionRequest`, so we need to get user parameters from it accordingly. |
| 75 | +```python |
| 76 | +prompt = req.prompt if req.prompt is not None else prompt |
| 77 | +``` |
| 78 | ++ pre/post-processing are handled by the framework elsewhere, so skip them. |
| 79 | + |
| 80 | +## Step 2: Extend OmniDiffusionRequest Fields |
| 81 | +User-provided inputs are ultimately passed to the model’s forward method through OmniDiffusionRequest, so we add the required fields here to support the new model. |
| 82 | +```python |
| 83 | +prompt: str | list[str] | None = None |
| 84 | +negative_prompt: str | list[str] | None = None |
| 85 | +... |
| 86 | +``` |
| 87 | + |
| 88 | +## Step 3: Registry |
| 89 | ++ registry diffusion model in registry.py |
| 90 | +```python |
| 91 | +_DIFFUSION_MODELS = { |
| 92 | + # arch:(mod_folder, mod_relname, cls_name) |
| 93 | + ... |
| 94 | + "QwenImageEditPipeline": ( |
| 95 | + "qwen_image", |
| 96 | + "pipeline_qwen_image_edit", |
| 97 | + "QwenImageEditPipeline", |
| 98 | + ), |
| 99 | + ... |
| 100 | +} |
| 101 | +``` |
| 102 | ++ registry pre-process get function |
| 103 | +```python |
| 104 | +_DIFFUSION_PRE_PROCESS_FUNCS = { |
| 105 | + # arch: pre_process_func |
| 106 | + ... |
| 107 | + "QwenImageEditPipeline": "get_qwen_image_edit_pre_process_func", |
| 108 | + ... |
| 109 | +} |
| 110 | +``` |
| 111 | + |
| 112 | ++ registry post-process get function |
| 113 | +```python |
| 114 | +_DIFFUSION_POST_PROCESS_FUNCS = { |
| 115 | + # arch: post_process_func |
| 116 | + ... |
| 117 | + "QwenImageEditPipeline": "get_qwen_image_edit_post_process_func", |
| 118 | + ... |
| 119 | +} |
| 120 | +``` |
| 121 | + |
| 122 | +## Step 4: Add an Example Script |
| 123 | +For each newly integrated model, we need to provide examples script under the examples/ to demonstrate how to initialize the pipeline with Omni, pass in user inputs, and generate outputs. |
| 124 | +Key point for writing the example: |
| 125 | + |
| 126 | ++ Use the Omni entrypoint to load the model and construct the pipeline. |
| 127 | + |
| 128 | ++ Show how to format user inputs and pass them via omni.generate(...). |
| 129 | + |
| 130 | ++ Demonstrate the common runtime arguments, such as: |
| 131 | + |
| 132 | + + model path or model name |
| 133 | + |
| 134 | + + input image(s) or prompt text |
| 135 | + |
| 136 | + + key diffusion parameters (e.g., inference steps, guidance scale) |
| 137 | + |
| 138 | + + optional acceleration backends (e.g., Cache-DiT, TeaCache) |
| 139 | + |
| 140 | ++ Save or display the generated results so users can validate the integration. |
| 141 | + |
| 142 | +# Testing |
| 143 | +For comprehensive testing guidelines, please refer to the [Test File Structure and Style Guide](../tests/tests_style.md). |
| 144 | + |
| 145 | + |
| 146 | +## Adding a Model Recipe |
| 147 | +After implementing and testing your model, please add a model recipe to the [vllm-project/recipes](https://github.com/vllm-project/recipes) repository. This helps other users understand how to use your model with vLLM-Omni. |
0 commit comments