[Core] Supports stage abstraction in the diffusion model#391
[Core] Supports stage abstraction in the diffusion model#391hsliuustc0106 merged 19 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| @@ -33,25 +69,465 @@ def __init__(self, *args, **kwargs): | |||
| args[0] = model | |||
There was a problem hiding this comment.
Avoid assigning to immutable args tuple in Omni init
When Omni is instantiated with the model passed positionally (e.g. Omni("Qwen/Qwen-Image")), the constructor assigns to args[0], but args is a tuple, so the assignment raises TypeError: 'tuple' object does not support item assignment before any initialization occurs. This makes the new entrypoint unusable for positional calls that previously worked with OmniLLM; callers must now pass model as a keyword or hit a hard crash.
Useful? React with 👍 / 👎.
|
looking forward to it |
|
let's get the initial version done before 1230 release |
|
Does this pr support reuse vllm as text encoding stage for diffusion models? |
Not yet. This PR only encapsulates the entire diffusion model into a single stage first. |
|
Does this PR mean that all models under diffusion folder can be deployed using YAML? |
Through some offline discussions, we decided that this version will not require providing a yaml file for the Diffusion model. Instead, the system will automatically generate a YAML file for the current Diffusion model. |
|
In which group you are discussing? Can you add me? |
|
Fixes #340 |
7f9aecc to
d5be79d
Compare
e08d98e to
5ef0287
Compare
| if "dtype" in kwargs: | ||
| kwargs["dtype"] = str(kwargs["dtype"]) | ||
| # TODO: hack, calculate devices based on parallel config. | ||
| devices = "0" |
There was a problem hiding this comment.
this may bring some problems but we can leave it later
| Each stage will create appropriate instances (AsyncOmniLLM or AsyncOmniDiffusion) | ||
| based on stage_type in YAML config. | ||
| """ | ||
| init_sleep_seconds = kwargs.get("init_sleep_seconds", 20) |
There was a problem hiding this comment.
@tzhouam init_sleep_seconds needs to be fixed in your PR
There was a problem hiding this comment.
Will unfiy the args for sleep in my PR later
| self.stage_configs = load_stage_configs_from_model(model, base_engine_args) | ||
| self.stage_configs = load_stage_configs_from_model(model) | ||
| if not self.stage_configs: | ||
| default_stage_cfg = [ |
There was a problem hiding this comment.
do we have a mechanism to prevent error if the default_stage_cfg is not suitable, e.g., OOM
| init_timeout = kwargs.get("init_timeout", 300) | ||
| worker_backend = kwargs.get("worker_backend", "multi_process") | ||
| ray_address = kwargs.get("ray_address", None) | ||
| batch_timeout = kwargs.get("batch_timeout", 10) |
There was a problem hiding this comment.
why we set 10 here? is it in ms or seconds?
| """ | ||
| init_sleep_seconds = kwargs.get("init_sleep_seconds", 20) | ||
| shm_threshold_bytes = kwargs.get("shm_threshold_bytes", 65536) | ||
| init_timeout = kwargs.get("init_timeout", 300) |
There was a problem hiding this comment.
is it in seconds?
what's the relationship between init_timeout and init_sleep_seconds?
do we have to check init_sleep_seconds * num_stages < init_timeout?
| idx, cfg = idx_cfg | ||
| return idx, OmniStage(cfg) | ||
|
|
||
| with ThreadPoolExecutor(max_workers=min(len(self.stage_configs), max(1, os.cpu_count() or 1))) as executor: |
There was a problem hiding this comment.
will the deployment strategy affect the way we build stage? for example, if we deploy one stage into one device, how do we choose the cpu workers
| Each stage will create appropriate instance (OmniLLM or OmniDiffusion) | ||
| based on stage_type in YAML config (handled in omni_stage.py). | ||
| """ | ||
| init_sleep_seconds = kwargs.get("init_sleep_seconds", 20) |
There was a problem hiding this comment.
same question, we need to take care of init_sleep_seconds, init_timeout and batch_timeout
| idx, cfg = idx_cfg | ||
| return idx, OmniStage(cfg) | ||
|
|
||
| with ThreadPoolExecutor(max_workers=min(len(self.stage_configs), max(1, os.cpu_count() or 1))) as executor: |
Signed-off-by: Chenguang ZHENG <645327136@qq.com>
Signed-off-by: Chenguang ZHENG <645327136@qq.com>
Signed-off-by: Chenguang ZHENG <645327136@qq.com>
Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com>
Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com>
Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com>
Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com>
Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com>
Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com>
Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com>
Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com>
Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com>
Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com>
…t#391) Signed-off-by: Chenguang ZHENG <645327136@qq.com> Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com> Co-authored-by: yinpeiqi <yinpeiqi809@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com> Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
…t#391) Signed-off-by: Chenguang ZHENG <645327136@qq.com> Signed-off-by: yinpeiqi <yinpeiqi809@gmail.com> Co-authored-by: yinpeiqi <yinpeiqi809@gmail.com> Co-authored-by: Hongsheng Liu <liuhongsheng4@huawei.com>
To align with our initial vision and for future overall optimization, we gradually began to provide Stage abstractions for Diffusion.
[Core] Add Stage Abstraction Support for Diffusion Models
Overview
This PR adds stage abstraction support for the diffusion model component of vLLM-Omni, achieving a consistent architectural design with LLM models. It also includes code refactoring to unify the sampling parameter interface, improving code maintainability and extensibility.
Major Changes
1. Code Refactoring
omni_llm.pyintoomni.py, unifying entry point managementomni_stage.pyto support stage configuration and management for diffusion models2. Diffusion Stage Abstraction Support
omni_sampling_params.py):OmniSamplingParamsclass to uniformly manage sampling parameters for both LLM and diffusion modelsSamplingParamsdiffusion_engine.pyto support stage abstractiongpu_worker.pyQwenImagePipeline.yaml3. Example Updates
text_to_image.pyexample to demonstrate how to use the new stage abstraction interface4. Outputs