[Feature] teacache integration#179
Conversation
|
could you display the pngs w/o teacache? |
vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py
Outdated
Show resolved
Hide resolved
| QwenImageTransformer2DModel, | ||
| ) | ||
| from vllm_omni.diffusion.request import OmniDiffusionRequest | ||
| from vllm_omni.diffusion.cache.teacache import TeaCacheConfig, apply_teacache |
There was a problem hiding this comment.
instead of using a fixed teacache, i think we should allow users to select different cache method either via omni_diffusion_config or environment variable. It will align the user behavior for selecting attention backend #115
My initial idea is:
- user select the cache method by
export DIFFUSION_CACHE_ADAPTER=TEA_CACHE(default no cache), the customized metadata for cache (likemax_warmup_steps) can be parsed viaomni_diffusion_config.cache_config - each cache method inherits a base cache class named as CacheAdapter, which supports feature retrieval, state management, skip-compute judgement, etc.
- model developer can easily integrate cache ability by some interface like:
maybe_apply_cache(self.transformer, cache_config)
There was a problem hiding this comment.
Yeah, I am working on something similar now actually. Taking some inspiration from https://github.com/huggingface/diffusers/blob/main/src/diffusers/hooks/faster_cache.py hooks from huggingface diffusers. I am testing the changes, I'll push it soon.
There was a problem hiding this comment.
So I have abstracted it work something like this, with separate extractors for each mode, the model file is not touched. We can depreceate the enable_teacache flag. This is extensible to other models
teacache_config = OmniDiffusionConfig(
model="Qwen/Qwen-Image",
cache_adapter="tea_cache",
cache_config={"rel_l1_thresh": 0.2, "model_type": "QwenImagePipeline"}
)
omni_cached = OmniDiffusion(od_config=teacache_config)
I think it would be best to provide data such as LPIPS, and Pareto curves would be even better. |
ZJY0516
left a comment
There was a problem hiding this comment.
The performance improvement is exciting
|
|
||
| # Registry for model-specific extractors | ||
| # Key: pipeline/model architecture name | ||
| EXTRACTOR_REGISTRY: dict[str, Callable] = { |
There was a problem hiding this comment.
We should use OmniDiffusionConfig.model_class_name(means QwenImagePipeline) as key here.
| """ | ||
| Get extractor function for given model or model type. | ||
|
|
||
| This function auto-detects the appropriate extractor based on: |
| import torch.nn as nn | ||
|
|
||
|
|
||
| def extract_qwen_modulated_input( |
There was a problem hiding this comment.
I think we should put it in model files.
You can refer to get_qwen_image_post_process_func in pipeline_qwen_image.py and how we import it in vllm_omni/diffusion/registry.py
| f"Please add a handler method for this model." | ||
| ) | ||
|
|
||
| def _handle_qwen_forward( |
There was a problem hiding this comment.
This means we need to write model specific forward, right?
|
LGTM. I'll try to run it locally |
| cache_adapter="tea_cache", | ||
| cache_config={"rel_l1_thresh": 0.2} | ||
| ) | ||
| omni = OmniDiffusion(od_config=config) |
There was a problem hiding this comment.
| omni = OmniDiffusion(od_config=config) | |
| omni = Omni(od_config=config) |
|
any progress on this PR? let's get it done asap. @LawJarp-A |
yes. But there are a few issues I‘m currently fixing. The target is to have it ready tomorrow. |
Tested on H20 |
…agnostic Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
|
One last problem: |
This is unrelated to the PR; it's a local environment issue on my end. |
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
…llm-omni into feature/teacache-integration
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
There was a problem hiding this comment.
Thanks for the nice work. Some interfaces will be altered after merge to improve compability with cache-dit in #250 , (e.g. DiffusionCacheConfig dataclass, parse pipeline instead of transformer to the adapter and rm cache_config.model_cls_name which can be obtained from pipeline.class.name )
Signed-off-by: Samit <285365963@qq.com>
|
@SamitHuang @ZJY0516 thanks for the patience and feedback! |
Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Samit <285365963@qq.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Samit <285365963@qq.com> Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Samit <285365963@qq.com> Signed-off-by: Prajwal A <prajwalanagani@gmail.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Samit <285365963@qq.com> Signed-off-by: Fanli Lin <fanli.lin@intel.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Samit <285365963@qq.com> Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>
Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Samit <285365963@qq.com>
For #175
Purpose
Integrate TeaCache (Timestep Embedding Aware Cache) into
vllm-omnito speed up diffusion inference (~1.5–2x) with minimal quality loss by reusing transformer block computations when consecutive timestep embeddings are similar.Design
Architecture
How it works:
Adding New Models
Only 5 lines needed to support a new model:
Test Plan
Test Results
Performance (CUDA, Qwen/Qwen-Image, 50 steps, 512×512)
Key Findings:
Usage
With Prompt: 'An apple and a princess"
Essential Elements of an Effective PR Description Checklist
The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating
supported_models.mdandexamplesfor a new model.(Optional) Release notes update. If your change is user facing, please update the release notes draft.
BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)