[Feature] teacache integration by LawJarp-A · Pull Request #179 · vllm-project/vllm-omni

LawJarp-A · 2025-12-03T12:33:29Z

Purpose

Integrate TeaCache (Timestep Embedding Aware Cache) into vllm-omni to speed up diffusion inference (~1.5–2x) with minimal quality loss by reusing transformer block computations when consecutive timestep embeddings are similar.

Design

Architecture

vllm_omni/diffusion/
├── cache/teacache/
│   ├── config.py         # TeaCacheConfig (thresholds, coefficients)
│   ├── state.py          # Cache state management
│   ├── extractors.py     # Model-specific extractor registry
│   ├── hook.py           # Forward pass interception
│   └── adapter.py        # CacheAdapter implementation
├── hooks.py              # Hook infrastructure
└── models/qwen_image/
    └── pipeline_qwen_image.py  # Cache setup & reset

How it works:

Hook intercepts transformer forward pass (no model changes needed)
Extract modulated input from first transformer block
Compute L1 distance between consecutive timesteps
Decision: Below threshold → reuse cache; Above → compute & update cache
CFG-aware: Separate states for positive/negative prompts

Adding New Models

Only 5 lines needed to support a new model:

# vllm_omni/diffusion/cache/teacache/extractors.py

def extract_flux_modulated_input(module, hidden_states, temb):
    """Extract modulated input for FLUX models."""
    return module.transformer_blocks[0].norm1(hidden_states, emb=temb)[0]

# Register it
EXTRACTOR_REGISTRY["FluxPipeline"] = extract_flux_modulated_input

Test Plan

Functional: Verify correctness with/without cache
Performance: Benchmark across thresholds (0.2, 0.4, 0.6)
Quality: Visual comparison of generated images

Test Results

Performance (CUDA, Qwen/Qwen-Image, 50 steps, 512×512)

Configuration	Time	Speedup	Quality
Baseline (no cache)	6.52s ± 0.01s	1.00x	Reference
thresh=0.2 (balanced)	5.16s ± 0.01s	1.26x	✓ Minimal loss
thresh=0.4 (aggressive)	3.23s ± 0.00s	2.02x	△ Slight loss

Key Findings:

2.0x speedup achieved (50% time reduction)
1.3x speedup with minimal quality impact

Usage

from vllm_omni.diffusion.data import OmniDiffusionConfig
from vllm_omni.entrypoints.omni import Omni

config = OmniDiffusionConfig(
    model="Qwen/Qwen-Image",
    cache_adapter="tea_cache",
    cache_config={"rel_l1_thresh": 0.2}  # 0.2=balanced, 0.4=fast
)

omni = Omni(od_config=config)
images = omni.generate("a cat", num_inference_steps=50)  # 1.3-2x faster!

With Prompt: 'An apple and a princess"

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

hsliuustc0106 · 2025-12-03T12:46:31Z

could you display the pngs w/o teacache?

vllm_omni/diffusion/cache/teacache/extractors.py

vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py

vllm_omni/diffusion/cache/teacache/config.py

SamitHuang · 2025-12-03T14:10:50Z

vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py

    QwenImageTransformer2DModel,
 )
 from vllm_omni.diffusion.request import OmniDiffusionRequest
+from vllm_omni.diffusion.cache.teacache import TeaCacheConfig, apply_teacache


instead of using a fixed teacache, i think we should allow users to select different cache method either via omni_diffusion_config or environment variable. It will align the user behavior for selecting attention backend #115

My initial idea is:

user select the cache method by export DIFFUSION_CACHE_ADAPTER=TEA_CACHE (default no cache), the customized metadata for cache (like max_warmup_steps) can be parsed via omni_diffusion_config.cache_config

each cache method inherits a base cache class named as CacheAdapter, which supports feature retrieval, state management, skip-compute judgement, etc.

model developer can easily integrate cache ability by some interface like:
maybe_apply_cache(self.transformer, cache_config)

Yeah, I am working on something similar now actually. Taking some inspiration from https://github.com/huggingface/diffusers/blob/main/src/diffusers/hooks/faster_cache.py hooks from huggingface diffusers. I am testing the changes, I'll push it soon.

So I have abstracted it work something like this, with separate extractors for each mode, the model file is not touched. We can depreceate the enable_teacache flag. This is extensible to other models

teacache_config = OmniDiffusionConfig( model="Qwen/Qwen-Image", cache_adapter="tea_cache", cache_config={"rel_l1_thresh": 0.2, "model_type": "QwenImagePipeline"} ) omni_cached = OmniDiffusion(od_config=teacache_config)

david6666666 · 2025-12-04T02:47:59Z

could you display the pngs w/o teacache?

I think it would be best to provide data such as LPIPS, and Pareto curves would be even better.

ZJY0516

The performance improvement is exciting

vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py

ZJY0516 · 2025-12-04T08:59:28Z

vllm_omni/diffusion/cache/teacache/extractors.py

+
+# Registry for model-specific extractors
+# Key: pipeline/model architecture name
+EXTRACTOR_REGISTRY: dict[str, Callable] = {


We should use OmniDiffusionConfig.model_class_name(means QwenImagePipeline) as key here.

ZJY0516 · 2025-12-04T09:00:33Z

vllm_omni/diffusion/cache/teacache/extractors.py

+    """
+    Get extractor function for given model or model type.
+
+    This function auto-detects the appropriate extractor based on:


exact match is enough

ZJY0516 · 2025-12-04T09:03:26Z

vllm_omni/diffusion/cache/teacache/extractors.py

+import torch.nn as nn
+
+
+def extract_qwen_modulated_input(


I think we should put it in model files.

You can refer to get_qwen_image_post_process_func in pipeline_qwen_image.py and how we import it in vllm_omni/diffusion/registry.py

ZJY0516 · 2025-12-04T10:03:59Z

vllm_omni/diffusion/cache/teacache/hook.py

+                f"Please add a handler method for this model."
+            )
+
+    def _handle_qwen_forward(


This means we need to write model specific forward, right?

vllm_omni/diffusion/models/qwen_image/pipeline_qwen_image.py

ZJY0516 · 2025-12-05T10:20:30Z

LGTM. I'll try to run it locally

SamitHuang · 2025-12-08T03:35:38Z

vllm_omni/diffusion/cache/teacache/__init__.py

+        cache_adapter="tea_cache",
+        cache_config={"rel_l1_thresh": 0.2}
+    )
+    omni = OmniDiffusion(od_config=config)


Suggested change

omni = OmniDiffusion(od_config=config)

omni = Omni(od_config=config)

hsliuustc0106 · 2025-12-09T01:23:07Z

any progress on this PR? let's get it done asap. @LawJarp-A
@ZJY0516 have you tried locally?

ZJY0516 · 2025-12-09T08:10:14Z

any progress on this PR? let's get it done asap. @LawJarp-A @ZJY0516 have you tried locally?

yes. But there are a few issues I‘m currently fixing. The target is to have it ready tomorrow.

ZJY0516 · 2025-12-09T11:17:36Z

from vllm_omni.diffusion.data import OmniDiffusionConfig
from vllm_omni.entrypoints.omni import Omni

if __name__ == "__main__":
    omni = Omni(
        model="Qwen/Qwen-Image",
        cache_adapter="tea_cache",
        cache_config={"rel_l1_thresh": 0.2},
    )
    import time
    start = time.perf_counter()
    images = omni.generate("a cat", num_inference_steps=50) 
    end = time.perf_counter()
    print(f"Generation took {end - start:.2f} seconds")
    images[0].save("qwen_image_teacache_example.png")

Tested on H20
66.35s -> 31.26s

…agnostic Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

ZJY0516 · 2025-12-11T11:24:44Z

One last problem:

/home/zjy/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/home/zjy/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 2 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
WARNING:vllm_omni.diffusion.diffusion_engine:Failed to send shutdown signal: 'NoneType' object has no attribute 'dumps'

ZJY0516 · 2025-12-11T13:11:37Z

One last problem:

/home/zjy/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/home/zjy/.local/share/uv/python/cpython-3.12.11-linux-x86_64-gnu/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 2 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
WARNING:vllm_omni.diffusion.diffusion_engine:Failed to send shutdown signal: 'NoneType' object has no attribute 'dumps'

This is unrelated to the PR; it's a local environment issue on my end.

docs/user_guide/teacache.md

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

docs/user_guide/examples/offline_inference/qwen_image.md

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

…llm-omni into feature/teacache-integration

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

SamitHuang

Thanks for the nice work. Some interfaces will be altered after merge to improve compability with cache-dit in #250 , (e.g. DiffusionCacheConfig dataclass, parse pipeline instead of transformer to the adapter and rm cache_config.model_cls_name which can be obtained from pipeline.class.name )

docs/.nav.yml

Signed-off-by: Samit <285365963@qq.com>

LawJarp-A · 2025-12-12T06:27:46Z

@SamitHuang @ZJY0516 thanks for the patience and feedback!

Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Samit <285365963@qq.com>

Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Samit <285365963@qq.com> Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Samit <285365963@qq.com> Signed-off-by: Fanli Lin <fanli.lin@intel.com>

Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Samit <285365963@qq.com> Signed-off-by: wangyu31577 <wangyu31577@hundsun.com>

Signed-off-by: Prajwal A <prajwalanagani@gmail.com> Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Signed-off-by: Samit <285365963@qq.com> Co-authored-by: zjy0516 <riverclouds.zhu@qq.com> Co-authored-by: Samit <285365963@qq.com>

LawJarp-A mentioned this pull request Dec 3, 2025

[RFC]: Adding TeaCache for Diffusion Acceleration #175

Closed

1 task

hsliuustc0106 requested review from SamitHuang and ZJY0516 December 3, 2025 12:45

ZJY0516 reviewed Dec 3, 2025

View reviewed changes

vllm_omni/diffusion/cache/teacache/extractors.py Show resolved Hide resolved

vllm_omni/diffusion/models/qwen_image/qwen_image_transformer.py Outdated Show resolved Hide resolved

ZJY0516 reviewed Dec 3, 2025

View reviewed changes

vllm_omni/diffusion/cache/teacache/config.py Outdated Show resolved Hide resolved

SamitHuang reviewed Dec 3, 2025

View reviewed changes

ZJY0516 reviewed Dec 4, 2025

View reviewed changes

ZJY0516 requested a review from SamitHuang December 5, 2025 10:18

hsliuustc0106 mentioned this pull request Dec 5, 2025

[RFC]: DiT model and feature support enhancement #85

Closed

58 tasks

SamitHuang reviewed Dec 8, 2025

View reviewed changes

SamitHuang mentioned this pull request Dec 8, 2025

[RFC][WIP]: Support CacheDiT as Third-party Cache Accelertion Backend #235

Closed

1 task

LawJarp-A added 12 commits December 9, 2025 15:00

feat; Add Teacache core module, with core, extractors, config; model …

28fb9f3

…agnostic Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

feat; integerate teacache with qwenimage

91c5e1e

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

enable Teacache via request flag in pipeline

e0eb0f1

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

fix CFG-aware caching, model-agnostic core, and coef for Qwen

9a2e9c6

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

move to cache folder

7367c01

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

fix imports

0a77823

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

feat add; move to a hooks based approach

4a04bef

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

correct model; infstep back to diffusion request

f46badd

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

feat add; add a cache adapter approach;

bed1b96

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

fix lint issues

2adb563

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

lint fix

0e30b96

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

add diffusion cache module and update .gitignore to track it

4b2dbb9

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

LawJarp-A added 4 commits December 11, 2025 05:48

update docs to be user friendly

670f9ab

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

update docs

605a04a

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

remove yaml, fix whitespace, fix omni input

926fa90

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

fix whitespace issue

4a73ffe

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

SamitHuang reviewed Dec 11, 2025

View reviewed changes

docs/user_guide/teacache.md Show resolved Hide resolved

LawJarp-A added 2 commits December 11, 2025 13:53

move docs to user guide; add in nav

f4017c1

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

move teacache docs to user guide and fix link

cb10f2e

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

ZJY0516 reviewed Dec 11, 2025

View reviewed changes

docs/user_guide/examples/offline_inference/qwen_image.md Show resolved Hide resolved

LawJarp-A added 4 commits December 11, 2025 14:37

move teacache docs to user guide and fix link

47601f0

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

Merge branch 'feature/teacache-integration' of github.com:LawJarp-A/v…

f38fa81

…llm-omni into feature/teacache-integration

revert qwen image md

7a9ed62

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

newlines fix

1e3b83c

Signed-off-by: Prajwal A <prajwalanagani@gmail.com>

ZJY0516 approved these changes Dec 11, 2025

View reviewed changes

SamitHuang approved these changes Dec 12, 2025

View reviewed changes

SamitHuang reviewed Dec 12, 2025

View reviewed changes

docs/.nav.yml Show resolved Hide resolved

Update docs/.nav.yml

2eb8560

Signed-off-by: Samit <285365963@qq.com>

ZJY0516 changed the title ~~Feature/teacache integration~~ [Feature] teacache integration Dec 12, 2025

ZJY0516 enabled auto-merge (squash) December 12, 2025 02:52

ZJY0516 merged commit 65ca131 into vllm-project:main Dec 12, 2025
4 checks passed

yuanheng-zhao mentioned this pull request Dec 13, 2025

[Feature] Enable TeaCache in QwenImageEditPipeline #304

Merged

5 tasks

wtomin mentioned this pull request Feb 5, 2026

[RFC]: Continuous Diffusion Model Acceleration Support #1217

Open

1 task

	omni = OmniDiffusion(od_config=config)
	omni = Omni(od_config=config)

Conversation

LawJarp-A commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Design

Architecture

Adding New Models

Test Plan

Test Results

Performance (CUDA, Qwen/Qwen-Image, 50 steps, 512×512)

Usage

Uh oh!

hsliuustc0106 commented Dec 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SamitHuang Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LawJarp-A Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

LawJarp-A Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david6666666 commented Dec 4, 2025

Uh oh!

ZJY0516 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ZJY0516 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ZJY0516 commented Dec 5, 2025

Uh oh!

SamitHuang Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Dec 9, 2025

Uh oh!

ZJY0516 commented Dec 9, 2025

Uh oh!

ZJY0516 commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZJY0516 commented Dec 11, 2025

Uh oh!

ZJY0516 commented Dec 11, 2025

Uh oh!

Uh oh!

Uh oh!

SamitHuang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LawJarp-A commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

LawJarp-A commented Dec 3, 2025 •

edited

Loading

SamitHuang Dec 3, 2025 •

edited

Loading

LawJarp-A Dec 3, 2025 •

edited

Loading

ZJY0516 commented Dec 9, 2025 •

edited

Loading

SamitHuang left a comment •

edited

Loading