Implement TeaCache #12652

LawJarp-A · 2025-11-13T14:54:08Z

What does this PR do?

What is TeaCache?

TeaCache (Timestep Embedding Aware Cache) is a training-free caching technique that speeds up diffusion model inference by 1.5x-2.6x by reusing transformer block computations when consecutive timestep embeddings are similar.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    TeaCache Hook Flow                       │
├─────────────────────────────────────────────────────────────┤
│  1. Extract modulated input from first transformer block    │
│  2. Compute relative L1 distance vs previous timestep       │
│  3. Apply model-specific polynomial rescaling               │
│  4. Accumulate distance, compare to threshold               │
│                                                             │
│  If accumulated < threshold → Reuse cached residual (FAST)  │
│  If accumulated ≥ threshold → Full transformer pass (SLOW)  │
└─────────────────────────────────────────────────────────────┘

Integrates with existing HookRegistry and CacheMixin patterns in diffusers.

Supported Models

Model	Coefficients	Status
FLUX	✅	Tested
FLUX-Kontext	✅	Ready
Mochi	✅	Ready
Lumina2	✅	Ready
CogVideoX (2b/5b/1.5)	✅	Ready

Benchmark Results (FLUX.1-schnell, 20 steps, 512x512)

Threshold	Time	Speedup
Baseline	3.76s	1.00x
0.25	2.28s	1.65x
0.4	1.86s	2.02x
0.8	1.30s	2.89x

Benchmark Results (Lumina2, 28 steps, 512x512)

Threshold	Time	Speedup
Baseline	4.33s	1.00x
0.25	2.22s	1.95x
0.4	1.79s	2.42x
0.8	1.43s	3.02x

Benchmark Results (CogVideoX-2b, 50 steps, 720x720, 49 frames)

Threshold	Time	Speedup
Baseline	91.96s	1.00x
0.25	37.98s	2.42x
0.4	30.39s	3.03x
0.8	24.30s	3.78x

Test Hardware: NVIDIA A100-SXM4-40GB
Framework: Diffusers with TeaCache hooks
All tests: Same seed (42) for reproducibility

Pros & Cons

Pros:

Training-free, drop-in speedup
Works with existing pipelines via enable_teacache()
Configurable quality/speed tradeoff
Proper state management between inference runs

Cons:

Quality degrades at high thresholds (>0.6)
Model-specific coefficients required

Usage

from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipe.to("cuda")

# Enable TeaCache (1.75x speedup with 0.4 threshold)
pipe.transformer.enable_teacache(rel_l1_thresh=0.4, num_inference_steps=20)

image = pipe("A dragon on a crystal mountain", num_inference_steps=20).images[0]

pipe.transformer.disable_cache()

Files Changed

src/diffusers/hooks/teacache.py - Core implementation
src/diffusers/models/cache_utils.py - CacheMixin integration
tests/hooks/test_teacache.py - Unit tests

Fixes # (issue)
#12589
#12635

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@sayakpaul @yiyixuxu

…nal_forward

LawJarp-A · 2025-11-13T17:29:06Z

Work done

Implement teacache for FLUX architecture using hooks (only flux for now)
add logging
add compatible tests

Waiting for feedback and review :)
cc: @dhruvrnaik @sayakpaul @yiyixuxu

LawJarp-A · 2025-11-17T11:40:23Z

Hi @sayakpaul @dhruvrnaik any updates?

sayakpaul · 2025-11-23T11:27:37Z

@LawJarp-A sorry about the delay on our end. @DN6 will review it soon.

HuggingFaceDocBuilderDev · 2025-11-23T11:35:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

DN6 · 2025-11-24T02:56:23Z

Hi @LawJarp-A I think we would need TeaCache to be implemented in a model agnostic way in order to merge the PR. The First Block Cache implementation is a good reference for this.

LawJarp-A · 2025-11-24T03:40:06Z

Hi @LawJarp-A I think we would need TeaCache to be implemented in a model agnostic way in order to merge the PR. The First Block Cache implementation is a good reference for this.

Yep @DN6 , I agree, I wanted to first implement it just for a single model and get feedback on that before I work on Model agnostic full implementation. I'm sort of working on it, didn't push it yet. I'll take a look at First block cache for reference as well.
On the same note, lemme know if there is anything to add to the current implementation

LawJarp-A · 2025-11-26T08:53:31Z

@DN6 updated it in a more model agnostic way.
Requesting review and feedback

… quality checks

…th auto-detection

LawJarp-A · 2025-12-02T09:51:52Z

Added multi model support, testing it thoroughly though.

LawJarp-A · 2025-12-08T11:51:18Z

Hi @DN6 @sayakpaul
Two questions, I'm almost done testing, I'll update the PR with more descriptive results and changes. And do final cleanup/merging etc

Any tests I should write and anything I can refer to for the same?
Added support for other models, I'll add pictures comparison with speedup and threshold to the PR as well?

In the meantime any feedback would be appreciated

sayakpaul · 2025-12-08T12:20:26Z

Thanks @LawJarp-A!

Any tests I should write and anything I can refer to for the same?

You can refer to #12569 for testing

Added support for other models, I'll add pictures comparison with speedup and threshold to the PR as well?

Yes, I think that is informative for users.

sayakpaul

Some initial feedback. Most important question is it seems like we need to craft different logic based on different model? Can we not keep it model agnostic?

sayakpaul · 2025-12-08T12:22:36Z

src/diffusers/hooks/teacache.py

+
+_TEACACHE_HOOK = "teacache"
+
+# Model-specific polynomial coefficients from TeaCache paper/reference implementations


Do we know if these are just model-agnostic or there's something other dependencies as well (for example num_inference_steps, guidance_scale, etc.)?

Also, can we add a calibration step similar to #12648 so that users can log these coefficients for other models?

src/diffusers/hooks/teacache.py

LawJarp-A · 2025-12-08T13:43:54Z

I am trying to think if ways we can avoid having the forward model for each model now. Initially that seemed like th ebe

Some initial feedback. Most important question is it seems like we need to craft different logic based on different model? Can we not keep it model agnostic?

t was fine when I wrote for flux, but lumina needed multi stage preprocessing.
I am trying to think how to , but keeping a generic forward might not work very well :/
Firstcache, FirstBlock all work block level, but TeaCache is more model level.
Defo open to ideas :)

…torch.compile support, and clean up coefficient flow Signed-off-by: Prajwal A <[email protected]>

Signed-off-by: Prajwal A <[email protected]>

LawJarp-A · 2025-12-10T11:59:13Z

@sayakpaul
Added flux image example in the PR description.
Tested it with Lumina, CogVideoX as well
Could not test with Mochi because of GPU constraints. I can try with cpu offloading maybe

LawJarp-A · 2025-12-11T09:01:52Z

@sayakpaul @DN6 I got the core logic working, and tested it for model my GPU can handle
Right now I have gone for a simple monolithic method, each of the models forward handlers, extractors all in one file. I tried to abstract it as much, but since TeaCache works on model level, rather than blocks (like most of the caches right now, taylor, firstblock etc). It's proven a bit difficult to make it model agnostic.

The current implementation puts all model handlers in a single teacache.py file. This works but has scaling concerns:
I was thinking, since we have to add model specific functions anyway, make them a bit modular deisgn-wise.

Potential refactor: Registry + Handler pattern

diffusers/hooks/
├── teacache/
│   ├── __init__.py           # Public API
│   ├── config.py             # TeaCacheConfig
│   ├── hook.py               # TeaCacheHook (core logic)
│   ├── registry.py           # Handler registry
│   └── handlers/
│       ├── __init__.py       # Auto-imports all handlers
│       ├── base.py           # BaseTeaCacheHandler ABC
│       ├── flux.py
│       ├── mochi.py
│       ├── lumina2.py
│       └── cogvideox.py

Each handler self-registers and encapsulates its logic:

# handlers/flux.py
from .base import BaseTeaCacheHandler
from ..registry import register_handler

@register_handler("Flux", "FluxKontext")
class FluxHandler(BaseTeaCacheHandler):
    coefficients = [4.98651651e02, -2.83781631e02, ...]
    
    def extract_modulated_input(self, module, hidden_states, temb):
        return module.transformer_blocks[0].norm1(hidden_states, emb=temb)[0]
    
    def handle_forward(self, module, *args, **kwargs):
        # FLUX-specific forward with ControlNet, LORA, etc.
        ...

# registry.py
_HANDLER_REGISTRY = {}

def register_handler(*model_names):
    def decorator(cls):
        for name in model_names:
            _HANDLER_REGISTRY[name] = cls
        return cls
    return decorator

def get_handler(module) -> BaseTeaCacheHandler:
    for name, handler_cls in _HANDLER_REGISTRY.items():
        if name in module.__class__.__name__:
            return handler_cls()
    raise ValueError(f"No TeaCache handler for {module.__class__.__name__}")

This is similar to how attention processors and schedulers are organized. Happy to refactor if you think it's worth it, or we can keep it simple like now. Since this has proven a bit more of a challenge to integrate than I thought xD would be happy to know if you guys have some ideas.

LawJarp-A · 2025-12-15T04:40:09Z

Hey @DN6 @sayakpaul , any updates :)

Signed-off-by: Prajwal A <[email protected]>

LawJarp-A · 2025-12-27T04:28:33Z

@sayakpaul @DN6 checking in again :)

DN6

Some high level feedback on the design. The control flow is hard to follow as it switches between the hook object and adapter. The adapters themselves are thin wrappers around a modified forward function, so it would be better to just define them as standalone functions. e.g.

def _flux_forward(
    state: "TeaCacheState", # pass the state to the function not the hook object
    coefficients: List[float],
    rel_l1_thresh: float,
    module: torch.nn.Module,
    hidden_states: torch.Tensor,
    timestep: torch.Tensor,
    pooled_projections: torch.Tensor,
    encoder_hidden_states: torch.Tensor,
    txt_ids: torch.Tensor,
    img_ids: torch.Tensor,
    return_dict: bool = True,
    **kwargs,
):

    if _should_use_cache(state, modulated_inp, coefficients, rel_l1_thresh)
        hidden_states = _apply_cached_residual(state, hidden_states, modulated_inp)
    else:
	    # run compute
	    _update_cache(state, hidden_states, original_hidden_states, modulated_inp)

Since we're hooking the top level forward of the model, we can map this forward function using the class name during hook initialization.

    def initialize_hook(self, module):
        """Initialize hook with model-specific configuration."""
        model_config = _MODEL_CONFIG.get(module.__name__)
        if model_config is None:
            raise ValueError

        if self.config.coefficients is not None:
            self.coefficients = self.config.coefficients
        else:
            self.coefficients = model_config["coefficients"]

        # Initialize state
        self.state_manager = StateManager(TeaCacheState)
        self.forward_fn = model_config["forward_func"]

        return module

Where _MODEL_CONFIG is just a mapping for the forward functions and coefficients

_MODEL_CONFIG = {
    "FluxTransformer2DModel": {
        "forward_func": _flux_forward,
        "coefficients": [4.98651651e02, -2.83781631e02, 5.58554382e01, -3.82021401e00, 2.64230861e-01],
    },
}

Similarly, the methods defined in the hook object could also be turned into utility functions.

def _compute_rescaled_distance(rel_distance: float, coefficients: List[float]) -> float:
    return (
        coefficients[0] * rel_distance**4
        + coefficients[1] * rel_distance**3
        + coefficients[2] * rel_distance**2
        + coefficients[3] * rel_distance
        + coefficients[4]
    )
    
def _should_use_cache(state: "TeaCacheState", ...):
	# Return True or False based on whether to use cache. 
	return 
	
def _update_cache(state: "TeaCacheState)
	return 

def _apply_cached_residual(
    state: "TeaCacheState", input_base: torch.Tensor, modulated_inp: torch.Tensor
) -> torch.Tensor:
    """
    Apply cached residual to input (fast path).
    """
    output = input_base + state.previous_residual
    state.previous_modulated_input = modulated_inp
    state.cnt += 1
    return output

Let's remove passing cache_fn and compute_fn between the hook and the adapter. Use operations directly on the cache state + globally available utility methods. We can also remove the modulation extractors and move that logic into the model specific forward functions.

DN6 · 2025-11-24T08:44:23Z

src/diffusers/hooks/teacache.py

+            )
+        if self.rel_l1_thresh < 0.05:
+            import warnings
+            warnings.warn(


Use logger.warning

DN6 · 2025-12-17T04:47:03Z

src/diffusers/models/cache_utils.py


        registry._set_context(None)
+
+    def enable_teacache(self, rel_l1_thresh: float = 0.2, num_inference_steps: int = None, **kwargs):


Cacheing should only be enabled through enable_cache and passing the relevant config. Cache specific enabling is not supported.

DN6 · 2025-12-29T03:11:49Z

src/diffusers/hooks/teacache.py

+        pipe.to("cuda")
+
+        # Enable TeaCache with auto-detection (1.5x speedup)
+        pipe.transformer.enable_teacache(rel_l1_thresh=0.2)


Should be

pipe.transformer.enable_cache(...)

We don't enable specific cacheing methods directly

DN6 · 2025-12-29T03:34:58Z

src/diffusers/hooks/teacache.py

+                logger.info(f"TeaCache: Using {state.num_steps} inference steps")
+
+    def initialize_hook(self, module):
+        self.state_manager.set_context("teacache")


Cache context is typically set in the denoising loop? I think in this case, both conditional and unconditional branches would write to the same cache state when using CFG.

DN6 · 2025-12-29T04:45:05Z

src/diffusers/hooks/teacache.py

+
+def _flux_modulated_input_extractor(module, hidden_states, timestep_emb):
+    """Extract modulated input for FLUX models."""
+    return module.transformer_blocks[0].norm1(hidden_states, emb=timestep_emb)[0]


I think these extractor functions can be folded into the adapter functions of each model. They're thin wrappers around a single line of code.

DN6 · 2025-12-29T04:47:45Z

src/diffusers/hooks/teacache.py

+        self.model_type = None
+
+    @staticmethod
+    def _create_rescale_func(coefficients):


Why do we need to create a rescale func? If we have coefficients set, we should be able to just call the function directly?

def rescale_fn(self): return self.coefficients[0] * x**4 + self.coefficients[1] * x**3 + self.coefficients[2] * x**2 + self.coefficients[3] * x + self.coefficients[4]

LawJarp-A · 2025-12-29T11:24:10Z

Thanks for the feedback @DN6
I'll take this week and rework it.
I had left some redundant code while trying to figure out the organization, will clean it up

base implement teacahce flux; follow original impl

781c7e3

LawJarp-A mentioned this pull request Nov 13, 2025

[feature] implement TeaCache #12589

Open

update cache utils with teacache

549bf97

sayakpaul requested a review from DN6 November 13, 2025 16:49

change hook to inline transformer processing instead of calling origi…

29d4ffc

…nal_forward

sayakpaul mentioned this pull request Nov 14, 2025

The Diffusers MVP 🚀 #12635

Open

LawJarp-A added 4 commits November 14, 2025 06:43

add extensive docstring (auto gen); add repr

07c6718

add param validation and error messages

a7598a1

add basic logging

d9648e5

add compatible test

59cb890

LawJarp-A marked this pull request as ready for review November 14, 2025 08:23

Merge branch 'main' into teacache-flux

5227937

LawJarp-A and others added 2 commits November 26, 2025 12:30

Merge branch 'main' into teacache-flux

296aa8d

update it to make it model agnostic

44663de

LawJarp-A added 2 commits November 26, 2025 11:08

add TeaCache hook tests and ensure cache integration passes style and…

4d34020

… quality checks

Add multi-model TeaCache support for Mochi, Lumina2, and CogVideoX wi…

9dab52f

…th auto-detection

sayakpaul reviewed Dec 8, 2025

View reviewed changes

LawJarp-A and others added 2 commits December 8, 2025 13:44

simplify model extract; use logger

4cb71d6

Merge branch 'main' into teacache-flux

ad90044

LawJarp-A and others added 4 commits December 9, 2025 12:15

Merge branch 'main' into teacache-flux

f2793fd

fix(teacache): fix return_dict handling, CogVideoX fallback bug, add …

f0abb3c

…torch.compile support, and clean up coefficient flow Signed-off-by: Prajwal A <[email protected]>

Fix TeaCache state management and add num_inference_steps param

4a6afef

Signed-off-by: Prajwal A <[email protected]>

fixed counter manage, cogvoideox missing norm proj added

8646908

Signed-off-by: Prajwal A <[email protected]>

Merge branch 'main' into teacache-flux

c62ddc2

LawJarp-A requested a review from sayakpaul December 16, 2025 06:42

LawJarp-A and others added 2 commits December 16, 2025 12:32

Merge branch 'huggingface:main' into teacache-flux

c2c4c76

Refactor TeaCache hook into adapters

38effa1

Signed-off-by: Prajwal A <[email protected]>

DN6 reviewed Dec 29, 2025

View reviewed changes


		_TEACACHE_HOOK = "teacache"

		# Model-specific polynomial coefficients from TeaCache paper/reference implementations


		registry._set_context(None)

		def enable_teacache(self, rel_l1_thresh: float = 0.2, num_inference_steps: int = None, **kwargs):

Implement TeaCache #12652

Are you sure you want to change the base?

Implement TeaCache #12652

Conversation

LawJarp-A commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

What is TeaCache?

Architecture

Supported Models

Benchmark Results (FLUX.1-schnell, 20 steps, 512x512)

Benchmark Results (Lumina2, 28 steps, 512x512)

Benchmark Results (CogVideoX-2b, 50 steps, 720x720, 49 frames)

Pros & Cons

Usage

Files Changed

Before submitting

Who can review?

Uh oh!

LawJarp-A commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LawJarp-A commented Nov 17, 2025

Uh oh!

sayakpaul commented Nov 23, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 23, 2025

Uh oh!

DN6 commented Nov 24, 2025

Uh oh!

LawJarp-A commented Nov 24, 2025

Uh oh!

LawJarp-A commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LawJarp-A commented Dec 2, 2025

Uh oh!

LawJarp-A commented Dec 8, 2025

Uh oh!

sayakpaul commented Dec 8, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LawJarp-A commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LawJarp-A commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LawJarp-A commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LawJarp-A commented Dec 15, 2025

Uh oh!

LawJarp-A commented Dec 27, 2025

Uh oh!

DN6 left a comment

Choose a reason for hiding this comment

Uh oh!

DN6 Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

DN6 Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

DN6 Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

DN6 Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

LawJarp-A commented Nov 13, 2025 •

edited

Loading

LawJarp-A commented Nov 13, 2025 •

edited

Loading

LawJarp-A commented Nov 26, 2025 •

edited

Loading

LawJarp-A commented Dec 8, 2025 •

edited

Loading

LawJarp-A commented Dec 10, 2025 •

edited

Loading

LawJarp-A commented Dec 11, 2025 •

edited

Loading