[generate] remove legacy cache in `t5` and `whisper`-based models (deprecated in v4.48) #36238

gante · 2025-02-17T15:10:21Z

What does this PR do?

Reviewers: this PR applies the same pattern on all models. In essence, you only need to review one model with attention (main models: T5 and Whisper)

This PR removes the option to use the legacy cache format in encoder-decoder models, where it has been scheduled for removal in v4.48 or earlier.

The previous pattern, introduced in Whisper, also got updated: if the model is used in decoder-only mode, a Cache class is now accepted. The previous pattern was wasteful and (IMO) unintuitive: we were initializing a new cache, to then discard the cross-attention part 🤔 The updated pattern uses the cache as-is. On the other hand, when the models are encoder-decoder, only EncoderDecoderCache instances are accepted (a deprecation cycle was added for the old behavior).

The PR also:

Adds support for assisted generation with EncoderDecoderCache
Updates the docstrings for past_key_values in the touched models
Adds more comments so we can immediately understand what's going on
Standardizes a few minor related differences (mostly on Whisper, to match T5-based models which are more readable)

✅ slow T5 tests are all green

HuggingFaceDocBuilderDev · 2025-02-17T15:36:29Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante · 2025-02-18T10:33:21Z

src/transformers/models/moonshine/modeling_moonshine.py

+                # set flag that curr layer for cross-attn is already updated so we can re-use in subsequent calls
+                past_key_value.is_updated[self.layer_idx] = True


took this pattern from T5 and modified everywhere: it is much clearer than the one originally in whisper, the flag is set AFTER the update is done

zucchini-nlp

Thanks a lot for cleaning our repo, those warning are indeed a bit annoying! 🧼

I looked only at T5 and Qwen2Audio, left a few questions to be sure I understand it correctly. The logic for dispatching cache when None seems to be involved

src/transformers/generation/candidate_generator.py

zucchini-nlp · 2025-02-18T16:12:02Z

src/transformers/models/longt5/modeling_longt5.py

+                    "You are passing a decoder-only cache to a model that is used as an encoder-decoder model. "
+                    "This behavior is deprecated and will be removed in v4.52. To avoid this warning, please pass an "
+                    "`EncoderDecoderCache` (e.g. `EncoderDecoderCache(past_key_values, DynamicCache())`)."


in which situations can this happen? I believe if we are using generate(), this should never happen. If users are passing cache objects, they are supposed to use EncoderDecoderCache (as of prev deprecation warning msg)

100% agreed it should never happen. However, it was supported in the refactor (see here), copy-pasted from Whisper. Hence the deprecation cycle, as a good practice.

For reference, back when it was added to Whisper, the Whisper team was experimenting with new architectures -- a decoder-only model that could also receive the inputs from another encoder. In hindsight, using EncoderDecoderCache there would have been better, but I didn't see the issue when reviewing the original PR :)

zucchini-nlp · 2025-02-18T16:22:14Z

src/transformers/models/longt5/modeling_longt5.py

+        elif past_key_values is not None:
+            logger.warning_once(
+                "`use_cache` is set to `False` but `past_key_values` is passed. `past_key_values` will be ignored."
+            )


Nice! we don't set cache to None in LLMs so would be cool to propagate this change

Will do!

I've noticed the cache docstrings in LLMs are also outdated, so I'll open a PR with both :)

src/transformers/models/qwen2_audio/modeling_qwen2_audio.py

gante · 2025-02-19T10:50:24Z

(@zucchini-nlp @eustlb -- moving Qwen2 audio changes into a separate PR that should be merged before this one, as the nature of the changes are different)

gante · 2025-05-27T10:24:19Z

(superceded by the PRs @vasqu is working on)

gante force-pushed the v4_48_deprecations branch from 2addcb6 to 515d266 Compare February 17, 2025 16:27

gante changed the title ~~[generate] remove legacy cache in encoder-decoder models (deprecated in v4.48)~~ [generate] remove legacy cache in t5-based encoder-decoder models (deprecated in v4.48) Feb 17, 2025

gante changed the title ~~[generate] remove legacy cache in t5-based encoder-decoder models (deprecated in v4.48)~~ [generate] remove legacy cache in t5-based encoder-decoder models (deprecated in v4.48) Feb 17, 2025

gante added 3 commits February 17, 2025 16:54

tmp_commit

32dfa88

tmp commit

17ad630

update models

fd365f0

gante force-pushed the v4_48_deprecations branch from 515d266 to fd365f0 Compare February 17, 2025 16:54

gante added 3 commits February 18, 2025 10:13

better comment for cache being ignored

0352f9d

shorter assisted gen logic

96a243e

better is_updated pattern (from t5)

9550f5d

gante commented Feb 18, 2025

View reviewed changes

gante added 3 commits February 18, 2025 10:45

update var name

9c195a8

moonshine

614e7b0

make fix copies

54cd2b1

gante mentioned this pull request Feb 18, 2025

[tests] remove pt_tf equivalence tests #36253

Merged

gante added 5 commits February 18, 2025 15:00

corrected pattern

5457c13

long t5

ce13fee

moonshine

bd79ac1

t5-based docstrings

f2b95fe

spread the pattern

2adfa3b

gante marked this pull request as ready for review February 18, 2025 15:41

gante requested review from eustlb and zucchini-nlp February 18, 2025 15:46

gante changed the title ~~[generate] remove legacy cache in t5-based encoder-decoder models (deprecated in v4.48)~~ [generate] remove legacy cache in t5 and whisper-based encoder-decoder models (deprecated in v4.48) Feb 18, 2025

gante changed the title ~~[generate] remove legacy cache in t5 and whisper-based encoder-decoder models (deprecated in v4.48)~~ [generate] remove legacy cache in t5 and whisper-based models (deprecated in v4.48) Feb 18, 2025

zucchini-nlp reviewed Feb 19, 2025

View reviewed changes

qwen2-audio is NOT a copy from whisper

19082da

gante mentioned this pull request Feb 19, 2025

[qwen2 audio] remove redundant code and update docs #36282

Merged

gante closed this May 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[generate] remove legacy cache in `t5` and `whisper`-based models (deprecated in v4.48) #36238

[generate] remove legacy cache in `t5` and `whisper`-based models (deprecated in v4.48) #36238

Uh oh!

gante commented Feb 17, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Feb 17, 2025

Uh oh!

gante Feb 18, 2025 •

edited

Loading

Uh oh!

zucchini-nlp left a comment

Uh oh!

Uh oh!

zucchini-nlp Feb 18, 2025

Uh oh!

gante Feb 19, 2025

Uh oh!

zucchini-nlp Feb 18, 2025

Uh oh!

gante Feb 19, 2025

Uh oh!

Uh oh!

gante commented Feb 19, 2025 •

edited

Loading

Uh oh!

gante commented May 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# set flag that curr layer for cross-attn is already updated so we can re-use in subsequent calls
		past_key_value.is_updated[self.layer_idx] = True

[generate] remove legacy cache in t5 and whisper-based models (deprecated in v4.48) #36238

[generate] remove legacy cache in t5 and whisper-based models (deprecated in v4.48) #36238

Uh oh!

Conversation

gante commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Feb 17, 2025

Uh oh!

gante Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zucchini-nlp Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

gante Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Feb 18, 2025

Choose a reason for hiding this comment

Uh oh!

gante Feb 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gante commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gante commented May 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[generate] remove legacy cache in `t5` and `whisper`-based models (deprecated in v4.48) #36238

[generate] remove legacy cache in `t5` and `whisper`-based models (deprecated in v4.48) #36238

gante commented Feb 17, 2025 •

edited

Loading

gante Feb 18, 2025 •

edited

Loading

gante commented Feb 19, 2025 •

edited

Loading