-
Notifications
You must be signed in to change notification settings - Fork 31.7k
[generate] remove legacy cache in t5 and whisper-based models (deprecated in v4.48)
#36238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
2addcb6 to
515d266
Compare
t5-based encoder-decoder models (deprecated in v4.48)
515d266 to
fd365f0
Compare
| # set flag that curr layer for cross-attn is already updated so we can re-use in subsequent calls | ||
| past_key_value.is_updated[self.layer_idx] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took this pattern from T5 and modified everywhere: it is much clearer than the one originally in whisper, the flag is set AFTER the update is done
t5-based encoder-decoder models (deprecated in v4.48)t5 and whisper-based encoder-decoder models (deprecated in v4.48)
t5 and whisper-based encoder-decoder models (deprecated in v4.48)t5 and whisper-based models (deprecated in v4.48)
zucchini-nlp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for cleaning our repo, those warning are indeed a bit annoying! 🧼
I looked only at T5 and Qwen2Audio, left a few questions to be sure I understand it correctly. The logic for dispatching cache when None seems to be involved
| "You are passing a decoder-only cache to a model that is used as an encoder-decoder model. " | ||
| "This behavior is deprecated and will be removed in v4.52. To avoid this warning, please pass an " | ||
| "`EncoderDecoderCache` (e.g. `EncoderDecoderCache(past_key_values, DynamicCache())`)." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in which situations can this happen? I believe if we are using generate(), this should never happen. If users are passing cache objects, they are supposed to use EncoderDecoderCache (as of prev deprecation warning msg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% agreed it should never happen. However, it was supported in the refactor (see here), copy-pasted from Whisper. Hence the deprecation cycle, as a good practice.
For reference, back when it was added to Whisper, the Whisper team was experimenting with new architectures -- a decoder-only model that could also receive the inputs from another encoder. In hindsight, using EncoderDecoderCache there would have been better, but I didn't see the issue when reviewing the original PR :)
| elif past_key_values is not None: | ||
| logger.warning_once( | ||
| "`use_cache` is set to `False` but `past_key_values` is passed. `past_key_values` will be ignored." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! we don't set cache to None in LLMs so would be cool to propagate this change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do!
I've noticed the cache docstrings in LLMs are also outdated, so I'll open a PR with both :)
|
(@zucchini-nlp @eustlb -- moving Qwen2 audio changes into a separate PR that should be merged before this one, as the nature of the changes are different) |
|
(superceded by the PRs @vasqu is working on) |
What does this PR do?
Reviewers: this PR applies the same pattern on all models. In essence, you only need to review one model with attention (main models: T5 and Whisper)
This PR removes the option to use the legacy cache format in encoder-decoder models, where it has been scheduled for removal in v4.48 or earlier.
The previous pattern, introduced in Whisper, also got updated: if the model is used in decoder-only mode, a
Cacheclass is now accepted. The previous pattern was wasteful and (IMO) unintuitive: we were initializing a new cache, to then discard the cross-attention part 🤔 The updated pattern uses the cache as-is. On the other hand, when the models are encoder-decoder, onlyEncoderDecoderCacheinstances are accepted (a deprecation cycle was added for the old behavior).The PR also:
EncoderDecoderCachepast_key_valuesin the touched models✅ slow T5 tests are all green