-
Notifications
You must be signed in to change notification settings - Fork 31.7k
[Generation] remove leftover code from end-to-end compilation #36685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, much cleaner! Not sure if we have any overriden code left in other models, so left a comment below
| inputs_embeds = inputs_embeds[:, -cache_position.shape[0] :] | ||
| elif ( | ||
| inputs_embeds is not None # Exception 1 | ||
| or (is_torchdynamo_compiling() or cache_position[-1] >= input_ids.shape[1]) # Exception 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember this code overridden in some models. If that's still the case, we'll need to clean up there also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll do a scan and replace the pattern in other points!
|
Please also update the docs according to that: https://huggingface.co/docs/transformers/llm_optims?static-kv=3.+compile+entire+generate+function#static-kv-cache-and-torchcompile
|
|
@vfdev-5 thank you for noticing it and reporting it back to us 🙈 I'll update the docs. |

What does this PR do?
We dropped support for the experimental end-to-end compilation of
generatea while back, but there were a few traces of related code. This PR removes them.