-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Add __iter__ to DynamicCache #41569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add __iter__ to DynamicCache #41569
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
SunMarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember fixing it (for similar reasons as in this PR), but I don't remember the following: can we instead disable cache at training time? I know there are some less-common fine-tuning strategies that use caches (prefix tuning in PEFT), but it may be wiser to simply disallow cache+DDP At inference time, if we want scale, we won't be using these classes anyways (we want continuous batching) |
Indeed, I think for v5 it is reasonable to do this change and most users won't be impacted by this change ! cc @BenjaminBossan |
What exactly would that entail? Would it mean that any PEFT method that uses |
Most likely, I will add an arg in trainer to allow users to change |
|
lmk what you think of that : #41585 |
|
Yes, I strongly agree with @gante here - I don't really see the point of supporting dp/ddp on the cache, as it's inference-only anyway. If we can avoid having those somewhat awkward |
|
Oh, I just noticed that this PR breaks this line in PEFT: There, we initialize a |
|
Hey @BenjaminBossan , I am adding support for |
* Add __iter__ to DynamicCache * Fix tests that use ddp init
Currently, DDP is broken when there is a
DynamicCachebecause it has no__iter___method and so it cannot be concatenated after the distributed forward. This PR adds back and__iter__and adapts the way ddp data is consumed to properly initialize sliding windows.