Skip to content

Conversation

@gante
Copy link
Contributor

@gante gante commented Feb 24, 2025

What does this PR do?

DynamicCache was not compatible with torch.distributed before, and #36212 exposed the issue. See the discussion starting here for more details. The added code contains comments explaining what's going on.

Fixes Trainer + DP when use_cache=True, such as in prefix tuning training runs.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@gante gante marked this pull request as ready for review February 24, 2025 17:33
@gante
Copy link
Contributor Author

gante commented Feb 24, 2025

@S1ro1 it should be okay now 👌 torch.distributed is clever and builds the tensors themselves correctly, all we need is to place them in the right variables

@S1ro1
Copy link
Contributor

S1ro1 commented Feb 24, 2025

@gante Comments look good to me, maybe small nit would be writing down the shapes? i.e. (num_layers, kv, (...)), else looks good to me, test as well.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this, LGTM.

@gante
Copy link
Contributor Author

gante commented Feb 27, 2025

(merging to unblock PEFT)

@gante gante merged commit 8aed019 into huggingface:main Feb 27, 2025
23 checks passed
@gante gante deleted the fix_36212_post_merge branch February 28, 2025 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants