[generate] `torch.distributed`-compatible `DynamicCache` #36373

gante · 2025-02-24T15:16:28Z

What does this PR do?

DynamicCache was not compatible with torch.distributed before, and #36212 exposed the issue. See the discussion starting here for more details. The added code contains comments explaining what's going on.

Fixes Trainer + DP when use_cache=True, such as in prefix tuning training runs.

HuggingFaceDocBuilderDev · 2025-02-24T16:04:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante · 2025-02-24T17:35:41Z

@S1ro1 it should be okay now 👌 torch.distributed is clever and builds the tensors themselves correctly, all we need is to place them in the right variables

S1ro1 · 2025-02-24T17:43:19Z

@gante Comments look good to me, maybe small nit would be writing down the shapes? i.e. (num_layers, kv, (...)), else looks good to me, test as well.

BenjaminBossan

Thanks for fixing this, LGTM.

gante · 2025-02-27T11:48:49Z

(merging to unblock PEFT)

gante added 2 commits February 24, 2025 15:15

test

d33d823

docstring

4ac1d7f

gante added 6 commits February 24, 2025 16:14

prepare distributed cache data

89b6a72

fix cat dim

511674a

test mvp

4290629

add test checks

11e8ff9

like this?

75c8a29

working test and solution

0da3262

gante marked this pull request as ready for review February 24, 2025 17:33

gante requested review from ArthurZucker and BenjaminBossan February 24, 2025 17:35

gante added 2 commits February 24, 2025 17:37

nit

6a8d554

nit

8180d99

add shape info

1dd1838

BenjaminBossan approved these changes Feb 25, 2025

View reviewed changes

shethaadit approved these changes Feb 25, 2025

View reviewed changes

shethaadit approved these changes Feb 26, 2025

View reviewed changes

gante merged commit 8aed019 into huggingface:main Feb 27, 2025
23 checks passed

gante deleted the fix_36212_post_merge branch February 28, 2025 10:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[generate] `torch.distributed`-compatible `DynamicCache` #36373

[generate] `torch.distributed`-compatible `DynamicCache` #36373

Uh oh!

gante commented Feb 24, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Feb 24, 2025

Uh oh!

gante commented Feb 24, 2025

Uh oh!

S1ro1 commented Feb 24, 2025 •

edited

Loading

Uh oh!

BenjaminBossan left a comment

Uh oh!

gante commented Feb 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[generate] torch.distributed-compatible DynamicCache #36373

[generate] torch.distributed-compatible DynamicCache #36373

Uh oh!

Conversation

gante commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Feb 24, 2025

Uh oh!

gante commented Feb 24, 2025

Uh oh!

S1ro1 commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

gante commented Feb 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[generate] `torch.distributed`-compatible `DynamicCache` #36373

[generate] `torch.distributed`-compatible `DynamicCache` #36373

gante commented Feb 24, 2025 •

edited

Loading

S1ro1 commented Feb 24, 2025 •

edited

Loading