add a `synchronize` call for xpu in `_gpu_gather` by faaany · Pull Request #3563 · huggingface/accelerate

faaany · 2025-05-13T09:04:42Z

What does this PR do?

There is a bug related to INT64 collectives in oneCCL on XPU. This bug will be fixed in PyTorch-2.9. However, this bug impacts many code examples when running in distributed mode, e.g., below are the predicted labels of nlp_examples.py:

 =========tensor([                   1,                    0,                    0,
                           1,                    0,                    1,
                           1,                    1,                    1,
                           1,                    1,                    0,
                           0,                    1,                    1,
                           0,                    1,                    0,
                           1,                    0,                    0,
                           1,                    0,                    1,
                           1,                    0,                    1,
                           1,                    1,                    1,
                           0,                    1, -4763119552675512320,
        -4766215777419132928, -4852910070239199232, -4828703222243852288,
        -4622663539326255104, -4603523240919826432, -4711609631944015872,
        -4868672668934144000, -4615063714957819904, -4820821922896740352,
        -4640114987875631104, -4620411739513421824, -4675017884978315264,
        -4857132194889465856, -4833769771824316416, -4612530440168210432,
        -4821666347826741248, -4823918147640229888, -4743134829331218432,
        -4807311124015677440, -4822229297780097024, -4838273371451359232,
        -4861072844563218432, -4824762572570230784, -4676988209814700032,
        -4864169069306904576, -4862761694423416832, -4796052124948037632,
        -4819977497966673920, -4719490931290537984, -4860791369586507776,
        -4709357832130723840], device='xpu:0')
=========tensor([ 4590785695020072807,  4613304161316028365, -4640889495023173838,
         4615837775396453961,  4613304056071733148, -4682266389616181702,
         4616400575022382831, -4652711660932972544,  4612178231343103911,
         4593600977363451611,  4609082204167389103,  4613304109775241221,
         4618089472120864287,  4580934693613453281,  4572489078510043055,
         4562073997567901665,                    1,                    0,
                           1,                    0,                    0,
                           1,                    0,                    1,
                           1,                    0,                    1,
                           1,                    1,                    1,
                           0,                    1,                    1,
                           1,                    1,                    1,
                           0,                    0,                    1,
                           1,                    0,                    1,
                           0,                    0,                    1,
                           1,                    1,                    0,
                           1,                    1,                    1,
                           1,                    1,                    1,
                           1,                    1,                    1,
                           1,                    1,                    1,
                           1,                    1,                    1,
                           1], device='xpu:1')

Since there are at least 5 months till the release of PyTorch-2.9, we would like to add a workaround for xpu for now. Once PyTorch-2.9 is released, we will remove them.

SunMarc

SGTM !

HuggingFaceDocBuilderDev · 2025-05-13T14:24:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

add xpu synchronize

01fd315

faaany changed the title ~~add a work-around in _gpu_gather for xpu~~ add a synchronize call for xpu in _gpu_gather May 13, 2025

SunMarc approved these changes May 13, 2025

View reviewed changes

SunMarc marked this pull request as ready for review May 13, 2025 14:20

yao-matrix approved these changes May 14, 2025

View reviewed changes

SunMarc merged commit 764eee4 into huggingface:main May 14, 2025
25 checks passed

yao-matrix mentioned this pull request Sep 3, 2025

xpu INT64 all_gather issue fixed in 2.9 #3756

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a `synchronize` call for xpu in `_gpu_gather`#3563

add a `synchronize` call for xpu in `_gpu_gather`#3563
SunMarc merged 1 commit intohuggingface:mainfrom
faaany:xpu-barrier

faaany commented May 13, 2025 •

edited

Loading

Uh oh!

SunMarc left a comment

Uh oh!

HuggingFaceDocBuilderDev commented May 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

faaany commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented May 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

faaany commented May 13, 2025 •

edited

Loading