Skip to content

Set ccl and KMP param in simple launch#3575

Merged
SunMarc merged 6 commits intohuggingface:mainfrom
jiqing-feng:tp
May 26, 2025
Merged

Set ccl and KMP param in simple launch#3575
SunMarc merged 6 commits intohuggingface:mainfrom
jiqing-feng:tp

Conversation

@jiqing-feng
Copy link
Contributor

@jiqing-feng jiqing-feng commented May 16, 2025

Don't know why it assigns CCL_WORKER_COUNT only when machine> 1 because 1 CPU machine can also run distributed training or Tensor Parallelism.

I also added KMP params to get better performance on CPU.

With this PR we can run transformers TP model and got 40% speed-up on Intel 4th Gen Xeon.
The accelerate config is :

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: MULTI_CPU
downcast_bf16: 'no'
enable_cpu_affinity: false
ipex_config:
  ipex: false
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
mpirun_config:
  mpirun_ccl: '1'
  mpirun_hostfile: /home/jiqingfe/hostfile
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: true

The script is as follows
accelerate launch script.py

import os
import torch.distributed as dist
from transformers import AutoTokenizer, AutoModelForCausalLM
import oneccl_bindings_for_pytorch

import time
import torch

print(f"Using {torch.get_num_threads()} threads (PyTorch)")
print(f"OMP_NUM_THREADS={os.getenv('OMP_NUM_THREADS')}")

model_id = "meta-llama/Llama-3.1-8B-Instruct"

os.environ['RANK'] = str(os.environ.get('PMI_RANK', 0))
os.environ['WORLD_SIZE'] = str(os.environ.get('PMI_SIZE', 1))

def main(is_tp, rank, world_size) -> None:
    print("is_tp, rank, world_size: ", is_tp, rank, world_size)
    model_kwargs = dict(torch_dtype=torch.bfloat16)
    if is_tp:
        model_kwargs["tp_plan"] = "auto"
    else:
        model_kwargs["device_map"] = "cpu"

    # Retrieve tensor parallel model
    model = AutoModelForCausalLM.from_pretrained(model_id, **model_kwargs)
    if dist.is_initialized():
        print("Backend:", dist.get_backend())
    else:
        print("Distributed process group is not initialized.")

    # Prepare input tokens
    tokenizer = AutoTokenizer.from_pretrained(model_id)
    prompt = "It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors"
    inputs = tokenizer(prompt, return_tensors="pt", max_length=512).to(model.device)
    print(f"inpu shape is {inputs.input_ids.shape}")

    model.generation_config.cache_implementation = "static"

    if is_tp:
        model.config.hidden_size = model.config.hidden_size // world_size
        model.config.num_key_value_heads = model.config.num_key_value_heads // world_size

    for i in range(1):
        with torch.no_grad():
            start = time.time()
            outputs = model.generate(**inputs, do_sample=False, max_new_tokens=128, min_new_tokens=128)
            end = time.time()
            print(f"time cost {(end-start)*1000} ms")

    # warm-up
    if is_tp:
        dist.barrier()

    if rank == 0:
        print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

    model.forward = torch.compile(model.forward)
    # warm-up
    if is_tp:
        dist.barrier()

    for i in range(4):
        with torch.no_grad():
            start = time.time()
            outputs = model.generate(**inputs, do_sample=False, max_new_tokens=128, min_new_tokens=128)
            if is_tp:
                dist.barrier()

            end = time.time()
            print(f"time cost {(end-start)*1000} ms")

    if rank == 0:
        print(tokenizer.batch_decode(outputs, skip_special_tokens=True))


if __name__ == "__main__":
    rank = int(os.environ["RANK"]) if "RANK" in os.environ else 0
    world_size = int(os.environ["WORLD_SIZE"]) if "WORLD_SIZE" in os.environ else 1
    is_tp = world_size > 1
    main(is_tp, rank, world_size)

@jiqing-feng
Copy link
Contributor Author

@sywangyi @yao-matrix . Please review this PR, thanks!

@jiqing-feng jiqing-feng marked this pull request as draft May 16, 2025 08:18
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@yao-matrix
Copy link
Contributor

yao-matrix commented May 19, 2025

  1. does ipex: true work?
  2. in the case we don't have accelerate config, what's the behavior?

@jiqing-feng
Copy link
Contributor Author

Hi @yao-matrix . I have verified that both ipex is True and no config can work as before.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng jiqing-feng marked this pull request as ready for review May 21, 2025 01:42
@jiqing-feng
Copy link
Contributor Author

Hi @SunMarc . Could you please review this PR? Thanks!

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left a couple of comments

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a small nit

@jiqing-feng
Copy link
Contributor Author

LGTM, just a small nit

Fixed!

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! LGTM !

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SunMarc SunMarc merged commit 4f3abb7 into huggingface:main May 26, 2025
24 of 25 checks passed
S1ro1 added a commit that referenced this pull request Jun 10, 2025
commit 2f8fd72
Author: Simon <80467011+sorgfresser@users.noreply.github.com>
Date:   Tue Jun 10 13:50:34 2025 +0100

    Remove device_count (#3587)

commit d2e6b03
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 10 05:26:48 2025 -0700

    [FSDP2] Refactor + FP8 (#3585)

    * Fix double wrap

    * Clocking off, ~equal to torch baseline

    * works?

    * Working version

    * Partial rewrite

    * FSDP2 path works

    * Fix back prepare

    * Almost done, proper AC left

    * Feat: should work, cleanup + test more benchmarks left

    * Style+quality

    * Feat: fp8 example

    * Feat: better example

    * Feat: add readme

    * Docs + should be done

    * Fix: typos

    * Fix: protect imports

    * Feat: address comments

    * Feat: add flops image

commit b9fee48
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 10 13:24:43 2025 +0100

    better handle FP8 with and without deepspeed (#3611)

    * use the state mixed precision which has undergone all preprocessing

    * Update src/accelerate/accelerator.py

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * Update src/accelerate/accelerator.py

    * accelerator state sets the mixed precision for deepspeed and fp8_enabled

    * fix

    * fix

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 3a82b05
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date:   Tue Jun 10 11:29:59 2025 +0200

    Fix bf16 training with TP  (#3610)

    * fix

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 6b61a37
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Fri Jun 6 13:48:43 2025 +0100

    fix deepspeed regional compilation (#3609)

commit 682691d
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 3 12:36:56 2025 +0200

    Update Gaudi Runners (#3593)

    * test

    * fix

    * push

    * in the morning

    * fix backend

    * run first

    * set habana modules

    * dynamo backend

    * trigger

    * remove on pr

    * remove on file change

commit 791055b
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 3 12:24:20 2025 +0200

    Fix: list object has no attribute keys (#3603)

commit 16bf1d8
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:36:34 2025 +0800

    enable torchao and pippy test cases on XPU (#3599)

    * enable torchao and pippy test cases on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ab3c604
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:23:26 2025 +0800

    enable big_model_inference on xpu (#3595)

    * enable big_model_inference on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix quality

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 273799c
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 20:08:59 2025 +0800

    enable fsdp2 benchmark on XPU (#3590)

    * enable fsdp2 benchmark on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * add deterministic

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 43526c5
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:44:50 2025 +0800

    add device-agnostic GradScaler (#3588)

    * add device-agnostic GradScaler

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix bug

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix review comments

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * format

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 07f2392
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:17:18 2025 +0800

    change to use torch.device (#3594)

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ee2f48c
Author: Fanli Lin <fanli.lin@intel.com>
Date:   Tue May 27 17:16:42 2025 +0800

    [docs] no hard-coded cuda in the ddp documentation (#3589)

    * make device-agnostic

    * refactor

commit 4f3abb7
Author: jiqing-feng <jiqing.feng@intel.com>
Date:   Mon May 26 21:55:10 2025 +0800

    Set ccl and KMP param in simple launch (#3575)

    * Even 1 CPU mechine can also run multi process

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl and kml param setting

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * set master addr only when processes > 1

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix num process check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl args check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    ---------

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

commit db536cb
Author: Yuanzhou Cai <80858000+yuanjua@users.noreply.github.com>
Date:   Mon May 26 21:08:13 2025 +0800

    Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup (#3581)

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Add test for bug #3550

    * Improve test for #3550

    * Remove redundant code

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * fix style

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 4e9d0de
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Mon May 26 21:05:42 2025 +0800

    enable regional_compilation benchmark on xpu (#3592)

    * enable regional_compilation benchmark on xpu

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 8cb3ace
Author: Luiz F. G. dos Santos <luiz.fernando0992@gmail.com>
Date:   Thu May 22 10:21:54 2025 -0500

    Add kwargs to optimizer, scheduler and dataloader using function `accelerator().load_state()` (#3540)

    * Added artifacts and figure tracking at MLFlow tracker

    * Added `log_artifact` to the MLFlowTracker

    * Remove changes

    * Added kwargs when loading state.

    * added doc string

    * Adjusted correct default types of kwargs

    * Changed the load kwargs to a single one

    * removed None value from kwargs

    * fix kwargs for loading the model

    * removed load_kwargs from optimizer state dict

    * make load_kwargs a dictionary

    * revert last changes

    * reverted load_kwargs

    * fix docstring

    * added dict initiation

    * Fix quality error during PR

commit b6d97cb
Author: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Date:   Thu May 22 17:26:31 2025 +0300

    Resolve logger warnings (#3582)

    Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

commit 33967d4
Author: Francesco Laiti <25352428+laitifranz@users.noreply.github.com>
Date:   Tue May 20 12:29:53 2025 +0200

    Add support for standalone mode when default port is occupied on single node (#3576)

    * add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection

    * address review feedback: warn on port conflict only for single-node; raise error for multi-node

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 5b1fcda
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:04:24 2025 +0800

    enable test_cli & test_example cases on XPU (#3578)

    * enable test_cli & test_example cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * remove print

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix ci issue

    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>
    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

commit f55f053
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:02:14 2025 +0800

    goodbye torch_ccl (#3580)

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

commit 1ec99f0
Author: Yao Matrix <yaoweifeng0301@126.com>
Date:   Mon May 19 17:27:40 2025 +0800

    enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU (#3579)

    * enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * Update test_load_checkpoint_and_dispatch_with_broadcast.py

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>
S1ro1 added a commit that referenced this pull request Jun 10, 2025
commit 2f8fd72
Author: Simon <80467011+sorgfresser@users.noreply.github.com>
Date:   Tue Jun 10 13:50:34 2025 +0100

    Remove device_count (#3587)

commit d2e6b03
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 10 05:26:48 2025 -0700

    [FSDP2] Refactor + FP8 (#3585)

    * Fix double wrap

    * Clocking off, ~equal to torch baseline

    * works?

    * Working version

    * Partial rewrite

    * FSDP2 path works

    * Fix back prepare

    * Almost done, proper AC left

    * Feat: should work, cleanup + test more benchmarks left

    * Style+quality

    * Feat: fp8 example

    * Feat: better example

    * Feat: add readme

    * Docs + should be done

    * Fix: typos

    * Fix: protect imports

    * Feat: address comments

    * Feat: add flops image

commit b9fee48
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 10 13:24:43 2025 +0100

    better handle FP8 with and without deepspeed (#3611)

    * use the state mixed precision which has undergone all preprocessing

    * Update src/accelerate/accelerator.py

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * Update src/accelerate/accelerator.py

    * accelerator state sets the mixed precision for deepspeed and fp8_enabled

    * fix

    * fix

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 3a82b05
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date:   Tue Jun 10 11:29:59 2025 +0200

    Fix bf16 training with TP  (#3610)

    * fix

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 6b61a37
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Fri Jun 6 13:48:43 2025 +0100

    fix deepspeed regional compilation (#3609)

commit 682691d
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 3 12:36:56 2025 +0200

    Update Gaudi Runners (#3593)

    * test

    * fix

    * push

    * in the morning

    * fix backend

    * run first

    * set habana modules

    * dynamo backend

    * trigger

    * remove on pr

    * remove on file change

commit 791055b
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 3 12:24:20 2025 +0200

    Fix: list object has no attribute keys (#3603)

commit 16bf1d8
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:36:34 2025 +0800

    enable torchao and pippy test cases on XPU (#3599)

    * enable torchao and pippy test cases on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ab3c604
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:23:26 2025 +0800

    enable big_model_inference on xpu (#3595)

    * enable big_model_inference on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix quality

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 273799c
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 20:08:59 2025 +0800

    enable fsdp2 benchmark on XPU (#3590)

    * enable fsdp2 benchmark on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * add deterministic

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 43526c5
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:44:50 2025 +0800

    add device-agnostic GradScaler (#3588)

    * add device-agnostic GradScaler

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix bug

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix review comments

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * format

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 07f2392
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:17:18 2025 +0800

    change to use torch.device (#3594)

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ee2f48c
Author: Fanli Lin <fanli.lin@intel.com>
Date:   Tue May 27 17:16:42 2025 +0800

    [docs] no hard-coded cuda in the ddp documentation (#3589)

    * make device-agnostic

    * refactor

commit 4f3abb7
Author: jiqing-feng <jiqing.feng@intel.com>
Date:   Mon May 26 21:55:10 2025 +0800

    Set ccl and KMP param in simple launch (#3575)

    * Even 1 CPU mechine can also run multi process

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl and kml param setting

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * set master addr only when processes > 1

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix num process check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl args check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    ---------

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

commit db536cb
Author: Yuanzhou Cai <80858000+yuanjua@users.noreply.github.com>
Date:   Mon May 26 21:08:13 2025 +0800

    Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup (#3581)

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Add test for bug #3550

    * Improve test for #3550

    * Remove redundant code

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * fix style

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 4e9d0de
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Mon May 26 21:05:42 2025 +0800

    enable regional_compilation benchmark on xpu (#3592)

    * enable regional_compilation benchmark on xpu

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 8cb3ace
Author: Luiz F. G. dos Santos <luiz.fernando0992@gmail.com>
Date:   Thu May 22 10:21:54 2025 -0500

    Add kwargs to optimizer, scheduler and dataloader using function `accelerator().load_state()` (#3540)

    * Added artifacts and figure tracking at MLFlow tracker

    * Added `log_artifact` to the MLFlowTracker

    * Remove changes

    * Added kwargs when loading state.

    * added doc string

    * Adjusted correct default types of kwargs

    * Changed the load kwargs to a single one

    * removed None value from kwargs

    * fix kwargs for loading the model

    * removed load_kwargs from optimizer state dict

    * make load_kwargs a dictionary

    * revert last changes

    * reverted load_kwargs

    * fix docstring

    * added dict initiation

    * Fix quality error during PR

commit b6d97cb
Author: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Date:   Thu May 22 17:26:31 2025 +0300

    Resolve logger warnings (#3582)

    Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

commit 33967d4
Author: Francesco Laiti <25352428+laitifranz@users.noreply.github.com>
Date:   Tue May 20 12:29:53 2025 +0200

    Add support for standalone mode when default port is occupied on single node (#3576)

    * add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection

    * address review feedback: warn on port conflict only for single-node; raise error for multi-node

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 5b1fcda
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:04:24 2025 +0800

    enable test_cli & test_example cases on XPU (#3578)

    * enable test_cli & test_example cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * remove print

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix ci issue

    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>
    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

commit f55f053
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:02:14 2025 +0800

    goodbye torch_ccl (#3580)

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

commit 1ec99f0
Author: Yao Matrix <yaoweifeng0301@126.com>
Date:   Mon May 19 17:27:40 2025 +0800

    enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU (#3579)

    * enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * Update test_load_checkpoint_and_dispatch_with_broadcast.py

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>
S1ro1 added a commit that referenced this pull request Jul 9, 2025
commit 2f8fd72
Author: Simon <80467011+sorgfresser@users.noreply.github.com>
Date:   Tue Jun 10 13:50:34 2025 +0100

    Remove device_count (#3587)

commit d2e6b03
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 10 05:26:48 2025 -0700

    [FSDP2] Refactor + FP8 (#3585)

    * Fix double wrap

    * Clocking off, ~equal to torch baseline

    * works?

    * Working version

    * Partial rewrite

    * FSDP2 path works

    * Fix back prepare

    * Almost done, proper AC left

    * Feat: should work, cleanup + test more benchmarks left

    * Style+quality

    * Feat: fp8 example

    * Feat: better example

    * Feat: add readme

    * Docs + should be done

    * Fix: typos

    * Fix: protect imports

    * Feat: address comments

    * Feat: add flops image

commit b9fee48
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 10 13:24:43 2025 +0100

    better handle FP8 with and without deepspeed (#3611)

    * use the state mixed precision which has undergone all preprocessing

    * Update src/accelerate/accelerator.py

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * Update src/accelerate/accelerator.py

    * accelerator state sets the mixed precision for deepspeed and fp8_enabled

    * fix

    * fix

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 3a82b05
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date:   Tue Jun 10 11:29:59 2025 +0200

    Fix bf16 training with TP  (#3610)

    * fix

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 6b61a37
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Fri Jun 6 13:48:43 2025 +0100

    fix deepspeed regional compilation (#3609)

commit 682691d
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 3 12:36:56 2025 +0200

    Update Gaudi Runners (#3593)

    * test

    * fix

    * push

    * in the morning

    * fix backend

    * run first

    * set habana modules

    * dynamo backend

    * trigger

    * remove on pr

    * remove on file change

commit 791055b
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 3 12:24:20 2025 +0200

    Fix: list object has no attribute keys (#3603)

commit 16bf1d8
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:36:34 2025 +0800

    enable torchao and pippy test cases on XPU (#3599)

    * enable torchao and pippy test cases on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ab3c604
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:23:26 2025 +0800

    enable big_model_inference on xpu (#3595)

    * enable big_model_inference on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix quality

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 273799c
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 20:08:59 2025 +0800

    enable fsdp2 benchmark on XPU (#3590)

    * enable fsdp2 benchmark on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * add deterministic

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 43526c5
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:44:50 2025 +0800

    add device-agnostic GradScaler (#3588)

    * add device-agnostic GradScaler

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix bug

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix review comments

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * format

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 07f2392
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:17:18 2025 +0800

    change to use torch.device (#3594)

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ee2f48c
Author: Fanli Lin <fanli.lin@intel.com>
Date:   Tue May 27 17:16:42 2025 +0800

    [docs] no hard-coded cuda in the ddp documentation (#3589)

    * make device-agnostic

    * refactor

commit 4f3abb7
Author: jiqing-feng <jiqing.feng@intel.com>
Date:   Mon May 26 21:55:10 2025 +0800

    Set ccl and KMP param in simple launch (#3575)

    * Even 1 CPU mechine can also run multi process

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl and kml param setting

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * set master addr only when processes > 1

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix num process check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl args check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    ---------

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

commit db536cb
Author: Yuanzhou Cai <80858000+yuanjua@users.noreply.github.com>
Date:   Mon May 26 21:08:13 2025 +0800

    Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup (#3581)

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Add test for bug #3550

    * Improve test for #3550

    * Remove redundant code

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * fix style

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 4e9d0de
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Mon May 26 21:05:42 2025 +0800

    enable regional_compilation benchmark on xpu (#3592)

    * enable regional_compilation benchmark on xpu

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 8cb3ace
Author: Luiz F. G. dos Santos <luiz.fernando0992@gmail.com>
Date:   Thu May 22 10:21:54 2025 -0500

    Add kwargs to optimizer, scheduler and dataloader using function `accelerator().load_state()` (#3540)

    * Added artifacts and figure tracking at MLFlow tracker

    * Added `log_artifact` to the MLFlowTracker

    * Remove changes

    * Added kwargs when loading state.

    * added doc string

    * Adjusted correct default types of kwargs

    * Changed the load kwargs to a single one

    * removed None value from kwargs

    * fix kwargs for loading the model

    * removed load_kwargs from optimizer state dict

    * make load_kwargs a dictionary

    * revert last changes

    * reverted load_kwargs

    * fix docstring

    * added dict initiation

    * Fix quality error during PR

commit b6d97cb
Author: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Date:   Thu May 22 17:26:31 2025 +0300

    Resolve logger warnings (#3582)

    Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

commit 33967d4
Author: Francesco Laiti <25352428+laitifranz@users.noreply.github.com>
Date:   Tue May 20 12:29:53 2025 +0200

    Add support for standalone mode when default port is occupied on single node (#3576)

    * add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection

    * address review feedback: warn on port conflict only for single-node; raise error for multi-node

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 5b1fcda
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:04:24 2025 +0800

    enable test_cli & test_example cases on XPU (#3578)

    * enable test_cli & test_example cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * remove print

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix ci issue

    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>
    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

commit f55f053
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:02:14 2025 +0800

    goodbye torch_ccl (#3580)

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

commit 1ec99f0
Author: Yao Matrix <yaoweifeng0301@126.com>
Date:   Mon May 19 17:27:40 2025 +0800

    enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU (#3579)

    * enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * Update test_load_checkpoint_and_dispatch_with_broadcast.py

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>
S1ro1 added a commit that referenced this pull request Jul 9, 2025
commit 2f8fd72
Author: Simon <80467011+sorgfresser@users.noreply.github.com>
Date:   Tue Jun 10 13:50:34 2025 +0100

    Remove device_count (#3587)

commit d2e6b03
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 10 05:26:48 2025 -0700

    [FSDP2] Refactor + FP8 (#3585)

    * Fix double wrap

    * Clocking off, ~equal to torch baseline

    * works?

    * Working version

    * Partial rewrite

    * FSDP2 path works

    * Fix back prepare

    * Almost done, proper AC left

    * Feat: should work, cleanup + test more benchmarks left

    * Style+quality

    * Feat: fp8 example

    * Feat: better example

    * Feat: add readme

    * Docs + should be done

    * Fix: typos

    * Fix: protect imports

    * Feat: address comments

    * Feat: add flops image

commit b9fee48
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 10 13:24:43 2025 +0100

    better handle FP8 with and without deepspeed (#3611)

    * use the state mixed precision which has undergone all preprocessing

    * Update src/accelerate/accelerator.py

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * Update src/accelerate/accelerator.py

    * accelerator state sets the mixed precision for deepspeed and fp8_enabled

    * fix

    * fix

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 3a82b05
Author: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Date:   Tue Jun 10 11:29:59 2025 +0200

    Fix bf16 training with TP  (#3610)

    * fix

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 6b61a37
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Fri Jun 6 13:48:43 2025 +0100

    fix deepspeed regional compilation (#3609)

commit 682691d
Author: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
Date:   Tue Jun 3 12:36:56 2025 +0200

    Update Gaudi Runners (#3593)

    * test

    * fix

    * push

    * in the morning

    * fix backend

    * run first

    * set habana modules

    * dynamo backend

    * trigger

    * remove on pr

    * remove on file change

commit 791055b
Author: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Date:   Tue Jun 3 12:24:20 2025 +0200

    Fix: list object has no attribute keys (#3603)

commit 16bf1d8
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:36:34 2025 +0800

    enable torchao and pippy test cases on XPU (#3599)

    * enable torchao and pippy test cases on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ab3c604
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Fri May 30 23:23:26 2025 +0800

    enable big_model_inference on xpu (#3595)

    * enable big_model_inference on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix quality

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 273799c
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 20:08:59 2025 +0800

    enable fsdp2 benchmark on XPU (#3590)

    * enable fsdp2 benchmark on XPU

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * add deterministic

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit 43526c5
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:44:50 2025 +0800

    add device-agnostic GradScaler (#3588)

    * add device-agnostic GradScaler

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix bug

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix review comments

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * fix

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * format

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 07f2392
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 27 17:17:18 2025 +0800

    change to use torch.device (#3594)

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

commit ee2f48c
Author: Fanli Lin <fanli.lin@intel.com>
Date:   Tue May 27 17:16:42 2025 +0800

    [docs] no hard-coded cuda in the ddp documentation (#3589)

    * make device-agnostic

    * refactor

commit 4f3abb7
Author: jiqing-feng <jiqing.feng@intel.com>
Date:   Mon May 26 21:55:10 2025 +0800

    Set ccl and KMP param in simple launch (#3575)

    * Even 1 CPU mechine can also run multi process

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl and kml param setting

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * set master addr only when processes > 1

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix num process check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    * fix ccl args check

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

    ---------

    Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

commit db536cb
Author: Yuanzhou Cai <80858000+yuanjua@users.noreply.github.com>
Date:   Mon May 26 21:08:13 2025 +0800

    Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup (#3581)

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Fix tracker initialize distributed before InitProcessGroupKwargs

    * Add test for bug #3550

    * Improve test for #3550

    * Remove redundant code

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

    * fix style

    ---------

    Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

commit 4e9d0de
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Mon May 26 21:05:42 2025 +0800

    enable regional_compilation benchmark on xpu (#3592)

    * enable regional_compilation benchmark on xpu

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>

    * Apply style fixes

    ---------

    Signed-off-by: Matrix YAO <matrix.yao@intel.com>
    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 8cb3ace
Author: Luiz F. G. dos Santos <luiz.fernando0992@gmail.com>
Date:   Thu May 22 10:21:54 2025 -0500

    Add kwargs to optimizer, scheduler and dataloader using function `accelerator().load_state()` (#3540)

    * Added artifacts and figure tracking at MLFlow tracker

    * Added `log_artifact` to the MLFlowTracker

    * Remove changes

    * Added kwargs when loading state.

    * added doc string

    * Adjusted correct default types of kwargs

    * Changed the load kwargs to a single one

    * removed None value from kwargs

    * fix kwargs for loading the model

    * removed load_kwargs from optimizer state dict

    * make load_kwargs a dictionary

    * revert last changes

    * reverted load_kwargs

    * fix docstring

    * added dict initiation

    * Fix quality error during PR

commit b6d97cb
Author: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Date:   Thu May 22 17:26:31 2025 +0300

    Resolve logger warnings (#3582)

    Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

commit 33967d4
Author: Francesco Laiti <25352428+laitifranz@users.noreply.github.com>
Date:   Tue May 20 12:29:53 2025 +0200

    Add support for standalone mode when default port is occupied on single node (#3576)

    * add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection

    * address review feedback: warn on port conflict only for single-node; raise error for multi-node

    * Apply style fixes

    ---------

    Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

commit 5b1fcda
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:04:24 2025 +0800

    enable test_cli & test_example cases on XPU (#3578)

    * enable test_cli & test_example cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * remove print

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix ci issue

    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>
    Signed-off-by: YAO Matrix <matrix.yao@intel.com>

commit f55f053
Author: Yao Matrix <matrix.yao@intel.com>
Date:   Tue May 20 18:02:14 2025 +0800

    goodbye torch_ccl (#3580)

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

commit 1ec99f0
Author: Yao Matrix <yaoweifeng0301@126.com>
Date:   Mon May 19 17:27:40 2025 +0800

    enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU (#3579)

    * enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * fix style

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>

    * Update test_load_checkpoint_and_dispatch_with_broadcast.py

    ---------

    Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants