enable several cases on XPU #37516

yao-matrix · 2025-04-15T07:35:13Z

w/ this PR:

autoawq cases: 9 pass, 1 skip(should skip, since xpu doesn't support exllama, all optimized op will go through ipex backend), 2 fail
peft_integration cases: 2 pass, 1 fail(for bnb issue)
test_sdpa_can_dispatch_on_flash: 19 pass, 1 fail

fail cases:
tests/models/diffllama/test_modeling_diffllama.py::DiffLlamaModelTest::test_sdpa_can_dispatch_on_flash
tests/peft_integration/test_peft_integration.py::PeftIntegrationTester::test_peft_from_pretrained_kwargs
tests/quantization/autoawq/test_awq.py::AwqTest::test_quantized_model_bf16
tests/quantization/autoawq/test_awq.py::AwqTest::test_quantized_model_multi_gpu

We will follow 4 failure cases and submit fixing separate PRs.

Signed-off-by: YAO Matrix <[email protected]>

github-actions · 2025-04-15T07:35:25Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

Rocketknight1 · 2025-04-15T10:48:59Z

cc @ydshieh!

tests/test_modeling_common.py

ydshieh · 2025-04-15T11:45:44Z

tests/quantization/autoawq/test_awq.py

        output = quantized_model.generate(**input_ids, max_new_tokens=40)
        self.assertEqual(self.tokenizer.decode(output[0], skip_special_tokens=True), self.EXPECTED_OUTPUT_BF16)

+    @require_torch_gpu


does this really need a gpu? or it could work on CPU too?

This is exllama backend specific case. exllama is an optimized kernel library for CUDA ecosystem only, I paste some description from its github README as below. As of now, Intel's strategy is implementing and exposing all optimized ops through ipex(we already integrated and upstreamed it to autoawq), to avoid out-of-bound maintain and develop efforts. So, for autoawq, users can use ipex backend to access all the optimized ops for intel cpu and xpu.

RTX 4090 and an RTX 3090-Ti. 30-series and later NVIDIA GPUs should be well supported, but anything Pascal or older with poor FP16 support isn't going to perform well. AutoGPTQ or GPTQ-for-LLaMa are better options at the moment for older GPUs. ROCm is also theoretically supported (via HIP) though I currently have no AMD devices to test or optimize on.

cc @SunMarc and/or @MekkCyber to see WDYT

yes if exllama don't support xpudevices, then the change will override the @require_torch_accelerator used for the class, with @require_torch_gpu. It might be useful to add a comment to explain why though

Previously there is no decorator applied, and this PR adds @require_torch_gpu. I think we are good, I will merge thank you !

Co-authored-by: Yih-Dar <[email protected]>

Signed-off-by: YAO Matrix <[email protected]>

ydshieh

Thank you. I will try to wait a response from one of other 2 team members before I merge.

MekkCyber

LGTM thanks @yao-matrix, left some questions

MekkCyber · 2025-04-16T07:56:59Z

tests/models/falcon_mamba/test_modeling_falcon_mamba.py

+    @require_torch_multi_accelerator
    def test_training_kernel(self):
        model_id = "tiiuae/falcon-mamba-7b"



why falcon specifically ?

by searching the name def test_training_kernel, there is only one test_training_kernel in the whole codebase 😃

MekkCyber · 2025-04-16T08:00:21Z

tests/quantization/autoawq/test_awq.py

+    @require_torch_multi_accelerator
    def test_quantized_model_multi_gpu(self):
        """
        Simple test that checks if the quantized model is working properly with multiple GPUs


do the tests pass when using xpu ?

it's failing (mentioned in PR description), they said

We will follow 4 failure cases and submit fixing separate PRs.

So ok

* enable several cases on XPU Signed-off-by: YAO Matrix <[email protected]> * Update tests/test_modeling_common.py Co-authored-by: Yih-Dar <[email protected]> * fix style Signed-off-by: YAO Matrix <[email protected]> --------- Signed-off-by: YAO Matrix <[email protected]> Co-authored-by: Yih-Dar <[email protected]>

enable several cases on XPU

a6ac369

Signed-off-by: YAO Matrix <[email protected]>

github-actions bot marked this pull request as draft April 15, 2025 07:35

ydshieh reviewed Apr 15, 2025

View reviewed changes

tests/test_modeling_common.py Outdated Show resolved Hide resolved

ydshieh reviewed Apr 15, 2025

View reviewed changes

yao-matrix and others added 3 commits April 16, 2025 07:00

Update tests/test_modeling_common.py

32373e6

Co-authored-by: Yih-Dar <[email protected]>

fix style

8539f6f

Signed-off-by: YAO Matrix <[email protected]>

Merge branch 'main' into xpu-ut

2ddfd59

yao-matrix marked this pull request as ready for review April 15, 2025 23:12

ydshieh approved these changes Apr 16, 2025

View reviewed changes

MekkCyber approved these changes Apr 16, 2025

View reviewed changes

Merge branch 'main' into xpu-ut

05893a4

ydshieh merged commit 33f6c5a into huggingface:main Apr 16, 2025
16 of 18 checks passed

yao-matrix deleted the xpu-ut branch April 16, 2025 22:39

enable several cases on XPU #37516

enable several cases on XPU #37516

Uh oh!

Conversation

yao-matrix commented Apr 15, 2025

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

Rocketknight1 commented Apr 15, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh left a comment

Choose a reason for hiding this comment

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants