Skip to content

Conversation

@yao-matrix
Copy link
Contributor

w/ this PR:

  1. autoawq cases: 9 pass, 1 skip(should skip, since xpu doesn't support exllama, all optimized op will go through ipex backend), 2 fail
  2. peft_integration cases: 2 pass, 1 fail(for bnb issue)
  3. test_sdpa_can_dispatch_on_flash: 19 pass, 1 fail

fail cases:
tests/models/diffllama/test_modeling_diffllama.py::DiffLlamaModelTest::test_sdpa_can_dispatch_on_flash
tests/peft_integration/test_peft_integration.py::PeftIntegrationTester::test_peft_from_pretrained_kwargs
tests/quantization/autoawq/test_awq.py::AwqTest::test_quantized_model_bf16
tests/quantization/autoawq/test_awq.py::AwqTest::test_quantized_model_multi_gpu

We will follow 4 failure cases and submit fixing separate PRs.

@github-actions
Copy link
Contributor

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

@github-actions github-actions bot marked this pull request as draft April 15, 2025 07:35
@Rocketknight1
Copy link
Member

cc @ydshieh!

output = quantized_model.generate(**input_ids, max_new_tokens=40)
self.assertEqual(self.tokenizer.decode(output[0], skip_special_tokens=True), self.EXPECTED_OUTPUT_BF16)

@require_torch_gpu
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this really need a gpu? or it could work on CPU too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exllama backend specific case. exllama is an optimized kernel library for CUDA ecosystem only, I paste some description from its github README as below. As of now, Intel's strategy is implementing and exposing all optimized ops through ipex(we already integrated and upstreamed it to autoawq), to avoid out-of-bound maintain and develop efforts. So, for autoawq, users can use ipex backend to access all the optimized ops for intel cpu and xpu.

RTX 4090 and an RTX 3090-Ti. 30-series and later NVIDIA GPUs should be well supported, but anything Pascal or older with poor FP16 support isn't going to perform well. AutoGPTQ or GPTQ-for-LLaMa are better options at the moment for older GPUs. ROCm is also theoretically supported (via HIP) though I currently have no AMD devices to test or optimize on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @SunMarc and/or @MekkCyber to see WDYT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes if exllama don't support xpudevices, then the change will override the @require_torch_accelerator used for the class, with @require_torch_gpu. It might be useful to add a comment to explain why though

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously there is no decorator applied, and this PR adds @require_torch_gpu. I think we are good, I will merge thank you !

@yao-matrix yao-matrix marked this pull request as ready for review April 15, 2025 23:12
Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I will try to wait a response from one of other 2 team members before I merge.

Copy link
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @yao-matrix, left some questions

Comment on lines +521 to 524
@require_torch_multi_accelerator
def test_training_kernel(self):
model_id = "tiiuae/falcon-mamba-7b"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why falcon specifically ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by searching the name def test_training_kernel, there is only one test_training_kernel in the whole codebase 😃

Comment on lines +245 to 248
@require_torch_multi_accelerator
def test_quantized_model_multi_gpu(self):
"""
Simple test that checks if the quantized model is working properly with multiple GPUs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the tests pass when using xpu ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's failing (mentioned in PR description), they said

We will follow 4 failure cases and submit fixing separate PRs.

So ok

@ydshieh ydshieh merged commit 33f6c5a into huggingface:main Apr 16, 2025
16 of 18 checks passed
@yao-matrix yao-matrix deleted the xpu-ut branch April 16, 2025 22:39
cyr0930 pushed a commit to cyr0930/transformers that referenced this pull request Apr 18, 2025
* enable several cases on XPU

Signed-off-by: YAO Matrix <[email protected]>

* Update tests/test_modeling_common.py

Co-authored-by: Yih-Dar <[email protected]>

* fix style

Signed-off-by: YAO Matrix <[email protected]>

---------

Signed-off-by: YAO Matrix <[email protected]>
Co-authored-by: Yih-Dar <[email protected]>
zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request May 14, 2025
* enable several cases on XPU

Signed-off-by: YAO Matrix <[email protected]>

* Update tests/test_modeling_common.py

Co-authored-by: Yih-Dar <[email protected]>

* fix style

Signed-off-by: YAO Matrix <[email protected]>

---------

Signed-off-by: YAO Matrix <[email protected]>
Co-authored-by: Yih-Dar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants