-
Notifications
You must be signed in to change notification settings - Fork 31.6k
enable several cases on XPU #37516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable several cases on XPU #37516
Conversation
Signed-off-by: YAO Matrix <[email protected]>
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
|
cc @ydshieh! |
| output = quantized_model.generate(**input_ids, max_new_tokens=40) | ||
| self.assertEqual(self.tokenizer.decode(output[0], skip_special_tokens=True), self.EXPECTED_OUTPUT_BF16) | ||
|
|
||
| @require_torch_gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this really need a gpu? or it could work on CPU too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exllama backend specific case. exllama is an optimized kernel library for CUDA ecosystem only, I paste some description from its github README as below. As of now, Intel's strategy is implementing and exposing all optimized ops through ipex(we already integrated and upstreamed it to autoawq), to avoid out-of-bound maintain and develop efforts. So, for autoawq, users can use ipex backend to access all the optimized ops for intel cpu and xpu.
RTX 4090 and an RTX 3090-Ti. 30-series and later NVIDIA GPUs should be well supported, but anything Pascal or older with poor FP16 support isn't going to perform well. AutoGPTQ or GPTQ-for-LLaMa are better options at the moment for older GPUs. ROCm is also theoretically supported (via HIP) though I currently have no AMD devices to test or optimize on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @SunMarc and/or @MekkCyber to see WDYT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes if exllama don't support xpudevices, then the change will override the @require_torch_accelerator used for the class, with @require_torch_gpu. It might be useful to add a comment to explain why though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously there is no decorator applied, and this PR adds @require_torch_gpu. I think we are good, I will merge thank you !
Co-authored-by: Yih-Dar <[email protected]>
Signed-off-by: YAO Matrix <[email protected]>
ydshieh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I will try to wait a response from one of other 2 team members before I merge.
MekkCyber
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks @yao-matrix, left some questions
| @require_torch_multi_accelerator | ||
| def test_training_kernel(self): | ||
| model_id = "tiiuae/falcon-mamba-7b" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why falcon specifically ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by searching the name def test_training_kernel, there is only one test_training_kernel in the whole codebase 😃
| @require_torch_multi_accelerator | ||
| def test_quantized_model_multi_gpu(self): | ||
| """ | ||
| Simple test that checks if the quantized model is working properly with multiple GPUs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do the tests pass when using xpu ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's failing (mentioned in PR description), they said
We will follow 4 failure cases and submit fixing separate PRs.
So ok
* enable several cases on XPU Signed-off-by: YAO Matrix <[email protected]> * Update tests/test_modeling_common.py Co-authored-by: Yih-Dar <[email protected]> * fix style Signed-off-by: YAO Matrix <[email protected]> --------- Signed-off-by: YAO Matrix <[email protected]> Co-authored-by: Yih-Dar <[email protected]>
* enable several cases on XPU Signed-off-by: YAO Matrix <[email protected]> * Update tests/test_modeling_common.py Co-authored-by: Yih-Dar <[email protected]> * fix style Signed-off-by: YAO Matrix <[email protected]> --------- Signed-off-by: YAO Matrix <[email protected]> Co-authored-by: Yih-Dar <[email protected]>
w/ this PR:
fail cases:
tests/models/diffllama/test_modeling_diffllama.py::DiffLlamaModelTest::test_sdpa_can_dispatch_on_flash
tests/peft_integration/test_peft_integration.py::PeftIntegrationTester::test_peft_from_pretrained_kwargs
tests/quantization/autoawq/test_awq.py::AwqTest::test_quantized_model_bf16
tests/quantization/autoawq/test_awq.py::AwqTest::test_quantized_model_multi_gpu
We will follow 4 failure cases and submit fixing separate PRs.