-
Notifications
You must be signed in to change notification settings - Fork 31.7k
Add device workaround for int4 weight only quantization after API update #36980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
SunMarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! Just a nit
| module._parameters[tensor_name] = torch.nn.Parameter(param_value).to(device=target_device) | ||
| quantize_(module, self.quantization_config.get_apply_tensor_subclass(), set_inductor_config=False) | ||
| quantize_(module, self.quantization_config.get_apply_tensor_subclass()) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove set_inductor_config ? This was added due to that : #36608
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll have 0.10.0 branch cut next week, we can always ask people to use the latest torchao right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah i feel like the next torchao release is quite breaking so we will force the user to use the latest torchao. When do you plan to remove the support for the string (for specifying the quant scheme) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can remove the string support after 0.10 release and when we enforce the torchao version
SunMarc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! Please fix the style with make style and we are good to merge
|
There is still a failing test on the CI check_repository_consistency - Failed |
| from torch.utils.checkpoint import checkpoint | ||
| from torchao.quantization import Int4WeightOnlyConfig | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tests are failling because of this
|
we recently updated the ruff version so this is probably the code_quality test is not passing. Can you please update your version of ruff @jerryzh168 ? |
…ate (huggingface#36980) * merge * fix import * format * reformat * reformat --------- Co-authored-by: Mohamed Mekkouri <[email protected]>
…ate (huggingface#36980) * merge * fix import * format * reformat * reformat --------- Co-authored-by: Mohamed Mekkouri <[email protected]>
…ate (huggingface#36980) * merge * fix import * format * reformat * reformat --------- Co-authored-by: Mohamed Mekkouri <[email protected]>
…ate (huggingface#36980) * merge * fix import * format * reformat * reformat --------- Co-authored-by: Mohamed Mekkouri <[email protected]>
Summary:
We add the new torchao API support in hf transformers: #36526 one thing that's missing is it does not account for int4 weight only quant config only works on cuda, this PR adds back the workaround
also updated the version requirement to > 0.9 temporarily so that we can use the torchao nightly before 0.10 is released, we should chagne this back before land
Test Plan:
local test: https://gist.github.com/jerryzh168/0e749d0dab40e2a62a7f2e48639f77b5 (we can setup deserialization test later when we can quantize a small model and host in a stable place like TinyLlama/TinyLlama-1.1B-Chat-v1.0)
Reviewers:
Subscribers:
Tasks:
Tags: