Add device workaround for int4 weight only quantization after API update #36980

jerryzh168 · 2025-03-25T20:54:17Z

Summary:
We add the new torchao API support in hf transformers: #36526 one thing that's missing is it does not account for int4 weight only quant config only works on cuda, this PR adds back the workaround

also updated the version requirement to > 0.9 temporarily so that we can use the torchao nightly before 0.10 is released, we should chagne this back before land

Test Plan:
local test: https://gist.github.com/jerryzh168/0e749d0dab40e2a62a7f2e48639f77b5 (we can setup deserialization test later when we can quantize a small model and host in a stable place like TinyLlama/TinyLlama-1.1B-Chat-v1.0)

Reviewers:

Subscribers:

Tasks:

Tags:

github-actions · 2025-03-25T20:54:27Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

Rocketknight1 · 2025-03-26T12:40:33Z

cc @SunMarc @MekkCyber

SunMarc

Thanks ! Just a nit

SunMarc · 2025-03-26T15:31:12Z

src/transformers/quantizers/quantizer_torchao.py

            module._parameters[tensor_name] = torch.nn.Parameter(param_value).to(device=target_device)
-            quantize_(module, self.quantization_config.get_apply_tensor_subclass(), set_inductor_config=False)
+            quantize_(module, self.quantization_config.get_apply_tensor_subclass())



why remove set_inductor_config ? This was added due to that : #36608

the flag is removed: https://github.com/pytorch/ao/blob/dfbd68160b820f8b5529854d2aa460af52f6eb97/torchao/quantization/quant_api.py#L489

we'll have 0.10.0 branch cut next week, we can always ask people to use the latest torchao right?

Yeah i feel like the next torchao release is quite breaking so we will force the user to use the latest torchao. When do you plan to remove the support for the string (for specifying the quant scheme) ?

we can remove the string support after 0.10 release and when we enforce the torchao version

SunMarc

Thanks ! Please fix the style with make style and we are good to merge

SunMarc · 2025-03-28T16:49:35Z

There is still a failing test on the CI check_repository_consistency - Failed

MekkCyber · 2025-03-28T17:13:34Z

src/transformers/modeling_utils.py

 from torch.utils.checkpoint import checkpoint
+from torchao.quantization import Int4WeightOnlyConfig



Some tests are failling because of this

SunMarc · 2025-04-01T15:09:40Z

we recently updated the ruff version so this is probably the code_quality test is not passing. Can you please update your version of ruff @jerryzh168 ?

…ate (huggingface#36980) * merge * fix import * format * reformat * reformat --------- Co-authored-by: Mohamed Mekkouri <[email protected]>

github-actions bot marked this pull request as draft March 25, 2025 20:54

jerryzh168 force-pushed the fix-torchao branch from db6e552 to e8a31f6 Compare March 25, 2025 20:56

SunMarc reviewed Mar 26, 2025

View reviewed changes

SunMarc marked this pull request as ready for review March 27, 2025 10:40

SunMarc approved these changes Mar 27, 2025

View reviewed changes

MekkCyber approved these changes Mar 28, 2025

View reviewed changes

jerryzh168 force-pushed the fix-torchao branch from 5b81d4c to 58a3670 Compare March 31, 2025 20:02

jerryzh168 added 5 commits April 1, 2025 10:22

merge

b44d1cc

fix import

d67744a

format

cff1111

reformat

628ad44

reformat

a89c4ab

jerryzh168 force-pushed the fix-torchao branch from b862804 to a89c4ab Compare April 1, 2025 17:22

Merge branch 'main' into fix-torchao

2c6bda4

SunMarc merged commit a165458 into huggingface:main Apr 2, 2025
18 checks passed

		from torch.utils.checkpoint import checkpoint
		from torchao.quantization import Int4WeightOnlyConfig

Add device workaround for int4 weight only quantization after API update #36980

Add device workaround for int4 weight only quantization after API update #36980

Uh oh!

Conversation

jerryzh168 commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 25, 2025

Uh oh!

Rocketknight1 commented Mar 26, 2025

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

SunMarc Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MekkCyber Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

SunMarc commented Apr 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jerryzh168 commented Mar 25, 2025 •

edited

Loading

SunMarc commented Mar 28, 2025 •

edited

Loading