Skip to content

Conversation

@jiqing-feng
Copy link
Contributor

Torchao quantization is ready for CPU

@jiqing-feng
Copy link
Contributor Author

The error is TypeError: Object of type Int4CPULayout is not JSON serializable when I want to save the int4 quantized model. It seems that the torchao api is not friendly. We'd like to figure out how to make it works.

@Rocketknight1
Copy link
Member

cc @SunMarc @MekkCyber

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left a few comments

@SunMarc
Copy link
Member

SunMarc commented Feb 12, 2025

Also do you know if it is possible to serialize on device (cpu) and load/infer the model on cuda with int4wo ?

Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
@jiqing-feng
Copy link
Contributor Author

jiqing-feng commented Feb 13, 2025

Also do you know if it is possible to serialize on device (cpu) and load/infer the model on cuda with int4wo ?

I am afraid not, because cuda and CPU have different data layout on int4_weight_only. But it works on int8, I will enable this test on int8.

@jiqing-feng
Copy link
Contributor Author

jiqing-feng commented Feb 13, 2025

Fixed all test issues, only left #36147 . Do you have any idea? @SunMarc

@jiqing-feng
Copy link
Contributor Author

jiqing-feng commented Feb 13, 2025

New update, I fixed the save error from json by this commit

All tests passed now : pytest tests/quantization/torchao_integration/test_torchao.py

Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
@SunMarc
Copy link
Member

SunMarc commented Feb 13, 2025

It would be nice if we could repack the tensors depending on the hardware we are loading them cc @jerryzh168

@jiqing-feng
Copy link
Contributor Author

It would be nice if we could repack the tensors depending on the hardware we are loading them cc @jerryzh168

Hi @SunMarc . Yes, it could be our next plan. We need to sync it with torchao developers. Before that, do I need any changes to merge this PR?

@MekkCyber
Copy link
Contributor

Thanks for the PR @jiqing-feng, I left some nits

@jiqing-feng
Copy link
Contributor Author

Hi @MekkCyber . Thanks for your review. I have fixed these comments.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@jiqing-feng
Copy link
Contributor Author

I will rebase this PR after #36206 is merged.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. Just a few nits and we are good to merge

@jerryzh168
Copy link
Contributor

It would be nice if we could repack the tensors depending on the hardware we are loading them cc @jerryzh168

we could build some functionality for to probably, that when we change the device we also change the layout, but that would need to be an inplace change, this may not be easily doable. we can also have a util to change between both device and layout I think, which would require some custom code in huggingface side to call it explicitly

@SunMarc
Copy link
Member

SunMarc commented Feb 20, 2025

I'm fine with the modification you did but feel like this could make everything easier to read if we just require torchao 0.8 for the tests : #36146 (comment)

@SunMarc
Copy link
Member

SunMarc commented Feb 20, 2025

we can also have a util to change between both device and layout I think, which would require some custom code in huggingface side to call it explicitly

Having this could be useful for a temporary solution

@MekkCyber
Copy link
Contributor

@jerryzh168, just FMI, what is the technical reason why the layout is different between CPU and GPU ? thanks !

@jerryzh168
Copy link
Contributor

@jerryzh168, just FMI, what is the technical reason why the layout is different between CPU and GPU ? thanks !

weights are packed in different format for performance optimizations, different hardwares (CPU v.s. GPU tensor core) prefers different formats for the kernel to work efficiently I think

@jiqing-feng
Copy link
Contributor Author

Hi @SunMarc . I have updated the PR by your comments, but the failed CI is strange. Is it because no torchao available in the tests?

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! Just a nit

@SunMarc
Copy link
Member

SunMarc commented Feb 21, 2025

Failing tests are not related. I reran the tests

@jiqing-feng
Copy link
Contributor Author

Failing tests are not related. I reran the tests

Hi @SunMarc . The failed test still exists: 2 errored out because of NameError -> name 'Int4CPULayout' is not defined

@SunMarc
Copy link
Member

SunMarc commented Feb 24, 2025

Hi @SunMarc . The failed test still exists: 2 errored out because of NameError -> name 'Int4CPULayout' is not defined

You can put the code that trigger this error in a setup method (will be trigger before each test). That should fix the issue.

def setUp(self):
...

@jiqing-feng
Copy link
Contributor Author

jiqing-feng commented Feb 25, 2025

You can put the code that trigger this error in a setup method (will be trigger before each test). That should fix the issue.

def setUp(self):
...

Use setUp will change too much code and make the tests complicated. Just check the quant_scheme_kwargs in the init state will be fine. We can reformat it when torchao is fully ready.

@jiqing-feng
Copy link
Contributor Author

Hi @SunMarc . I used a more straightforward method to fix the tests and also avoid changing too much code. Please review the new changes and let me know if any change is needed before merging. Thanks!

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

@SunMarc SunMarc merged commit 9d6abf9 into huggingface:main Feb 25, 2025
21 checks passed
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants