Skip to content

Switch to new ao quant api for 8da4w #8501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 25, 2025
Merged

Switch to new ao quant api for 8da4w #8501

merged 3 commits into from
Feb 25, 2025

Conversation

jackzhxng
Copy link
Contributor

@jackzhxng jackzhxng commented Feb 14, 2025

Summary

Closes #8422

Test plan

[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

Copy link

pytorch-bot bot commented Feb 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8501

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 285d20b with merge base 6cb5c1a (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 14, 2025
Copy link
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, please make sure the lowering still works as well

@mcr229
Copy link
Contributor

mcr229 commented Feb 18, 2025

so i believe the qo quant api introduces quant_affine nodes, which are decomposed. We got around this by using to_edge_transform_and_lower path to preserve them from decomposition, but I think that would also require us to modify the llama_export path to use to_edge_transform_and_lower as well.

@jackzhxng
Copy link
Contributor Author

@mcr229 taking a stab at that here #8624

@jackzhxng jackzhxng added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label Feb 24, 2025
@jackzhxng jackzhxng requested a review from lucylq as a code owner February 25, 2025 16:40
@jackzhxng
Copy link
Contributor Author

^Above has been closed and xnnpack is now using to_edge_transform_and_lower, giving CI another crack at this

@mcr229
Copy link
Contributor

mcr229 commented Feb 25, 2025

looks like this is running into issues with loading the quantized checkpoints because aten._copy is unimplemented? seems sus, let me know if you need help from me for any part of enabling this.

@jackzhxng
Copy link
Contributor Author

Yeah this is weird, our one test for quantized embeddings is erroring, presumably because the embedding quantization happens after the linear quantization which is now using the API

@jerryzh168
Copy link
Contributor

jerryzh168 commented Feb 25, 2025

please check out https://pytorch.org/ao/stable/serialization.html to see how serialization works in torchao, copy_ a torchao quantized tensor to a normal tensor is indeed not supported, we typically use load_state_dict(...., assign=True)

also embedding quantization is not yet fully supported in torchao I think, but we may add this in H1

@jackzhxng
Copy link
Contributor Author

@jerryzh168 this error is being thrown during the load_state_dict here we are already doing which loads the quantized embedding weights - https://github.com/pytorch/executorch/blob/main/examples/models/llama/source_transformation/quantize.py#L666, any idea where this copy_ op is coming from?

@jackzhxng
Copy link
Contributor Author

jackzhxng commented Feb 25, 2025

Update - should be resolved. This error was happening since the quantize embedding code was loading the model's state dict which contains ao-quantized tensors from the linear transform before which now uses quantize_. Need the assign=True to do the deserialization process properly. Thanks @andrewor14 @jerryzh168 @mcr229

@jackzhxng jackzhxng merged commit f3fc096 into main Feb 25, 2025
119 checks passed
@jackzhxng jackzhxng deleted the jz/new-quantize-api branch February 25, 2025 21:41
@jackzhxng
Copy link
Contributor Author

Actually I should have imported and run internal tests before merging, if diff train gets blocked I'll forward fix

swolchok added a commit that referenced this pull request Feb 26, 2025
@swolchok swolchok mentioned this pull request Feb 26, 2025
swolchok added a commit that referenced this pull request Feb 26, 2025
* Revert "Switch to new ao quant api for 8da4w (#8501)"

This reverts commit f3fc096.

* Revert "Use to_edge_lower_and_transform for XNNPack (#8624)"

This reverts commit b5344c1.

#8624 caused concerning test failure internally -- out of bounds array access. #8501 depends on it per author
jackzhxng added a commit that referenced this pull request Feb 27, 2025
).quantize(model)
from torchao.quantization import int8_dynamic_activation_int4_weight, quantize_

quantize_(model, int8_dynamic_activation_int4_weight(group_size=group_size))
Copy link
Contributor

@jerryzh168 jerryzh168 Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is torch_dtype not applied here? should it be applied to model?

Copy link
Contributor Author

@jackzhxng jackzhxng Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it would good to ensure the dtype here by applying it to the model in general, but the model is already in fp32 for when the test passes before this PR and when it fails after this PR

mcr229 pushed a commit to mcr229/executorch that referenced this pull request Mar 5, 2025
iseeyuan pushed a commit that referenced this pull request Mar 14, 2025
* Revert "Switch to new ao quant api for 8da4w (#8501)"

This reverts commit f3fc096.

* Revert "Use to_edge_lower_and_transform for XNNPack (#8624)"

This reverts commit b5344c1.
jackzhxng added a commit that referenced this pull request Mar 24, 2025
facebook-github-bot pushed a commit that referenced this pull request Mar 25, 2025
Differential Revision: D70329890

Pull Request resolved: #8772
kirklandsign pushed a commit that referenced this pull request Apr 11, 2025
Differential Revision: D70329890

Pull Request resolved: #8772
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate export_llama to new ao quantize API
4 participants