Switch to new ao quant api for 8da4w #8501

jackzhxng · 2025-02-14T19:52:03Z

Summary

Test plan

[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.

pytorch-bot · 2025-02-14T19:52:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8501

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 285d20b with merge base 6cb5c1a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168

thanks, please make sure the lowering still works as well

mcr229 · 2025-02-18T17:58:55Z

so i believe the qo quant api introduces quant_affine nodes, which are decomposed. We got around this by using to_edge_transform_and_lower path to preserve them from decomposition, but I think that would also require us to modify the llama_export path to use to_edge_transform_and_lower as well.

jackzhxng · 2025-02-21T19:57:17Z

@mcr229 taking a stab at that here #8624

jackzhxng · 2025-02-25T16:42:22Z

^Above has been closed and xnnpack is now using to_edge_transform_and_lower, giving CI another crack at this

mcr229 · 2025-02-25T18:42:48Z

looks like this is running into issues with loading the quantized checkpoints because aten._copy is unimplemented? seems sus, let me know if you need help from me for any part of enabling this.

jackzhxng · 2025-02-25T19:02:29Z

Yeah this is weird, our one test for quantized embeddings is erroring, presumably because the embedding quantization happens after the linear quantization which is now using the API

jerryzh168 · 2025-02-25T19:11:08Z

please check out https://pytorch.org/ao/stable/serialization.html to see how serialization works in torchao, copy_ a torchao quantized tensor to a normal tensor is indeed not supported, we typically use load_state_dict(...., assign=True)

also embedding quantization is not yet fully supported in torchao I think, but we may add this in H1

jackzhxng · 2025-02-25T19:25:39Z

@jerryzh168 this error is being thrown during the load_state_dict here we are already doing which loads the quantized embedding weights - https://github.com/pytorch/executorch/blob/main/examples/models/llama/source_transformation/quantize.py#L666, any idea where this copy_ op is coming from?

jackzhxng · 2025-02-25T20:11:42Z

Update - should be resolved. This error was happening since the quantize embedding code was loading the model's state dict which contains ao-quantized tensors from the linear transform before which now uses quantize_. Need the assign=True to do the deserialization process properly. Thanks @andrewor14 @jerryzh168 @mcr229

jackzhxng · 2025-02-25T21:42:58Z

Actually I should have imported and run internal tests before merging, if diff train gets blocked I'll forward fix

This reverts commit f3fc096.

* Revert "Switch to new ao quant api for 8da4w (#8501)" This reverts commit f3fc096. * Revert "Use to_edge_lower_and_transform for XNNPack (#8624)" This reverts commit b5344c1. #8624 caused concerning test failure internally -- out of bounds array access. #8501 depends on it per author

jerryzh168 · 2025-02-28T05:38:36Z

examples/models/llama/source_transformation/quantize.py

-        ).quantize(model)
+        from torchao.quantization import int8_dynamic_activation_int4_weight, quantize_
+
+        quantize_(model, int8_dynamic_activation_int4_weight(group_size=group_size))


is torch_dtype not applied here? should it be applied to model?

Yeah I think it would good to ensure the dtype here by applying it to the model in general, but the model is already in fp32 for when the test passes before this PR and when it fails after this PR

* Revert "Switch to new ao quant api for 8da4w (#8501)" This reverts commit f3fc096. * Revert "Use to_edge_lower_and_transform for XNNPack (#8624)" This reverts commit b5344c1.

Differential Revision: D70329890 Pull Request resolved: #8772

Switch to new ao quant api for 8da4w

ca6b246

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 14, 2025

mergennachin requested review from mergennachin, mcr229 and jerryzh168 February 14, 2025 20:13

jerryzh168 approved these changes Feb 17, 2025

View reviewed changes

jackzhxng added the release notes: examples Changes to any of our example LLMs integrations, such as Llama3 and Llava label Feb 24, 2025

Merge branch 'main' into jz/new-quantize-api

d3a4def

jackzhxng requested a review from lucylq as a code owner February 25, 2025 16:40

jackzhxng added the ciflow/trunk label Feb 25, 2025

Fix quantize embedding error

285d20b

jackzhxng merged commit f3fc096 into main Feb 25, 2025
119 checks passed

jackzhxng deleted the jz/new-quantize-api branch February 25, 2025 21:41

swolchok added a commit that referenced this pull request Feb 26, 2025

Revert "Switch to new ao quant api for 8da4w (#8501)"

adb897c

This reverts commit f3fc096.

swolchok mentioned this pull request Feb 26, 2025

Revert #8501 and #8624 #8716

Merged

jackzhxng added a commit that referenced this pull request Feb 27, 2025

Switch to new ao quant api for 8da4w (#8501)

4b945a5

jackzhxng mentioned this pull request Feb 27, 2025

Switch to new ao quant api for 8da4w (#8501) #8772

Merged

jerryzh168 reviewed Feb 28, 2025

View reviewed changes

mcr229 pushed a commit to mcr229/executorch that referenced this pull request Mar 5, 2025

Switch to new ao quant api for 8da4w (pytorch#8501)

105206f

iseeyuan pushed a commit that referenced this pull request Mar 14, 2025

Revert #8501 and #8624 (#8716)

5f32355

* Revert "Switch to new ao quant api for 8da4w (#8501)" This reverts commit f3fc096. * Revert "Use to_edge_lower_and_transform for XNNPack (#8624)" This reverts commit b5344c1.

jackzhxng added a commit that referenced this pull request Mar 24, 2025

Switch to new ao quant api for 8da4w (#8501)

920b084

facebook-github-bot pushed a commit that referenced this pull request Mar 25, 2025

Switch to new ao quant api for 8da4w (#8501)

fc25829

Differential Revision: D70329890 Pull Request resolved: #8772

kirklandsign pushed a commit that referenced this pull request Apr 11, 2025

Switch to new ao quant api for 8da4w (#8501)

9be8768

Differential Revision: D70329890 Pull Request resolved: #8772

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to new ao quant api for 8da4w #8501

Switch to new ao quant api for 8da4w #8501

jackzhxng commented Feb 14, 2025 •

edited

Loading

pytorch-bot bot commented Feb 14, 2025 •

edited

Loading

jerryzh168 left a comment

mcr229 commented Feb 18, 2025

jackzhxng commented Feb 21, 2025

jackzhxng commented Feb 25, 2025

mcr229 commented Feb 25, 2025

jackzhxng commented Feb 25, 2025

jerryzh168 commented Feb 25, 2025 •

edited

Loading

jackzhxng commented Feb 25, 2025

jackzhxng commented Feb 25, 2025 •

edited

Loading

jackzhxng commented Feb 25, 2025

jerryzh168 Feb 28, 2025 •

edited

Loading

jackzhxng Mar 3, 2025 •

edited

Loading

Switch to new ao quant api for 8da4w #8501

Switch to new ao quant api for 8da4w #8501

Conversation

jackzhxng commented Feb 14, 2025 • edited Loading

Summary

Test plan

pytorch-bot bot commented Feb 14, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8501

✅ No Failures

jerryzh168 left a comment

Choose a reason for hiding this comment

mcr229 commented Feb 18, 2025

jackzhxng commented Feb 21, 2025

jackzhxng commented Feb 25, 2025

mcr229 commented Feb 25, 2025

jackzhxng commented Feb 25, 2025

jerryzh168 commented Feb 25, 2025 • edited Loading

jackzhxng commented Feb 25, 2025

jackzhxng commented Feb 25, 2025 • edited Loading

jackzhxng commented Feb 25, 2025

jerryzh168 Feb 28, 2025 • edited Loading

Choose a reason for hiding this comment

jackzhxng Mar 3, 2025 • edited Loading

Choose a reason for hiding this comment

jackzhxng commented Feb 14, 2025 •

edited

Loading

pytorch-bot bot commented Feb 14, 2025 •

edited

Loading

jerryzh168 commented Feb 25, 2025 •

edited

Loading

jackzhxng commented Feb 25, 2025 •

edited

Loading

jerryzh168 Feb 28, 2025 •

edited

Loading

jackzhxng Mar 3, 2025 •

edited

Loading