-
Notifications
You must be signed in to change notification settings - Fork 536
Switch to new ao quant api for 8da4w #8501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8501
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 285d20b with merge base 6cb5c1a ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, please make sure the lowering still works as well
so i believe the qo quant api introduces quant_affine nodes, which are decomposed. We got around this by using to_edge_transform_and_lower path to preserve them from decomposition, but I think that would also require us to modify the llama_export path to use to_edge_transform_and_lower as well. |
^Above has been closed and xnnpack is now using |
looks like this is running into issues with loading the quantized checkpoints because aten._copy is unimplemented? seems sus, let me know if you need help from me for any part of enabling this. |
Yeah this is weird, our one test for quantized embeddings is erroring, presumably because the embedding quantization happens after the linear quantization which is now using the API |
please check out https://pytorch.org/ao/stable/serialization.html to see how serialization works in torchao, copy_ a torchao quantized tensor to a normal tensor is indeed not supported, we typically use also embedding quantization is not yet fully supported in torchao I think, but we may add this in H1 |
@jerryzh168 this error is being thrown during the |
Update - should be resolved. This error was happening since the quantize embedding code was loading the model's state dict which contains ao-quantized tensors from the linear transform before which now uses |
Actually I should have imported and run internal tests before merging, if diff train gets blocked I'll forward fix |
).quantize(model) | ||
from torchao.quantization import int8_dynamic_activation_int4_weight, quantize_ | ||
|
||
quantize_(model, int8_dynamic_activation_int4_weight(group_size=group_size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is torch_dtype
not applied here? should it be applied to model
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it would good to ensure the dtype here by applying it to the model in general, but the model is already in fp32 for when the test passes before this PR and when it fails after this PR
Differential Revision: D70329890 Pull Request resolved: #8772
Differential Revision: D70329890 Pull Request resolved: #8772
Summary
Closes #8422
Test plan
[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.