[ET-VK][ez] Allow logit linear layer to be lowered to Vulkan #9951

SS-JIA · 2025-04-07T22:51:40Z

This PR was created by the merge bot to help merge the original PR into the main branch.
ghstack PR number: #9918 by @SS-JIA
^ Please use this as the source of truth for the PR details, comments, and reviews
ghstack PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/208/base
ghstack PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/208/head
Merge bot PR base: https://github.com/pytorch/executorch/tree/gh/SS-JIA/207/orig
Merge bot PR head: https://github.com/pytorch/executorch/tree/gh/SS-JIA/208/orig
@diff-train-skip-merge

cc @manuelcandales @cbilgin

pytorch-bot · 2025-04-07T22:51:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9951

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Pull Request resolved: #9918 ## Context Due to poor performance of Vulkan's int4 linear operator, the final logit layer of the transformer model was not being delegated to vulkan, and was instead quantized and executed with the XNNPACK delegate. However, with D72412950 / #9883 decent performance can now be achieved with Vulkan/s int4 linear op. Therefore, the final logit layer can be lowered to Vulkan. ## Changes * Remove limit from `VkInt4WeightOnlyQuantizer` that was causing it to ignore the logit layer of the transformer * Do not apply XNNPACK partitioner and quantizer when lowering with Vulkan ghstack-source-id: 276566114 Differential Revision: [D72480177](https://our.internmc.facebook.com/intern/diff/D72480177/)

@manuelcandales

## Context Due to poor performance of Vulkan's int4 linear operator, the final logit layer of the transformer model was not being delegated to vulkan, and was instead quantized and executed with the XNNPACK delegate. However, with D72412950 / #9883 decent performance can now be achieved with Vulkan/s int4 linear op. Therefore, the final logit layer can be lowered to Vulkan. ## Changes * Remove limit from `VkInt4WeightOnlyQuantizer` that was causing it to ignore the logit layer of the transformer * Do not apply XNNPACK partitioner and quantizer when lowering with Vulkan Differential Revision: [D72480177](https://our.internmc.facebook.com/intern/diff/D72480177/) cc @manuelcandales @cbilgin

@manuelcandales

…#9951) ## Context Due to poor performance of Vulkan's int4 linear operator, the final logit layer of the transformer model was not being delegated to vulkan, and was instead quantized and executed with the XNNPACK delegate. However, with D72412950 / pytorch#9883 decent performance can now be achieved with Vulkan/s int4 linear op. Therefore, the final logit layer can be lowered to Vulkan. ## Changes * Remove limit from `VkInt4WeightOnlyQuantizer` that was causing it to ignore the logit layer of the transformer * Do not apply XNNPACK partitioner and quantizer when lowering with Vulkan Differential Revision: [D72480177](https://our.internmc.facebook.com/intern/diff/D72480177/) cc @manuelcandales @cbilgin

SS-JIA requested review from jackzhxng and lucylq as code owners April 7, 2025 22:51

pytorch-bot bot added the module: vulkan Issues related to the Vulkan delegate and code under backends/vulkan/ label Apr 7, 2025

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 7, 2025

kirklandsign approved these changes Apr 7, 2025

View reviewed changes

kirklandsign force-pushed the gh/SS-JIA/207/orig branch from 75f61e4 to e6e6880 Compare April 7, 2025 22:54

Base automatically changed from gh/SS-JIA/207/orig to main April 7, 2025 22:54

kirklandsign force-pushed the gh/SS-JIA/208/orig branch from 0eb073e to 3fdd8ca Compare April 7, 2025 22:55

kirklandsign added the release notes: vulkan Changes to the Vulkan backend delegate label Apr 7, 2025

kirklandsign merged commit 2cce2db into main Apr 7, 2025
3 checks passed

kirklandsign deleted the gh/SS-JIA/208/orig branch April 7, 2025 22:55

This was referenced Apr 14, 2025

Weekly pr metrics report - 2025-04-01..2025-04-07 wdvr/pytorch#28

Open

Weekly pr metrics report - 2025-04-01..2025-04-07 wdvr/pytorch#30

Open

github-actions bot mentioned this pull request May 5, 2025

Weekly pr metrics report - 2025-04-01..2025-04-07 wdvr/pytorch#35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ET-VK][ez] Allow logit linear layer to be lowered to Vulkan #9951

[ET-VK][ez] Allow logit linear layer to be lowered to Vulkan #9951

Uh oh!

SS-JIA commented Apr 7, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Apr 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ET-VK][ez] Allow logit linear layer to be lowered to Vulkan #9951

[ET-VK][ez] Allow logit linear layer to be lowered to Vulkan #9951

Uh oh!

Conversation

SS-JIA commented Apr 7, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9951

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SS-JIA commented Apr 7, 2025 •

edited by pytorch-bot bot

Loading