-
Notifications
You must be signed in to change notification settings - Fork 537
[ET-VK][int4] Wrap int4 linear calls with view_copy nodes to squeeze/unsqueeze inputs #8226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ET-VK][int4] Wrap int4 linear calls with view_copy nodes to squeeze/unsqueeze inputs #8226
Conversation
…unsqueeze inputs This is done automatically for full-precision linear/mm nodes in the graph at torch.export graph tracing time, but is not done for the int4 op. The new pass adds view_copy nodes, as there are subsequent passes which can fuse view_copy nodes if redundant, and convert view_copy nodes to squeeze/unsqueeze nodes. Differential Revision: [D69065866](https://our.internmc.facebook.com/intern/diff/D69065866/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8226
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 30e966c with merge base 7805229 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D69065866 |
…unsqueeze inputs This is done automatically for full-precision linear/mm nodes in the graph at torch.export graph tracing time, but is not done for the int4 op. The new pass adds view_copy nodes, as there are subsequent passes which can fuse view_copy nodes if redundant, and convert view_copy nodes to squeeze/unsqueeze nodes. Differential Revision: [D69065866](https://our.internmc.facebook.com/intern/diff/D69065866/) ghstack-source-id: 264874724 Pull Request resolved: #8226
…to squeeze/unsqueeze inputs" This is done automatically for full-precision linear/mm nodes in the graph at torch.export graph tracing time, but is not done for the int4 op. The new pass adds view_copy nodes, as there are subsequent passes which can fuse view_copy nodes if redundant, and convert view_copy nodes to squeeze/unsqueeze nodes. Differential Revision: [D69065866](https://our.internmc.facebook.com/intern/diff/D69065866/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D69065866 |
…unsqueeze inputs Pull Request resolved: #8226 This is done automatically for full-precision linear/mm nodes in the graph at torch.export graph tracing time, but is not done for the int4 op. The new pass adds view_copy nodes, as there are subsequent passes which can fuse view_copy nodes if redundant, and convert view_copy nodes to squeeze/unsqueeze nodes. ghstack-source-id: 264908152 @exported-using-ghexport Differential Revision: [D69065866](https://our.internmc.facebook.com/intern/diff/D69065866/)
…to squeeze/unsqueeze inputs" This is done automatically for full-precision linear/mm nodes in the graph at torch.export graph tracing time, but is not done for the int4 op. The new pass adds view_copy nodes, as there are subsequent passes which can fuse view_copy nodes if redundant, and convert view_copy nodes to squeeze/unsqueeze nodes. Differential Revision: [D69065866](https://our.internmc.facebook.com/intern/diff/D69065866/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D69065866 |
…unsqueeze inputs Pull Request resolved: #8226 This is done automatically for full-precision linear/mm nodes in the graph at torch.export graph tracing time, but is not done for the int4 op. The new pass adds view_copy nodes, as there are subsequent passes which can fuse view_copy nodes if redundant, and convert view_copy nodes to squeeze/unsqueeze nodes. ghstack-source-id: 264915710 @exported-using-ghexport Differential Revision: [D69065866](https://our.internmc.facebook.com/intern/diff/D69065866/)
…to squeeze/unsqueeze inputs" This is done automatically for full-precision linear/mm nodes in the graph at torch.export graph tracing time, but is not done for the int4 op. The new pass adds view_copy nodes, as there are subsequent passes which can fuse view_copy nodes if redundant, and convert view_copy nodes to squeeze/unsqueeze nodes. Differential Revision: [D69065866](https://our.internmc.facebook.com/intern/diff/D69065866/) [ghstack-poisoned]
This pull request was exported from Phabricator. Differential Revision: D69065866 |
…unsqueeze inputs Pull Request resolved: #8226 This is done automatically for full-precision linear/mm nodes in the graph at torch.export graph tracing time, but is not done for the int4 op. The new pass adds view_copy nodes, as there are subsequent passes which can fuse view_copy nodes if redundant, and convert view_copy nodes to squeeze/unsqueeze nodes. ghstack-source-id: 264952606 @exported-using-ghexport Differential Revision: [D69065866](https://our.internmc.facebook.com/intern/diff/D69065866/)
2ba4ab2
into
gh/nathanaelsee/3/base
…unsqueeze inputs Pull Request resolved: #8226 This is done automatically for full-precision linear/mm nodes in the graph at torch.export graph tracing time, but is not done for the int4 op. The new pass adds view_copy nodes, as there are subsequent passes which can fuse view_copy nodes if redundant, and convert view_copy nodes to squeeze/unsqueeze nodes. ghstack-source-id: 264952606 @exported-using-ghexport Differential Revision: [D69065866](https://our.internmc.facebook.com/intern/diff/D69065866/) --------- Co-authored-by: Nathanael See <[email protected]>
See T214560872 #8226 added the pass to the partition preprocess pass list, so now it runs on all exports. This uncovered a bug in the squeeze dims finding function in the mobilenet test case. Differential Revision: [D69254910](https://our.internmc.facebook.com/intern/diff/D69254910/) [ghstack-poisoned]
See T214560872 #8226 added the pass to the partition preprocess pass list, so now it runs on all exports. This uncovered a bug in the squeeze dims finding function in the mobilenet test case. Differential Revision: [D69254910](https://our.internmc.facebook.com/intern/diff/D69254910/) ghstack-source-id: 265078517 Pull Request resolved: #8281
…queezeUnsqueezePass" See T214560872 #8226 added the pass to the partition preprocess pass list, so now it runs on all exports. This uncovered a bug in the squeeze dims finding function in the mobilenet test case. Differential Revision: [D69254910](https://our.internmc.facebook.com/intern/diff/D69254910/) [ghstack-poisoned]
…ass" See T214560872 #8226 added the pass to the partition preprocess pass list, so now it runs on all exports. This uncovered a bug in the squeeze dims finding function in the mobilenet test case. Differential Revision: [D69254910](https://our.internmc.facebook.com/intern/diff/D69254910/) [ghstack-poisoned]
Pull Request resolved: #8281 See T214560872 #8226 added the pass to the partition preprocess pass list, so now it runs on all exports. This uncovered a bug in the squeeze dims finding function in the mobilenet test case. ghstack-source-id: 265183421 @exported-using-ghexport Differential Revision: [D69254910](https://our.internmc.facebook.com/intern/diff/D69254910/)
Stack from ghstack (oldest at bottom):
This is done automatically for full-precision linear/mm nodes in the graph at torch.export graph tracing time, but is not done for the int4 op.
The new pass adds view_copy nodes, as there are subsequent passes which can fuse view_copy nodes if redundant, and convert view_copy nodes to squeeze/unsqueeze nodes.
Differential Revision: D69065866