-
Notifications
You must be signed in to change notification settings - Fork 589
Change weight to channel-packing in Conv1d #7057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7057
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 6b5a494 with merge base a35cb73 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D66417572 |
Summary: In a model we evaluate, we have a weight tensor (256, 1, 7) for conv 1d, 256 is the out-channel count and 7 is the weight. It leads to a non-optimal use of memory since this tensor is mapped to extents of `(7 / 4, 1, 256)` under weight-packing, using 1MB per tensor. Reason is that each (x, y) plane uses 4096 bytes in the test device (for both 'OPTIMAL' and 'LINEAR' tiling), despite we are using only 2 texels in each plane. A temporarily work-around is to use channel-packing instead. Then new tensor will be `(7, 1, 64)`, 75% less deep hence consume far less memory. Knowing that we will fetch 4 times more. But lab test shows that our model has no perf regression. ## Future work: A more optimal solution is mapping the weight tensor `(out-channel, in-channel, kernel)` into extents `(x=out-channel, y=kernel, z=in-channel)`. In our case, it leads to close to optimal layout. Reviewed By: nathanaelsee Differential Revision: D66417572
5353456
to
fa845e2
Compare
This pull request was exported from Phabricator. Differential Revision: D66417572 |
Summary: In a model we evaluate, we have a weight tensor (256, 1, 7) for conv 1d, 256 is the out-channel count and 7 is the weight. It leads to a non-optimal use of memory since this tensor is mapped to extents of `(7 / 4, 1, 256)` under weight-packing, using 1MB per tensor. Reason is that each (x, y) plane uses 4096 bytes in the test device (for both 'OPTIMAL' and 'LINEAR' tiling), despite we are using only 2 texels in each plane. A temporarily work-around is to use channel-packing instead. Then new tensor will be `(7, 1, 64)`, 75% less deep hence consume far less memory. Knowing that we will fetch 4 times more. But lab test shows that our model has no perf regression. ## Future work: A more optimal solution is mapping the weight tensor `(out-channel, in-channel, kernel)` into extents `(x=out-channel, y=kernel, z=in-channel)`. In our case, it leads to close to optimal layout. Reviewed By: nathanaelsee Differential Revision: D66417572
fa845e2
to
44097ee
Compare
This pull request was exported from Phabricator. Differential Revision: D66417572 |
Summary: In a model we evaluate, we have a weight tensor (256, 1, 7) for conv 1d, 256 is the out-channel count and 7 is the weight. It leads to a non-optimal use of memory since this tensor is mapped to extents of `(7 / 4, 1, 256)` under weight-packing, using 1MB per tensor. Reason is that each (x, y) plane uses 4096 bytes in the test device (for both 'OPTIMAL' and 'LINEAR' tiling), despite we are using only 2 texels in each plane. A temporarily work-around is to use channel-packing instead. Then new tensor will be `(7, 1, 64)`, 75% less deep hence consume far less memory. Knowing that we will fetch 4 times more. But lab test shows that our model has no perf regression. ## Future work: A more optimal solution is mapping the weight tensor `(out-channel, in-channel, kernel)` into extents `(x=out-channel, y=kernel, z=in-channel)`. In our case, it leads to close to optimal layout. Reviewed By: nathanaelsee, jorgep31415 Differential Revision: D66417572
44097ee
to
c19759d
Compare
Summary: In a model we evaluate, we have a weight tensor (256, 1, 7) for conv 1d, 256 is the out-channel count and 7 is the weight. It leads to a non-optimal use of memory since this tensor is mapped to extents of `(7 / 4, 1, 256)` under weight-packing, using 1MB per tensor. Reason is that each (x, y) plane uses 4096 bytes in the test device (for both 'OPTIMAL' and 'LINEAR' tiling), despite we are using only 2 texels in each plane. A temporarily work-around is to use channel-packing instead. Then new tensor will be `(7, 1, 64)`, 75% less deep hence consume far less memory. Knowing that we will fetch 4 times more. But lab test shows that our model has no perf regression. ## Future work: A more optimal solution is mapping the weight tensor `(out-channel, in-channel, kernel)` into extents `(x=out-channel, y=kernel, z=in-channel)`. In our case, it leads to close to optimal layout. Reviewed By: nathanaelsee, jorgep31415 Differential Revision: D66417572
c19759d
to
6b5a494
Compare
This pull request was exported from Phabricator. Differential Revision: D66417572 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D66417572 |
Summary:
In a model we evaluate, we have a weight tensor (256, 1, 7) for conv 1d, 256 is the out-channel count and 7 is the weight.
It leads to a non-optimal use of memory since this tensor is mapped to extents of
(7 / 4, 1, 256)
under weight-packing, using 1MB per tensor. Reason is that each (x, y) plane uses 4096 bytes in the test device (for both 'OPTIMAL' and 'LINEAR' tiling), despite we are using only 2 texels in each plane.A temporarily work-around is to use channel-packing instead. Then new tensor will be
(7, 1, 64)
, 75% less deep hence consume far less memory. Knowing that we will fetch 4 times more. But lab test shows that our model has no perf regression.Future work:
A more optimal solution is mapping the weight tensor
(out-channel, in-channel, kernel)
into extents(x=out-channel, y=kernel, z=in-channel)
. In our case, it leads to close to optimal layout.Reviewed By: nathanaelsee
Differential Revision: D66417572