Change weight to channel-packing in Conv1d #7057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

facebook-github-bot merged 1 commit into pytorch:main from yipjustin:export-D66417572

Nov 26, 2024

Contributor

yipjustin commented Nov 25, 2024

Summary:
In a model we evaluate, we have a weight tensor (256, 1, 7) for conv 1d, 256 is the out-channel count and 7 is the weight.

It leads to a non-optimal use of memory since this tensor is mapped to extents of (7 / 4, 1, 256) under weight-packing, using 1MB per tensor. Reason is that each (x, y) plane uses 4096 bytes in the test device (for both 'OPTIMAL' and 'LINEAR' tiling), despite we are using only 2 texels in each plane.

A temporarily work-around is to use channel-packing instead. Then new tensor will be (7, 1, 64), 75% less deep hence consume far less memory. Knowing that we will fetch 4 times more. But lab test shows that our model has no perf regression.

Future work:

A more optimal solution is mapping the weight tensor (out-channel, in-channel, kernel) into extents (x=out-channel, y=kernel, z=in-channel). In our case, it leads to close to optimal layout.

Reviewed By: nathanaelsee

Differential Revision: D66417572

pytorch-bot bot commented Nov 25, 2024 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7057

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6b5a494 with merge base a35cb73 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot added the CLA Signed label

Contributor

facebook-github-bot commented Nov 25, 2024

This pull request was exported from Phabricator. Differential Revision: D66417572

facebook-github-bot added the fb-exported label

yipjustin added a commit to yipjustin/executorch that referenced this pull request


          Change weight to channel-packing in Conv1d (pytorch#7057)

fa845e2

Summary:

In a model we evaluate, we have a weight tensor (256, 1, 7) for conv 1d, 256 is the out-channel count and 7 is the weight.

It leads to a non-optimal use of memory since this tensor is mapped to extents of `(7 / 4, 1, 256)` under weight-packing, using 1MB per tensor. Reason is that each (x, y) plane uses 4096 bytes in the test device (for both 'OPTIMAL' and 'LINEAR' tiling), despite we are using only 2 texels in each plane.

A temporarily work-around is to use channel-packing instead. Then new tensor will be `(7, 1, 64)`, 75% less deep hence consume far less memory. Knowing that we will fetch 4 times more. But lab test shows that our model has no perf regression.

## Future work: 
A more optimal solution is mapping the weight tensor `(out-channel, in-channel, kernel)` into extents `(x=out-channel, y=kernel, z=in-channel)`.  In our case, it leads to close to optimal layout.

Reviewed By: nathanaelsee

Differential Revision: D66417572

yipjustin force-pushed the export-D66417572 branch from 5353456 to fa845e2 Compare

November 25, 2024 20:45

Contributor

facebook-github-bot commented Nov 25, 2024

This pull request was exported from Phabricator. Differential Revision: D66417572

yipjustin added a commit to yipjustin/executorch that referenced this pull request


          Change weight to channel-packing in Conv1d (pytorch#7057)

44097ee

Summary:

In a model we evaluate, we have a weight tensor (256, 1, 7) for conv 1d, 256 is the out-channel count and 7 is the weight.

It leads to a non-optimal use of memory since this tensor is mapped to extents of `(7 / 4, 1, 256)` under weight-packing, using 1MB per tensor. Reason is that each (x, y) plane uses 4096 bytes in the test device (for both 'OPTIMAL' and 'LINEAR' tiling), despite we are using only 2 texels in each plane.

A temporarily work-around is to use channel-packing instead. Then new tensor will be `(7, 1, 64)`, 75% less deep hence consume far less memory. Knowing that we will fetch 4 times more. But lab test shows that our model has no perf regression.

## Future work: 
A more optimal solution is mapping the weight tensor `(out-channel, in-channel, kernel)` into extents `(x=out-channel, y=kernel, z=in-channel)`.  In our case, it leads to close to optimal layout.

Reviewed By: nathanaelsee

Differential Revision: D66417572

yipjustin force-pushed the export-D66417572 branch from fa845e2 to 44097ee Compare

November 25, 2024 20:47

Contributor

facebook-github-bot commented Nov 25, 2024

This pull request was exported from Phabricator. Differential Revision: D66417572

yipjustin added the release notes: backends [DO NOT USE] label

jorgep31415 approved these changes

View reviewed changes

yipjustin added a commit to yipjustin/executorch that referenced this pull request


          Change weight to channel-packing in Conv1d (pytorch#7057)

c19759d

Summary:

In a model we evaluate, we have a weight tensor (256, 1, 7) for conv 1d, 256 is the out-channel count and 7 is the weight.

It leads to a non-optimal use of memory since this tensor is mapped to extents of `(7 / 4, 1, 256)` under weight-packing, using 1MB per tensor. Reason is that each (x, y) plane uses 4096 bytes in the test device (for both 'OPTIMAL' and 'LINEAR' tiling), despite we are using only 2 texels in each plane.

A temporarily work-around is to use channel-packing instead. Then new tensor will be `(7, 1, 64)`, 75% less deep hence consume far less memory. Knowing that we will fetch 4 times more. But lab test shows that our model has no perf regression.

## Future work: 
A more optimal solution is mapping the weight tensor `(out-channel, in-channel, kernel)` into extents `(x=out-channel, y=kernel, z=in-channel)`.  In our case, it leads to close to optimal layout.

Reviewed By: nathanaelsee, jorgep31415

Differential Revision: D66417572

yipjustin force-pushed the export-D66417572 branch from 44097ee to c19759d Compare

November 26, 2024 03:11


          Change weight to channel-packing in Conv1d (pytorch#7057)

6b5a494

Summary:

In a model we evaluate, we have a weight tensor (256, 1, 7) for conv 1d, 256 is the out-channel count and 7 is the weight.

It leads to a non-optimal use of memory since this tensor is mapped to extents of `(7 / 4, 1, 256)` under weight-packing, using 1MB per tensor. Reason is that each (x, y) plane uses 4096 bytes in the test device (for both 'OPTIMAL' and 'LINEAR' tiling), despite we are using only 2 texels in each plane.

A temporarily work-around is to use channel-packing instead. Then new tensor will be `(7, 1, 64)`, 75% less deep hence consume far less memory. Knowing that we will fetch 4 times more. But lab test shows that our model has no perf regression.

## Future work: 
A more optimal solution is mapping the weight tensor `(out-channel, in-channel, kernel)` into extents `(x=out-channel, y=kernel, z=in-channel)`.  In our case, it leads to close to optimal layout.

Reviewed By: nathanaelsee, jorgep31415

Differential Revision: D66417572

yipjustin force-pushed the export-D66417572 branch from c19759d to 6b5a494 Compare

November 26, 2024 03:11

Contributor

facebook-github-bot commented Nov 26, 2024

This pull request was exported from Phabricator. Differential Revision: D66417572

1 similar comment

Contributor

facebook-github-bot commented Nov 26, 2024

This pull request was exported from Phabricator. Differential Revision: D66417572

facebook-github-bot merged commit 2967302 into pytorch:main

42 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed fb-exported release notes: backends [DO NOT USE]