Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Support per-op partitioning in XNNPACK delegate for NHWC ops #8265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
GregoryComer opened this issue Feb 6, 2025 · 1 comment
Closed

Support per-op partitioning in XNNPACK delegate for NHWC ops #8265

GregoryComer opened this issue Feb 6, 2025 · 1 comment
Labels
module: xnnpack Issues related to xnnpack delegation and the code under backends/xnnpack/ triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@GregoryComer
Copy link
Member

GregoryComer commented Feb 6, 2025

🚀 The feature, motivation and pitch

We currently support per-op partitioning in the XNNPACK delegate, which allows for all activation tensor memory to be owned by ExecuTorch and thus overlapped with other ExecuTorch-owned activation memory. However, this isn't currently practical for ops that run in channels-last (NHWC) dim order. This is because the delegate currently assumes that tensors that are inputs or outputs to the partition are always channels first (NCHW) and thus inserts dim order conversions around every op. This is perf issue, but more importantly, it means that XNNPACK ends up owning some of the activation memory.

Ideally, we can leverage the recent dim order support in the core runtime to let the framework manage the dim order conversion, at least in single-op mode. How this interacts with partitioning is not entirely clear, since this would have to happen after partitioning. For initial purposes, it's likely fine to let the dim order conversions not be delegated. This likely needs a bit more design discussion, but it is a high ROI feature and may be necessary for memory parity with LI in some cases, even with workspace sharing.

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

cc @digantdesai @mcr229

@GregoryComer GregoryComer added feature module: xnnpack Issues related to xnnpack delegation and the code under backends/xnnpack/ labels Feb 6, 2025
@GregoryComer GregoryComer changed the title Support per-op partitioning in XNNPACK delegate for NHWC oprt Support per-op partitioning in XNNPACK delegate for NHWC ops Feb 6, 2025
@digantdesai digantdesai added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 6, 2025
@mcr229
Copy link
Contributor

mcr229 commented Feb 13, 2025

let's convert this to a discussion

@pytorch pytorch locked and limited conversation to collaborators Feb 13, 2025
@mcr229 mcr229 converted this issue into discussion #8476 Feb 13, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
module: xnnpack Issues related to xnnpack delegation and the code under backends/xnnpack/ triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants