Use to_edge_lower_and_transform for XNNPack #8624

jackzhxng · 2025-02-21T19:49:44Z

Summary

Use to_edge_transform_and_lower in export_llama for XNNPack. As part of these changes, this also means that you cannot specify multiple backends in export_llama in the args, although I'm not sure if that is happening anywhere at the moment.

Closes #8621

Performance regression benchmarking for xnnpack (on android) vs. past 3 days:

These benchmark numbers also normally fluctuate a bit across runs and these differences are within the usual fluctuation ranges.

Test plan

See if CI passes

pytorch-bot · 2025-02-21T19:49:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8624

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 951d91e with merge base 77589c6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

iseeyuan · 2025-02-24T17:35:01Z

Have you tested the performance of a model (like llama 3B), before and after? Asking because export_llama is used by different users to prepare for the .pte files in cpu. Please make sure there's no perf regress.

jackzhxng · 2025-02-24T18:21:15Z

Running on demand perf benchmark here: https://github.com/pytorch/executorch/actions/runs/13505211116

examples/models/llama/export_llama_lib.py

tarun292 · 2025-02-24T18:27:49Z

examples/models/llama/export_llama_lib.py

+    )
+    if args.verbose:
+        print_delegation_info(builder.edge_manager.exported_program().graph_module)
+    if args.num_sharding > 0 and args.qnn:


This code shouldn't be here right?

Oh yeah technically should remove since qnn. Will remove after benchmarking finishes

jackzhxng · 2025-02-24T20:04:10Z

@iseeyuan please see the performance benchmark graph I posted in the pr description

mergennachin · 2025-02-24T21:24:20Z

Can you run internal CI tests before merging? Otherwise looks good.

facebook-github-bot · 2025-02-24T22:00:10Z

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: Use `to_edge_transform_and_lower` in `export_llama` for XNNPack. As part of these changes, this also means that you cannot specify multiple backends in `export_llama` in the args, although I'm not sure if that is happening anywhere at the moment. Closes #8621 Performance regression benchmarking for xnnpack (on android) vs. past 3 days: <img width="1427" alt="Screenshot 2025-02-24 at 11 39 52 AM" src="https://github.com/user-attachments/assets/1640cf2c-a579-491f-8940-7ccfbe464903" /> These benchmark numbers also normally fluctuate a bit across runs and these differences are within the usual fluctuation ranges. Test Plan: See if CI passes Differential Revision: D70124742 Pulled By: jackzhxng

facebook-github-bot · 2025-02-25T00:16:11Z

This pull request was exported from Phabricator. Differential Revision: D70124742

facebook-github-bot · 2025-02-25T00:33:02Z

@jackzhxng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: Use `to_edge_transform_and_lower` in `export_llama` for XNNPack. As part of these changes, this also means that you cannot specify multiple backends in `export_llama` in the args, although I'm not sure if that is happening anywhere at the moment. Closes #8621 Performance regression benchmarking for xnnpack (on android) vs. past 3 days: <img width="1427" alt="Screenshot 2025-02-24 at 11 39 52 AM" src="https://github.com/user-attachments/assets/1640cf2c-a579-491f-8940-7ccfbe464903" /> These benchmark numbers also normally fluctuate a bit across runs and these differences are within the usual fluctuation ranges. Test Plan: See if CI passes Differential Revision: D70124742 Pulled By: jackzhxng

facebook-github-bot · 2025-02-25T02:15:01Z

This pull request was exported from Phabricator. Differential Revision: D70124742

This reverts commit b5344c1.

* Revert "Switch to new ao quant api for 8da4w (#8501)" This reverts commit f3fc096. * Revert "Use to_edge_lower_and_transform for XNNPack (#8624)" This reverts commit b5344c1. #8624 caused concerning test failure internally -- out of bounds array access. #8501 depends on it per author

Summary: Trying to bring back #8624 after it got reverted due to an internal test failing Differential Revision: D70221944 Pulled By: jackzhxng

Differential Revision: D70221944 Pull Request resolved: #8717

* Revert "Switch to new ao quant api for 8da4w (#8501)" This reverts commit f3fc096. * Revert "Use to_edge_lower_and_transform for XNNPack (#8624)" This reverts commit b5344c1.

jackzhxng requested review from iseeyuan, larryliu0820 and lucylq as code owners February 21, 2025 19:49

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 21, 2025

jackzhxng changed the title ~~Use to_edge_lower_and_transform for xnnpack~~ Use to_edge_lower_and_transform for XNNPack Feb 21, 2025

jackzhxng added topic: not user facing release notes: xnnpack Changes to the XNNPack backend delegate and removed topic: not user facing labels Feb 21, 2025

jackzhxng mentioned this pull request Feb 21, 2025

Switch to new ao quant api for 8da4w #8501

Merged

jackzhxng added the ciflow/trunk label Feb 21, 2025

mergennachin self-requested a review February 21, 2025 20:43

tarun292 reviewed Feb 24, 2025

View reviewed changes

examples/models/llama/export_llama_lib.py Show resolved Hide resolved

tarun292 reviewed Feb 24, 2025

View reviewed changes

jackzhxng temporarily deployed to upload-benchmark-results February 24, 2025 19:19 — with GitHub Actions Inactive

jackzhxng temporarily deployed to upload-benchmark-results February 24, 2025 19:48 — with GitHub Actions Inactive

mergennachin approved these changes Feb 24, 2025

View reviewed changes

facebook-github-bot force-pushed the jz/export_llama_new_api branch from b30733d to 1b0c5e4 Compare February 25, 2025 00:16

facebook-github-bot added the fb-exported label Feb 25, 2025

facebook-github-bot force-pushed the jz/export_llama_new_api branch from 930ec0e to 951d91e Compare February 25, 2025 02:14

jackzhxng merged commit b5344c1 into main Feb 25, 2025
119 of 121 checks passed

jackzhxng deleted the jz/export_llama_new_api branch February 25, 2025 09:58

jackzhxng restored the jz/export_llama_new_api branch February 25, 2025 09:59

jackzhxng temporarily deployed to upload-benchmark-results February 25, 2025 10:45 — with GitHub Actions Inactive

jackzhxng mentioned this pull request Feb 25, 2025

[DRAFT] Export llama uses to_edge_lower_and_transform #7524

Closed

swolchok added a commit that referenced this pull request Feb 26, 2025

Revert "Use to_edge_lower_and_transform for XNNPack (#8624)"

8a95288

This reverts commit b5344c1.

swolchok mentioned this pull request Feb 26, 2025

Revert #8501 and #8624 #8716

Merged

jackzhxng added a commit that referenced this pull request Feb 26, 2025

Use to_edge_lower_and_transform for XNNPack (#8624)

f737078

jackzhxng mentioned this pull request Feb 26, 2025

Use to_edge_lower_and_transform for XNNPack (#8624) #8717

Merged

facebook-github-bot pushed a commit that referenced this pull request Feb 27, 2025

Use to_edge_lower_and_transform for XNNPack (#8624)

30d4cc8

Differential Revision: D70221944 Pull Request resolved: #8717

jackzhxng mentioned this pull request Mar 11, 2025

Fix pre-autograd transforms not getting persisted during xnnpack export #9118

Merged

iseeyuan pushed a commit that referenced this pull request Mar 14, 2025

Revert #8501 and #8624 (#8716)

5f32355

* Revert "Switch to new ao quant api for 8da4w (#8501)" This reverts commit f3fc096. * Revert "Use to_edge_lower_and_transform for XNNPack (#8624)" This reverts commit b5344c1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use to_edge_lower_and_transform for XNNPack #8624

Use to_edge_lower_and_transform for XNNPack #8624

jackzhxng commented Feb 21, 2025 •

edited

Loading

pytorch-bot bot commented Feb 21, 2025 •

edited

Loading

iseeyuan commented Feb 24, 2025

jackzhxng commented Feb 24, 2025 •

edited

Loading

tarun292 Feb 24, 2025

jackzhxng Feb 24, 2025

jackzhxng commented Feb 24, 2025

mergennachin commented Feb 24, 2025

facebook-github-bot commented Feb 24, 2025

facebook-github-bot commented Feb 25, 2025

facebook-github-bot commented Feb 25, 2025

facebook-github-bot commented Feb 25, 2025

Use to_edge_lower_and_transform for XNNPack #8624

Use to_edge_lower_and_transform for XNNPack #8624

Conversation

jackzhxng commented Feb 21, 2025 • edited Loading

Summary

Test plan

pytorch-bot bot commented Feb 21, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8624

✅ No Failures

iseeyuan commented Feb 24, 2025

jackzhxng commented Feb 24, 2025 • edited Loading

tarun292 Feb 24, 2025

Choose a reason for hiding this comment

jackzhxng Feb 24, 2025

Choose a reason for hiding this comment

jackzhxng commented Feb 24, 2025

mergennachin commented Feb 24, 2025

facebook-github-bot commented Feb 24, 2025

facebook-github-bot commented Feb 25, 2025

facebook-github-bot commented Feb 25, 2025

facebook-github-bot commented Feb 25, 2025

jackzhxng commented Feb 21, 2025 •

edited

Loading

pytorch-bot bot commented Feb 21, 2025 •

edited

Loading

jackzhxng commented Feb 24, 2025 •

edited

Loading