Qualcomm AI Engine Direct - Add llama sha transforming pass #6211

chunit-quic · 2024-10-15T00:46:30Z

Add SHA pass

pytorch-bot · 2024-10-15T00:46:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6211

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 607bc6c with merge base 5b51bb8 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-10-15T22:49:37Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-10-15T23:53:38Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2024-10-15T23:58:34Z

examples/models/llama2/export_llama.py


 import torch

 from .export_llama_lib import build_args_parser, export_llama

+sys.setrecursionlimit(4096)


what is this for?

We hit the maximum recursion depth during model = prepare(model, node_name_to_scope, is_qat=False) in builder.py. Therefore we enlarge the limit here.

cccclai · 2024-10-16T00:04:54Z

examples/models/llama2/llama_transformer.py

@@ -260,21 +260,22 @@ class Attention(nn.Module):
    def __init__(self, args: ModelArgs, layer_id: int):
        super().__init__()
        self.use_kv_cache = args.use_kv_cache
-        self.n_kv_heads = args.n_heads if args.n_kv_heads is None else args.n_kv_heads
-        assert args.n_heads % self.n_kv_heads == 0
+        self.n_heads = args.n_heads


I guess it's still draft, but will be good to have a seperate PR if we need to change llama_transformer.py..

Sure. We can separate this part to another PR.

Hi @cccclai
Just pushed a seperated PR for it. If PR 6376 looks fine to you and is merged, I will rebase this PR. Thank you. :)

facebook-github-bot · 2024-10-16T00:05:09Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

chunit-quic · 2024-11-06T04:45:26Z

@cccclai
Just a gentle ping. We rebased and changed to PR from draft few minutes ago. Would this be fine to import and merge? Thank you!

cccclai · 2024-11-06T05:25:41Z

Sure yeah, let me merge it

facebook-github-bot · 2024-11-06T16:31:39Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2024-11-07T05:56:40Z

Hi this PR breaks some internal test, I need to get you a patch to land this PR safely.

cccclai

Looks good

cccclai · 2024-11-11T00:59:58Z

examples/models/llama/export_llama.py


 import torch

 from .export_llama_lib import build_args_parser, export_llama

+sys.setrecursionlimit(4096)


Is it still required in the latest commit?

Yes it is needed when enable use_qnn_sha. Otherwise will trigger maximum recursion depth at the prepare_pt2e funciton.

Can you add a comment and explain the reason? Also how likely we can guard it to args.qnn only?

Sorry for late reply. I was running an experiment about the comment.
I believe we can move this line and import sys to a qualcomm specific condition

@ def get_quantizer_and_quant_params(args): @@ -557,6 +557,8 @ quantizers = get_pt2e_quantizers(pt2e_quant_params, args.so_library) quant_dtype = None if args.qnn and args.pt2e_quantize: + import sys + sys.setrecursionlimit(4096)

It can guard it to args.qnn only. If this one looks better, I will raise a PR to move it and add comment

cccclai · 2024-11-11T01:01:02Z

Can you apply this patch to fix internal test?

--- a/executorch/examples/models/llama/TARGETS
+++ b/executorch/examples/models/llama/TARGETS
@@ -82,6 +82,7 @@
         "export_llama_lib.py",
         "model.py",
         "source_transformation/apply_spin_quant_r1_r2.py",
+        "source_transformation/attention.py",
         "source_transformation/lora.py",
         "source_transformation/pre_quantization.py",
         "source_transformation/prune_vocab.py",

- Add SHA pass

chunit-quic · 2024-11-11T04:08:44Z

Can you apply this patch to fix internal test?

Sure, just rebase and add.

facebook-github-bot · 2024-11-11T04:37:25Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-11-11T19:26:04Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 15, 2024

cccclai reviewed Oct 16, 2024

View reviewed changes

shewu-quic mentioned this pull request Oct 25, 2024

Qualcomm AI Engine Direct - The performance issue about mutable buffer #6493

Closed

chunit-quic force-pushed the dev1/chunit/add_llama_sha_pass branch from 4cd737a to 60aeec8 Compare November 1, 2024 04:10

chunit-quic marked this pull request as ready for review November 6, 2024 03:17

chunit-quic force-pushed the dev1/chunit/add_llama_sha_pass branch from 60aeec8 to a2a97a7 Compare November 6, 2024 03:22

cccclai approved these changes Nov 11, 2024

View reviewed changes

Chun-I Tsai and others added 3 commits November 11, 2024 12:03

Qualcomm AI Engine Direct - Add llama sha transforming pass

2f85f9e

- Add SHA pass

Rebase and change class name *Sha-> SHA

8bceee8

Fix internal test

607bc6c

chunit-quic force-pushed the dev1/chunit/add_llama_sha_pass branch from a2a97a7 to 607bc6c Compare November 11, 2024 04:05

facebook-github-bot merged commit 576e96c into pytorch:main Nov 11, 2024
40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qualcomm AI Engine Direct - Add llama sha transforming pass #6211

Qualcomm AI Engine Direct - Add llama sha transforming pass #6211

chunit-quic commented Oct 15, 2024

pytorch-bot bot commented Oct 15, 2024 •

edited

Loading

facebook-github-bot commented Oct 15, 2024

facebook-github-bot commented Oct 15, 2024

cccclai Oct 15, 2024

chunit-quic Oct 21, 2024

cccclai Oct 16, 2024

chunit-quic Oct 21, 2024

chunit-quic Oct 21, 2024

facebook-github-bot commented Oct 16, 2024

chunit-quic commented Nov 6, 2024

cccclai commented Nov 6, 2024

facebook-github-bot commented Nov 6, 2024

cccclai commented Nov 7, 2024

cccclai left a comment

cccclai Nov 11, 2024

chunit-quic Nov 11, 2024

cccclai Nov 11, 2024

chunit-quic Nov 13, 2024

cccclai commented Nov 11, 2024

chunit-quic commented Nov 11, 2024

facebook-github-bot commented Nov 11, 2024

facebook-github-bot commented Nov 11, 2024

Qualcomm AI Engine Direct - Add llama sha transforming pass #6211

Qualcomm AI Engine Direct - Add llama sha transforming pass #6211

Conversation

chunit-quic commented Oct 15, 2024

pytorch-bot bot commented Oct 15, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6211

✅ No Failures

facebook-github-bot commented Oct 15, 2024

facebook-github-bot commented Oct 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 16, 2024

chunit-quic commented Nov 6, 2024

cccclai commented Nov 6, 2024

facebook-github-bot commented Nov 6, 2024

cccclai commented Nov 7, 2024

cccclai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cccclai commented Nov 11, 2024

chunit-quic commented Nov 11, 2024

facebook-github-bot commented Nov 11, 2024

facebook-github-bot commented Nov 11, 2024

pytorch-bot bot commented Oct 15, 2024 •

edited

Loading