Skip to content

local change to export llama to qnn #2985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

local change to export llama to qnn #2985

wants to merge 1 commit into from

Conversation

cccclai
Copy link
Contributor

@cccclai cccclai commented Apr 11, 2024

  1. AOT, generate qnn delegated model: python -m examples.models.llama2.export_llama --qnn --use_kv_cache -p /home/chenlai/models/stories110M/params.json -c /home/chenlai/models/stories110M/stories110M.pt

  2. Runtime: follow build_llama_android.sh with QNN config on, then run: /llama_main --model_path=./stories_qnn_SM8450.pte --tokenizer_path=./tokenizer.bin --prompt="Once"

Copy link

pytorch-bot bot commented Apr 11, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2985

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit 796ae1c with merge base d3326a2 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 11, 2024
@cccclai cccclai marked this pull request as draft April 11, 2024 04:34
@shewu-quic
Copy link
Collaborator

Hi Chen,
Thanks for your sharing.
I trying to reproduce but I hit the error. May I ask what I have done less?

cmake-android-out/examples/models/llama2/llama_main: 1 file pushed. 36.5 MB/s (542730752 bytes in 14.174s)
llama2.pte: 1 file pushed. 66.5 MB/s (196377840 bytes in 2.816s)
tokenizer.bin: 1 file pushed. 17.4 MB/s (433869 bytes in 0.024s)
cmake-android-out/lib/libqnn_executorch_backend.so: 1 file pushed. 25.2 MB/s (1025160 bytes in 0.039s)
/opt/qcom/aistack/qnn/2.21.0.240326/lib/aarch64-android/libQnnHtp.so: 1 file pushed. 24.8 MB/s (1573896 bytes in 0.061s)
/opt/qcom/aistack/qnn/2.21.0.240326/lib/aarch64-android/libQnnHtpV75Stub.so: 1 file pushed. 20.3 MB/s (291992 bytes in 0.014s)
/opt/qcom/aistack/qnn/2.21.0.240326/lib/aarch64-android/libQnnSystem.so: 1 file pushed. 24.0 MB/s (230864 bytes in 0.009s)
/opt/qcom/aistack/qnn/2.21.0.240326/lib/hexagon-v75/unsigned/libQnnHtpV75Skel.so: 1 file pushed. 53.0 MB/s (12046348 bytes in 0.217s)
2024-04-12T11:13:36+08:00  - Running...
2024-04-12T11:13:36+08:00  - export LD_LIBRARY_PATH=/data/local/tmp/llama2_cc:/opt/qcom/aistack/qnn/2.21.0.240326/lib/x86_64-linux-clang && export ADSP_LIBRARY_PATH=/data/local/tmp/llama2_cc && cd /data/local/tmp/llama2_cc && ./llama_main --model_path=./llama2.pte --tokenizer_path=./tokenizer.bin --prompt='Once'
E 00:00:00.000208 executorch:operator_registry.cpp:75] Re-registering aten::sym_size.int, from NOT_SUPPORTED
E 00:00:00.000392 executorch:operator_registry.cpp:76] key: (null), is_fallback: true
F 00:00:00.000432 executorch:operator_registry.cpp:33] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
Aborted

@cccclai
Copy link
Contributor Author

cccclai commented Apr 12, 2024

sym_size

oh you may need this change...#2934

In the meanwhile, this line probably need to be updated because there is a bug in the constant prop pass..

m = convert_pt2e(m, fold_quantize=False)

I've submit a change here pytorch/pytorch#123909 to fix the constant prop pass and try to fix it

@cccclai
Copy link
Contributor Author

cccclai commented Apr 12, 2024

Also ideally qnn_executorch_backend doesn't necessarily need to depend on the whole executorch library, just these targets: https://github.com/pytorch/executorch/blob/main/runtime/backend/targets.bzl#L13-L32

@shewu-quic
Copy link
Collaborator

sym_size

oh you may need this change...#2934

In the meanwhile, this line probably need to be updated because there is a bug in the constant prop pass..

m = convert_pt2e(m, fold_quantize=False)

I've submit a change here pytorch/pytorch#123909 to fix the constant prop pass and try to fix it

Thanks for your reply. I will try it.

@shewu-quic
Copy link
Collaborator

shewu-quic commented Apr 12, 2024

Also ideally qnn_executorch_backend doesn't necessarily need to depend on the whole executorch library, just these targets: https://github.com/pytorch/executorch/blob/main/runtime/backend/targets.bzl#L13-L32

That's great. We will try to refine our dependency.
For now, qnn_executorch_backend depends on executorch_no_prim_ops target.

target_link_libraries(qnn_executorch_backend

May I know which target do you recommend?

@cccclai
Copy link
Contributor Author

cccclai commented Apr 12, 2024

Also ideally qnn_executorch_backend doesn't necessarily need to depend on the whole executorch library, just these targets: https://github.com/pytorch/executorch/blob/main/runtime/backend/targets.bzl#L13-L32

That's great. We will try to refine our dependency. For now, qnn_executorch_backend depends on executorch_no_prim_ops target.

target_link_libraries(qnn_executorch_backend

May I know which target do you recommend?

probably need to check the corresponding cmake target...in buck, it's runtime/backend:interface, which should already include "//runtime/core:core", "//runtime/core:evalue", "//runtime/core:event_tracer", "//runtime/core:memory_allocator",

@shewu-quic
Copy link
Collaborator

shewu-quic commented Apr 12, 2024

I can run it.
May I check the results with you?

I get 37 partitions and accuracy is not good, such as "Once nieíoVA аas blablabla"
image

We have survey for our version. The reason seems related rms norm. We obeserve the bigger scale (about 10~30) for mul op in rms norm. When I fallback the rms norm (25 partitions), I will get better results, such as "Once upon a time, there was a mommy and a daddy blablalalb". But as you see, it still has a gap with expected result, "Once upon a time, there was a little girl named Lily. She loved to play outside". We are trying to fix it.

@cccclai
Copy link
Contributor Author

cccclai commented Mar 17, 2025

Not longer needed.

@cccclai cccclai closed this Mar 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants