local change to export llama to qnn #2985

cccclai · 2024-04-11T04:33:50Z

AOT, generate qnn delegated model: python -m examples.models.llama2.export_llama --qnn --use_kv_cache -p /home/chenlai/models/stories110M/params.json -c /home/chenlai/models/stories110M/stories110M.pt
Runtime: follow build_llama_android.sh with QNN config on, then run: /llama_main --model_path=./stories_qnn_SM8450.pte --tokenizer_path=./tokenizer.bin --prompt="Once"

pytorch-bot · 2024-04-11T04:33:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2985

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures

As of commit 796ae1c with merge base d3326a2 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
>>> Lint for examples/models/llama2/llama_transformer.py:
pull / test-llama-runner-linux (fp32, buck2, portable) / linux-job (gh)
RuntimeError: Command docker exec -t f8db1f04ffa27c1d432eba898f549c2a98cc3d71b4edafe87670fd8e5104d67c /exec failed with exit code 1
pull / test-llama-runner-linux (fp32, buck2, xnnpack+kv+custom) / linux-job (gh)
RuntimeError: Command docker exec -t 2d2e935f1d4da5be6bb495d853a8faba9ef47afacb15dd129eb1a78f73c8e9e3 /exec failed with exit code 1
pull / test-llama-runner-linux (fp32, cmake, portable) / linux-job (gh)
RuntimeError: Command docker exec -t f1a3a958fc32fe81b0b0c87467852990ad5a97f6a54a88b74b9f3a33d1f3939d /exec failed with exit code 1
pull / test-llama-runner-linux (fp32, cmake, xnnpack+kv+custom) / linux-job (gh)
RuntimeError: Command docker exec -t 1e9ea998826699c124c67ad043587024ab896a8543e07888d3d4381e2712c75f /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

shewu-quic · 2024-04-12T03:16:23Z

Hi Chen,
Thanks for your sharing.
I trying to reproduce but I hit the error. May I ask what I have done less?

cmake-android-out/examples/models/llama2/llama_main: 1 file pushed. 36.5 MB/s (542730752 bytes in 14.174s)
llama2.pte: 1 file pushed. 66.5 MB/s (196377840 bytes in 2.816s)
tokenizer.bin: 1 file pushed. 17.4 MB/s (433869 bytes in 0.024s)
cmake-android-out/lib/libqnn_executorch_backend.so: 1 file pushed. 25.2 MB/s (1025160 bytes in 0.039s)
/opt/qcom/aistack/qnn/2.21.0.240326/lib/aarch64-android/libQnnHtp.so: 1 file pushed. 24.8 MB/s (1573896 bytes in 0.061s)
/opt/qcom/aistack/qnn/2.21.0.240326/lib/aarch64-android/libQnnHtpV75Stub.so: 1 file pushed. 20.3 MB/s (291992 bytes in 0.014s)
/opt/qcom/aistack/qnn/2.21.0.240326/lib/aarch64-android/libQnnSystem.so: 1 file pushed. 24.0 MB/s (230864 bytes in 0.009s)
/opt/qcom/aistack/qnn/2.21.0.240326/lib/hexagon-v75/unsigned/libQnnHtpV75Skel.so: 1 file pushed. 53.0 MB/s (12046348 bytes in 0.217s)
2024-04-12T11:13:36+08:00  - Running...
2024-04-12T11:13:36+08:00  - export LD_LIBRARY_PATH=/data/local/tmp/llama2_cc:/opt/qcom/aistack/qnn/2.21.0.240326/lib/x86_64-linux-clang && export ADSP_LIBRARY_PATH=/data/local/tmp/llama2_cc && cd /data/local/tmp/llama2_cc && ./llama_main --model_path=./llama2.pte --tokenizer_path=./tokenizer.bin --prompt='Once'
E 00:00:00.000208 executorch:operator_registry.cpp:75] Re-registering aten::sym_size.int, from NOT_SUPPORTED
E 00:00:00.000392 executorch:operator_registry.cpp:76] key: (null), is_fallback: true
F 00:00:00.000432 executorch:operator_registry.cpp:33] In function register_kernels(), assert failed (false): Kernel registration failed with error 18, see error log for details.
Aborted

cccclai · 2024-04-12T04:08:43Z

sym_size

oh you may need this change...#2934

In the meanwhile, this line probably need to be updated because there is a bug in the constant prop pass..

m = convert_pt2e(m, fold_quantize=False)

I've submit a change here pytorch/pytorch#123909 to fix the constant prop pass and try to fix it

cccclai · 2024-04-12T04:10:59Z

Also ideally qnn_executorch_backend doesn't necessarily need to depend on the whole executorch library, just these targets: https://github.com/pytorch/executorch/blob/main/runtime/backend/targets.bzl#L13-L32

shewu-quic · 2024-04-12T04:13:55Z

sym_size

oh you may need this change...#2934

In the meanwhile, this line probably need to be updated because there is a bug in the constant prop pass..
m = convert_pt2e(m, fold_quantize=False)
I've submit a change here pytorch/pytorch#123909 to fix the constant prop pass and try to fix it

Thanks for your reply. I will try it.

shewu-quic · 2024-04-12T04:22:12Z

Also ideally qnn_executorch_backend doesn't necessarily need to depend on the whole executorch library, just these targets: https://github.com/pytorch/executorch/blob/main/runtime/backend/targets.bzl#L13-L32

That's great. We will try to refine our dependency.
For now, qnn_executorch_backend depends on executorch_no_prim_ops target.

executorch/backends/qualcomm/CMakeLists.txt

Line 251 in 6acc86f

target_link_libraries(qnn_executorch_backend

May I know which target do you recommend?

cccclai · 2024-04-12T06:05:22Z

Also ideally qnn_executorch_backend doesn't necessarily need to depend on the whole executorch library, just these targets: https://github.com/pytorch/executorch/blob/main/runtime/backend/targets.bzl#L13-L32

That's great. We will try to refine our dependency. For now, qnn_executorch_backend depends on executorch_no_prim_ops target.

executorch/backends/qualcomm/CMakeLists.txt

Line 251 in 6acc86f

target_link_libraries(qnn_executorch_backend

May I know which target do you recommend?

probably need to check the corresponding cmake target...in buck, it's runtime/backend:interface, which should already include "//runtime/core:core", "//runtime/core:evalue", "//runtime/core:event_tracer", "//runtime/core:memory_allocator",

shewu-quic · 2024-04-12T08:03:31Z

I can run it.
May I check the results with you?

I get 37 partitions and accuracy is not good, such as "Once nieíoVA аas blablabla"

We have survey for our version. The reason seems related rms norm. We obeserve the bigger scale (about 10~30) for mul op in rms norm. When I fallback the rms norm (25 partitions), I will get better results, such as "Once upon a time, there was a mommy and a daddy blablalalb". But as you see, it still has a gap with expected result, "Once upon a time, there was a little girl named Lily. She loved to play outside". We are trying to fix it.

cccclai · 2025-03-17T18:11:01Z

Not longer needed.

local change to export llama to qnn

796ae1c

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 11, 2024

cccclai marked this pull request as draft April 11, 2024 04:34

BlackSamorez mentioned this pull request Aug 1, 2024

How to link custom ops? #4510

Closed

cccclai closed this Mar 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

local change to export llama to qnn #2985

local change to export llama to qnn #2985

Uh oh!

cccclai commented Apr 11, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Apr 11, 2024 •

edited

Loading

Uh oh!

shewu-quic commented Apr 12, 2024

Uh oh!

cccclai commented Apr 12, 2024

Uh oh!

cccclai commented Apr 12, 2024

Uh oh!

shewu-quic commented Apr 12, 2024

Uh oh!

shewu-quic commented Apr 12, 2024 •

edited

Loading

Uh oh!

cccclai commented Apr 12, 2024

Uh oh!

shewu-quic commented Apr 12, 2024 •

edited

Loading

Uh oh!

cccclai commented Mar 17, 2025

Uh oh!

Uh oh!

local change to export llama to qnn #2985

local change to export llama to qnn #2985

Uh oh!

Conversation

cccclai commented Apr 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2985

❌ 5 New Failures

Uh oh!

shewu-quic commented Apr 12, 2024

Uh oh!

cccclai commented Apr 12, 2024

Uh oh!

cccclai commented Apr 12, 2024

Uh oh!

shewu-quic commented Apr 12, 2024

Uh oh!

shewu-quic commented Apr 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cccclai commented Apr 12, 2024

Uh oh!

shewu-quic commented Apr 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cccclai commented Mar 17, 2025

Uh oh!

Uh oh!

cccclai commented Apr 11, 2024 •

edited

Loading

pytorch-bot bot commented Apr 11, 2024 •

edited

Loading

shewu-quic commented Apr 12, 2024 •

edited

Loading

shewu-quic commented Apr 12, 2024 •

edited

Loading