Qualcomm AI Engine Direct - Support Qnn IR backend in online preparation #8876

haowhsu-quic · 2025-03-03T09:46:24Z

Summary

Support Qnn IR backend
Replace QCir with Dlc in online prepare flow
Add config for Saver backend
Block online preparation if the QNN version is below 2.30.
Fix SDK version checking
quant/dequant op breakage fix
Upgrade ANDROID_NATIVE_API_LEVEL from 23 to 30
Add comments for qat_training_data/passes_job

Test plan

python -m backends.qualcomm.tests.test_qnn_delegate \
        TestQNNQuantizedOperator.test_qnn_backend_linear \
        -s $DEVICE -H $HOST -m SM8550 \
        -b build-android \
        --online_prepare

pytorch-bot · 2025-03-03T09:46:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8876

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8ffca64 with merge base 12ed924 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

haowhsu-quic · 2025-03-03T09:46:51Z

On behalf of @DannyYuyang-quic.

haowhsu-quic · 2025-03-03T09:54:56Z

@pytorchbot label "release notes: qualcomm"

cccclai · 2025-03-03T18:54:20Z

Looks like there is CI failure

DannyYuyang-quic · 2025-03-04T03:46:17Z

Hi @cccclai

The QnnIR-related header files in this PR, such as IR/QnnIrCommon.h, are compatible only with QNN version 2.30 and above.
I've updated the QNN version from 2.28 to 2.31 in the CI to align with QnnIR.

Sorry for not mentioning this and any inconvenience.

DannyYuyang-quic · 2025-03-04T04:06:59Z

Hi @cccclai ,
As previous discussions, versions 2.30 and 2.31 have a regression in Llama. So, please stick with QNN 2.28 for internal Llama CI until we sort this out. We’ll let you know as soon as it’s fixed

cccclai · 2025-03-04T04:52:35Z

Hi @cccclai , As previous discussions, versions 2.30 and 2.31 have a regression in Llama. So, please stick with QNN 2.28 for internal Llama CI until we sort this out. We’ll let you know as soon as it’s fixed

Can you share a bit what kind of regression does 2.30/2.31 have?

DannyYuyang-quic · 2025-03-04T07:27:44Z

Hi @cccclai , As previous discussions, versions 2.30 and 2.31 have a regression in Llama. So, please stick with QNN 2.28 for internal Llama CI until we sort this out. We’ll let you know as soon as it’s fixed

Can you share a bit what kind of regression does 2.30/2.31 have?

We have tested Llama 1B with Lanai using QNN2.28, 2.30, and 2.31.
In 2.28, we achieved 67tok/sec.
In 2.30, we observed 61tok/sec.
However, in 2.31, we are also observing 61tok/sec.

cccclai · 2025-03-05T03:17:45Z

Hi @cccclai , As previous discussions, versions 2.30 and 2.31 have a regression in Llama. So, please stick with QNN 2.28 for internal Llama CI until we sort this out. We’ll let you know as soon as it’s fixed

Can you share a bit what kind of regression does 2.30/2.31 have?

We have tested Llama 1B with Lanai using QNN2.28, 2.30, and 2.31. In 2.28, we achieved 67tok/sec. In 2.30, we observed 61tok/sec. However, in 2.31, we are also observing 61tok/sec.

I see. This PR bumps the qnn version in general, and we probably need to figure out how to manage these qnn versions. This PR probably will break our internal flow. Should we start an email to discuss about the versioning?

chenweng-quic · 2025-03-11T06:47:39Z

backends/qualcomm/runtime/backends/QnnContextCommon.cpp

+  }
+  return Error::Internal;
+}
+// std::vector<char> buffer(size);


can this be removed?

Yes, thanks for pointing that out!
@haowhsu-quic, could you please help me remove this line to trigger CI?

cccclai · 2025-03-18T22:35:20Z

As discussed in the meetings, let's only bump the version in open source and error out when users try to run online prepare with versions older than 2.30

DannyYuyang-quic · 2025-04-18T06:03:59Z

Hi @cccclai, I've pushed a new commit with the fixes, and it seems like everything is green.
Please have a look.
Thanks!

kirklandsign · 2025-04-21T20:55:29Z

scripts/build_android_library.sh

@@ -37,6 +37,7 @@ build_android_native_library() {
  cmake . -DCMAKE_INSTALL_PREFIX="${CMAKE_OUT}" \
    -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}/build/cmake/android.toolchain.cmake" \
    -DANDROID_ABI="${ANDROID_ABI}" \
+    -DANDROID_NATIVE_API_LEVEL=30 \


Use https://developer.android.com/ndk/guides/cmake#android_platform? (line below)

is this needed?

Yes, thank you for your suggestion.

facebook-github-bot · 2025-04-21T23:54:40Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2025-04-22T05:25:24Z

Can you add this change?

--- a/fbcode/executorch/backends/qualcomm/runtime/targets.bzl
+++ b/fbcode/executorch/backends/qualcomm/runtime/targets.bzl
@@ -43,14 +43,18 @@
                 [
                     "*.cpp",
                     "backends/*.cpp",
+                    "backends/irbackend/*.cpp",
                     "backends/htpbackend/*.cpp",
-                ] + (["backends/htpbackend/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/htpbackend/aarch64/*.cpp"]),
+                ] + (["backends/htpbackend/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/htpbackend/aarch64/*.cpp"]) + (
+                    ["backends/irbackend/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/irbackend/aarch64/*.cpp"]
+                ),
                 exclude = ["Logging.cpp"],
             ),
             exported_headers = glob(
                 [
                     "*.h",
                     "backends/*.h",
+                    "backends/irbackend/*.h",
                     "backends/htpbackend/*.h",
                 ],

Also, I'm getting this error

In file included from fbcode/executorch/backends/qualcomm/runtime/backends/QnnContextCommon.cpp:10:
buck-out/v2/gen/fbcode/5d832762563ef7a9/executorch/backends/qualcomm/runtime/__runtime__/buck-headers/executorch/backends/qualcomm/runtime/backends/QnnDlcManager.h:15:10: fatal error: 'QnnWrapperUtils.hpp' file not found
   15 | #include "QnnWrapperUtils.hpp"
      |          ^~~~~~~~~~~~~~~~~~~~~
1 error generated.

Where does this file come from?

DannyYuyang-quic · 2025-04-22T09:31:14Z

backends/qualcomm/CMakeLists.txt

@@ -70,6 +70,7 @@ endif()

 include_directories(
  BEFORE ${_common_include_directories} ${QNN_SDK_ROOT}/include/QNN
+  ${QNN_SDK_ROOT}/share/QNN/converter/jni


Also, I'm getting this error

In file included from fbcode/executorch/backends/qualcomm/runtime/backends/QnnContextCommon.cpp:10: buck-out/v2/gen/fbcode/5d832762563ef7a9/executorch/backends/qualcomm/runtime/__runtime__/buck-headers/executorch/backends/qualcomm/runtime/backends/QnnDlcManager.h:15:10: fatal error: 'QnnWrapperUtils.hpp' file not found 15 | #include "QnnWrapperUtils.hpp" | ^~~~~~~~~~~~~~~~~~~~~ 1 error generated.

Where does this file come from?

The QnnWrapperUtils.hpp file is located under the path ${QNN_SDK_ROOT}/share/QNN/converter/jni.

Find it, looks like we miss some files for the internal build. I added a target for the files inside /share/QNN/converter/jni, and now run into this error

ld.lld: error: undefined symbol: qnn_wrapper_api::strnDup(char const*, unsigned long) >>> referenced by QnnWrapperUtils.cpp:75 (./third-party/qualcomm/qnn/qnn-2.28/share/QNN/converter/jni/QnnWrapperUtils.cpp:75) >>> buck-out/v2/gen/fbsource/7d5d1c564400faae/third-party/qualcomm/qnn/qnn-2.28/__app_sources__/__objects__/share/QNN/converter/jni/QnnWrapperUtils.cpp.pic.o:(qnn_wrapper_api::deepCopyQnnTensors(Qnn_Tensor_t&, Qnn_Tensor_t&)) >>> referenced by QnnModel.cpp:403 (./third-party/qualcomm/qnn/qnn-2.28/share/QNN/converter/jni/QnnModel.cpp:403) >>> buck-out/v2/gen/fbsource/7d5d1c564400faae/third-party/qualcomm/qnn/qnn-2.28/__app_sources__/__objects__/share/QNN/converter/jni/QnnModel.cpp.pic.o:(qnn_wrapper_api::getGraphInfoFromModels(qnn_wrapper_api::QnnModel*, unsigned int, qnn_wrapper_api::GraphInfo***))

Looks like

char *strnDup(const char *source, size_t maxlen);

is defined inside QnnModelPal.hpp, where is the source function?

Ah find it, nvm.

Do you know how much size increase it will add to the android? Also, is it for x86 only or both?

ld.lld: error: undefined symbol: qnn_wrapper_api::strnDup(char const*, unsigned long) >>> referenced by QnnWrapperUtils.cpp:75 (./third-party/qualcomm/qnn/qnn-2.28/share/QNN/converter/jni/QnnWrapperUtils.cpp:75) >>> buck-out/v2/gen/fbsource/7d5d1c564400faae/third-party/qualcomm/qnn/qnn-2.28/__app_sources__/__objects__/share/QNN/converter/jni/QnnWrapperUtils.cpp.pic.o:(qnn_wrapper_api::deepCopyQnnTensors(Qnn_Tensor_t&, Qnn_Tensor_t&)) >>> referenced by QnnModel.cpp:403 (./third-party/qualcomm/qnn/qnn-2.28/share/QNN/converter/jni/QnnModel.cpp:403) >>> buck-out/v2/gen/fbsource/7d5d1c564400faae/third-party/qualcomm/qnn/qnn-2.28/__app_sources__/__objects__/share/QNN/converter/jni/QnnModel.cpp.pic.o:(qnn_wrapper_api::getGraphInfoFromModels(qnn_wrapper_api::QnnModel*, unsigned int, qnn_wrapper_api::GraphInfo***))

We don't need to include the QnnWrapperUtils.cpp file; we only use the macro inside QnnWrapperUtils.hpp.

Do you know how much size increase it will add to the android? Also, is it for x86 only or both?

Regarding the size increase, libqnn_executorch_backend.so will grow from 11.79MB to 12.19MB on android in total, based on a comparison between the mainline and this PR.
This is required for both x86 and android.

oh, hmm, do you mean I just need to add some files? I currently add a dependency for the buck target like this

cxx_library( name = "app_sources", srcs = glob([ "share/QNN/converter/jni/*.cpp", ]) + select({ "DEFAULT": glob([ "share/QNN/converter/jni/linux/*.cpp", ]), "ovr_config//os:linux": glob([ "share/QNN/converter/jni/linux/*.cpp", ]), "ovr_config//os:windows": glob([ "share/QNN/converter/jni/windows/*.cpp", ]), }), headers = glob([ "share/QNN/converter/jni/*.hpp", ]), header_namespace = "", exported_headers = subdir_glob([ ("share/QNN/converter/jni", "*.hpp"), ]), visibility = [ "PUBLIC", ], deps = [ ":api", ], )

Can you help me understand what is required and what is not? If you have a better name, that's better

Can we consider making it optional in the future? For production, the runtime size budget can be limited sometimes.

oh, hmm, do you mean I just need to add some files? I currently add a dependency for the buck target like this

cxx_library( name = "qnn_converter_sources", exported_headers = subdir_glob([ ("share/QNN/converter/jni", "QnnWrapperUtils.hpp"), ]), visibility = [ "PUBLIC", ], deps = [ ":api", ], )

Yes, we only need QnnWrapperUtils.hpp, so I think our dependency can just be like this~ ?

Can we consider making it optional in the future? For production, the runtime size budget can be limited sometimes.

I see. Ideally, it would be great to make it optional, I will have the corresponding PR on this.

cccclai · 2025-04-22T21:02:49Z

Can you also update these

--- a/fbcode/executorch/backends/qualcomm/runtime/backends/irbackend/aarch64/QnnDlcManager.cpp
+++ b/fbcode/executorch/backends/qualcomm/runtime/backends/irbackend/aarch64/QnnDlcManager.cpp
@@ -73,7 +73,13 @@
       cache->GetQnnContextBlob();
 
   // memfd_create on android api level 30 and above
-  int fd = memfd_create("tmp.dlc", 0);
+  // int fd = memfd_create("tmp.dlc", 0);
+  int fd = -1;
+  #ifdef __ANDROID__
+    #if __ANDROID_API__ >= 30
+      fd = memfd_create("tmp.dlc", 0);
+    #endif
+  #endif

Summary: pytorch#8876 add dependency on the QnnWrapperUtils.hpp, add the buck file here. Differential Revision: D73452937

Summary: pytorch#8876 add dependency on the QnnWrapperUtils.hpp, add the buck file here. Reviewed By: kirklandsign Differential Revision: D73452937

Summary: Pull Request resolved: pytorch#10370 pytorch#8876 add dependency on the QnnWrapperUtils.hpp, add the buck file here. Reviewed By: kirklandsign Differential Revision: D73452937

backends/qualcomm/utils/utils.py

facebook-github-bot · 2025-04-23T06:20:25Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2025-04-23T06:58:23Z

hmm seems like there is a merge conflict, can you rebase?

DannyYuyang-quic · 2025-04-28T07:49:44Z

hmm seems like there is a merge conflict, can you rebase?

Hi @cccclai,
I rebased it, but there are some errors in the CI.
I'm not sure if they're caused by this PR. Could you take a look?
Thanks!

cccclai · 2025-04-29T03:48:55Z

I'm out of office and don't have access for now. @kirklandsign can you help a bit?

facebook-github-bot · 2025-04-29T04:25:15Z

@kirklandsign has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kirklandsign · 2025-04-29T04:25:24Z

Hi @haowhsu-quic seems that you still need to rebase it.

facebook-github-bot · 2025-04-29T17:23:16Z

@kirklandsign has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kirklandsign · 2025-04-29T23:54:03Z

Seems that we are very close. Just android ci.

2025-04-29T17:48:52.5972976Z ld.lld: �[0;31merror: �[0mundefined symbol: __tls_get_addr
2025-04-29T17:48:52.5974053Z >>> referenced by thread_parallel.cpp:32 (/pytorch/executorch/extension/threadpool/thread_parallel.cpp:32)
2025-04-29T17:48:52.5975826Z >>>               thread_parallel.cpp.o:(executorch::extension::get_thread_num()) in archive /pytorch/executorch/cmake-out-android-x86_64/lib/libextension_threadpool.a
2025-04-29T17:48:52.5977441Z >>> referenced by thread_parallel.cpp:36 (/pytorch/executorch/extension/threadpool/thread_parallel.cpp:36)
2025-04-29T17:48:52.5978719Z >>>               thread_parallel.cpp.o:(executorch::extension::set_thread_num(long)) in archive /pytorch/executorch/cmake-out-android-x86_64/lib/libextension_threadpool.a
2025-04-29T17:48:52.5979742Z >>> referenced by thread_parallel.cpp:36 (/pytorch/executorch/extension/threadpool/thread_parallel.cpp:36)
2025-04-29T17:48:52.5982206Z >>>               thread_parallel.cpp.o:(std::__ndk1::__function::__func<executorch::extension::parallel_for(long, long, long, executorch::runtime::FunctionRef<void (long, long)>)::$_0, std::__ndk1::allocator<executorch::extension::parallel_for(long, long, long, executorch::runtime::FunctionRef<void (long, long)>)::$_0>, void (unsigned long)>::operator()(unsigned long&&)) in archive /pytorch/executorch/cmake-out-android-x86_64/lib/libextension_threadpool.a

Is that related to bumping android sdk version

- Support Qnn IR backend - Replace QCir with Dlc in online prepare flow - Add config for Saver backend - Block online preparation if the QNN version is below 2.30. - Fix SDK version checking - quant/dequant op breakage fix - Upgrade ANDROID_NATIVE_API_LEVEL from 23 to 30 - Add comments for qat_training_data/passes_job

DannyYuyang-quic · 2025-04-30T15:08:10Z

Hi @kirklandsign,
I think we've passed the android CI. Could you please take a look? Thanks!

facebook-github-bot · 2025-04-30T20:40:52Z

@kirklandsign has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kirklandsign · 2025-04-30T20:41:27Z

Thank you @DannyYuyang-quic !!

Let's import and do another round of internal CI!

haowhsu-quic requested a review from cccclai as a code owner March 3, 2025 09:46

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 3, 2025

pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Mar 3, 2025

DannyYuyang-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from 50a3f7f to b0f66f8 Compare March 4, 2025 03:23

DannyYuyang-quic requested a review from mergennachin as a code owner March 4, 2025 03:23

DannyYuyang-quic had a problem deploying to upload-benchmark-results March 4, 2025 05:05 — with GitHub Actions Failure

DannyYuyang-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from b0f66f8 to dfa286b Compare March 6, 2025 08:59

haowhsu-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from dfa286b to 73eefc6 Compare March 6, 2025 09:04

haowhsu-quic had a problem deploying to upload-benchmark-results March 6, 2025 09:21 — with GitHub Actions Failure

winskuo-quic had a problem deploying to upload-benchmark-results March 6, 2025 10:09 — with GitHub Actions Failure

github-actions bot mentioned this pull request Mar 10, 2025

Weekly pr metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#16

Open

chenweng-quic reviewed Mar 11, 2025

View reviewed changes

DannyYuyang-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from e05d75d to 68365ef Compare March 12, 2025 07:05

DannyYuyang-quic requested review from GregoryComer and kirklandsign as code owners March 12, 2025 07:05

haowhsu-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from 68365ef to b6b9549 Compare March 12, 2025 07:40

haowhsu-quic had a problem deploying to upload-benchmark-results March 12, 2025 07:58 — with GitHub Actions Failure

github-actions bot mentioned this pull request Mar 17, 2025

Weekly pr metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#18

Open

DannyYuyang-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from b6b9549 to f750340 Compare March 24, 2025 09:06

github-actions bot mentioned this pull request Mar 24, 2025

Weekly pr metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#20

Open

kirklandsign reviewed Apr 21, 2025

View reviewed changes

DannyYuyang-quic reviewed Apr 22, 2025

View reviewed changes

cccclai mentioned this pull request Apr 22, 2025

Add buck file for qnn jni #10370

Merged

cccclai added a commit to cccclai/executorch-1 that referenced this pull request Apr 22, 2025

Add buck file for qnn jni

14b95eb

Summary: pytorch#8876 add dependency on the QnnWrapperUtils.hpp, add the buck file here. Differential Revision: D73452937

cccclai added a commit to cccclai/executorch-1 that referenced this pull request Apr 22, 2025

Add buck file for qnn jni (pytorch#10370)

47f3c92

Summary: pytorch#8876 add dependency on the QnnWrapperUtils.hpp, add the buck file here. Reviewed By: kirklandsign Differential Revision: D73452937

cccclai reviewed Apr 22, 2025

View reviewed changes

backends/qualcomm/utils/utils.py Outdated Show resolved Hide resolved

DannyYuyang-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from 5d224d7 to eaf22c9 Compare April 23, 2025 04:54

DannyYuyang-quic force-pushed the dev1/danny/support_qnn_ir_backend branch 2 times, most recently from eac8893 to 8611a4b Compare April 28, 2025 06:35

DannyYuyang-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from 5334d52 to 4b4f6ce Compare April 30, 2025 07:15

DannyYuyang-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from 4b4f6ce to 8ffca64 Compare April 30, 2025 08:40

kirklandsign approved these changes Apr 30, 2025

View reviewed changes

facebook-github-bot merged commit 48ad9f6 into pytorch:main May 1, 2025
260 of 263 checks passed

Qualcomm AI Engine Direct - Support Qnn IR backend in online preparation #8876

Qualcomm AI Engine Direct - Support Qnn IR backend in online preparation #8876

Conversation

haowhsu-quic commented Mar 3, 2025 • edited Loading

Summary

Test plan

pytorch-bot bot commented Mar 3, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8876

✅ No Failures

haowhsu-quic commented Mar 3, 2025

haowhsu-quic commented Mar 3, 2025

cccclai commented Mar 3, 2025

DannyYuyang-quic commented Mar 4, 2025

DannyYuyang-quic commented Mar 4, 2025

cccclai commented Mar 4, 2025

DannyYuyang-quic commented Mar 4, 2025

cccclai commented Mar 5, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cccclai commented Mar 18, 2025

DannyYuyang-quic commented Apr 18, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DannyYuyang-quic Apr 24, 2025 • edited Loading

Choose a reason for hiding this comment

facebook-github-bot commented Apr 21, 2025

cccclai commented Apr 22, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DannyYuyang-quic Apr 23, 2025 • edited Loading

Choose a reason for hiding this comment

cccclai commented Apr 22, 2025

facebook-github-bot commented Apr 23, 2025

cccclai commented Apr 23, 2025

DannyYuyang-quic commented Apr 28, 2025 • edited Loading

cccclai commented Apr 29, 2025

facebook-github-bot commented Apr 29, 2025

kirklandsign commented Apr 29, 2025

facebook-github-bot commented Apr 29, 2025

kirklandsign commented Apr 29, 2025

DannyYuyang-quic commented Apr 30, 2025

facebook-github-bot commented Apr 30, 2025

kirklandsign commented Apr 30, 2025

haowhsu-quic commented Mar 3, 2025 •

edited

Loading

pytorch-bot bot commented Mar 3, 2025 •

edited

Loading

DannyYuyang-quic Apr 24, 2025 •

edited

Loading

DannyYuyang-quic Apr 23, 2025 •

edited

Loading

DannyYuyang-quic commented Apr 28, 2025 •

edited

Loading