Skip to content

Qualcomm AI Engine Direct - Support Qnn IR backend in online preparation #8876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

haowhsu-quic
Copy link
Collaborator

@haowhsu-quic haowhsu-quic commented Mar 3, 2025

Summary

  • Support Qnn IR backend
  • Replace QCir with Dlc in online prepare flow
  • Add config for Saver backend
  • Block online preparation if the QNN version is below 2.30.
  • Fix SDK version checking
  • quant/dequant op breakage fix
  • Upgrade ANDROID_NATIVE_API_LEVEL from 23 to 30
  • Add comments for qat_training_data/passes_job

Test plan

python -m backends.qualcomm.tests.test_qnn_delegate \
        TestQNNQuantizedOperator.test_qnn_backend_linear \
        -s $DEVICE -H $HOST -m SM8550 \
        -b build-android \
        --online_prepare

@haowhsu-quic haowhsu-quic requested a review from cccclai as a code owner March 3, 2025 09:46
Copy link

pytorch-bot bot commented Mar 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8876

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8ffca64 with merge base 12ed924 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 3, 2025
@haowhsu-quic
Copy link
Collaborator Author

On behalf of @DannyYuyang-quic.

@haowhsu-quic
Copy link
Collaborator Author

@pytorchbot label "release notes: qualcomm"

@pytorch-bot pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Mar 3, 2025
@cccclai
Copy link
Contributor

cccclai commented Mar 3, 2025

Looks like there is CI failure

@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from 50a3f7f to b0f66f8 Compare March 4, 2025 03:23
@DannyYuyang-quic
Copy link
Collaborator

Hi @cccclai

The QnnIR-related header files in this PR, such as IR/QnnIrCommon.h, are compatible only with QNN version 2.30 and above.
I've updated the QNN version from 2.28 to 2.31 in the CI to align with QnnIR.

Sorry for not mentioning this and any inconvenience.

@DannyYuyang-quic
Copy link
Collaborator

Hi @cccclai ,
As previous discussions, versions 2.30 and 2.31 have a regression in Llama. So, please stick with QNN 2.28 for internal Llama CI until we sort this out. We’ll let you know as soon as it’s fixed

@cccclai
Copy link
Contributor

cccclai commented Mar 4, 2025

Hi @cccclai , As previous discussions, versions 2.30 and 2.31 have a regression in Llama. So, please stick with QNN 2.28 for internal Llama CI until we sort this out. We’ll let you know as soon as it’s fixed

Can you share a bit what kind of regression does 2.30/2.31 have?

@DannyYuyang-quic DannyYuyang-quic had a problem deploying to upload-benchmark-results March 4, 2025 05:05 — with GitHub Actions Failure
@DannyYuyang-quic
Copy link
Collaborator

Hi @cccclai , As previous discussions, versions 2.30 and 2.31 have a regression in Llama. So, please stick with QNN 2.28 for internal Llama CI until we sort this out. We’ll let you know as soon as it’s fixed

Can you share a bit what kind of regression does 2.30/2.31 have?

We have tested Llama 1B with Lanai using QNN2.28, 2.30, and 2.31.
In 2.28, we achieved 67tok/sec.
In 2.30, we observed 61tok/sec.
However, in 2.31, we are also observing 61tok/sec.

@cccclai
Copy link
Contributor

cccclai commented Mar 5, 2025

Hi @cccclai , As previous discussions, versions 2.30 and 2.31 have a regression in Llama. So, please stick with QNN 2.28 for internal Llama CI until we sort this out. We’ll let you know as soon as it’s fixed

Can you share a bit what kind of regression does 2.30/2.31 have?

We have tested Llama 1B with Lanai using QNN2.28, 2.30, and 2.31. In 2.28, we achieved 67tok/sec. In 2.30, we observed 61tok/sec. However, in 2.31, we are also observing 61tok/sec.

I see. This PR bumps the qnn version in general, and we probably need to figure out how to manage these qnn versions. This PR probably will break our internal flow. Should we start an email to discuss about the versioning?

@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from b0f66f8 to dfa286b Compare March 6, 2025 08:59
@haowhsu-quic haowhsu-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from dfa286b to 73eefc6 Compare March 6, 2025 09:04
@haowhsu-quic haowhsu-quic had a problem deploying to upload-benchmark-results March 6, 2025 09:21 — with GitHub Actions Failure
@winskuo-quic winskuo-quic had a problem deploying to upload-benchmark-results March 6, 2025 10:09 — with GitHub Actions Failure
}
return Error::Internal;
}
// std::vector<char> buffer(size);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be removed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for pointing that out!
@haowhsu-quic, could you please help me remove this line to trigger CI?

@cccclai
Copy link
Contributor

cccclai commented Mar 18, 2025

As discussed in the meetings, let's only bump the version in open source and error out when users try to run online prepare with versions older than 2.30

@DannyYuyang-quic
Copy link
Collaborator

Hi @cccclai, I've pushed a new commit with the fixes, and it seems like everything is green.
Please have a look.
Thanks!

@@ -37,6 +37,7 @@ build_android_native_library() {
cmake . -DCMAKE_INSTALL_PREFIX="${CMAKE_OUT}" \
-DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}/build/cmake/android.toolchain.cmake" \
-DANDROID_ABI="${ANDROID_ABI}" \
-DANDROID_NATIVE_API_LEVEL=30 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed?

Copy link
Collaborator

@DannyYuyang-quic DannyYuyang-quic Apr 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you for your suggestion.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai
Copy link
Contributor

cccclai commented Apr 22, 2025

Can you add this change?

--- a/fbcode/executorch/backends/qualcomm/runtime/targets.bzl
+++ b/fbcode/executorch/backends/qualcomm/runtime/targets.bzl
@@ -43,14 +43,18 @@
                 [
                     "*.cpp",
                     "backends/*.cpp",
+                    "backends/irbackend/*.cpp",
                     "backends/htpbackend/*.cpp",
-                ] + (["backends/htpbackend/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/htpbackend/aarch64/*.cpp"]),
+                ] + (["backends/htpbackend/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/htpbackend/aarch64/*.cpp"]) + (
+                    ["backends/irbackend/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/irbackend/aarch64/*.cpp"]
+                ),
                 exclude = ["Logging.cpp"],
             ),
             exported_headers = glob(
                 [
                     "*.h",
                     "backends/*.h",
+                    "backends/irbackend/*.h",
                     "backends/htpbackend/*.h",
                 ],

Also, I'm getting this error

In file included from fbcode/executorch/backends/qualcomm/runtime/backends/QnnContextCommon.cpp:10:
buck-out/v2/gen/fbcode/5d832762563ef7a9/executorch/backends/qualcomm/runtime/__runtime__/buck-headers/executorch/backends/qualcomm/runtime/backends/QnnDlcManager.h:15:10: fatal error: 'QnnWrapperUtils.hpp' file not found
   15 | #include "QnnWrapperUtils.hpp"
      |          ^~~~~~~~~~~~~~~~~~~~~
1 error generated.

Where does this file come from?

@@ -70,6 +70,7 @@ endif()

include_directories(
BEFORE ${_common_include_directories} ${QNN_SDK_ROOT}/include/QNN
${QNN_SDK_ROOT}/share/QNN/converter/jni
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm getting this error

In file included from fbcode/executorch/backends/qualcomm/runtime/backends/QnnContextCommon.cpp:10:
buck-out/v2/gen/fbcode/5d832762563ef7a9/executorch/backends/qualcomm/runtime/__runtime__/buck-headers/executorch/backends/qualcomm/runtime/backends/QnnDlcManager.h:15:10: fatal error: 'QnnWrapperUtils.hpp' file not found
   15 | #include "QnnWrapperUtils.hpp"
      |          ^~~~~~~~~~~~~~~~~~~~~
1 error generated.

Where does this file come from?

The QnnWrapperUtils.hpp file is located under the path ${QNN_SDK_ROOT}/share/QNN/converter/jni.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Find it, looks like we miss some files for the internal build. I added a target for the files inside /share/QNN/converter/jni, and now run into this error

ld.lld: error: undefined symbol: qnn_wrapper_api::strnDup(char const*, unsigned long)
>>> referenced by QnnWrapperUtils.cpp:75 (./third-party/qualcomm/qnn/qnn-2.28/share/QNN/converter/jni/QnnWrapperUtils.cpp:75)
>>>               buck-out/v2/gen/fbsource/7d5d1c564400faae/third-party/qualcomm/qnn/qnn-2.28/__app_sources__/__objects__/share/QNN/converter/jni/QnnWrapperUtils.cpp.pic.o:(qnn_wrapper_api::deepCopyQnnTensors(Qnn_Tensor_t&, Qnn_Tensor_t&))
>>> referenced by QnnModel.cpp:403 (./third-party/qualcomm/qnn/qnn-2.28/share/QNN/converter/jni/QnnModel.cpp:403)
>>>               buck-out/v2/gen/fbsource/7d5d1c564400faae/third-party/qualcomm/qnn/qnn-2.28/__app_sources__/__objects__/share/QNN/converter/jni/QnnModel.cpp.pic.o:(qnn_wrapper_api::getGraphInfoFromModels(qnn_wrapper_api::QnnModel*, unsigned int, qnn_wrapper_api::GraphInfo***))

Looks like

char *strnDup(const char *source, size_t maxlen);

is defined inside QnnModelPal.hpp, where is the source function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah find it, nvm.

Do you know how much size increase it will add to the android? Also, is it for x86 only or both?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ld.lld: error: undefined symbol: qnn_wrapper_api::strnDup(char const*, unsigned long)
>>> referenced by QnnWrapperUtils.cpp:75 (./third-party/qualcomm/qnn/qnn-2.28/share/QNN/converter/jni/QnnWrapperUtils.cpp:75)
>>>               buck-out/v2/gen/fbsource/7d5d1c564400faae/third-party/qualcomm/qnn/qnn-2.28/__app_sources__/__objects__/share/QNN/converter/jni/QnnWrapperUtils.cpp.pic.o:(qnn_wrapper_api::deepCopyQnnTensors(Qnn_Tensor_t&, Qnn_Tensor_t&))
>>> referenced by QnnModel.cpp:403 (./third-party/qualcomm/qnn/qnn-2.28/share/QNN/converter/jni/QnnModel.cpp:403)
>>>               buck-out/v2/gen/fbsource/7d5d1c564400faae/third-party/qualcomm/qnn/qnn-2.28/__app_sources__/__objects__/share/QNN/converter/jni/QnnModel.cpp.pic.o:(qnn_wrapper_api::getGraphInfoFromModels(qnn_wrapper_api::QnnModel*, unsigned int, qnn_wrapper_api::GraphInfo***))

We don't need to include the QnnWrapperUtils.cpp file; we only use the macro inside QnnWrapperUtils.hpp.

Do you know how much size increase it will add to the android? Also, is it for x86 only or both?

Regarding the size increase, libqnn_executorch_backend.so will grow from 11.79MB to 12.19MB on android in total, based on a comparison between the mainline and this PR.
This is required for both x86 and android.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, hmm, do you mean I just need to add some files? I currently add a dependency for the buck target like this

cxx_library(
    name = "app_sources",
    srcs = glob([
        "share/QNN/converter/jni/*.cpp",
    ]) + select({
        "DEFAULT": glob([
            "share/QNN/converter/jni/linux/*.cpp",
        ]),
        "ovr_config//os:linux": glob([
            "share/QNN/converter/jni/linux/*.cpp",
        ]),
        "ovr_config//os:windows": glob([
            "share/QNN/converter/jni/windows/*.cpp",
        ]),
    }),
    headers = glob([
        "share/QNN/converter/jni/*.hpp",
    ]),
    header_namespace = "",
    exported_headers = subdir_glob([
        ("share/QNN/converter/jni", "*.hpp"),
    ]),
    visibility = [
        "PUBLIC",
    ],
    deps = [
        ":api",
    ],
)

Can you help me understand what is required and what is not? If you have a better name, that's better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider making it optional in the future? For production, the runtime size budget can be limited sometimes.

Copy link
Collaborator

@DannyYuyang-quic DannyYuyang-quic Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, hmm, do you mean I just need to add some files? I currently add a dependency for the buck target like this

cxx_library(
    name = "qnn_converter_sources",
    exported_headers = subdir_glob([
        ("share/QNN/converter/jni", "QnnWrapperUtils.hpp"),
    ]),
    visibility = [
        "PUBLIC",
    ],
    deps = [
        ":api",
    ],
)

Yes, we only need QnnWrapperUtils.hpp, so I think our dependency can just be like this~ ?

Can we consider making it optional in the future? For production, the runtime size budget can be limited sometimes.

I see. Ideally, it would be great to make it optional, I will have the corresponding PR on this.

@cccclai
Copy link
Contributor

cccclai commented Apr 22, 2025

Can you also update these

--- a/fbcode/executorch/backends/qualcomm/runtime/backends/irbackend/aarch64/QnnDlcManager.cpp
+++ b/fbcode/executorch/backends/qualcomm/runtime/backends/irbackend/aarch64/QnnDlcManager.cpp
@@ -73,7 +73,13 @@
       cache->GetQnnContextBlob();
 
   // memfd_create on android api level 30 and above
-  int fd = memfd_create("tmp.dlc", 0);
+  // int fd = memfd_create("tmp.dlc", 0);
+  int fd = -1;
+  #ifdef __ANDROID__
+    #if __ANDROID_API__ >= 30
+      fd = memfd_create("tmp.dlc", 0);
+    #endif
+  #endif

cccclai added a commit to cccclai/executorch-1 that referenced this pull request Apr 22, 2025
Summary: pytorch#8876 add dependency on the QnnWrapperUtils.hpp, add the buck file here.

Differential Revision: D73452937
cccclai added a commit to cccclai/executorch-1 that referenced this pull request Apr 22, 2025
Summary:

pytorch#8876 add dependency on the QnnWrapperUtils.hpp, add the buck file here.

Reviewed By: kirklandsign

Differential Revision: D73452937
cccclai added a commit to cccclai/executorch-1 that referenced this pull request Apr 22, 2025
Summary:
Pull Request resolved: pytorch#10370

pytorch#8876 add dependency on the QnnWrapperUtils.hpp, add the buck file here.

Reviewed By: kirklandsign

Differential Revision: D73452937
@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from 5d224d7 to eaf22c9 Compare April 23, 2025 04:54
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai
Copy link
Contributor

cccclai commented Apr 23, 2025

hmm seems like there is a merge conflict, can you rebase?

@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/support_qnn_ir_backend branch 2 times, most recently from eac8893 to 8611a4b Compare April 28, 2025 06:35
@DannyYuyang-quic
Copy link
Collaborator

DannyYuyang-quic commented Apr 28, 2025

hmm seems like there is a merge conflict, can you rebase?

Hi @cccclai,
I rebased it, but there are some errors in the CI.
I'm not sure if they're caused by this PR. Could you take a look?
Thanks!

@cccclai
Copy link
Contributor

cccclai commented Apr 29, 2025

I'm out of office and don't have access for now. @kirklandsign can you help a bit?

@facebook-github-bot
Copy link
Contributor

@kirklandsign has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kirklandsign
Copy link
Contributor

Hi @haowhsu-quic seems that you still need to rebase it.

@facebook-github-bot
Copy link
Contributor

@kirklandsign has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kirklandsign
Copy link
Contributor

Seems that we are very close. Just android ci.

2025-04-29T17:48:52.5972976Z ld.lld: �[0;31merror: �[0mundefined symbol: __tls_get_addr
2025-04-29T17:48:52.5974053Z >>> referenced by thread_parallel.cpp:32 (/pytorch/executorch/extension/threadpool/thread_parallel.cpp:32)
2025-04-29T17:48:52.5975826Z >>>               thread_parallel.cpp.o:(executorch::extension::get_thread_num()) in archive /pytorch/executorch/cmake-out-android-x86_64/lib/libextension_threadpool.a
2025-04-29T17:48:52.5977441Z >>> referenced by thread_parallel.cpp:36 (/pytorch/executorch/extension/threadpool/thread_parallel.cpp:36)
2025-04-29T17:48:52.5978719Z >>>               thread_parallel.cpp.o:(executorch::extension::set_thread_num(long)) in archive /pytorch/executorch/cmake-out-android-x86_64/lib/libextension_threadpool.a
2025-04-29T17:48:52.5979742Z >>> referenced by thread_parallel.cpp:36 (/pytorch/executorch/extension/threadpool/thread_parallel.cpp:36)
2025-04-29T17:48:52.5982206Z >>>               thread_parallel.cpp.o:(std::__ndk1::__function::__func<executorch::extension::parallel_for(long, long, long, executorch::runtime::FunctionRef<void (long, long)>)::$_0, std::__ndk1::allocator<executorch::extension::parallel_for(long, long, long, executorch::runtime::FunctionRef<void (long, long)>)::$_0>, void (unsigned long)>::operator()(unsigned long&&)) in archive /pytorch/executorch/cmake-out-android-x86_64/lib/libextension_threadpool.a

Is that related to bumping android sdk version

@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from 5334d52 to 4b4f6ce Compare April 30, 2025 07:15
 - Support Qnn IR backend
 - Replace QCir with Dlc in online prepare flow
 - Add config for Saver backend
 - Block online preparation if the QNN version is below 2.30.
 - Fix SDK version checking
 - quant/dequant op breakage fix
 - Upgrade ANDROID_NATIVE_API_LEVEL from 23 to 30
 - Add comments for qat_training_data/passes_job
@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/support_qnn_ir_backend branch from 4b4f6ce to 8ffca64 Compare April 30, 2025 08:40
@DannyYuyang-quic
Copy link
Collaborator

Hi @kirklandsign,
I think we've passed the android CI. Could you please take a look? Thanks!

@facebook-github-bot
Copy link
Contributor

@kirklandsign has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@kirklandsign
Copy link
Contributor

Thank you @DannyYuyang-quic !!

Let's import and do another round of internal CI!

@facebook-github-bot facebook-github-bot merged commit 48ad9f6 into pytorch:main May 1, 2025
260 of 263 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: qualcomm Changes to the Qualcomm backend delegate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants