Qualcomm AI Engine Direct - oss model enablement (EfficientSAM) #9266

DannyYuyang-quic · 2025-03-14T08:54:39Z

Summary

e2e script for https://github.com/yformer/EfficientSAM
Fastvit breakage fix
Add support for cum_sum
Add bicubic interpolate transform pass
Fix stack op

Test plan

python ./examples/qualcomm/oss_scripts/efficientSAM/efficientSAM.py -m ${soc} -b build-android -H ${host_id} -s ${device_id} --oss_repo ${Path_to_oss_repo} --pretrained_weight ${Path_to_pretrained_weight} -d ${Path_to_dataset_dir}

pytorch-bot · 2025-03-14T08:54:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9266

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 42412d7 with merge base c9c5481 ():

NEW FAILURE - The following job has failed:

pull / unittest-arm / linux-job (gh)
RuntimeError: Command docker exec -t c50233835032a0388f502a805413875acb5526b953584d9a5d02ffbe166713f5 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

DannyYuyang-quic · 2025-03-14T08:56:03Z

@pytorchbot label "release notes: qualcomm"

DannyYuyang-quic · 2025-03-14T09:01:38Z

@cccclai
This PR is one of the model enablement requests. Please have a look.
BTW, I can't trigger CI. Could you please help with this?

Thanks!

cccclai

Thank you for enabling EfficientSAM! As we enable more models, let's add it to part of the CI, similar to #8616

DannyYuyang-quic · 2025-03-17T04:29:01Z

Thank you for enabling EfficientSAM! As we enable more models, let's add it to part of the CI, similar to #8616

Regarding the CI, would you prefer I add it in this PR or add it in the next one?

cccclai · 2025-03-17T04:48:48Z

Thank you for enabling EfficientSAM! As we enable more models, let's add it to part of the CI, similar to #8616

Next PR is fine

facebook-github-bot · 2025-03-20T22:14:20Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2025-03-21T16:50:48Z

backends/qualcomm/tests/test_qnn_delegate.py

+    def test_qnn_backend_cumsum(self):
+        module = CumSum()  # noqa: F405
+        sample_input = (torch.randn(4),)
+        self.lower_module_and_test_output(module, sample_input)


It seems like the fp test is failing, can you double check? The quantized one is passing.

Could you please share the log~?
Both the fp and quantized tests are passing from my side.

cccclai · 2025-03-24T18:32:25Z

The error message is

[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 2
[WARNING] [Qnn ExecuTorch]: QnnDsp <W> Initializing HtpProvider

[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in SAVE MODE.
[WARNING] [Qnn ExecuTorch]: QnnDsp <W> Performance Estimates unsupported

[WARNING] [Qnn ExecuTorch]: QnnDsp <W> Arch 68 set by custom config is different from arch associated with SoC 57, will overwrite it to 75

[INFO] [Qnn ExecuTorch]: Running level=3 optimization.
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 2
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in SAVE MODE.
[WARNING] [Qnn ExecuTorch]: QnnDsp <W> Performance Estimates unsupported

[WARNING] [Qnn ExecuTorch]: QnnDsp <W> Arch 68 set by custom config is different from arch associated with SoC 57, will overwrite it to 75

[INFO] [Qnn ExecuTorch]: Running level=3 optimization.
/data/sandcastle/boxes/eden-trunk-hg-full-fbsource/buck-out/v2/gen/fbcode/ec7059d5161b31ff/executorch/backends/qualcomm/tests/fb/__test_qnn_delegate_simulator__/test_qnn_delegate_simulator#link-tree/executorch/backends/qualcomm/qnn_preprocess.py:69: Visiting: aten_cumsum_default, aten.cumsum.default
[ERROR] [Qnn ExecuTorch]: tcm_migration.cc:1863:ERROR:no properties registered for q::QNN_CumulativeSum

[ERROR] [Qnn ExecuTorch]: graph_prepare.cc:210:ERROR:could not create op: q::QNN_CumulativeSum

[ERROR] [Qnn ExecuTorch]: graph_prepare.cc:1403:ERROR:Op 0x10 preparation failed with err:-1

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> "aten_cumsum_default" generated: could not create op

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> RouterX86 graph prepare failed 12

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Failed to finalize graph (id: 1) with err 1002

[ERROR] [Qnn ExecuTorch]: Failed to finalize Qnn Graph with error: 1002
[ERROR] [Qnn ExecuTorch]: Fail to compile QNN graph
FAIL
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend

DannyYuyang-quic

The error message is

[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 2
[WARNING] [Qnn ExecuTorch]: QnnDsp <W> Initializing HtpProvider

[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in SAVE MODE.
[WARNING] [Qnn ExecuTorch]: QnnDsp <W> Performance Estimates unsupported

[WARNING] [Qnn ExecuTorch]: QnnDsp <W> Arch 68 set by custom config is different from arch associated with SoC 57, will overwrite it to 75

[INFO] [Qnn ExecuTorch]: Running level=3 optimization.
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend parameters
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend
[INFO] [Qnn ExecuTorch]: create QNN Logger with log_level 2
[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in SAVE MODE.
[WARNING] [Qnn ExecuTorch]: QnnDsp <W> Performance Estimates unsupported

[WARNING] [Qnn ExecuTorch]: QnnDsp <W> Arch 68 set by custom config is different from arch associated with SoC 57, will overwrite it to 75

[INFO] [Qnn ExecuTorch]: Running level=3 optimization.
/data/sandcastle/boxes/eden-trunk-hg-full-fbsource/buck-out/v2/gen/fbcode/ec7059d5161b31ff/executorch/backends/qualcomm/tests/fb/__test_qnn_delegate_simulator__/test_qnn_delegate_simulator#link-tree/executorch/backends/qualcomm/qnn_preprocess.py:69: Visiting: aten_cumsum_default, aten.cumsum.default
[ERROR] [Qnn ExecuTorch]: tcm_migration.cc:1863:ERROR:no properties registered for q::QNN_CumulativeSum

[ERROR] [Qnn ExecuTorch]: graph_prepare.cc:210:ERROR:could not create op: q::QNN_CumulativeSum

[ERROR] [Qnn ExecuTorch]: graph_prepare.cc:1403:ERROR:Op 0x10 preparation failed with err:-1

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> "aten_cumsum_default" generated: could not create op

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> RouterX86 graph prepare failed 12

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Failed to finalize graph (id: 1) with err 1002

[ERROR] [Qnn ExecuTorch]: Failed to finalize Qnn Graph with error: 1002
[ERROR] [Qnn ExecuTorch]: Fail to compile QNN graph
FAIL
[INFO] [Qnn ExecuTorch]: Destroy Qnn context
[INFO] [Qnn ExecuTorch]: Destroy Qnn device
[INFO] [Qnn ExecuTorch]: Destroy Qnn backend

I tried QNN versions 2.27, 2.28, 2.31, and 2.32 with android-ndk-r26c in this PR but still can't reproduce this error. I think I need more details to tackle this issue. Could you please check which QNN version you used? Also, check the QNN version in $LD_LIBRARY_PATH, thanks!

cccclai · 2025-03-25T06:15:01Z

I'm currently using qnn 2.28, but not sure the android ndk version. It seems like there are quite a few fp tests continuously failing while most are passing. We can merge this PR for now, but will need some help to identify the root cause. See following for the failing fp tests, the one under fp but not marked are the passing tests.

    # Overwrite because unit test is failing, once passing, remove this function
    def test_qnn_backend_interpolate_nearest_2d(self):
        pass

    # Overwrite because unit test is failing, once passing, remove this function
    def test_qnn_backend_interpolate_bilinear_2d(self):
        pass

    def test_qnn_backend_element_wise_ceil(self):
        pass

    def test_qnn_backend_embedding(self):
        pass

    def test_qnn_backend_gelu(self):
        pass

    def test_qnn_backend_group_norm(self):
        pass

    def test_qnn_backend_hardsigmoid(self):
        pass

    def test_qnn_backend_index(self):
        pass

    def test_qnn_backend_index_put(self):
        pass

    def test_qnn_backend_instance_norm_2d(self):
        pass

    def test_qnn_backend_rms_norm(self):
        pass

    def test_qnn_backend_stack(self):
        pass

    def test_qnn_backend_where(self):
        pass

    def test_qnn_backend_conv_transpose2d(self):
        pass

We target SM8650 for the unit test btw.

DannyYuyang-quic · 2025-03-25T06:50:42Z

I think you can check backend_options = generate_htp_compiler_spec(use_fp16=True) setting.
The use_fp16 option should be set to True in all fp tests, as shown in the figure below.
I'm not sure if you have modified this setting or not~

cccclai · 2025-03-25T17:55:37Z

I think you can check backend_options = generate_htp_compiler_spec(use_fp16=True) setting. The use_fp16 option should be set to True in all fp tests, as shown in the figure below. I'm not sure if you have modified this setting or not~

Ah good catch. I inherit the base class TestQNNFloatingPointOperator and accidently use TestQNNQuantizedOperator to setup instead of TestQNNFloatingPointOperator. Let me update.

DannyYuyang-quic · 2025-03-26T02:34:47Z

I think you can check backend_options = generate_htp_compiler_spec(use_fp16=True) setting. The use_fp16 option should be set to True in all fp tests, as shown in the figure below. I'm not sure if you have modified this setting or not~

Ah good catch. I inherit the base class TestQNNFloatingPointOperator and accidently use TestQNNQuantizedOperator to setup instead of TestQNNFloatingPointOperator. Let me update.

Cool! Let me know if there is any issue.

cccclai · 2025-04-01T18:17:32Z

It should be good now. Mind rebasing?

DannyYuyang-quic · 2025-04-02T01:32:37Z

I've rebased the branch. Thanks!

I was on the older base in the last commit, but now it's on the newest one.

facebook-github-bot · 2025-04-02T17:20:28Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai

Thank you!

cccclai · 2025-04-07T18:29:08Z

Everything is green, looks like it's still need rebase...mind rebasing again?

- e2e script for https://github.com/yformer/EfficientSAM - Fastvit breakage fix - Add support for cum_sum - Add bicubic interpolate transform pass - Fix stack op

DannyYuyang-quic · 2025-04-10T07:06:15Z

Everything is green, looks like it's still need rebase...mind rebasing again?

Done!
Please have a look.
Thanks!

facebook-github-bot · 2025-04-10T17:12:26Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

### Summary - e2e script for https://github.com/yformer/EfficientSAM - Fastvit breakage fix - Add support for cum_sum - Add bicubic interpolate transform pass - Fix stack op ### Test plan ``` bash python ./examples/qualcomm/oss_scripts/efficientSAM/efficientSAM.py -m ${soc} -b build-android -H ${host_id} -s ${device_id} --oss_repo ${Path_to_oss_repo} --pretrained_weight ${Path_to_pretrained_weight} -d ${Path_to_dataset_dir} ```

…rch#9266) ### Summary - e2e script for https://github.com/yformer/EfficientSAM - Fastvit breakage fix - Add support for cum_sum - Add bicubic interpolate transform pass - Fix stack op ### Test plan ``` bash python ./examples/qualcomm/oss_scripts/efficientSAM/efficientSAM.py -m ${soc} -b build-android -H ${host_id} -s ${device_id} --oss_repo ${Path_to_oss_repo} --pretrained_weight ${Path_to_pretrained_weight} -d ${Path_to_dataset_dir} ```

DannyYuyang-quic requested a review from cccclai as a code owner March 14, 2025 08:54

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 14, 2025

pytorch-bot bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Mar 14, 2025

DannyYuyang-quic force-pushed the dev1/danny/EfficientSAM_enablement branch from f36af71 to 1f614ea Compare March 14, 2025 08:56

cccclai approved these changes Mar 15, 2025

View reviewed changes

cccclai reviewed Mar 21, 2025

View reviewed changes

DannyYuyang-quic commented Mar 25, 2025

View reviewed changes

DannyYuyang-quic requested review from larryliu0820, kirklandsign, GregoryComer, swolchok, JacobSzwejbka, lucylq, manuelcandales, kimishpatel, shoumikhin, jackzhxng and iseeyuan as code owners April 2, 2025 01:21

DannyYuyang-quic requested review from tarun292, digantdesai, metascroy, mergennachin, Gasoonjia, mcr229 and SS-JIA as code owners April 2, 2025 01:21

DannyYuyang-quic force-pushed the dev1/danny/EfficientSAM_enablement branch from d9135be to 05fcdb6 Compare April 2, 2025 01:31

cccclai approved these changes Apr 7, 2025

View reviewed changes

Qualcomm AI Engine Direct - oss model enablement (EfficientSAM)

42412d7

- e2e script for https://github.com/yformer/EfficientSAM - Fastvit breakage fix - Add support for cum_sum - Add bicubic interpolate transform pass - Fix stack op

DannyYuyang-quic force-pushed the dev1/danny/EfficientSAM_enablement branch from 4bb1800 to 42412d7 Compare April 10, 2025 06:30

cccclai merged commit 5e4f045 into pytorch:main Apr 10, 2025
88 of 89 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qualcomm AI Engine Direct - oss model enablement (EfficientSAM) #9266

Qualcomm AI Engine Direct - oss model enablement (EfficientSAM) #9266

DannyYuyang-quic commented Mar 14, 2025 •

edited

Loading

pytorch-bot bot commented Mar 14, 2025 •

edited

Loading

DannyYuyang-quic commented Mar 14, 2025

DannyYuyang-quic commented Mar 14, 2025 •

edited

Loading

cccclai left a comment

DannyYuyang-quic commented Mar 17, 2025

cccclai commented Mar 17, 2025

facebook-github-bot commented Mar 20, 2025

cccclai Mar 21, 2025

DannyYuyang-quic Mar 24, 2025

cccclai commented Mar 24, 2025

DannyYuyang-quic left a comment •

edited

Loading

cccclai commented Mar 25, 2025 •

edited

Loading

DannyYuyang-quic commented Mar 25, 2025 •

edited

Loading

cccclai commented Mar 25, 2025

DannyYuyang-quic commented Mar 26, 2025

cccclai commented Apr 1, 2025

DannyYuyang-quic commented Apr 2, 2025 •

edited

Loading

facebook-github-bot commented Apr 2, 2025

cccclai left a comment

cccclai commented Apr 7, 2025

DannyYuyang-quic commented Apr 10, 2025

facebook-github-bot commented Apr 10, 2025

Qualcomm AI Engine Direct - oss model enablement (EfficientSAM) #9266

Qualcomm AI Engine Direct - oss model enablement (EfficientSAM) #9266

Conversation

DannyYuyang-quic commented Mar 14, 2025 • edited Loading

Summary

Test plan

pytorch-bot bot commented Mar 14, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9266

❌ 1 New Failure

DannyYuyang-quic commented Mar 14, 2025

DannyYuyang-quic commented Mar 14, 2025 • edited Loading

cccclai left a comment

Choose a reason for hiding this comment

DannyYuyang-quic commented Mar 17, 2025

cccclai commented Mar 17, 2025

facebook-github-bot commented Mar 20, 2025

cccclai Mar 21, 2025

Choose a reason for hiding this comment

DannyYuyang-quic Mar 24, 2025

Choose a reason for hiding this comment

cccclai commented Mar 24, 2025

DannyYuyang-quic left a comment • edited Loading

Choose a reason for hiding this comment

cccclai commented Mar 25, 2025 • edited Loading

DannyYuyang-quic commented Mar 25, 2025 • edited Loading

cccclai commented Mar 25, 2025

DannyYuyang-quic commented Mar 26, 2025

cccclai commented Apr 1, 2025

DannyYuyang-quic commented Apr 2, 2025 • edited Loading

facebook-github-bot commented Apr 2, 2025

cccclai left a comment

Choose a reason for hiding this comment

cccclai commented Apr 7, 2025

DannyYuyang-quic commented Apr 10, 2025

facebook-github-bot commented Apr 10, 2025

DannyYuyang-quic commented Mar 14, 2025 •

edited

Loading

pytorch-bot bot commented Mar 14, 2025 •

edited

Loading

DannyYuyang-quic commented Mar 14, 2025 •

edited

Loading

DannyYuyang-quic left a comment •

edited

Loading

cccclai commented Mar 25, 2025 •

edited

Loading

DannyYuyang-quic commented Mar 25, 2025 •

edited

Loading

DannyYuyang-quic commented Apr 2, 2025 •

edited

Loading