QNN: mobilebert failed to generate Qnn context binary #7946

guangy10 · 2025-01-24T19:37:32Z

🐛 Describe the bug

python -m examples.qualcomm.scripts.mobilebert_fine_tune -b cmake-out -m SM8450 --compile_only --use_fp16

stacktrace:


[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Context binary size calculation failed

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Failed to get serialized binary

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Failed to get context binary size with err 0x7532

[ERROR] [Qnn ExecuTorch]: Can't determine the size of graph binary to be saved to cache. Error 30002
E 00:01:45.674883 executorch:QnnManager.cpp:484] Fail to get context binary.
Traceback (most recent call last):
  File "/data/users/guangyang/executorch/examples/qualcomm/scripts/mobilebert_fine_tune.py", line 399, in <module>
    main(args)
  File "/data/users/guangyang/executorch/examples/qualcomm/scripts/mobilebert_fine_tune.py", line 270, in main
    build_executorch_binary(
  File "/home/guangyang/executorch/examples/qualcomm/utils.py", line 329, in build_executorch_binary
    exported_program = to_backend(edge_prog.exported_program, qnn_partitioner)
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/functools.py", line 878, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/guangyang/executorch/exir/backend/backend_api.py", line 396, in _
    tagged_graph_module = _partition_and_lower(
  File "/home/guangyang/executorch/exir/backend/backend_api.py", line 319, in _partition_and_lower
    partitioned_module = _partition_and_lower_one_graph_module(
  File "/home/guangyang/executorch/exir/backend/backend_api.py", line 249, in _partition_and_lower_one_graph_module
    lowered_submodule = to_backend(
  File "/home/guangyang/.conda/envs/executorch/lib/python3.10/functools.py", line 878, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/guangyang/executorch/exir/backend/backend_api.py", line 113, in _
    preprocess_result: PreprocessResult = cls.preprocess(
  File "/home/guangyang/executorch/backends/qualcomm/qnn_preprocess.py", line 110, in preprocess
    assert len(qnn_context_binary) != 0, "Failed to generate Qnn context binary."
AssertionError: Failed to generate Qnn context binary.

Versions

trunk

cc @cccclai @winskuo-quic @shewu-quic

The text was updated successfully, but these errors were encountered:

winskuo-quic · 2025-02-18T01:50:27Z

Hi @guangy10,
Could you please share what QNN SDK you are currently using?
I tried it using QNN2.28.0 with executorch mainline(commit: 433e30b) and everything is working fine.
Below is the command I used to compile the model, which should almost be the same as yours:
python -m examples.qualcomm.scripts.mobilebert_fine_tune -b build-android -m SM8650 --compile_only --use_fp16 --pretrained_weight ../artifacts/mobilebert_fine_tune/finetuned_mobilebert_epoch_5.model
Thanks

guangy10 · 2025-02-20T23:46:56Z

Hi @guangy10, Could you please share what QNN SDK you are currently using? I tried it using QNN2.28.0 with executorch mainline(commit: 433e30b) and everything is working fine. Below is the command I used to compile the model, which should almost be the same as yours: python -m examples.qualcomm.scripts.mobilebert_fine_tune -b build-android -m SM8650 --compile_only --use_fp16 --pretrained_weight ../artifacts/mobilebert_fine_tune/finetuned_mobilebert_epoch_5.model Thanks

Mine is 2.26. Let me try upgrade to 2.28

guangy10 · 2025-02-21T00:11:18Z

Hi @guangy10, Could you please share what QNN SDK you are currently using? I tried it using QNN2.28.0 with executorch mainline(commit: 433e30b) and everything is working fine. Below is the command I used to compile the model, which should almost be the same as yours: python -m examples.qualcomm.scripts.mobilebert_fine_tune -b build-android -m SM8650 --compile_only --use_fp16 --pretrained_weight ../artifacts/mobilebert_fine_tune/finetuned_mobilebert_epoch_5.model Thanks

@winskuo-quic Got another issue after updating the QNN version to 2.28. What is the numpy version you are using? Getting the following error with numpy 2.2.2 in my setup.

RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

winskuo-quic · 2025-02-21T01:35:17Z

Hi @guangy10, Could you please share what QNN SDK you are currently using? I tried it using QNN2.28.0 with executorch mainline(commit: 433e30b) and everything is working fine. Below is the command I used to compile the model, which should almost be the same as yours: python -m examples.qualcomm.scripts.mobilebert_fine_tune -b build-android -m SM8650 --compile_only --use_fp16 --pretrained_weight ../artifacts/mobilebert_fine_tune/finetuned_mobilebert_epoch_5.model Thanks

@winskuo-quic Got another issue after updating the QNN version to 2.28. What is the numpy version you are using? Getting the following error with numpy 2.2.2 in my setup.
RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

I believe the error above should not be directly related to QNN version but more possibly caused by Python library version. For me, I have numpy 2.2.3 and transformers 4.47.1. Could you share the library installation process that you are currently using? For us, we are using python install_requirements.py

guangy10 · 2025-02-21T02:15:25Z

python -m examples.qualcomm.scripts.mobilebert_fine_tune -b build-android -m SM8650 --compile_only --use_fp16 --pretrained_weight ../artifacts/mobilebert_fine_tune/finetuned_mobilebert_epoch_5.model

Hi @guangy10, Could you please share what QNN SDK you are currently using? I tried it using QNN2.28.0 with executorch mainline(commit: 433e30b) and everything is working fine. Below is the command I used to compile the model, which should almost be the same as yours: python -m examples.qualcomm.scripts.mobilebert_fine_tune -b build-android -m SM8650 --compile_only --use_fp16 --pretrained_weight ../artifacts/mobilebert_fine_tune/finetuned_mobilebert_epoch_5.model Thanks

@winskuo-quic Got another issue after updating the QNN version to 2.28. What is the numpy version you are using? Getting the following error with numpy 2.2.2 in my setup.
RuntimeError: Failed to import transformers.generation.utils because of the following error (look up to see its traceback):
numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
I believe the error above should not be directly related to QNN version but more possibly caused by Python library version. For me, I have numpy 2.2.3 and transformers 4.47.1. Could you share the library installation process that you are currently using? For us, we are using python install_requirements.py

Yeah, I'm running the same installation script, Transformers version is same, numpy version is slightly different but I guess it doesn't matter since it's a patched version. Here is my detailed env:

Collecting environment information...
PyTorch version: 2.7.0.dev20250131+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: CentOS Stream 9 (x86_64)
GCC version: (GCC) 11.5.0 20240719 (Red Hat 11.5.0-2)
Clang version: Could not collect
CMake version: version 3.29.0
Libc version: glibc-2.34

Python version: 3.10.0 | packaged by conda-forge | (default, Nov 20 2021, 02:24:10) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.19.0-0_fbk21_hardened_12633_g4db063a1bcb5-x86_64-with-glibc2.34
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   40 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          72
On-line CPU(s) list:             0-71
Vendor ID:                       GenuineIntel
Model name:                      Intel Core Processor (Broadwell)
CPU family:                      6
Model:                           61
Thread(s) per core:              1
Core(s) per socket:              36
Socket(s):                       2
Stepping:                        2
BogoMIPS:                        3990.61
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat
Virtualization:                  VT-x
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       2.3 MiB (72 instances)
L1i cache:                       2.3 MiB (72 instances)
L2 cache:                        288 MiB (72 instances)
L3 cache:                        32 MiB (2 instances)
NUMA node(s):                    1
NUMA node0 CPU(s):               0-71
Vulnerability Itlb multihit:     KVM: Mitigation: VMX disabled
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX vulnerable, SMT disabled
Vulnerability Mds:               Vulnerable; SMT Host state unknown
Vulnerability Meltdown:          Vulnerable
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
Vulnerability Spectre v2:        Vulnerable, STIBP: disabled
Vulnerability Srbds:             Unknown: Dependent on hypervisor status
Vulnerability Tsx async abort:   Vulnerable

Versions of relevant libraries:
[pip3] executorch==0.6.0a0+f2720fa
[pip3] flake8==6.1.0
[pip3] flake8-breakpoint==1.1.0
[pip3] flake8-bugbear==24.4.26
[pip3] flake8-comprehensions==3.14.0
[pip3] flake8-executable==2.1.3
[pip3] flake8-logging-format==0.9.0
[pip3] flake8-plugin-utils==1.3.3
[pip3] flake8-pyi==23.5.0
[pip3] flake8-simplify==0.19.3
[pip3] mypy==1.14.1
[pip3] mypy-extensions==1.0.0
[pip3] numpy==2.2.2
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==8.9.2.26
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-nccl-cu12==2.18.1
[pip3] nvidia-nvjitlink-cu12==12.5.82
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] optree==0.10.0
[pip3] torch==2.7.0.dev20250131+cpu
[pip3] torchao==0.8.0+git11333ba2
[pip3] torchaudio==2.6.0.dev20250131+cpu
[pip3] torchsr==1.0.4
[pip3] torchvision==0.22.0.dev20250131+cpu
[pip3] triton==3.0.0
[conda] executorch                0.6.0a0+f2720fa          pypi_0    pypi
[conda] numpy                     2.2.2                    pypi_0    pypi
[conda] nvidia-cublas-cu12        12.1.3.1                 pypi_0    pypi
[conda] nvidia-cuda-cupti-cu12    12.1.105                 pypi_0    pypi
[conda] nvidia-cuda-nvrtc-cu12    12.1.105                 pypi_0    pypi
[conda] nvidia-cuda-runtime-cu12  12.1.105                 pypi_0    pypi
[conda] nvidia-cudnn-cu12         8.9.2.26                 pypi_0    pypi
[conda] nvidia-cufft-cu12         11.0.2.54                pypi_0    pypi
[conda] nvidia-curand-cu12        10.3.2.106               pypi_0    pypi
[conda] nvidia-cusolver-cu12      11.4.5.107               pypi_0    pypi
[conda] nvidia-cusparse-cu12      12.1.0.106               pypi_0    pypi
[conda] nvidia-nccl-cu12          2.18.1                   pypi_0    pypi
[conda] nvidia-nvjitlink-cu12     12.5.82                  pypi_0    pypi
[conda] nvidia-nvtx-cu12          12.1.105                 pypi_0    pypi
[conda] optree                    0.10.0                   pypi_0    pypi
[conda] torch                     2.7.0.dev20250131+cpu          pypi_0    pypi
[conda] torchao                   0.8.0+git11333ba2          pypi_0    pypi
[conda] torchaudio                2.6.0.dev20250131+cpu          pypi_0    pypi
[conda] torchfix                  0.6.0                    pypi_0    pypi
[conda] torchsr                   1.0.4                    pypi_0    pypi
[conda] torchvision               0.22.0.dev20250131+cpu          pypi_0    pypi
[conda] triton                    3.0.0                    pypi_0    pypi

guangy10 · 2025-02-21T02:22:39Z

@winskuo-quic It's hard to debug the local dev env, if you think the model is working fine, should we just enable it in the CI, the setup there can be the source of truth for future reference. Here is the QNN models we are currently running on CI: https://github.com/pytorch/executorch/blob/main/.github/workflows/trunk.yml#L305-L329

Can you add mobilebert to it? The CI only need to test it in --compile_only mode, to ensure there is no complication issue

winskuo-quic · 2025-02-21T03:44:47Z

@guangy10,
Yes, I think we can try to enable both mobilebert and wav2letter mentioned in #7634 (comment).

guangy10 · 2025-02-21T18:39:20Z

@guangy10, Yes, I think we can try to enable both mobilebert and wav2letter mentioned in #7634 (comment).

Same comment applies to here as well. #7634 (comment)

cccclai · 2025-03-25T18:09:40Z

#8616 is merged. Can we close it now?

guangy10 added bug module: examples Issues related to demos under examples/ module: qnn Issues related to Qualcomm's QNN delegate and code under backends/qualcomm/ partner: qualcomm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Qualcomm labels Jan 24, 2025

digantdesai added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 4, 2025

jackzhxng removed the bug label Feb 21, 2025

github-project-automation bot added this to ExecuTorch Core Feb 21, 2025

github-project-automation bot moved this to To triage in ExecuTorch Core Feb 21, 2025

iseeyuan removed this from ExecuTorch Core Mar 6, 2025

guangy10 closed this as completed Mar 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QNN: mobilebert failed to generate Qnn context binary #7946

QNN: mobilebert failed to generate Qnn context binary #7946

guangy10 commented Jan 24, 2025 •

edited by pytorch-bot bot

Loading

winskuo-quic commented Feb 18, 2025

guangy10 commented Feb 20, 2025

guangy10 commented Feb 21, 2025

winskuo-quic commented Feb 21, 2025

guangy10 commented Feb 21, 2025

guangy10 commented Feb 21, 2025

winskuo-quic commented Feb 21, 2025 •

edited

Loading

guangy10 commented Feb 21, 2025

cccclai commented Mar 25, 2025

QNN: mobilebert failed to generate Qnn context binary #7946

QNN: mobilebert failed to generate Qnn context binary #7946

Comments

guangy10 commented Jan 24, 2025 • edited by pytorch-bot bot Loading

🐛 Describe the bug

Versions

winskuo-quic commented Feb 18, 2025

guangy10 commented Feb 20, 2025

guangy10 commented Feb 21, 2025

winskuo-quic commented Feb 21, 2025

guangy10 commented Feb 21, 2025

guangy10 commented Feb 21, 2025

winskuo-quic commented Feb 21, 2025 • edited Loading

guangy10 commented Feb 21, 2025

cccclai commented Mar 25, 2025

guangy10 commented Jan 24, 2025 •

edited by pytorch-bot bot

Loading

winskuo-quic commented Feb 21, 2025 •

edited

Loading