prefill model #5807

cccclai · 2024-10-02T01:44:35Z

Summary:

repro command

python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w

Pass in 2.25 but fails in 2.26

Segfault error stacktrace

[INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2
[INFO] [Qnn ExecuTorch]: Caching: Caching is in SAVE MODE.
[WARNING] [Qnn ExecuTorch]: Qnn API version 2.19.0 is used. The version is tested against 2.18.0.
[INFO] [Qnn ExecuTorch]: Running level=3 optimization.
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1523599==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000020 (pc 0x7f1585ee38e2 bp 0x7f16d5ab8800 sp 0x7ffed19ab8b0 T0)
==1523599==The signal is caused by a READ memory access.
==1523599==Hint: address points to the zero page.
SCARINESS: 10 (null-deref)
    #0 0x7f1585ee38e2  (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x2ce38e2) (BuildId: bc3ab8ddc89a0e65)
    #1 0x7f1585dd8926  (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x2bd8926) (BuildId: bc3ab8ddc89a0e65)
    #2 0x7f15844d1161  (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x12d1161) (BuildId: bc3ab8ddc89a0e65)
    #3 0x7f15844dcac6  (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x12dcac6) (BuildId: bc3ab8ddc89a0e65)
    #4 0x7f15844d245b  (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x12d245b) (BuildId: bc3ab8ddc89a0e65)
    #5 0x7f15b9bc7b21 in auto torch::executor::qnn::QnnInterface::qnn_backend_validate_op_config<void*, Qnn_OpConfig_t>(void*, Qnn_OpConfig_t) const fbcode/executorch/backends/qualcomm/runtime/backends/QnnFunctionInterface.h:39
    #6 0x7f15b9bc7682 in torch::executor::qnn::QnnBackend::BackendValidateOpConfig(Qnn_OpConfig_t const&) fbcode/executorch/backends/qualcomm/runtime/backends/QnnBackendCommon.h:41
    #7 0x7f15b9bc7115 in torch::executor::qnn::QnnManager::IsNodeSupportedByBackend(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&) fbcode/executorch/backends/qualcomm/runtime/QnnManager.cpp:450
    #8 0x7f15b9dd44ee in torch::executor::qnn::PyQnnManager::IsNodeSupportedByBackend(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&) fbcode/executorch/backends/qualcomm/aot/python/PyQnnManagerAdaptor.h:57
    #9 0x7f15b9e5b986 in pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)::operator()(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&) const fbsource/pybind11/pybind11.h:84
    #10 0x7f15b9e5b8b5 in bool pybind11::detail::argument_loader<torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&>::call_impl<bool, pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)&, 0ul, 1ul, pybind11::detail::void_type>(torch::executor::qnn::PyQnnManager&&, std::integer_sequence<unsigned long, 0ul, 1ul>, pybind11::detail::void_type&&) && fbsource/pybind11/cast.h:2042
    #11 0x7f15b9e53831 in std::enable_if<!std::is_void<bool>::value, bool>::type pybind11::detail::argument_loader<torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&>::call<bool, pybind11::detail::void_type, pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)&>(pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)&) && fbsource/pybind11/cast.h:2014
    #12 0x7f15b9e53454 in void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), bool, torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool&&, torch::executor::qnn::PyQnnManager (*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::operator()(pybind11::detail::function_call&) const fbsource/pybind11/pybind11.h:193
    #13 0x7f15b9e530d3 in void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), bool, torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool&&, torch::executor::qnn::PyQnnManager (*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) fbsource/pybind11/pybind11.h:170
    #14 0x7f15b9d8f707 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) fbsource/pybind11/pybind11.h:767
    #15 0x327141 in cfunction_call(_object*, _object*, _object*) (.__uniq.281047882695835599676768160755749362799) (/usr/local/fbcode/platform010/bin/python3.10+0x327141) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #16 0x349630 in _PyObject_MakeTpCall (/usr/local/fbcode/platform010/bin/python3.10+0x349630) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #17 0x5897d4 in method_vectorcall(_object*, _object* const*, unsigned long, _object*) (.__uniq.243338978568352371442406765225626566013.llvm.6236606370933165261) (/usr/local/fbcode/platform010/bin/python3.10+0x5897d4) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #18 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #19 0x331421 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x331421) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #20 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #21 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #22 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #23 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #24 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #25 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #26 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #27 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #28 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #29 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #30 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #31 0x331577 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x331577) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #32 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #33 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #34 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #35 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #36 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #37 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #38 0x39b8ca in _PyEval_Vector (/usr/local/fbcode/platform010/bin/python3.10+0x39b8ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #39 0x39ad7d in _PyObject_FastCallDictTstate (/usr/local/fbcode/platform010/bin/python3.10+0x39ad7d) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #40 0x3c8b72 in slot_tp_call(_object*, _object*, _object*) (.__uniq.235726554139783955843240177532338160225) (/usr/local/fbcode/platform010/bin/python3.10+0x3c8b72) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #41 0x392ca8 in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x392ca8) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #42 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #43 0x39b8ca in _PyEval_Vector (/usr/local/fbcode/platform010/bin/python3.10+0x39b8ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #44 0x331b18 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x331b18) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #45 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #46 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #47 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #48 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #49 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #50 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #51 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #52 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #53 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #54 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #55 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #56 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #57 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #58 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #59 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #60 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #61 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #62 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #63 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #64 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #65 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #66 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #67 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #68 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #69 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #70 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #71 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #72 0x39b8ca in _PyEval_Vector (/usr/local/fbcode/platform010/bin/python3.10+0x39b8ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #73 0x431565 in PyEval_EvalCode (/usr/local/fbcode/platform010/bin/python3.10+0x431565) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #74 0x431447 in run_mod(_mod*, _object*, _object*, _object*, PyCompilerFlags*, _arena*) (.__uniq.251861886623903963524397139660542440724.llvm.17622910512627074885) (/usr/local/fbcode/platform010/bin/python3.10+0x431447) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #75 0x4e3054 in pyrun_file(_IO_FILE*, _object*, int, _object*, _object*, int, PyCompilerFlags*) (.__uniq.251861886623903963524397139660542440724) (/usr/local/fbcode/platform010/bin/python3.10+0x4e3054) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #76 0x4e2b54 in _PyRun_SimpleFileObject (/usr/local/fbcode/platform010/bin/python3.10+0x4e2b54) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #77 0x4e28f1 in _PyRun_AnyFileObject (/usr/local/fbcode/platform010/bin/python3.10+0x4e28f1) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #78 0x4d4a54 in Py_RunMain (/usr/local/fbcode/platform010/bin/python3.10+0x4d4a54) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #79 0x4d286b in pymain_main(_PyArgv*) (.__uniq.297908980262787110426434251325078884054) (/usr/local/fbcode/platform010/bin/python3.10+0x4d286b) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #80 0x4d2759 in Py_BytesMain (/usr/local/fbcode/platform010/bin/python3.10+0x4d2759) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
    #81 0x7f19e282c656 in __libc_start_call_main (/usr/local/fbcode/platform010/lib/libc.so.6+0x2c656) (BuildId: 93cdceeb8322234c38e1f2c93ad0ff10c7632fa6)
    #82 0x7f19e282c717 in __libc_start_main@GLIBC_2.2.5 (/usr/local/fbcode/platform010/lib/libc.so.6+0x2c717) (BuildId: 93cdceeb8322234c38e1f2c93ad0ff10c7632fa6)
    #83 0x553d90 in _start (/usr/local/fbcode/platform010/bin/python3.10+0x553d90) (BuildId: a620038add613fd8585eb50983ca8e455d54738e)
AddressSanitizer can not provide additional info.
AddressSanitizer: SEGV (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x2ce38e2) (BuildId: bc3ab8ddc89a0e65)
==1523599==ABORTING

Differential Revision: D63736779

pytorch-bot · 2024-10-02T01:44:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5807

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 13 New Failures

As of commit 35387f6 with merge base 13408b9 ():

NEW FAILURES - The following jobs have failed:

Lint / lintrunner / linux-job (gh)
>>> Lint for extension/llm/export/quantizer_lib.py:
pull / test-llama-runner-linux (bf16, buck2, portable) / linux-job (gh)
RuntimeError: Command docker exec -t 9294fafbac0fca50672d5e242f4ce0781eb8568b0c69c9b6bfb06a7923be8fe3 /exec failed with exit code 1
pull / test-llama-runner-linux (bf16, cmake, portable) / linux-job (gh)
RuntimeError: Command docker exec -t 877983a1550c5e0194ed7bd11d779e8c8db8d11da38472a46c7ad18bff4a4f52 /exec failed with exit code 1
pull / test-llama-runner-linux (fp32, buck2, portable) / linux-job (gh)
RuntimeError: Command docker exec -t ead7431470d56c952ec00a395459d64a46509d20d94f679ad4f2dc066758e510 /exec failed with exit code 1
pull / test-llama-runner-linux (fp32, buck2, xnnpack+custom) / linux-job (gh)
RuntimeError: Command docker exec -t 204c23e951c505eed4796b2cd8b6ad16a0dfcb35fac4c2e93cdd7b5e739f9b85 /exec failed with exit code 1
pull / test-llama-runner-linux (fp32, buck2, xnnpack+custom+qe) / linux-job (gh)
RuntimeError: Command docker exec -t 1a8b3526c28616933db8dab1e51f8813fd07df3cc884aa281f8e5c0cf03dbc44 /exec failed with exit code 1
pull / test-llama-runner-linux (fp32, cmake, portable) / linux-job (gh)
RuntimeError: Command docker exec -t d516d93a5df867d186ddc08e176d3532d51a9f9cb6058dec11987fabce37e820 /exec failed with exit code 1
pull / test-llama-runner-linux (fp32, cmake, xnnpack+custom) / linux-job (gh)
RuntimeError: Command docker exec -t a9c2077fd87ea9efb7fc9ddcdd5c2329d7745779eaa3cbd21bb379c8944c8d61 /exec failed with exit code 1
pull / test-llama-runner-linux (fp32, cmake, xnnpack+custom+qe) / linux-job (gh)
RuntimeError: Command docker exec -t 23d2a2467b418eba020633c07cfa0b2610c874f4da8a965140a81964bc318ea3 /exec failed with exit code 1
pull / test-llama-runner-qnn-linux (fp32, cmake, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t 912044c80a1e6107a7a6db36d927fa3f102c7fce647199c455d26ae1b40e8d7a /exec failed with exit code 1
pull / test-llava-runner-linux / linux-job (gh)
RuntimeError: Command docker exec -t 88348b529af94cb89bd08aa9585fc544ed9a4f2a0aaf60383587c39d321537aa /exec failed with exit code 1
pull / unittest / linux / linux-job (gh)
RuntimeError: Command docker exec -t aae6addd12d97ad24b43214d1ce9b7f1602414119c2c7f889cc5d070ca71474e /exec failed with exit code 1
pull / unittest / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-10-02T01:45:12Z

This pull request was exported from Phabricator. Differential Revision: D63736779

chiwwang · 2024-10-02T04:31:39Z

@shewu-quic @haowhsu-quic @chunit-quic

chiwwang · 2024-10-02T04:34:26Z

We will try to reproduce this at our side.

shewu-quic · 2024-10-02T06:38:20Z

Hi @cccclai ,
Thanks for this PR.
I could also reproduce the error.
About the segmentation fault for per channel 16a4w linear in QNN 2.26, we also find in our unit test.
We will investigate more for this issue.
If possible, could you use convert_linear_to_conv pass to bypass this issue?

chiwwang · 2024-10-02T06:40:54Z

We also need to check why the matmul is quantized to an unsupport schema. Maybe something wrong in our QnnQuantizer or so?

chiwwang · 2024-10-02T06:53:01Z

Sad, the segmentation fault of linear was detected around 2.26~2.27 timeframe. The fix is not released yet. ETA is QNN 2.28, which is at the end of Oct.

shewu-quic · 2024-10-02T07:28:13Z

We also need to check why the matmul is quantized to an unsupport schema. Maybe something wrong in our QnnQuantizer or so?

Hi @cccclai, @chiwwang,

It seems that I could not reproduce the op validation failed for matmul op on my end when using QNN 2.26 and add the convert_linear_to_conv pass.
The below is my call sequence.

./install_requirements.sh
cp schema/*.fbs exir/_serialize/
export PYTHONPATH=/local/mnt/workspace/test_cc/
export ANDROID_NDK=/local/mnt/workspace/shewu/android-ndk-r26c
export QNN_SDK_ROOT=/local/mnt/workspace/shewu/qairt/2.26.0.240828
export LD_LIBRARY_PATH=$QNN_SDK_ROOT/lib/x86_64-linux-clang
./backends/qualcomm/scripts/build.sh
python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w

Summary: python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w Segfault error stacktrace ``` [INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2 [INFO] [Qnn ExecuTorch]: Caching: Caching is in SAVE MODE. [WARNING] [Qnn ExecuTorch]: Qnn API version 2.19.0 is used. The version is tested against 2.18.0. [INFO] [Qnn ExecuTorch]: Running level=3 optimization. AddressSanitizer:DEADLYSIGNAL ================================================================= ==1523599==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000020 (pc 0x7f1585ee38e2 bp 0x7f16d5ab8800 sp 0x7ffed19ab8b0 T0) ==1523599==The signal is caused by a READ memory access. ==1523599==Hint: address points to the zero page. SCARINESS: 10 (null-deref) #0 0x7f1585ee38e2 (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x2ce38e2) (BuildId: bc3ab8ddc89a0e65) #1 0x7f1585dd8926 (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x2bd8926) (BuildId: bc3ab8ddc89a0e65) #2 0x7f15844d1161 (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x12d1161) (BuildId: bc3ab8ddc89a0e65) #3 0x7f15844dcac6 (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x12dcac6) (BuildId: bc3ab8ddc89a0e65) #4 0x7f15844d245b (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x12d245b) (BuildId: bc3ab8ddc89a0e65) pytorch#5 0x7f15b9bc7b21 in auto torch::executor::qnn::QnnInterface::qnn_backend_validate_op_config<void*, Qnn_OpConfig_t>(void*, Qnn_OpConfig_t) const fbcode/executorch/backends/qualcomm/runtime/backends/QnnFunctionInterface.h:39 pytorch#6 0x7f15b9bc7682 in torch::executor::qnn::QnnBackend::BackendValidateOpConfig(Qnn_OpConfig_t const&) fbcode/executorch/backends/qualcomm/runtime/backends/QnnBackendCommon.h:41 pytorch#7 0x7f15b9bc7115 in torch::executor::qnn::QnnManager::IsNodeSupportedByBackend(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&) fbcode/executorch/backends/qualcomm/runtime/QnnManager.cpp:450 pytorch#8 0x7f15b9dd44ee in torch::executor::qnn::PyQnnManager::IsNodeSupportedByBackend(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&) fbcode/executorch/backends/qualcomm/aot/python/PyQnnManagerAdaptor.h:57 pytorch#9 0x7f15b9e5b986 in pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)::operator()(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&) const fbsource/pybind11/pybind11.h:84 pytorch#10 0x7f15b9e5b8b5 in bool pybind11::detail::argument_loader<torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&>::call_impl<bool, pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)&, 0ul, 1ul, pybind11::detail::void_type>(torch::executor::qnn::PyQnnManager&&, std::integer_sequence<unsigned long, 0ul, 1ul>, pybind11::detail::void_type&&) && fbsource/pybind11/cast.h:2042 pytorch#11 0x7f15b9e53831 in std::enable_if<!std::is_void<bool>::value, bool>::type pybind11::detail::argument_loader<torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&>::call<bool, pybind11::detail::void_type, pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)&>(pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)&) && fbsource/pybind11/cast.h:2014 pytorch#12 0x7f15b9e53454 in void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), bool, torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool&&, torch::executor::qnn::PyQnnManager (*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::operator()(pybind11::detail::function_call&) const fbsource/pybind11/pybind11.h:193 pytorch#13 0x7f15b9e530d3 in void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), bool, torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool&&, torch::executor::qnn::PyQnnManager (*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) fbsource/pybind11/pybind11.h:170 pytorch#14 0x7f15b9d8f707 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) fbsource/pybind11/pybind11.h:767 pytorch#15 0x327141 in cfunction_call(_object*, _object*, _object*) (.__uniq.281047882695835599676768160755749362799) (/usr/local/fbcode/platform010/bin/python3.10+0x327141) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#16 0x349630 in _PyObject_MakeTpCall (/usr/local/fbcode/platform010/bin/python3.10+0x349630) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#17 0x5897d4 in method_vectorcall(_object*, _object* const*, unsigned long, _object*) (.__uniq.243338978568352371442406765225626566013.llvm.6236606370933165261) (/usr/local/fbcode/platform010/bin/python3.10+0x5897d4) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#18 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#19 0x331421 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x331421) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#20 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#21 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#22 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#23 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#24 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#25 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#26 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#27 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#28 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#29 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#30 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#31 0x331577 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x331577) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#32 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#33 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#34 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#35 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#36 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#37 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#38 0x39b8ca in _PyEval_Vector (/usr/local/fbcode/platform010/bin/python3.10+0x39b8ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#39 0x39ad7d in _PyObject_FastCallDictTstate (/usr/local/fbcode/platform010/bin/python3.10+0x39ad7d) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#40 0x3c8b72 in slot_tp_call(_object*, _object*, _object*) (.__uniq.235726554139783955843240177532338160225) (/usr/local/fbcode/platform010/bin/python3.10+0x3c8b72) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#41 0x392ca8 in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x392ca8) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#42 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#43 0x39b8ca in _PyEval_Vector (/usr/local/fbcode/platform010/bin/python3.10+0x39b8ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#44 0x331b18 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x331b18) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#45 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#46 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#47 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#48 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#49 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#50 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#51 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#52 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#53 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#54 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#55 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#56 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#57 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#58 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#59 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#60 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#61 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#62 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#63 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#64 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#65 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#66 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#67 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#68 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#69 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#70 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#71 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#72 0x39b8ca in _PyEval_Vector (/usr/local/fbcode/platform010/bin/python3.10+0x39b8ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#73 0x431565 in PyEval_EvalCode (/usr/local/fbcode/platform010/bin/python3.10+0x431565) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#74 0x431447 in run_mod(_mod*, _object*, _object*, _object*, PyCompilerFlags*, _arena*) (.__uniq.251861886623903963524397139660542440724.llvm.17622910512627074885) (/usr/local/fbcode/platform010/bin/python3.10+0x431447) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#75 0x4e3054 in pyrun_file(_IO_FILE*, _object*, int, _object*, _object*, int, PyCompilerFlags*) (.__uniq.251861886623903963524397139660542440724) (/usr/local/fbcode/platform010/bin/python3.10+0x4e3054) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#76 0x4e2b54 in _PyRun_SimpleFileObject (/usr/local/fbcode/platform010/bin/python3.10+0x4e2b54) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#77 0x4e28f1 in _PyRun_AnyFileObject (/usr/local/fbcode/platform010/bin/python3.10+0x4e28f1) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#78 0x4d4a54 in Py_RunMain (/usr/local/fbcode/platform010/bin/python3.10+0x4d4a54) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#79 0x4d286b in pymain_main(_PyArgv*) (.__uniq.297908980262787110426434251325078884054) (/usr/local/fbcode/platform010/bin/python3.10+0x4d286b) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#80 0x4d2759 in Py_BytesMain (/usr/local/fbcode/platform010/bin/python3.10+0x4d2759) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#81 0x7f19e282c656 in __libc_start_call_main (/usr/local/fbcode/platform010/lib/libc.so.6+0x2c656) (BuildId: 93cdceeb8322234c38e1f2c93ad0ff10c7632fa6) pytorch#82 0x7f19e282c717 in __libc_start_main@GLIBC_2.2.5 (/usr/local/fbcode/platform010/lib/libc.so.6+0x2c717) (BuildId: 93cdceeb8322234c38e1f2c93ad0ff10c7632fa6) pytorch#83 0x553d90 in _start (/usr/local/fbcode/platform010/bin/python3.10+0x553d90) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) AddressSanitizer can not provide additional info. AddressSanitizer: SEGV (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x2ce38e2) (BuildId: bc3ab8ddc89a0e65) ==1523599==ABORTING ``` Differential Revision: D63736779

facebook-github-bot · 2024-10-02T19:30:54Z

This pull request was exported from Phabricator. Differential Revision: D63736779

cccclai · 2024-10-02T20:59:09Z

I update the PR to use linear to conv pass now as the segfault can reproduced now. Here is the latest log
prefill_qnn.log

I can see matmul fails to lower

...
[QNN Partitioner Op Support]: aten.convolution.default | True
[QNN Partitioner Op Support]: aten.permute_copy.default | True
[QNN Partitioner Op Support]: aten.unsqueeze_copy.default | True
[QNN Partitioner Op Support]: aten.view_copy.default | True
[QNN Partitioner Op Support]: aten.permute_copy.default | True
[QNN Partitioner Op Support]: aten.matmul.default | False
[QNN Partitioner Op Support]: aten._softmax.default | True
[QNN Partitioner Op Support]: aten.add.Tensor | True
[QNN Partitioner Op Support]: aten.slice_copy.Tensor | True
[QNN Partitioner Op Support]: aten.slice_copy.Tensor | True
[QNN Partitioner Op Support]: aten.div.Tensor | True
[QNN Partitioner Op Support]: aten.matmul.default | False
...

chiwwang · 2024-10-03T00:04:49Z

I suddenly realize this is in AOT stage so the mismatch of QNN libraries & executorch (Maybe QnnPyXXXXX.so) should be caused by the mismatch of QNN_SDK_ROOT and LD_LIBRARY_PATH... not on the device yet 😨

cccclai · 2024-10-03T00:18:04Z

I suddenly realize this is in AOT stage so the mismatch of QNN libraries & executorch (Maybe QnnPyXXXXX.so) should be caused by the mismatch of QNN_SDK_ROOT and LD_LIBRARY_PATH... not on the device yet 😨

Yeah....it is still AOT and not on device yet

cccclai · 2024-10-03T02:11:17Z

I double check again and it looks like I can lower matmul in oss flow, but not internal buck flow, I guess I can workaround for now...

chiwwang · 2024-10-03T02:20:33Z

I double check again and it looks like I can lower matmul in oss flow, but not internal buck flow, I guess I can workaround for now...

I'm also stuck in buck build-flow. Let me submit a comment below.
I aim to add the soc information to QC backend. I remembered we moved these things to Python.

chiwwang

Just FYI. I can bypass the error by building the runner by cmake.

chiwwang · 2024-10-03T02:12:58Z

examples/models/llama2/runner/targets.bzl

@@ -29,6 +29,7 @@ def define_common_targets():
            ],
            # qnn_executorch_backend can be added below //executorch/backends/qualcomm:qnn_executorch_backend
            exported_deps = [
+                "//executorch/backends/qualcomm:qnn_executorch_backend",


Just FYI.
I encountered build-error due to this line, but I think it's possibly environment setup issue at my side... don't know how to install "ANDROID" for buck2.

Caused by: [0/591] 0: Error looking up configured node root//backends/qualcomm:qnn_executorch_backend (prelude//platforms:default#904931f735703749) 1: looking up unconfigured target node `root//backends/qualcomm:qnn_executorch_backend` 2: Error loading targets in package `root//backends/qualcomm` for target `root//backends/qualcomm:qnn_executorch_backend` 3: From load at backends/qualcomm/TARGETS:2 4: Error evaluating module: `root//backends/qualcomm/targets.bzl` 5: error: Module has no symbol `ANDROID` --> backends/qualcomm/targets.bzl:3:5 | 3 | "ANDROID", | ^^^^^^^^^ | CMake Error at build/Utils.cmake:216 (message): executorch: source list generation failed Call Stack (most recent call first): CMakeLists.txt:340 (extract_sources)

I also encountered this error in this PR.
I bypassed it by commenting out this line.

I remove it in the latest commit. Sorry for the inconvenience.

No worry at all. I'm thinking if we should set up buck2 environment internally.
Is the buck2 flow intended for open-source project?

Ah no…we want to remove buck dependency in oss flow…

chiwwang · 2024-10-03T02:28:10Z

We also need to check why the matmul is quantized to an unsupport schema. Maybe something wrong in our QnnQuantizer or so?

Hi @cccclai, @chiwwang,

It seems that I could not reproduce the op validation failed for matmul op on my end when using QNN 2.26 and add the convert_linear_to_conv pass. The below is my call sequence.
./install_requirements.sh
cp schema/*.fbs exir/_serialize/
export PYTHONPATH=/local/mnt/workspace/test_cc/
export ANDROID_NDK=/local/mnt/workspace/shewu/android-ndk-r26c
export QNN_SDK_ROOT=/local/mnt/workspace/shewu/qairt/2.26.0.240828
export LD_LIBRARY_PATH=$QNN_SDK_ROOT/lib/x86_64-linux-clang
./backends/qualcomm/scripts/build.sh
python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w

should be ANDROID_NDK_ROOT instead of ANDROID_NDK.

chiwwang · 2024-10-03T03:04:20Z

Hi @cccclai
I added the SOC here: cccclai#1
I ran a silly model with soc_model=SSG2115P on a SM8550 and it seems OK.
I will test the command shared here.

[update]

python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w --soc_model SSG2115P

seems to work 😮

shewu-quic · 2024-10-03T06:38:28Z

Hi @cccclai
I add a PR to quantize embedding op and 16x8 matmul.
I ran this model, and it could fully delegate.

If you have any problem, please let me know.

python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w

chiwwang · 2024-10-03T06:42:40Z

Hi @cccclai I add a PR to quantize embedding op and 16x8 matmul. I ran this model, and it could fully delegate. If you have any problem, please let me know.
python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w

Hey @shewu-quic
Is there PR link? And I'm thinking if you can add more descriptions to let us understand how you achieve this, that would be great 😄

shewu-quic · 2024-10-03T07:04:10Z

Oh~ sure, let me add more descriptions for this PR
About 16x8 matmul op, I think it can be divided into two types according to whether to use kv cache.
- if use kv cache, we could annotate 8 bits to past KV (input) along the second input of matmul to lower the size of input tensor.
- if not use kv cache, I think we could just annotate 16x8 matmul to improve performance.

By default, we annotate matmul with 16x16 in 16 bits quantization and we could override it with add_custom_quant_annotations.

chiwwang · 2024-10-03T07:11:00Z

So it's "custom annotation", almost based on the topology of the graph, right?
We look into the graph and choose a node to annotate, which helps us to obtain 16x8 matmul. Do I understand correctly?

shewu-quic · 2024-10-03T07:23:00Z

So it's "custom annotation", almost based on the topology of the graph, right? We look into the graph and choose a node to annotate, which helps us to obtain 16x8 matmul. Do I understand correctly?

Yes, that right.
After applying the custom annotation, you could get the below.

                                                                  q (16 bits) -> dq (16 bits)--\
                                                                                                 matmul -> q (16 bits) -> dq (16 bits)
q (16 bits) -> dq (16 bits) -> op -> q (16 bits) -> dq (16 bits) -> q (8 bits) -> dq (8 bits)--/

For q (16 bits) -> dq (16 bits) -> q (8 bits) -> dq (8 bits) pattern, we will tag requantize to op and insert to_copy (QNN Convert or Cast) after op.

chiwwang · 2024-10-03T08:55:20Z

Got it Thanks.
Note that the command should contain --soc_model SSG2115P for correct VTCM size. (need PR cccclai#1, though)
python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w --soc_model SSG2115P

cccclai · 2024-10-03T18:17:39Z

Thanks folks! I was able to get the model running with embedding/matmul lower with these changes. Maybe we can extend the soc table? The change looks reasonable to me.

cccclai · 2024-10-03T19:03:09Z

layer norm op lowering:

We have a different model using layernorm instead rmsnorm, because the runtime just recently bumps to 2.25 and the current model still uses layernorm, I'll make change on this PR with the PRs your folks sent to test both layernorm and rmsnorm.

[edit]:
made some progress on it. The bias node quant node looks suspicious...it's

    %dequantize_per_tensor_2 : [num_users=1] = call_function[target=torch.ops.quantized_decomposed.dequantize_per_tensor.default](args = (%b__frozen_param2, 9.5367431640625e-07, 0, -2147483648, 2147483647, torch.int32), kwargs = {})
...
    %layer_norm : [num_users=1] = call_function[target=torch.ops.aten.layer_norm.default](args = (%dequantize_per_tensor_11, [64], %dequantize_per_tensor_1, %dequantize_per_tensor_2), kwargs = {})

...

cccclai · 2024-10-03T19:05:20Z

In the meanwhile, we're tracking latency (both model loading time and inference time), memory, power and accuracy for production. Latency and accuracy are easier, how about memory and power?

Summary: python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w Segfault error stacktrace ``` [INFO] [Qnn ExecuTorch]: Initialize Qnn backend parameters for Qnn executorch backend type 2 [INFO] [Qnn ExecuTorch]: Caching: Caching is in SAVE MODE. [WARNING] [Qnn ExecuTorch]: Qnn API version 2.19.0 is used. The version is tested against 2.18.0. [INFO] [Qnn ExecuTorch]: Running level=3 optimization. AddressSanitizer:DEADLYSIGNAL ================================================================= ==1523599==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000020 (pc 0x7f1585ee38e2 bp 0x7f16d5ab8800 sp 0x7ffed19ab8b0 T0) ==1523599==The signal is caused by a READ memory access. ==1523599==Hint: address points to the zero page. SCARINESS: 10 (null-deref) #0 0x7f1585ee38e2 (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x2ce38e2) (BuildId: bc3ab8ddc89a0e65) #1 0x7f1585dd8926 (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x2bd8926) (BuildId: bc3ab8ddc89a0e65) #2 0x7f15844d1161 (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x12d1161) (BuildId: bc3ab8ddc89a0e65) #3 0x7f15844dcac6 (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x12dcac6) (BuildId: bc3ab8ddc89a0e65) #4 0x7f15844d245b (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x12d245b) (BuildId: bc3ab8ddc89a0e65) pytorch#5 0x7f15b9bc7b21 in auto torch::executor::qnn::QnnInterface::qnn_backend_validate_op_config<void*, Qnn_OpConfig_t>(void*, Qnn_OpConfig_t) const fbcode/executorch/backends/qualcomm/runtime/backends/QnnFunctionInterface.h:39 pytorch#6 0x7f15b9bc7682 in torch::executor::qnn::QnnBackend::BackendValidateOpConfig(Qnn_OpConfig_t const&) fbcode/executorch/backends/qualcomm/runtime/backends/QnnBackendCommon.h:41 pytorch#7 0x7f15b9bc7115 in torch::executor::qnn::QnnManager::IsNodeSupportedByBackend(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&) fbcode/executorch/backends/qualcomm/runtime/QnnManager.cpp:450 pytorch#8 0x7f15b9dd44ee in torch::executor::qnn::PyQnnManager::IsNodeSupportedByBackend(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&) fbcode/executorch/backends/qualcomm/aot/python/PyQnnManagerAdaptor.h:57 pytorch#9 0x7f15b9e5b986 in pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)::operator()(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&) const fbsource/pybind11/pybind11.h:84 pytorch#10 0x7f15b9e5b8b5 in bool pybind11::detail::argument_loader<torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&>::call_impl<bool, pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)&, 0ul, 1ul, pybind11::detail::void_type>(torch::executor::qnn::PyQnnManager&&, std::integer_sequence<unsigned long, 0ul, 1ul>, pybind11::detail::void_type&&) && fbsource/pybind11/cast.h:2042 pytorch#11 0x7f15b9e53831 in std::enable_if<!std::is_void<bool>::value, bool>::type pybind11::detail::argument_loader<torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&>::call<bool, pybind11::detail::void_type, pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)&>(pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&)&) && fbsource/pybind11/cast.h:2014 pytorch#12 0x7f15b9e53454 in void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), bool, torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool&&, torch::executor::qnn::PyQnnManager (*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::operator()(pybind11::detail::function_call&) const fbsource/pybind11/pybind11.h:193 pytorch#13 0x7f15b9e530d3 in void pybind11::cpp_function::initialize<pybind11::cpp_function::cpp_function<bool, torch::executor::qnn::PyQnnManager, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool (torch::executor::qnn::PyQnnManager::*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), bool, torch::executor::qnn::PyQnnManager*, std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&, pybind11::name, pybind11::is_method, pybind11::sibling>(bool&&, torch::executor::qnn::PyQnnManager (*)(std::vector<std::shared_ptr<torch::executor::qnn::OpWrapper>, std::allocator<std::shared_ptr<torch::executor::qnn::OpWrapper>>>&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) fbsource/pybind11/pybind11.h:170 pytorch#14 0x7f15b9d8f707 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) fbsource/pybind11/pybind11.h:767 pytorch#15 0x327141 in cfunction_call(_object*, _object*, _object*) (.__uniq.281047882695835599676768160755749362799) (/usr/local/fbcode/platform010/bin/python3.10+0x327141) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#16 0x349630 in _PyObject_MakeTpCall (/usr/local/fbcode/platform010/bin/python3.10+0x349630) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#17 0x5897d4 in method_vectorcall(_object*, _object* const*, unsigned long, _object*) (.__uniq.243338978568352371442406765225626566013.llvm.6236606370933165261) (/usr/local/fbcode/platform010/bin/python3.10+0x5897d4) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#18 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#19 0x331421 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x331421) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#20 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#21 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#22 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#23 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#24 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#25 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#26 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#27 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#28 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#29 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#30 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#31 0x331577 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x331577) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#32 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#33 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#34 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#35 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#36 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#37 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#38 0x39b8ca in _PyEval_Vector (/usr/local/fbcode/platform010/bin/python3.10+0x39b8ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#39 0x39ad7d in _PyObject_FastCallDictTstate (/usr/local/fbcode/platform010/bin/python3.10+0x39ad7d) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#40 0x3c8b72 in slot_tp_call(_object*, _object*, _object*) (.__uniq.235726554139783955843240177532338160225) (/usr/local/fbcode/platform010/bin/python3.10+0x3c8b72) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#41 0x392ca8 in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x392ca8) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#42 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#43 0x39b8ca in _PyEval_Vector (/usr/local/fbcode/platform010/bin/python3.10+0x39b8ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#44 0x331b18 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x331b18) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#45 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#46 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#47 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#48 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#49 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#50 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#51 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#52 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#53 0x3313f2 in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3313f2) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#54 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#55 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#56 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#57 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#58 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#59 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#60 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#61 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#62 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#63 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#64 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#65 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#66 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#67 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#68 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#69 0x327547 in _PyFunction_Vectorcall (/usr/local/fbcode/platform010/bin/python3.10+0x327547) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#70 0x3928df in call_function(_ts*, PyTraceInfo*, _object***, long, _object*) (.__uniq.79849310599369217189729546442812793949) (/usr/local/fbcode/platform010/bin/python3.10+0x3928df) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#71 0x3314ca in _PyEval_EvalFrameDefault (/usr/local/fbcode/platform010/bin/python3.10+0x3314ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#72 0x39b8ca in _PyEval_Vector (/usr/local/fbcode/platform010/bin/python3.10+0x39b8ca) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#73 0x431565 in PyEval_EvalCode (/usr/local/fbcode/platform010/bin/python3.10+0x431565) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#74 0x431447 in run_mod(_mod*, _object*, _object*, _object*, PyCompilerFlags*, _arena*) (.__uniq.251861886623903963524397139660542440724.llvm.17622910512627074885) (/usr/local/fbcode/platform010/bin/python3.10+0x431447) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#75 0x4e3054 in pyrun_file(_IO_FILE*, _object*, int, _object*, _object*, int, PyCompilerFlags*) (.__uniq.251861886623903963524397139660542440724) (/usr/local/fbcode/platform010/bin/python3.10+0x4e3054) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#76 0x4e2b54 in _PyRun_SimpleFileObject (/usr/local/fbcode/platform010/bin/python3.10+0x4e2b54) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#77 0x4e28f1 in _PyRun_AnyFileObject (/usr/local/fbcode/platform010/bin/python3.10+0x4e28f1) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#78 0x4d4a54 in Py_RunMain (/usr/local/fbcode/platform010/bin/python3.10+0x4d4a54) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#79 0x4d286b in pymain_main(_PyArgv*) (.__uniq.297908980262787110426434251325078884054) (/usr/local/fbcode/platform010/bin/python3.10+0x4d286b) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#80 0x4d2759 in Py_BytesMain (/usr/local/fbcode/platform010/bin/python3.10+0x4d2759) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) pytorch#81 0x7f19e282c656 in __libc_start_call_main (/usr/local/fbcode/platform010/lib/libc.so.6+0x2c656) (BuildId: 93cdceeb8322234c38e1f2c93ad0ff10c7632fa6) pytorch#82 0x7f19e282c717 in __libc_start_main@GLIBC_2.2.5 (/usr/local/fbcode/platform010/lib/libc.so.6+0x2c717) (BuildId: 93cdceeb8322234c38e1f2c93ad0ff10c7632fa6) pytorch#83 0x553d90 in _start (/usr/local/fbcode/platform010/bin/python3.10+0x553d90) (BuildId: a620038add613fd8585eb50983ca8e455d54738e) AddressSanitizer can not provide additional info. AddressSanitizer: SEGV (/home/chenlai/fbsource/third-party/qualcomm/qnn/qnn-2.26/lib/x86_64-linux-clang/libQnnHtp.so+0x2ce38e2) (BuildId: bc3ab8ddc89a0e65) ==1523599==ABORTING ``` Differential Revision: D63736779

facebook-github-bot · 2024-10-03T21:51:17Z

This pull request was exported from Phabricator. Differential Revision: D63736779

cccclai · 2024-10-03T21:54:53Z

Hi @cccclai I add a PR to quantize embedding op and 16x8 matmul. I ran this model, and it could fully delegate. If you have any problem, please let me know.
python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w

Thanks! I was able to lower embedding, however the latency seems very close to the cpu version. Maybe we will have a better memory usage by using the qnn embedding? I feel like the alternative solution includes:
0. use cpu fp embedding

use 16x8 qnn embedding
use cpu 4bit embedding https://github.com/pytorch/executorch/blob/13408b9848b1a776f03ff0fbac5f18b6347ff64a/kernels/quantized/cpu/op_embedding4b.cpp

and then maybe we have a better understanding on the latency/memory for these options.

cccclai · 2024-10-03T21:56:06Z

Oh~ sure, let me add more descriptions for this PR About 16x8 matmul op, I think it can be divided into two types according to whether to use kv cache. - if use kv cache, we could annotate 8 bits to past KV (input) along the second input of matmul to lower the size of input tensor. - if not use kv cache, I think we could just annotate 16x8 matmul to improve performance.

By default, we annotate matmul with 16x16 in 16 bits quantization and we could override it with add_custom_quant_annotations.

this is working well. Thanks! Also wonder if have know the latency/memory compared between 16x8 vs 8x8?

cccclai · 2024-10-03T22:11:39Z

examples/models/llama2/model.py

+            # generate_full_logits=self.generate_full_logits,
+            # output_prune_map=output_prune_map,
+            # enable_dynamic_shape=self.enable_dynamic_shape,
+            use_layer_norm_op=True,


@chiwwang @shewu-quic here is the place to switch between layer norm and rms norm

shewu-quic · 2024-10-04T02:09:28Z

[edit]: made some progress on it. The bias node quant node looks suspicious...it's

    %dequantize_per_tensor_2 : [num_users=1] = call_function[target=torch.ops.quantized_decomposed.dequantize_per_tensor.default](args = (%b__frozen_param2, 9.5367431640625e-07, 0, -2147483648, 2147483647, torch.int32), kwargs = {})
...
    %layer_norm : [num_users=1] = call_function[target=torch.ops.aten.layer_norm.default](args = (%dequantize_per_tensor_11, [64], %dequantize_per_tensor_1, %dequantize_per_tensor_2), kwargs = {})

...

We quantize bias node to int32 by default.
https://github.com/cccclai/executorch-1/blob/35387f6de7c731e8d3f52ce504c2abd912c6f096/backends/qualcomm/quantizer/utils.py#L1074

shewu-quic · 2024-10-04T02:16:11Z

this is working well. Thanks! Also wonder if have know the latency/memory compared between 16x8 vs 8x8?

Do you mean quantize the model in 8x8?
Maybe we could give it a try, but I doubt we'd get reasonable accuracy on it.

cccclai · 2024-10-04T02:53:29Z

this is working well. Thanks! Also wonder if have know the latency/memory compared between 16x8 vs 8x8?

Do you mean quantize the model in 8x8? Maybe we could give it a try, but I doubt we'd get reasonable accuracy on it.

It's an arbitrary task, meaning we just give a prompt and generate one determined result given the prompt, so it's easier to quantize. Also we'll add QAT for this model later and it helps recover the accuracy a lot.

chiwwang · 2024-10-04T04:31:13Z

One possibility might be the layernorm is decomposed and is not built as a QNN_LayerNorm.
@shewu-quic do we have any doc about running QNN profiling? I think it's easier to find performance bottleneck by profiling data.

chiwwang · 2024-10-04T04:38:02Z

this is working well. Thanks! Also wonder if have know the latency/memory compared between 16x8 vs 8x8?

8x8 is usually faster. However, I recommend checking QNN per-op profiling data first.... but I'm thinking if the model is llama-like, can our optimization help here. The related PRs are in internal review and not submitted yet. (@chunit-quic

cccclai · 2024-10-04T05:01:27Z

One possibility might be the layernorm is decomposed and is not built as a QNN_LayerNorm. @shewu-quic do we have any doc about running QNN profiling? I think it's easier to find performance bottleneck by profiling data.

Hmm I think I saw layer norm in the graph, but fails to lower because validation error. Here is the log


[INFO] [Qnn ExecuTorch]: QnnDsp <V> key found in multimap: qti.aisw	 => 0x7f7acbbd88a0

[INFO] [Qnn ExecuTorch]: QnnDsp <V> Validating Native Op aten_native_layer_norm_default_1:qti.aisw:LayerNorm

[INFO] [Qnn ExecuTorch]: Validating Op Config aten_native_layer_norm_default_1.

[INFO] [Qnn ExecuTorch]: Validating Op Type LayerNorm == LayerNorm.

[INFO] [Qnn ExecuTorch]: Validating Inputs.

[INFO] [Qnn ExecuTorch]: Validating Input[0] of ID 0.

[INFO] [Qnn ExecuTorch]: Validating Params.

[INFO] [Qnn ExecuTorch]: Validating Param[0]: axes.

[INFO] [Qnn ExecuTorch]: Validating Param[1]: epsilon.

[INFO] [Qnn ExecuTorch]: Validating Inputs.

[INFO] [Qnn ExecuTorch]: Validating Input[0] of ID 0.

[INFO] [Qnn ExecuTorch]: Validating Input[1] of ID 0.

[INFO] [Qnn ExecuTorch]: Validating Outputs.

[INFO] [Qnn ExecuTorch]: Validating Output[0] of ID 0.

[INFO] [Qnn ExecuTorch]: Validating tensor 0 and 0 have the same Datatype.

[INFO] [Qnn ExecuTorch]: Validating tensor 0 and 0 have the same Shape.

[INFO] [Qnn ExecuTorch]: Validating tensor 0 and 0 have the same Rank.

[INFO] [Qnn ExecuTorch]: QnnDsp <V> Validating Op Config aten_native_layer_norm_default_1.

[INFO] [Qnn ExecuTorch]: QnnDsp <V> check for mandatory input

[INFO] [Qnn ExecuTorch]: QnnDsp <V> check for mandatory output

[INFO] [Qnn ExecuTorch]: QnnDsp <V> Validating Op aten_native_layer_norm_default_1 with precision INT16

[INFO] [Qnn ExecuTorch]: QnnDsp <V> check non-mandatory input

[INFO] [Qnn ExecuTorch]: QnnDsp <V> Validating Op aten_native_layer_norm_default_1 with precision INT16

[INFO] [Qnn ExecuTorch]: QnnDsp <V> check non-mandatory input

[INFO] [Qnn ExecuTorch]: QnnDsp <V> Validating Op aten_native_layer_norm_default_1 with precision INT16

[INFO] [Qnn ExecuTorch]: QnnDsp <V> check non-mandatory input

[INFO] [Qnn ExecuTorch]: QnnDsp <V> Validating Op aten_native_layer_norm_default_1 with precision INT16

[INFO] [Qnn ExecuTorch]: QnnDsp <V> check non-mandatory input

[INFO] [Qnn ExecuTorch]: QnnDsp <V> Validating Op aten_native_layer_norm_default_1 with precision INT16

[INFO] [Qnn ExecuTorch]: QnnDsp <V> check non-mandatory input

[INFO] [Qnn ExecuTorch]: QnnDsp <V> Validating Op aten_native_layer_norm_default_1 with precision INT16

[INFO] [Qnn ExecuTorch]: QnnDsp <V> check non-mandatory input

[INFO] [Qnn ExecuTorch]: QnnDsp <V> Validating Op aten_native_layer_norm_default_1 with precision INT16

[INFO] [Qnn ExecuTorch]: QnnDsp <V> check non-mandatory input

[INFO] [Qnn ExecuTorch]: QnnDsp <V> Validating Op aten_native_layer_norm_default_1 with precision INT16

[INFO] [Qnn ExecuTorch]: QnnDsp <V> check non-mandatory input

[INFO] [Qnn ExecuTorch]: QnnDsp <V> check non-mandatory output

[INFO] [Qnn ExecuTorch]: QnnDsp <V> Checking input and output constraints for LayerNorm with input datayptes = [[<QnnDatatype.QNN_DATATYPE_UFIXED_POINT_16: 16>], [<QnnDatatype.QNN_DATATYPE_UFIXED_POINT_16: 16>], [<QnnDatatype.QNN_DATATYPE_SFIXED_POINT_32: 14>]] and output datayptes = [[<QnnDatatype.QNN_DATATYPE_UFIXED_POINT_16: 16>]]

[WARNING] [Qnn ExecuTorch]: QnnDsp <W> [4294967295] has incorrect Value 0, expected equal to -32768.

[INFO] [Qnn ExecuTorch]: QnnDsp <V> validateNativeOps aten_native_layer_norm_default_1:qti.aisw:LayerNorm htp op validator failed 3110

[INFO] [Qnn ExecuTorch]: QnnDsp <V> registered validator failed => 3110

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> QnnBackend_validateOpConfig failed 3110

[INFO] [Qnn ExecuTorch]: QnnDsp <V> Wake up free backend (id: 1)'s thread(s)

[ERROR] [Qnn ExecuTorch]: QnnDsp <E> Failed to validate op aten_native_layer_norm_default_1 with error 0xc26

[WARNING] [Qnn ExecuTorch]: Qnn Backend op validation failed with error: 3110
[INFO] [Qnn ExecuTorch]: QnnDsp <I> QnnBackend_validateOpConfig started, backend = 0x1

cccclai · 2024-10-04T05:10:30Z

One possibility might be the layernorm is decomposed and is not built as a QNN_LayerNorm. @shewu-quic do we have any doc about running QNN profiling? I think it's easier to find performance bottleneck by profiling data.

QNN profiling sounds good. Also, we need to optimize two things:

model loading time. I'm getting about 2+ seconds loading time give this params {"dim": 576, "hidden_dim": 4096, "n_layers": 2, "n_heads": 9, "n_kv_heads": 3, "vocab_size": 128256, "norm_eps": 1e-05}
memory consumption. We'd like to save more memory. It seems like there are two places to track, one is the process starting from cpu, and additional htp memory allocation inside qnn if any.

shewu-quic · 2024-10-04T05:24:51Z

One possibility might be the layernorm is decomposed and is not built as a QNN_LayerNorm. @shewu-quic do we have any doc about running QNN profiling? I think it's easier to find performance bottleneck by profiling data.

I think that we are missing the document for profiling. Let me add it ASAP.

chiwwang · 2024-10-04T05:47:57Z

And this seems meaningful!

[INFO] [Qnn ExecuTorch]: QnnDsp Checking input and output constraints for LayerNorm with input datayptes = [[<QnnDatatype.QNN_DATATYPE_UFIXED_POINT_16: 16>], [<QnnDatatype.QNN_DATATYPE_UFIXED_POINT_16: 16>], [<QnnDatatype.QNN_DATATYPE_SFIXED_POINT_32: 14>]] and output datayptes = [[<QnnDatatype.QNN_DATATYPE_UFIXED_POINT_16: 16>]]
[WARNING] [Qnn ExecuTorch]: QnnDsp [4294967295] has incorrect Value 0, expected equal to -32768.

So, the graph is partitioned into many parts? I think it might be a cause for big latency and slow load-time.
The error is saying we need symmetric quant somehow for LayerNorm. So we need to fix it in QnnQuantizer.

shewu-quic · 2024-10-04T05:48:49Z

Hmm I think I saw layer norm in the graph, but fails to lower because validation error. Here is the log

I add a PR to fix it
It seems something wrong for layer_norm in our quantizer.
Based on constraint in qnn docs, 16bit gamma(weight) must have 16bit input and symmetric quantized.

Let me file a PR to fix it in mainline.

cccclai · 2024-10-04T06:07:52Z

And this seems meaningful!

[INFO] [Qnn ExecuTorch]: QnnDsp Checking input and output constraints for LayerNorm with input datayptes = [[<QnnDatatype.QNN_DATATYPE_UFIXED_POINT_16: 16>], [<QnnDatatype.QNN_DATATYPE_UFIXED_POINT_16: 16>], [<QnnDatatype.QNN_DATATYPE_SFIXED_POINT_32: 14>]] and output datayptes = [[<QnnDatatype.QNN_DATATYPE_UFIXED_POINT_16: 16>]]
[WARNING] [Qnn ExecuTorch]: QnnDsp [4294967295] has incorrect Value 0, expected equal to -32768.

So, the graph is partitioned into many parts? I think it might be a cause for big latency and slow load-time. The error is saying we need symmetric quant somehow for LayerNorm. So we need to fix it in QnnQuantizer.

The load time is still slow with the rms norm model which we only have one graph. The graph break only happens for the layer norm model

cccclai · 2024-10-04T06:08:55Z

Btw just for my knowledge, which line in the log says I’m not using symmetric quantization?

chiwwang · 2024-10-04T06:10:03Z

[WARNING] [Qnn ExecuTorch]: QnnDsp [4294967295] has incorrect Value 0, expected equal to -32768

This line: [WARNING] [Qnn ExecuTorch]: QnnDsp [4294967295] has incorrect Value 0, expected equal to -32768

UFIXED_16, offset=-32768 meaning symmetric quant.

chiwwang · 2024-10-04T06:11:13Z

And this seems meaningful!

[INFO] [Qnn ExecuTorch]: QnnDsp Checking input and output constraints for LayerNorm with input datayptes = [[<QnnDatatype.QNN_DATATYPE_UFIXED_POINT_16: 16>], [<QnnDatatype.QNN_DATATYPE_UFIXED_POINT_16: 16>], [<QnnDatatype.QNN_DATATYPE_SFIXED_POINT_32: 14>]] and output datayptes = [[<QnnDatatype.QNN_DATATYPE_UFIXED_POINT_16: 16>]]
[WARNING] [Qnn ExecuTorch]: QnnDsp [4294967295] has incorrect Value 0, expected equal to -32768.

So, the graph is partitioned into many parts? I think it might be a cause for big latency and slow load-time. The error is saying we need symmetric quant somehow for LayerNorm. So we need to fix it in QnnQuantizer.

The load time is still slow with the rms norm model which we only have one graph. The graph break only happens for the layer norm model

Got it... then we need to look into this.
The runtime is supposed to load HTP context binaries, which should be fairly fast. If it takes up to 2 seconds, something wrong.

shewu-quic · 2024-10-04T09:48:54Z

One possibility might be the layernorm is decomposed and is not built as a QNN_LayerNorm. @shewu-quic do we have any doc about running QNN profiling? I think it's easier to find performance bottleneck by profiling data.

QNN profiling sounds good. Also, we need to optimize two things:

model loading time. I'm getting about 2+ seconds loading time give this params {"dim": 576, "hidden_dim": 4096, "n_layers": 2, "n_heads": 9, "n_kv_heads": 3, "vocab_size": 128256, "norm_eps": 1e-05}

I exported the llama model with rms_norm by your setting and ran on the our SM8650 device with llama_main
But it seems that model loading time takes about 205 ms on average.
On SM8550, approximate 211 ms
May I know how you measure the loading time?
Or did I do something wrong?

python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w

cmake \
    -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}/build/cmake/android.toolchain.cmake" \
    -DANDROID_ABI="arm64-v8a" \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-android-out \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_QNN=ON \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DQNN_SDK_ROOT=$QNN_SDK_ROOT \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -DXNNPACK_ENABLE_ARM_BF16=OFF \
    -Bcmake-android-out .

cmake --build cmake-android-out -j4 --target install --config Release
cmake \
    -DCMAKE_TOOLCHAIN_FILE="$ANDROID_NDK"/build/cmake/android.toolchain.cmake  \
    -DANDROID_ABI="arm64-v8a" \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-android-out \
    -DCMAKE_BUILD_TYPE=Release -DPYTHON_EXECUTABLE=python \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_QNN=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -Bcmake-android-out/examples/models/llama2 examples/models/llama2

cmake --build cmake-android-out/examples/models/llama2 -j4 --config Release

${adb_cmd} shell rm -rf ${DEVICE_DIR}
${adb_cmd} shell mkdir -p ${DEVICE_DIR}
${adb_cmd} push ./llama2.pte ${DEVICE_DIR}
${adb_cmd} push cmake-android-out/examples/models/llama2/llama_main  ${DEVICE_DIR}
${adb_cmd} push cmake-android-out/lib/libqnn_executorch_backend.so ${DEVICE_DIR}
${adb_cmd} push ./tokenizer.model  ${DEVICE_DIR}
${adb_cmd} push $QNN_LIBRARY/lib/aarch64-android/libQnnHtp.so ${DEVICE_DIR}
${adb_cmd} push $QNN_LIBRARY/lib/aarch64-android/libQnnSystem.so ${DEVICE_DIR}
${adb_cmd} push $QNN_LIBRARY/lib/aarch64-android/libQnnHtpV75Stub.so ${DEVICE_DIR}
${adb_cmd} push $QNN_LIBRARY/lib/hexagon-v75/unsigned/libQnnHtpV75Skel.so ${DEVICE_DIR}
$LIB_ENV && $ADSP_ENV && cd $DEVICE_DIR && ./llama_main --model_path=./llama2.pte --tokenizer_path=./tokenizer.model --prompt=\"$PROMPT\" --seq_len 128 --temperature 0

cccclai · 2024-10-05T22:28:31Z

One possibility might be the layernorm is decomposed and is not built as a QNN_LayerNorm. @shewu-quic do we have any doc about running QNN profiling? I think it's easier to find performance bottleneck by profiling data.

QNN profiling sounds good. Also, we need to optimize two things:

model loading time. I'm getting about 2+ seconds loading time give this params {"dim": 576, "hidden_dim": 4096, "n_layers": 2, "n_heads": 9, "n_kv_heads": 3, "vocab_size": 128256, "norm_eps": 1e-05}

I exported the llama model with rms_norm by your setting and ran on the our SM8650 device with llama_main But it seems that model loading time takes about 205 ms on average. On SM8550, approximate 211 ms May I know how you measure the loading time? Or did I do something wrong?

python -m executorch.examples.models.llama2.export_llama --disable_dynamic_shape --qnn --pt2e_quantize qnn_16a4w

cmake \
    -DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}/build/cmake/android.toolchain.cmake" \
    -DANDROID_ABI="arm64-v8a" \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-android-out \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_QNN=ON \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DQNN_SDK_ROOT=$QNN_SDK_ROOT \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -DXNNPACK_ENABLE_ARM_BF16=OFF \
    -Bcmake-android-out .

cmake --build cmake-android-out -j4 --target install --config Release
cmake \
    -DCMAKE_TOOLCHAIN_FILE="$ANDROID_NDK"/build/cmake/android.toolchain.cmake  \
    -DANDROID_ABI="arm64-v8a" \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-android-out \
    -DCMAKE_BUILD_TYPE=Release -DPYTHON_EXECUTABLE=python \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_QNN=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -Bcmake-android-out/examples/models/llama2 examples/models/llama2

cmake --build cmake-android-out/examples/models/llama2 -j4 --config Release

${adb_cmd} shell rm -rf ${DEVICE_DIR}
${adb_cmd} shell mkdir -p ${DEVICE_DIR}
${adb_cmd} push ./llama2.pte ${DEVICE_DIR}
${adb_cmd} push cmake-android-out/examples/models/llama2/llama_main  ${DEVICE_DIR}
${adb_cmd} push cmake-android-out/lib/libqnn_executorch_backend.so ${DEVICE_DIR}
${adb_cmd} push ./tokenizer.model  ${DEVICE_DIR}
${adb_cmd} push $QNN_LIBRARY/lib/aarch64-android/libQnnHtp.so ${DEVICE_DIR}
${adb_cmd} push $QNN_LIBRARY/lib/aarch64-android/libQnnSystem.so ${DEVICE_DIR}
${adb_cmd} push $QNN_LIBRARY/lib/aarch64-android/libQnnHtpV75Stub.so ${DEVICE_DIR}
${adb_cmd} push $QNN_LIBRARY/lib/hexagon-v75/unsigned/libQnnHtpV75Skel.so ${DEVICE_DIR}
$LIB_ENV && $ADSP_ENV && cd $DEVICE_DIR && ./llama_main --model_path=./llama2.pte --tokenizer_path=./tokenizer.model --prompt=\"$PROMPT\" --seq_len 128 --temperature 0

I think this is the init time we're measuring, maybe it's related to the actual soc. Let me try with the layer norm on device with the fix from you, as the curretn model we're tracking uses layer norm.

shewu-quic · 2024-10-08T07:09:59Z

Hi @cccclai,
I would like to reproduce layer norm model on my end.
May I know if I just need to modify the following it's enough?

https://github.com/cccclai/executorch-1/blob/35387f6de7c731e8d3f52ce504c2abd912c6f096/examples/models/llama2/llama_transformer.py#L252

# self.tok_embeddings = torch.nn.Embedding(params.vocab_size, params.dim)
self.tok_embeddings = torch.nn.Embedding(params.input_vocab_size, params.dim)

and https://github.com/cccclai/executorch-1/blob/35387f6de7c731e8d3f52ce504c2abd912c6f096/examples/models/llama2/params/demo_config.json#L1

How about here?

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 2, 2024

facebook-github-bot added the fb-exported label Oct 2, 2024

cccclai marked this pull request as draft October 2, 2024 01:52

cccclai requested a review from chiwwang October 2, 2024 01:58

cccclai force-pushed the export-D63736779 branch from aeb0ec1 to e5ef519 Compare October 2, 2024 19:30

chiwwang reviewed Oct 3, 2024

View reviewed changes

cccclai force-pushed the export-D63736779 branch from e5ef519 to 35387f6 Compare October 3, 2024 21:51

cccclai commented Oct 3, 2024

View reviewed changes

shewu-quic mentioned this pull request Dec 19, 2024

Qualcomm AI Engine Direct - Support Hybrid Mode for Llama3.2 #7175

Merged

prefill model #5807

Are you sure you want to change the base?

prefill model #5807

Conversation

cccclai commented Oct 2, 2024 • edited Loading

pytorch-bot bot commented Oct 2, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/5807

❌ 13 New Failures

facebook-github-bot commented Oct 2, 2024

chiwwang commented Oct 2, 2024

chiwwang commented Oct 2, 2024

shewu-quic commented Oct 2, 2024 • edited Loading

chiwwang commented Oct 2, 2024

chiwwang commented Oct 2, 2024

shewu-quic commented Oct 2, 2024

facebook-github-bot commented Oct 2, 2024

cccclai commented Oct 2, 2024

chiwwang commented Oct 3, 2024

cccclai commented Oct 3, 2024

cccclai commented Oct 3, 2024

chiwwang commented Oct 3, 2024

chiwwang left a comment

Choose a reason for hiding this comment

chiwwang Oct 3, 2024

Choose a reason for hiding this comment

shewu-quic Oct 3, 2024

Choose a reason for hiding this comment

cccclai Oct 3, 2024

Choose a reason for hiding this comment

chiwwang Oct 4, 2024

Choose a reason for hiding this comment

cccclai Oct 4, 2024

Choose a reason for hiding this comment

chiwwang commented Oct 3, 2024

chiwwang commented Oct 3, 2024 • edited Loading

shewu-quic commented Oct 3, 2024 • edited Loading

chiwwang commented Oct 3, 2024

shewu-quic commented Oct 3, 2024 • edited Loading

chiwwang commented Oct 3, 2024

shewu-quic commented Oct 3, 2024

chiwwang commented Oct 3, 2024 • edited Loading

cccclai commented Oct 3, 2024

cccclai commented Oct 3, 2024 • edited Loading

cccclai commented Oct 3, 2024 • edited Loading

facebook-github-bot commented Oct 3, 2024

cccclai commented Oct 3, 2024

cccclai commented Oct 3, 2024

cccclai Oct 3, 2024

Choose a reason for hiding this comment

shewu-quic commented Oct 4, 2024

shewu-quic commented Oct 4, 2024

cccclai commented Oct 4, 2024

chiwwang commented Oct 4, 2024

chiwwang commented Oct 4, 2024 • edited Loading

cccclai commented Oct 4, 2024 • edited Loading

cccclai commented Oct 4, 2024

shewu-quic commented Oct 4, 2024

chiwwang commented Oct 4, 2024

shewu-quic commented Oct 4, 2024 • edited Loading

cccclai commented Oct 4, 2024

cccclai commented Oct 4, 2024 • edited Loading

chiwwang commented Oct 4, 2024

chiwwang commented Oct 4, 2024

shewu-quic commented Oct 4, 2024 • edited Loading

cccclai commented Oct 5, 2024

shewu-quic commented Oct 8, 2024

cccclai commented Oct 2, 2024 •

edited

Loading

pytorch-bot bot commented Oct 2, 2024 •

edited

Loading

shewu-quic commented Oct 2, 2024 •

edited

Loading

chiwwang commented Oct 3, 2024 •

edited

Loading

shewu-quic commented Oct 3, 2024 •

edited

Loading

shewu-quic commented Oct 3, 2024 •

edited

Loading

chiwwang commented Oct 3, 2024 •

edited

Loading

cccclai commented Oct 3, 2024 •

edited

Loading

cccclai commented Oct 3, 2024 •

edited

Loading

chiwwang commented Oct 4, 2024 •

edited

Loading

cccclai commented Oct 4, 2024 •

edited

Loading

shewu-quic commented Oct 4, 2024 •

edited

Loading

cccclai commented Oct 4, 2024 •

edited

Loading

shewu-quic commented Oct 4, 2024 •

edited

Loading