Dev weight sharing #6657

haowhsu-quic · 2024-11-05T09:31:12Z

Summary

support multiple graphs in single qnn context in runtime
helper function in aot for generating multi-methods pte
enable weight sharing mechanism on HTP
test cases

Test plan

python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_multi_graphs -s $DEVICE_SN -m SM8650 -b build-android/ -a $ARTIFACTS

pytorch-bot · 2024-11-05T09:31:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6657

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

✅ No Failures

As of commit 9032c34 with merge base e95f171 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cccclai

Looks fairly clean, thanks! I just asked some questions so I can understand it better.

cccclai · 2024-11-05T23:30:53Z

backends/qualcomm/aot/python/PyQnnManagerAdaptor.h

+          //   QnnTensor
+          //   ToTensor(flatbuffers::Vector<::flatbuffers::Offset<qcir::Tensor>>
+          //   tensor), flatbuffers::FlatBufferBuilder* builder);
+          tensors.emplace_back(ToTensor(ToTensor(tensor), &builder_));


What does `ToTensor(ToTensor(tensor), &builder_)) mean?

I'm guessing here we'll deduplicate tensor?

The inner ToTensor is used to convert serialized tensor in qcir to QnnTensor defined in QNN SDK header. The outer ToTensor is used to convert QnnTensor to flatbuffer API compatible tensor for building qcir.

Looks like flatbuffer has no mechanism for merging binaries, this is the detour I can come up with so far.
Will rephrase the comment for better understanding.

cccclai · 2024-11-05T23:32:59Z

backends/qualcomm/aot/python/PyQnnManagerAdaptor.h

+        QNN_EXECUTORCH_LOG_ERROR("Fail to verify qcir format");
+        return;
+      }
+      auto context = qcir::GetContext(info.ptr);


Does context mean context_binary here? Does each qcir flatbuffers combined with a context binary?

Yes, the concept is similar in qcir. The flatbuffer does not contain context binary but graph architecture and tensor data.

cccclai · 2024-11-05T23:35:02Z

backends/qualcomm/aot/python/PyQnnManagerAdaptor.h

      std::vector<std::shared_ptr<OpWrapper>>& op_wrappers) {
    QnnExecuTorchContextBinary context_binary;
    flatbuffers::FlatBufferBuilder builder;

-    if (qnn_manager_->IsOnlinePrepare()) {
+    if (qnn_manager_->IsOnlinePrepare() || qnn_manager_->IsMultipleGraphs()) {


Does it mean qnn manager can support both online prepare and multiple graph?

This Compile method is invoked in qnn_preprocess.py. Once one of these two compiler specs are recognized, qcir will be returned instead of generating context binary.
If in online_prepare mode, user could directly ship the generated pte and let QnnManager compose graph on device side.
Although multiple_graph gives the same binary format that could be used in the same scenario of online_prepare. We would expect user follow the example in our test cases, because the optimization level in HTP will be different (host side will be higher and generating more computation efficient context binary).

cccclai · 2024-11-05T23:36:08Z

backends/qualcomm/partition/qnn_partitioner.py

-        if node.target in allow_list_operator:
+        if (
+            node.target in allow_list_operator
+            # bypass if custom op appears


what is the custom op namespace?

qaisw for now.

cccclai · 2024-11-05T23:36:55Z

backends/qualcomm/qnn_preprocess.py

@@ -104,7 +104,8 @@ def preprocess(
            else:
                raise RuntimeError(f"{node.op} is not supported in Qnn")
        qnn_context_binary = qnn_manager.Compile(
-            [py_op_wrapper.GetOpWrapper() for py_op_wrapper in py_op_wrapper_list]
+            qnn_manager.GetGraphNames()[0],


what does this line mean?

Currently we will have multiple graphs inside one QnnManager, the exposed APIs need graph name as an identifier to manipulate. But in this stage, there will be only one graph for processing where the graph name comes from compiler specs.

cccclai · 2024-11-05T23:40:48Z

backends/qualcomm/runtime/backends/QnnBackendCache.cpp

    flatbuffers::Verifier verifier(
        static_cast<const uint8_t* const>(qnn_context_blob_.buffer),
        qnn_context_blob_.nbytes);

-    if (qcir::VerifyGraphBuffer(verifier)) {
+    if (qcir::VerifyContextBuffer(verifier)) {


Trying to follow - are these logic for AOT or runtime?

If for online_prepare, this will happen in runtime.
If for multiple_graphs this will happen in AoT, since we will again compile the merged qcir in host side with higher optimization level in HTP.

cccclai · 2024-11-05T23:41:56Z

backends/qualcomm/runtime/backends/htpbackend/HtpGraph.h

@@ -23,10 +23,9 @@ class HtpGraph : public QnnGraph {
      QnnBackend* backend,
      QnnContext* context,
      const QnnExecuTorchProfileLevel& profile_level,
-      const std::string& graph_name,


interesting - what did we use this graph_name before?

Before we use it to store graph_name from compiler specs in AoT or from context binary in runtime.

cccclai · 2024-11-05T23:42:51Z

backends/qualcomm/serialization/schema.fbs

@@ -177,7 +181,10 @@ table QnnExecuTorchOptions {
  shared_buffer:bool;

  /// Is model from qnn context binary
-  is_from_context_binary: bool;
+  is_from_context_binary:bool;


oh is it only for the custom op solution?

Yes, we need this flag to guarantee the order of graph IOs.

cccclai · 2024-11-05T23:43:47Z

backends/qualcomm/tests/test_qnn_delegate.py

+            )
+            for graph_name in graph_names
+        ]
+        exported_programs = [


Great! Did we observe smaller model size compared with no weight sharing option?

cccclai · 2024-11-05T23:46:19Z

examples/qualcomm/utils.py

@@ -140,7 +140,7 @@ def push(self, inputs=None, input_list=None, files=None):
            for file_name in files:
                self._adb(["push", file_name, self.workspace])

-    def execute(self, custom_runner_cmd=None):
+    def execute(self, custom_runner_cmd=None, method_index=0):


Is it because we don't have the method name but just method index?

I think that's related to the interface we used in qnn_executor_runner:

const char* method_name = nullptr; { const auto method_name_result = program->get_method_name(FLAGS_method_index); ET_CHECK_MSG(method_name_result.ok(), "Program has no methods"); method_name = *method_name_result; } ET_LOG(Info, "Using method %s", method_name);

haowhsu-quic · 2024-11-06T02:36:35Z

It just came to my mind that the memory footprint could be still high in runtime since we will create one QnnManager for one method.
Will it be possible if the method_name proposed in #6622 to be executed could be derived via BackendExecutionContext (or some data structure in void* that user could ship into the delegates for runtime option when invoking method->execute()).

My idea would be like:

// in user app side
struct QnnBackendRuntimeOption {
  std::string graph_name;
}
auto option = QnnBackendRuntimeOption({"forward"});
method->execute(&option);

// in runtime/executor/method.cpp
Error Method::execute(void* backend_runtime_option) {
  ...
  auto status = execute_instruction(backend_runtime_option);
}
Error Method::execute_instruction(void* backend_runtime_option) {
  ...
  BackendExecutionContext backend_execution_context(
    /*event_tracer*/ event_tracer_,
    /*temp_allocator*/ temp_allocator_,
    /*backend_runtime_option*/ backend_runtime_option);
    err = delegates_[delegate_idx].Execute(
      backend_execution_context,
      chain.argument_lists_[step_state_.instr_idx].data());
}

cccclai · 2024-11-06T04:06:04Z

Hmm looks like it needs to inject runtime info, and it needs more internal discussion and feel like we may not be able to ship it on time....

Also just trying to follow, what does init look like in your case? If we can add the method name in BackendExecutionContext, will it work for you?

If you need to iterate all methods during init, feel like we may need to hack it via compile specs, like passing methods names as part of the compile specs...

haowhsu-quic · 2024-11-06T04:29:21Z

We will parse the context binary and create graphs inside init function, all the graphs are well-prepared in that moment.
What I hope is to reuse that created context and for user to select which graph to execute, but this seems to be constrained by framework. We have to re-initialize the same context again when user allocate another method.

cccclai · 2024-11-06T04:41:16Z

We will parse the context binary and create graphs inside init function, all the graphs are well-prepared in that moment. What I hope is to reuse that created context and for user to select which graph to execute, but this seems to be constrained by framework. We have to re-initialize the same context again when user allocate another method.

😢 I totally agree with you, and have been pushing for the framework change, but got lots of pushback...I'm just trying to see how to unblock us with minimum change. I promise I'll continue pushing for the framework change to support this feature. The runtime config injection is something we've been discussing for a while and we all agree, landing a proper solution may still take time though.

In the meanwhile, as a short term workaround will the follow snipshot work? Like compile_spec = [ "method_name": ["prefill, "decode"]], then during init, we have access to all method name, like

init(preprocessed, compilation_spec){
    # iterate method_name in compilcation_spec["method_name"] to consturct the graph
}

execute(context, handle){
    method_name = context.get_method_name() # "like forward"
    handle.execute("method_name")
}

cccclai · 2024-11-06T04:42:20Z

We have to re-initialize the same context again when user allocate another method.

Is it possible to have it as part of the cache, or is it not a clean solution?

haowhsu-quic · 2024-11-06T05:06:08Z

We have to re-initialize the same context again when user allocate another method.

Is it possible to have it as part of the cache, or is it not a clean solution?

I have thoughts if they are not too hacky for you:

// a map to maintain hash value of context binaries with corresponding initialized delegate handle
class QnnExecuTorchBackend final : public ::executorch::runtime::BackendInterface {
 private:
  mutable std::string method_name_; // PR6622
  static std::unordered_map<uint64_t, executorch::runtime::DelegateHandle*> delegate_map_;
};
Result<DelegateHandle*> QnnExecuTorchBackend::init(
    BackendInitContext& context,
    FreeableBuffer* processed,
    ArrayRef<CompileSpec> compile_specs) const {
  ...
  method_name_ = context.get_method_name();
  // check if the processed bytes have already been initialized
  uint64_t hash_val = calculate_hash(processed);
  auto iter = delegate_map_.find(hash_val);
  if (iter != delegate_map_.end()) {
    return iter->second;
  }
  ...
  return delegate_map_[hash_val] = qnn_manager;
}

With this approach, current implementation and #6622 might still be leveraged.

cccclai · 2024-11-06T05:24:48Z

Oh I think it still work! Maybe we can add a todo to remove the hack. In the meanwhile, do you need the method name during execute too?

haowhsu-quic · 2024-11-06T05:30:44Z

Will do, and I think method_name in BackendInitContext is enough for current approach.

haowhsu-quic · 2024-11-06T10:27:51Z

Oh I think it still work! Maybe we can add a todo to remove the hack. In the meanwhile, do you need the method name during execute too?

I realized that we only have one QnnExecuTorchBackend instance during runtime, the method_name should be shipped via BackendExecutionContext instead, or the stored name will be overwritten (I use a simple app with multiple methods loaded to trigger that issue).
Could you help make that happen? thank you!

cccclai · 2024-11-06T16:30:34Z

Oh I think it still work! Maybe we can add a todo to remove the hack. In the meanwhile, do you need the method name during execute too?

I realized that we only have one QnnExecuTorchBackend instance during runtime, the method_name should be shipped via BackendExecutionContext instead, or the stored name will be overwritten (I use a simple app with multiple methods loaded to trigger that issue). Could you help make that happen? thank you!

I think it should be fine to have method name for BackendInitContext and BackendExecutionContext. Will make the change

cccclai · 2024-11-07T06:21:08Z

Hey I just add method name for execute in #6622, can you try again? Sorry was a bit late on this.

haowhsu-quic · 2024-11-07T15:54:24Z

Thank you for supporting! The cache reuse part has been done, but somehow I could not get the correct method_name form BackendExecutionContext (always empty string from my side).
Might need some time to investigate, will submit the updated version once problem being identified.

cccclai · 2024-11-07T17:43:58Z

Hmm let me check that too…

cccclai · 2024-11-08T18:17:30Z

What is your command to repro? I wanted to iterate my change with your PR

haowhsu-quic · 2024-11-09T01:14:36Z

After applying 6622.patch, I use the pte generated from TestQNNQuantizedUtils.test_qnn_backend_multi_graphs. Change qnn_executor_runner to load both methods seq_conv, single_conv and execute them.
To check the context binary reuse effect, the logs should contain "Use cached delegate handle for current method: ..." from following snippet in backends/qualcomm/runtime/QnnExecuTorchBackend.cpp`:

  new (qnn_manager) QnnManager(qnn_executorch_options, qnn_context_blob);

  // TODO: this is a temporal solution for multi-graph support, will be
  //       removed once framework starts to accept runtime configuration
  // ---
  // check if current context binary has already been initialized
  // return cached one for reducing memory footprint
  std::string binary_hash = qnn_manager->GetBinaryHash();
  auto iter = delegate_map_.find(binary_hash);
  if (iter != delegate_map_.end()) {
    QNN_EXECUTORCH_LOG_INFO(
        "Use cached delegate handle for current method: %s",
        context.get_method_name());
    return iter->second;
  }

No context binary with the same md5sum would be initialized by HTP again, but I guess here is a thing need your advice: Is there a macro to roll back allocated qnn_manager in allocator? Since it's only used for extracting context binary hash here.

To check the effect of weight sharing, you could comment out function body inside qualcomm/runtime/backends/htpbackends/x86_64/HtpContextCustomConfig.cpp to see the difference of context binary size (55640 w/o and 51552 w/ weight sharing).

cccclai · 2024-11-11T01:05:53Z

Does the latest commit in #6622 work for you? The CI seems passing and I updated the unit test to test the method name during execute.

haowhsu-quic · 2024-11-11T03:13:24Z

Yes, thank you for the change. It looks good on my side.
Will resubmit the final version once #6622 lands and all internal verification passed.

haowhsu-quic · 2024-11-11T03:17:27Z

Btw, could you keep the method_name shipping part in init time? It seems the BackendInitContext related implementation has been stripped.

cccclai · 2024-11-12T23:50:46Z

I'm landing #6622 and the method name is available on both init and execute. Let me know if there is any issue

haowhsu-quic · 2024-11-15T06:07:57Z

Hi @cccclai, this is the final version fully verified internally. Sorry for the huge change but it's kind of inevitable for everything to work as usual.
I do not rebase onto the latest commit because of #6489 introducing incompatible transformer module which will break MobileBert example and our test cases.

facebook-github-bot · 2024-11-15T17:52:36Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2024-11-16T18:20:16Z

Is this PR good for review? I think the CI is failing: test-llama-runner-qnn-linux, Lint / lintrunner / linux-job

haowhsu-quic · 2024-11-17T16:11:11Z

Is this PR good for review? I think the CI is failing: test-llama-runner-qnn-linux, Lint / lintrunner / linux-job

Sorry for the mistake, I just submit the fix and will add it to our internal CI.

facebook-github-bot · 2024-11-18T02:52:46Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai · 2024-11-18T05:21:38Z

There are quite a bit internal errors, can you apply the following patches, and I'll import and see the internal errors are gone? Thanks!

diff --git a/executorch/backends/qualcomm/aot/python/targets.bzl b/executorch/backends/qualcomm/aot/python/targets.bzl
--- a/executorch/backends/qualcomm/aot/python/targets.bzl
+++ b/executorch/backends/qualcomm/aot/python/targets.bzl
@@ -31,6 +31,7 @@
             "//executorch/backends/qualcomm/aot/wrappers:wrappers",
             "//executorch/backends/qualcomm/runtime:logging",
             "//executorch/backends/qualcomm:schema",
+            "//executorch/backends/qualcomm:qc_binary_info_schema",
             "//executorch/backends/qualcomm/aot/ir:qcir_utils",
             "//executorch/backends/qualcomm/runtime:runtime",
             "fbsource//third-party/qualcomm/qnn/qnn-{0}:api".format(get_qnn_library_verision()),
diff --git a/executorch/backends/qualcomm/runtime/targets.bzl b/executorch/backends/qualcomm/runtime/targets.bzl
--- a/executorch/backends/qualcomm/runtime/targets.bzl
+++ b/executorch/backends/qualcomm/runtime/targets.bzl
@@ -29,6 +29,7 @@
         ],
         exported_deps = [
             "//executorch/backends/qualcomm:schema",
+            "//executorch/backends/qualcomm:qc_binary_info_schema",
             "//executorch/runtime/core:core",
         ],
     )
@@ -63,6 +64,7 @@
             "fbsource//third-party/qualcomm/qnn/qnn-{0}:api".format(get_qnn_library_verision()),
             ":logging",
             "//executorch/backends/qualcomm:schema",
+            "//executorch/backends/qualcomm:qc_binary_info_schema",
             "//executorch/backends/qualcomm/aot/ir:qcir_utils",
             "//executorch/backends/qualcomm/aot/wrappers:wrappers",
             "//executorch/runtime/backend:interface",
diff --git a/executorch/backends/qualcomm/targets.bzl b/executorch/backends/qualcomm/targets.bzl
--- a/executorch/backends/qualcomm/targets.bzl
+++ b/executorch/backends/qualcomm/targets.bzl
@@ -16,6 +16,12 @@
 
 SCHEMA_LIRRARY_NAME = SCHEMA_NAME
 
+QC_BINARY_INFO_SCHEMA = "qc_binary_info"
+QC_BINARY_INFO_INPUT_SCHEMA = "serialization/" + QC_BINARY_INFO_SCHEMA + ".fbs"
+QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME = QC_BINARY_INFO_SCHEMA + "_generated"
+QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER = QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME + ".h"
+QC_BINARY_INFO_SCHEMA_LIRRARY_NAME = QC_BINARY_INFO_SCHEMA
+
 def generate_schema_header(rule_name, srcs, headers, default_header):
     """Generate header file given flatbuffer schema
     """
@@ -77,6 +83,33 @@
         platforms = [ANDROID],
     )
 
+    generate_schema_header(
+        QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME,
+        [QC_BINARY_INFO_INPUT_SCHEMA],
+        [QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER],
+        QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER,
+    )
+
+    runtime.cxx_library(
+        name = "qc_binary_info_schema",
+        srcs = [],
+        visibility = [
+            # Lock this down as tightly as possible to ensure that flatbuffers
+            # are an implementation detail. Ideally this list would only include
+            # //executorch/runtime/executor/...
+            "//executorch/codegen/tools/...",
+            "//executorch/runtime/executor/...",
+            "//executorch/backends/qualcomm/...",
+            "//executorch/backends/qualcomm/runtime/...",
+        ],
+        exported_headers = {
+             QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER: ":{}[{}]".format( QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME,  QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER),
+        },
+        exported_external_deps = ["flatbuffers-api"],
+        define_static_target = True,
+        platforms = [ANDROID],
+    )
+
     runtime.cxx_library(
         name = "qnn_executorch_backend",
         srcs = [],
diff --git a/xplat/executorch/backends/qualcomm/aot/python/targets.bzl b/xplat/executorch/backends/qualcomm/aot/python/targets.bzl
--- a/xplat/executorch/backends/qualcomm/aot/python/targets.bzl
+++ b/xplat/executorch/backends/qualcomm/aot/python/targets.bzl
@@ -31,6 +31,7 @@
             "//executorch/backends/qualcomm/aot/wrappers:wrappers",
             "//executorch/backends/qualcomm/runtime:logging",
             "//executorch/backends/qualcomm:schema",
+            "//executorch/backends/qualcomm:qc_binary_info_schema",
             "//executorch/backends/qualcomm/aot/ir:qcir_utils",
             "//executorch/backends/qualcomm/runtime:runtime",
             "fbsource//third-party/qualcomm/qnn/qnn-{0}:api".format(get_qnn_library_verision()),
diff --git a/xplat/executorch/backends/qualcomm/runtime/targets.bzl b/xplat/executorch/backends/qualcomm/runtime/targets.bzl
--- a/xplat/executorch/backends/qualcomm/runtime/targets.bzl
+++ b/xplat/executorch/backends/qualcomm/runtime/targets.bzl
@@ -29,6 +29,7 @@
         ],
         exported_deps = [
             "//executorch/backends/qualcomm:schema",
+            "//executorch/backends/qualcomm:qc_binary_info_schema",
             "//executorch/runtime/core:core",
         ],
     )
@@ -63,6 +64,7 @@
             "fbsource//third-party/qualcomm/qnn/qnn-{0}:api".format(get_qnn_library_verision()),
             ":logging",
             "//executorch/backends/qualcomm:schema",
+            "//executorch/backends/qualcomm:qc_binary_info_schema",
             "//executorch/backends/qualcomm/aot/ir:qcir_utils",
             "//executorch/backends/qualcomm/aot/wrappers:wrappers",
             "//executorch/runtime/backend:interface",
diff --git a/xplat/executorch/backends/qualcomm/targets.bzl b/xplat/executorch/backends/qualcomm/targets.bzl
--- a/xplat/executorch/backends/qualcomm/targets.bzl
+++ b/xplat/executorch/backends/qualcomm/targets.bzl
@@ -16,6 +16,12 @@
 
 SCHEMA_LIRRARY_NAME = SCHEMA_NAME
 
+QC_BINARY_INFO_SCHEMA = "qc_binary_info"
+QC_BINARY_INFO_INPUT_SCHEMA = "serialization/" + QC_BINARY_INFO_SCHEMA + ".fbs"
+QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME = QC_BINARY_INFO_SCHEMA + "_generated"
+QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER = QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME + ".h"
+QC_BINARY_INFO_SCHEMA_LIRRARY_NAME = QC_BINARY_INFO_SCHEMA
+
 def generate_schema_header(rule_name, srcs, headers, default_header):
     """Generate header file given flatbuffer schema
     """
@@ -77,6 +83,33 @@
         platforms = [ANDROID],
     )
 
+    generate_schema_header(
+        QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME,
+        [QC_BINARY_INFO_INPUT_SCHEMA],
+        [QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER],
+        QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER,
+    )
+
+    runtime.cxx_library(
+        name = "qc_binary_info_schema",
+        srcs = [],
+        visibility = [
+            # Lock this down as tightly as possible to ensure that flatbuffers
+            # are an implementation detail. Ideally this list would only include
+            # //executorch/runtime/executor/...
+            "//executorch/codegen/tools/...",
+            "//executorch/runtime/executor/...",
+            "//executorch/backends/qualcomm/...",
+            "//executorch/backends/qualcomm/runtime/...",
+        ],
+        exported_headers = {
+             QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER: ":{}[{}]".format( QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME,  QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER),
+        },
+        exported_external_deps = ["flatbuffers-api"],
+        define_static_target = True,
+        platforms = [ANDROID],
+    )
+
     runtime.cxx_library(
         name = "qnn_executorch_backend",
         srcs = [],

Summary: - support multiple graphs in single qnn context in runtime - helper function in aot for generating multi-method pte - enable weight sharing mechanism on HTP - support file signature for cache reuse - changes that making sure everything works as usual - test cases

facebook-github-bot · 2024-11-18T18:06:52Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

cccclai

Thanks! Looks good

cccclai · 2024-11-18T19:49:15Z

Hmm more tests start failing now, because the schema.fbs and the bzl rule is renamed...

cccclai · 2024-11-18T19:53:56Z

extension/llm/export/partitioner_lib.py

@@ -160,10 +160,8 @@ def get_qnn_partitioner(
            QnnPartitioner,
        )

-        # pyre-ignore: Undefined import [21]: Could not find a module corresponding to import `executorch.backends.qualcomm.serialization.qnn_compile_spec_schema`
-        from executorch.backends.qualcomm.serialization.qnn_compile_spec_schema import (


Actually the error is

ModuleNotFoundError: No module named 'executorch.backends.qualcomm.serialization.qnn_compile_spec_schema`

Seems like it's removed somewhere

Many thanks for helping this!

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 5, 2024

haowhsu-quic marked this pull request as draft November 5, 2024 09:31

cccclai reviewed Nov 5, 2024

View reviewed changes

haowhsu-quic mentioned this pull request Nov 6, 2024

Expose mehod name as part of backend init context #6622

Merged

haowhsu-quic force-pushed the dev_weight_sharing branch from 94fb1ea to 7bbbde1 Compare November 8, 2024 15:55

haowhsu-quic force-pushed the dev_weight_sharing branch from 7bbbde1 to 3b76022 Compare November 15, 2024 06:02

haowhsu-quic marked this pull request as ready for review November 15, 2024 06:02

haowhsu-quic added 3 commits November 18, 2024 16:18

fix import error in partitioner_lib.py

8437526

apply bzl changes

9032c34

haowhsu-quic force-pushed the dev_weight_sharing branch from d3af80f to 9032c34 Compare November 18, 2024 09:21

cccclai approved these changes Nov 18, 2024

View reviewed changes

cccclai reviewed Nov 18, 2024

View reviewed changes

facebook-github-bot merged commit 4086509 into pytorch:main Nov 18, 2024
39 of 41 checks passed

metascroy mentioned this pull request Nov 20, 2024

I pulled the latest code, and the model is reporting errors everywhere #6955

Closed

cccclai mentioned this pull request Dec 5, 2024

Qualcomm AI Engine Direct - Support Hybrid Mode for Llama3.2 #7175

Merged

shewu-quic mentioned this pull request Feb 3, 2025

Error: input 3 is none #7614

Closed

haowhsu-quic deleted the dev_weight_sharing branch February 7, 2025 09:21

Dev weight sharing #6657

Dev weight sharing #6657

Conversation

haowhsu-quic commented Nov 5, 2024

Summary

Test plan

pytorch-bot bot commented Nov 5, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6657

❗ 1 Active SEVs

✅ No Failures

cccclai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haowhsu-quic commented Nov 6, 2024 • edited Loading

cccclai commented Nov 6, 2024

haowhsu-quic commented Nov 6, 2024

cccclai commented Nov 6, 2024

cccclai commented Nov 6, 2024

haowhsu-quic commented Nov 6, 2024 • edited Loading

cccclai commented Nov 6, 2024

haowhsu-quic commented Nov 6, 2024

haowhsu-quic commented Nov 6, 2024 • edited Loading

cccclai commented Nov 6, 2024

cccclai commented Nov 7, 2024

haowhsu-quic commented Nov 7, 2024

cccclai commented Nov 7, 2024

cccclai commented Nov 8, 2024

haowhsu-quic commented Nov 9, 2024

cccclai commented Nov 11, 2024

haowhsu-quic commented Nov 11, 2024

haowhsu-quic commented Nov 11, 2024

cccclai commented Nov 12, 2024

haowhsu-quic commented Nov 15, 2024 • edited Loading

facebook-github-bot commented Nov 15, 2024

cccclai commented Nov 16, 2024

haowhsu-quic commented Nov 17, 2024

facebook-github-bot commented Nov 18, 2024

cccclai commented Nov 18, 2024

facebook-github-bot commented Nov 18, 2024

cccclai left a comment

Choose a reason for hiding this comment

cccclai commented Nov 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pytorch-bot bot commented Nov 5, 2024 •

edited

Loading

haowhsu-quic commented Nov 6, 2024 •

edited

Loading

haowhsu-quic commented Nov 6, 2024 •

edited

Loading

haowhsu-quic commented Nov 6, 2024 •

edited

Loading

haowhsu-quic commented Nov 15, 2024 •

edited

Loading