Skip to content

Dev weight sharing #6657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 18, 2024
Merged

Conversation

haowhsu-quic
Copy link
Collaborator

Summary

  • support multiple graphs in single qnn context in runtime
  • helper function in aot for generating multi-methods pte
  • enable weight sharing mechanism on HTP
  • test cases

Test plan

python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedUtils.test_qnn_backend_multi_graphs -s $DEVICE_SN -m SM8650 -b build-android/ -a $ARTIFACTS

Copy link

pytorch-bot bot commented Nov 5, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6657

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit 9032c34 with merge base e95f171 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 5, 2024
@haowhsu-quic haowhsu-quic marked this pull request as draft November 5, 2024 09:31
Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fairly clean, thanks! I just asked some questions so I can understand it better.

// QnnTensor
// ToTensor(flatbuffers::Vector<::flatbuffers::Offset<qcir::Tensor>>
// tensor), flatbuffers::FlatBufferBuilder* builder);
tensors.emplace_back(ToTensor(ToTensor(tensor), &builder_));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does `ToTensor(ToTensor(tensor), &builder_)) mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing here we'll deduplicate tensor?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner ToTensor is used to convert serialized tensor in qcir to QnnTensor defined in QNN SDK header. The outer ToTensor is used to convert QnnTensor to flatbuffer API compatible tensor for building qcir.

Looks like flatbuffer has no mechanism for merging binaries, this is the detour I can come up with so far.
Will rephrase the comment for better understanding.

QNN_EXECUTORCH_LOG_ERROR("Fail to verify qcir format");
return;
}
auto context = qcir::GetContext(info.ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does context mean context_binary here? Does each qcir flatbuffers combined with a context binary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the concept is similar in qcir. The flatbuffer does not contain context binary but graph architecture and tensor data.

std::vector<std::shared_ptr<OpWrapper>>& op_wrappers) {
QnnExecuTorchContextBinary context_binary;
flatbuffers::FlatBufferBuilder builder;

if (qnn_manager_->IsOnlinePrepare()) {
if (qnn_manager_->IsOnlinePrepare() || qnn_manager_->IsMultipleGraphs()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean qnn manager can support both online prepare and multiple graph?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Compile method is invoked in qnn_preprocess.py. Once one of these two compiler specs are recognized, qcir will be returned instead of generating context binary.
If in online_prepare mode, user could directly ship the generated pte and let QnnManager compose graph on device side.
Although multiple_graph gives the same binary format that could be used in the same scenario of online_prepare. We would expect user follow the example in our test cases, because the optimization level in HTP will be different (host side will be higher and generating more computation efficient context binary).

if node.target in allow_list_operator:
if (
node.target in allow_list_operator
# bypass if custom op appears
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the custom op namespace?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qaisw for now.

@@ -104,7 +104,8 @@ def preprocess(
else:
raise RuntimeError(f"{node.op} is not supported in Qnn")
qnn_context_binary = qnn_manager.Compile(
[py_op_wrapper.GetOpWrapper() for py_op_wrapper in py_op_wrapper_list]
qnn_manager.GetGraphNames()[0],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this line mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we will have multiple graphs inside one QnnManager, the exposed APIs need graph name as an identifier to manipulate. But in this stage, there will be only one graph for processing where the graph name comes from compiler specs.

flatbuffers::Verifier verifier(
static_cast<const uint8_t* const>(qnn_context_blob_.buffer),
qnn_context_blob_.nbytes);

if (qcir::VerifyGraphBuffer(verifier)) {
if (qcir::VerifyContextBuffer(verifier)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to follow - are these logic for AOT or runtime?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If for online_prepare, this will happen in runtime.
If for multiple_graphs this will happen in AoT, since we will again compile the merged qcir in host side with higher optimization level in HTP.

@@ -23,10 +23,9 @@ class HtpGraph : public QnnGraph {
QnnBackend* backend,
QnnContext* context,
const QnnExecuTorchProfileLevel& profile_level,
const std::string& graph_name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting - what did we use this graph_name before?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we use it to store graph_name from compiler specs in AoT or from context binary in runtime.

@@ -177,7 +181,10 @@ table QnnExecuTorchOptions {
shared_buffer:bool;

/// Is model from qnn context binary
is_from_context_binary: bool;
is_from_context_binary:bool;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh is it only for the custom op solution?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need this flag to guarantee the order of graph IOs.

)
for graph_name in graph_names
]
exported_programs = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Did we observe smaller model size compared with no weight sharing option?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@@ -140,7 +140,7 @@ def push(self, inputs=None, input_list=None, files=None):
for file_name in files:
self._adb(["push", file_name, self.workspace])

def execute(self, custom_runner_cmd=None):
def execute(self, custom_runner_cmd=None, method_index=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because we don't have the method name but just method index?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's related to the interface we used in qnn_executor_runner:

  const char* method_name = nullptr;
  {
    const auto method_name_result =
        program->get_method_name(FLAGS_method_index);
    ET_CHECK_MSG(method_name_result.ok(), "Program has no methods");
    method_name = *method_name_result;
  }
  ET_LOG(Info, "Using method %s", method_name);

@haowhsu-quic
Copy link
Collaborator Author

haowhsu-quic commented Nov 6, 2024

It just came to my mind that the memory footprint could be still high in runtime since we will create one QnnManager for one method.
Will it be possible if the method_name proposed in #6622 to be executed could be derived via BackendExecutionContext (or some data structure in void* that user could ship into the delegates for runtime option when invoking method->execute()).

My idea would be like:

// in user app side
struct QnnBackendRuntimeOption {
  std::string graph_name;
}
auto option = QnnBackendRuntimeOption({"forward"});
method->execute(&option);

// in runtime/executor/method.cpp
Error Method::execute(void* backend_runtime_option) {
  ...
  auto status = execute_instruction(backend_runtime_option);
}
Error Method::execute_instruction(void* backend_runtime_option) {
  ...
  BackendExecutionContext backend_execution_context(
    /*event_tracer*/ event_tracer_,
    /*temp_allocator*/ temp_allocator_,
    /*backend_runtime_option*/ backend_runtime_option);
    err = delegates_[delegate_idx].Execute(
      backend_execution_context,
      chain.argument_lists_[step_state_.instr_idx].data());
}

@cccclai
Copy link
Contributor

cccclai commented Nov 6, 2024

Hmm looks like it needs to inject runtime info, and it needs more internal discussion and feel like we may not be able to ship it on time....

Also just trying to follow, what does init look like in your case? If we can add the method name in BackendExecutionContext, will it work for you?

If you need to iterate all methods during init, feel like we may need to hack it via compile specs, like passing methods names as part of the compile specs...

@haowhsu-quic
Copy link
Collaborator Author

We will parse the context binary and create graphs inside init function, all the graphs are well-prepared in that moment.
What I hope is to reuse that created context and for user to select which graph to execute, but this seems to be constrained by framework. We have to re-initialize the same context again when user allocate another method.

@cccclai
Copy link
Contributor

cccclai commented Nov 6, 2024

We will parse the context binary and create graphs inside init function, all the graphs are well-prepared in that moment. What I hope is to reuse that created context and for user to select which graph to execute, but this seems to be constrained by framework. We have to re-initialize the same context again when user allocate another method.

😢 I totally agree with you, and have been pushing for the framework change, but got lots of pushback...I'm just trying to see how to unblock us with minimum change. I promise I'll continue pushing for the framework change to support this feature. The runtime config injection is something we've been discussing for a while and we all agree, landing a proper solution may still take time though.

In the meanwhile, as a short term workaround will the follow snipshot work? Like compile_spec = [ "method_name": ["prefill, "decode"]], then during init, we have access to all method name, like

init(preprocessed, compilation_spec){
    # iterate method_name in compilcation_spec["method_name"] to consturct the graph
}

execute(context, handle){
    method_name = context.get_method_name() # "like forward"
    handle.execute("method_name")
}

@cccclai
Copy link
Contributor

cccclai commented Nov 6, 2024

We have to re-initialize the same context again when user allocate another method.

Is it possible to have it as part of the cache, or is it not a clean solution?

@haowhsu-quic
Copy link
Collaborator Author

haowhsu-quic commented Nov 6, 2024

We have to re-initialize the same context again when user allocate another method.

Is it possible to have it as part of the cache, or is it not a clean solution?

I have thoughts if they are not too hacky for you:

// a map to maintain hash value of context binaries with corresponding initialized delegate handle
class QnnExecuTorchBackend final : public ::executorch::runtime::BackendInterface {
 private:
  mutable std::string method_name_; // PR6622
  static std::unordered_map<uint64_t, executorch::runtime::DelegateHandle*> delegate_map_;
};
Result<DelegateHandle*> QnnExecuTorchBackend::init(
    BackendInitContext& context,
    FreeableBuffer* processed,
    ArrayRef<CompileSpec> compile_specs) const {
  ...
  method_name_ = context.get_method_name();
  // check if the processed bytes have already been initialized
  uint64_t hash_val = calculate_hash(processed);
  auto iter = delegate_map_.find(hash_val);
  if (iter != delegate_map_.end()) {
    return iter->second;
  }
  ...
  return delegate_map_[hash_val] = qnn_manager;
}

With this approach, current implementation and #6622 might still be leveraged.

@cccclai
Copy link
Contributor

cccclai commented Nov 6, 2024

Oh I think it still work! Maybe we can add a todo to remove the hack. In the meanwhile, do you need the method name during execute too?

@haowhsu-quic
Copy link
Collaborator Author

Will do, and I think method_name in BackendInitContext is enough for current approach.

@haowhsu-quic
Copy link
Collaborator Author

haowhsu-quic commented Nov 6, 2024

Oh I think it still work! Maybe we can add a todo to remove the hack. In the meanwhile, do you need the method name during execute too?

I realized that we only have one QnnExecuTorchBackend instance during runtime, the method_name should be shipped via BackendExecutionContext instead, or the stored name will be overwritten (I use a simple app with multiple methods loaded to trigger that issue).
Could you help make that happen? thank you!

@cccclai
Copy link
Contributor

cccclai commented Nov 6, 2024

Oh I think it still work! Maybe we can add a todo to remove the hack. In the meanwhile, do you need the method name during execute too?

I realized that we only have one QnnExecuTorchBackend instance during runtime, the method_name should be shipped via BackendExecutionContext instead, or the stored name will be overwritten (I use a simple app with multiple methods loaded to trigger that issue). Could you help make that happen? thank you!

I think it should be fine to have method name for BackendInitContext and BackendExecutionContext. Will make the change

@cccclai
Copy link
Contributor

cccclai commented Nov 7, 2024

Hey I just add method name for execute in #6622, can you try again? Sorry was a bit late on this.

@haowhsu-quic
Copy link
Collaborator Author

Thank you for supporting! The cache reuse part has been done, but somehow I could not get the correct method_name form BackendExecutionContext (always empty string from my side).
Might need some time to investigate, will submit the updated version once problem being identified.

@cccclai
Copy link
Contributor

cccclai commented Nov 7, 2024

Hmm let me check that too…

@cccclai
Copy link
Contributor

cccclai commented Nov 8, 2024

What is your command to repro? I wanted to iterate my change with your PR

@haowhsu-quic
Copy link
Collaborator Author

After applying 6622.patch, I use the pte generated from TestQNNQuantizedUtils.test_qnn_backend_multi_graphs. Change qnn_executor_runner to load both methods seq_conv, single_conv and execute them.
To check the context binary reuse effect, the logs should contain "Use cached delegate handle for current method: ..." from following snippet in backends/qualcomm/runtime/QnnExecuTorchBackend.cpp`:

  new (qnn_manager) QnnManager(qnn_executorch_options, qnn_context_blob);

  // TODO: this is a temporal solution for multi-graph support, will be
  //       removed once framework starts to accept runtime configuration
  // ---
  // check if current context binary has already been initialized
  // return cached one for reducing memory footprint
  std::string binary_hash = qnn_manager->GetBinaryHash();
  auto iter = delegate_map_.find(binary_hash);
  if (iter != delegate_map_.end()) {
    QNN_EXECUTORCH_LOG_INFO(
        "Use cached delegate handle for current method: %s",
        context.get_method_name());
    return iter->second;
  }

No context binary with the same md5sum would be initialized by HTP again, but I guess here is a thing need your advice: Is there a macro to roll back allocated qnn_manager in allocator? Since it's only used for extracting context binary hash here.

To check the effect of weight sharing, you could comment out function body inside qualcomm/runtime/backends/htpbackends/x86_64/HtpContextCustomConfig.cpp to see the difference of context binary size (55640 w/o and 51552 w/ weight sharing).

@cccclai
Copy link
Contributor

cccclai commented Nov 11, 2024

Does the latest commit in #6622 work for you? The CI seems passing and I updated the unit test to test the method name during execute.

@haowhsu-quic
Copy link
Collaborator Author

Yes, thank you for the change. It looks good on my side.
Will resubmit the final version once #6622 lands and all internal verification passed.

@haowhsu-quic
Copy link
Collaborator Author

Btw, could you keep the method_name shipping part in init time? It seems the BackendInitContext related implementation has been stripped.

@cccclai
Copy link
Contributor

cccclai commented Nov 12, 2024

I'm landing #6622 and the method name is available on both init and execute. Let me know if there is any issue

@haowhsu-quic haowhsu-quic marked this pull request as ready for review November 15, 2024 06:02
@haowhsu-quic
Copy link
Collaborator Author

haowhsu-quic commented Nov 15, 2024

Hi @cccclai, this is the final version fully verified internally. Sorry for the huge change but it's kind of inevitable for everything to work as usual.
I do not rebase onto the latest commit because of #6489 introducing incompatible transformer module which will break MobileBert example and our test cases.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai
Copy link
Contributor

cccclai commented Nov 16, 2024

Is this PR good for review? I think the CI is failing: test-llama-runner-qnn-linux, Lint / lintrunner / linux-job

@haowhsu-quic
Copy link
Collaborator Author

Is this PR good for review? I think the CI is failing: test-llama-runner-qnn-linux, Lint / lintrunner / linux-job

Sorry for the mistake, I just submit the fix and will add it to our internal CI.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai
Copy link
Contributor

cccclai commented Nov 18, 2024

There are quite a bit internal errors, can you apply the following patches, and I'll import and see the internal errors are gone? Thanks!

diff --git a/executorch/backends/qualcomm/aot/python/targets.bzl b/executorch/backends/qualcomm/aot/python/targets.bzl
--- a/executorch/backends/qualcomm/aot/python/targets.bzl
+++ b/executorch/backends/qualcomm/aot/python/targets.bzl
@@ -31,6 +31,7 @@
             "//executorch/backends/qualcomm/aot/wrappers:wrappers",
             "//executorch/backends/qualcomm/runtime:logging",
             "//executorch/backends/qualcomm:schema",
+            "//executorch/backends/qualcomm:qc_binary_info_schema",
             "//executorch/backends/qualcomm/aot/ir:qcir_utils",
             "//executorch/backends/qualcomm/runtime:runtime",
             "fbsource//third-party/qualcomm/qnn/qnn-{0}:api".format(get_qnn_library_verision()),
diff --git a/executorch/backends/qualcomm/runtime/targets.bzl b/executorch/backends/qualcomm/runtime/targets.bzl
--- a/executorch/backends/qualcomm/runtime/targets.bzl
+++ b/executorch/backends/qualcomm/runtime/targets.bzl
@@ -29,6 +29,7 @@
         ],
         exported_deps = [
             "//executorch/backends/qualcomm:schema",
+            "//executorch/backends/qualcomm:qc_binary_info_schema",
             "//executorch/runtime/core:core",
         ],
     )
@@ -63,6 +64,7 @@
             "fbsource//third-party/qualcomm/qnn/qnn-{0}:api".format(get_qnn_library_verision()),
             ":logging",
             "//executorch/backends/qualcomm:schema",
+            "//executorch/backends/qualcomm:qc_binary_info_schema",
             "//executorch/backends/qualcomm/aot/ir:qcir_utils",
             "//executorch/backends/qualcomm/aot/wrappers:wrappers",
             "//executorch/runtime/backend:interface",
diff --git a/executorch/backends/qualcomm/targets.bzl b/executorch/backends/qualcomm/targets.bzl
--- a/executorch/backends/qualcomm/targets.bzl
+++ b/executorch/backends/qualcomm/targets.bzl
@@ -16,6 +16,12 @@
 
 SCHEMA_LIRRARY_NAME = SCHEMA_NAME
 
+QC_BINARY_INFO_SCHEMA = "qc_binary_info"
+QC_BINARY_INFO_INPUT_SCHEMA = "serialization/" + QC_BINARY_INFO_SCHEMA + ".fbs"
+QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME = QC_BINARY_INFO_SCHEMA + "_generated"
+QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER = QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME + ".h"
+QC_BINARY_INFO_SCHEMA_LIRRARY_NAME = QC_BINARY_INFO_SCHEMA
+
 def generate_schema_header(rule_name, srcs, headers, default_header):
     """Generate header file given flatbuffer schema
     """
@@ -77,6 +83,33 @@
         platforms = [ANDROID],
     )
 
+    generate_schema_header(
+        QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME,
+        [QC_BINARY_INFO_INPUT_SCHEMA],
+        [QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER],
+        QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER,
+    )
+
+    runtime.cxx_library(
+        name = "qc_binary_info_schema",
+        srcs = [],
+        visibility = [
+            # Lock this down as tightly as possible to ensure that flatbuffers
+            # are an implementation detail. Ideally this list would only include
+            # //executorch/runtime/executor/...
+            "//executorch/codegen/tools/...",
+            "//executorch/runtime/executor/...",
+            "//executorch/backends/qualcomm/...",
+            "//executorch/backends/qualcomm/runtime/...",
+        ],
+        exported_headers = {
+             QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER: ":{}[{}]".format( QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME,  QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER),
+        },
+        exported_external_deps = ["flatbuffers-api"],
+        define_static_target = True,
+        platforms = [ANDROID],
+    )
+
     runtime.cxx_library(
         name = "qnn_executorch_backend",
         srcs = [],
diff --git a/xplat/executorch/backends/qualcomm/aot/python/targets.bzl b/xplat/executorch/backends/qualcomm/aot/python/targets.bzl
--- a/xplat/executorch/backends/qualcomm/aot/python/targets.bzl
+++ b/xplat/executorch/backends/qualcomm/aot/python/targets.bzl
@@ -31,6 +31,7 @@
             "//executorch/backends/qualcomm/aot/wrappers:wrappers",
             "//executorch/backends/qualcomm/runtime:logging",
             "//executorch/backends/qualcomm:schema",
+            "//executorch/backends/qualcomm:qc_binary_info_schema",
             "//executorch/backends/qualcomm/aot/ir:qcir_utils",
             "//executorch/backends/qualcomm/runtime:runtime",
             "fbsource//third-party/qualcomm/qnn/qnn-{0}:api".format(get_qnn_library_verision()),
diff --git a/xplat/executorch/backends/qualcomm/runtime/targets.bzl b/xplat/executorch/backends/qualcomm/runtime/targets.bzl
--- a/xplat/executorch/backends/qualcomm/runtime/targets.bzl
+++ b/xplat/executorch/backends/qualcomm/runtime/targets.bzl
@@ -29,6 +29,7 @@
         ],
         exported_deps = [
             "//executorch/backends/qualcomm:schema",
+            "//executorch/backends/qualcomm:qc_binary_info_schema",
             "//executorch/runtime/core:core",
         ],
     )
@@ -63,6 +64,7 @@
             "fbsource//third-party/qualcomm/qnn/qnn-{0}:api".format(get_qnn_library_verision()),
             ":logging",
             "//executorch/backends/qualcomm:schema",
+            "//executorch/backends/qualcomm:qc_binary_info_schema",
             "//executorch/backends/qualcomm/aot/ir:qcir_utils",
             "//executorch/backends/qualcomm/aot/wrappers:wrappers",
             "//executorch/runtime/backend:interface",
diff --git a/xplat/executorch/backends/qualcomm/targets.bzl b/xplat/executorch/backends/qualcomm/targets.bzl
--- a/xplat/executorch/backends/qualcomm/targets.bzl
+++ b/xplat/executorch/backends/qualcomm/targets.bzl
@@ -16,6 +16,12 @@
 
 SCHEMA_LIRRARY_NAME = SCHEMA_NAME
 
+QC_BINARY_INFO_SCHEMA = "qc_binary_info"
+QC_BINARY_INFO_INPUT_SCHEMA = "serialization/" + QC_BINARY_INFO_SCHEMA + ".fbs"
+QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME = QC_BINARY_INFO_SCHEMA + "_generated"
+QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER = QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME + ".h"
+QC_BINARY_INFO_SCHEMA_LIRRARY_NAME = QC_BINARY_INFO_SCHEMA
+
 def generate_schema_header(rule_name, srcs, headers, default_header):
     """Generate header file given flatbuffer schema
     """
@@ -77,6 +83,33 @@
         platforms = [ANDROID],
     )
 
+    generate_schema_header(
+        QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME,
+        [QC_BINARY_INFO_INPUT_SCHEMA],
+        [QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER],
+        QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER,
+    )
+
+    runtime.cxx_library(
+        name = "qc_binary_info_schema",
+        srcs = [],
+        visibility = [
+            # Lock this down as tightly as possible to ensure that flatbuffers
+            # are an implementation detail. Ideally this list would only include
+            # //executorch/runtime/executor/...
+            "//executorch/codegen/tools/...",
+            "//executorch/runtime/executor/...",
+            "//executorch/backends/qualcomm/...",
+            "//executorch/backends/qualcomm/runtime/...",
+        ],
+        exported_headers = {
+             QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER: ":{}[{}]".format( QC_BINARY_INFO_SCHEMA_GEN_RULE_NAME,  QC_BINARY_INFO_OUTPUT_SCHEMA_HEADER),
+        },
+        exported_external_deps = ["flatbuffers-api"],
+        define_static_target = True,
+        platforms = [ANDROID],
+    )
+
     runtime.cxx_library(
         name = "qnn_executorch_backend",
         srcs = [],

Summary:
- support multiple graphs in single qnn context in runtime
- helper function in aot for generating multi-method pte
- enable weight sharing mechanism on HTP
- support file signature for cache reuse
- changes that making sure everything works as usual
- test cases
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good

@cccclai
Copy link
Contributor

cccclai commented Nov 18, 2024

Hmm more tests start failing now, because the schema.fbs and the bzl rule is renamed...

@@ -160,10 +160,8 @@ def get_qnn_partitioner(
QnnPartitioner,
)

# pyre-ignore: Undefined import [21]: Could not find a module corresponding to import `executorch.backends.qualcomm.serialization.qnn_compile_spec_schema`
from executorch.backends.qualcomm.serialization.qnn_compile_spec_schema import (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the error is

ModuleNotFoundError: No module named 'executorch.backends.qualcomm.serialization.qnn_compile_spec_schema`

Seems like it's removed somewhere

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for helping this!

@facebook-github-bot facebook-github-bot merged commit 4086509 into pytorch:main Nov 18, 2024
39 of 41 checks passed
@haowhsu-quic haowhsu-quic deleted the dev_weight_sharing branch February 7, 2025 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants