Skip to content

Qualcomm AI Engine Direct - Enable zero copy feature #2531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

shewu-quic
Copy link
Collaborator

Summary:

  • Add argument "shared_buffer" into compiler_spec, qnn_executor_runner and test scripts
    • Actually, shared_buffer should be a runtime option since user are responsible to allocate memory for tensors on device. But it seems to have no way to set the runtime option to QnnBackend. Therefore, we put it to compile_spec for now.
  • Implement SharedBuffer to allocate and free RPC memory
  • Add QnnMemManger to register shared buffer for tensor
    • During exection time, we will register memory of tensor data for QNN. And we will deregister them during destruction time of QnnBackend
  • Add two API void* QnnExecuTorchAllocCustomMem(size_t bytes, size_t alignment) and void QnnExecuTorchFreeCustomMem(void* buffer_ptr) to allocate RPC memory with SharedBuffer
    • Users are responsible to allocate "enough" tensor bytes, and set alignment as MemoryAllocator::kDefaultAlignment. See runtime/core/memory_allocator.h.

Copy link

pytorch-bot bot commented Mar 20, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2531

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Unrelated Failure

As of commit e541fb4 with merge base 7f96f5a (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 20, 2024
@shewu-quic shewu-quic force-pushed the dev/hutton/enable_zero_copy branch from 250005d to 55c504d Compare March 20, 2024 09:28
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cccclai
Copy link
Contributor

cccclai commented Mar 21, 2024

Hey do you mind rebasing again? As #2506 was landed and there is a merge conflict

@shewu-quic shewu-quic force-pushed the dev/hutton/enable_zero_copy branch from 55c504d to a0b82f3 Compare March 21, 2024 03:51
@shewu-quic
Copy link
Collaborator Author

Thanks for your reminder.
I have rebased my PR.
But I find some errors for quantized test, it seems fail to get_fake_program in backend_api.py.
Do you have any ideas about it or I miss something?

Test in pytorch/main branch.

python3 backends/qualcomm/tests/test_qnn_delegate.py TestQNNQuantizedOperator.test_qnn_backend_16a4w_conv2d -b build_android -s $DEVICE -H $HOST -m $MODEL -r executorch -a unit_test 

image
image

@cccclai
Copy link
Contributor

cccclai commented Mar 21, 2024

Oh that might be related to a recent change in #2502. cc: @lucylq

@lucylq
Copy link
Contributor

lucylq commented Mar 21, 2024

Oh that might be related to a recent change in #2502. cc: @lucylq

Thanks - taking a look!

facebook-github-bot pushed a commit that referenced this pull request Mar 21, 2024
Summary:
Catch the general Exception case, not only AssertionError.

D55047794 caused an error with QC: #2531

Reviewed By: cccclai

Differential Revision: D55204315

fbshipit-source-id: 35df9d74bd55d329cbcab03e5c70e289b45796db
@lucylq
Copy link
Contributor

lucylq commented Mar 22, 2024

Hey @shewu-quic, do you mind rebasing and trying again? Added a fallback here: #2564

@shewu-quic shewu-quic force-pushed the dev/hutton/enable_zero_copy branch from a0b82f3 to 460f564 Compare March 25, 2024 05:52
@shewu-quic
Copy link
Collaborator Author

Done. Thanks for your help. :)

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary:
- Add argument "shared_buffer" into compiler_spec, qnn_executor_runner and test scripts
  -  Actually, shared_buffer should be a runtime option since user are responsible to allocate memory for tensors on device. But it seems to have no way to set the runtime option to QnnBackend. Therefore, we put it to compile_spec for now.
- Implement SharedBuffer to allocate and free RPC memory
- Add QnnMemManger to register shared buffer for tensor
  - During exection time, we will register memory of tensor data for QNN. And we will deregister them during destruction time of QnnBackend
- Add two API `void* QnnExecuTorchAllocCustomMem(size_t bytes, size_t alignment)` and `void QnnExecuTorchFreeCustomMem(void* buffer_ptr)` to allocate RPC memory with SharedBuffer
  -  Users are responsible to allocate "enough" tensor bytes, and set alignment as MemoryAllocator::kDefaultAlignment. See runtime/core/memory_allocator.h.
@shewu-quic shewu-quic force-pushed the dev/hutton/enable_zero_copy branch from 460f564 to e541fb4 Compare March 25, 2024 06:22
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cccclai merged this pull request in a531ca5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants