-
Notifications
You must be signed in to change notification settings - Fork 537
Qualcomm AI Engine Direct - Enable zero copy feature #2531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/2531
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New Failures, 1 Unrelated FailureAs of commit e541fb4 with merge base 7f96f5a ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
250005d
to
55c504d
Compare
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Hey do you mind rebasing again? As #2506 was landed and there is a merge conflict |
55c504d
to
a0b82f3
Compare
Thanks for your reminder. Test in pytorch/main branch.
|
Summary: Catch the general Exception case, not only AssertionError. D55047794 caused an error with QC: #2531 Reviewed By: cccclai Differential Revision: D55204315 fbshipit-source-id: 35df9d74bd55d329cbcab03e5c70e289b45796db
Hey @shewu-quic, do you mind rebasing and trying again? Added a fallback here: #2564 |
a0b82f3
to
460f564
Compare
Done. Thanks for your help. :) |
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: - Add argument "shared_buffer" into compiler_spec, qnn_executor_runner and test scripts - Actually, shared_buffer should be a runtime option since user are responsible to allocate memory for tensors on device. But it seems to have no way to set the runtime option to QnnBackend. Therefore, we put it to compile_spec for now. - Implement SharedBuffer to allocate and free RPC memory - Add QnnMemManger to register shared buffer for tensor - During exection time, we will register memory of tensor data for QNN. And we will deregister them during destruction time of QnnBackend - Add two API `void* QnnExecuTorchAllocCustomMem(size_t bytes, size_t alignment)` and `void QnnExecuTorchFreeCustomMem(void* buffer_ptr)` to allocate RPC memory with SharedBuffer - Users are responsible to allocate "enough" tensor bytes, and set alignment as MemoryAllocator::kDefaultAlignment. See runtime/core/memory_allocator.h.
460f564
to
e541fb4
Compare
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary:
void* QnnExecuTorchAllocCustomMem(size_t bytes, size_t alignment)
andvoid QnnExecuTorchFreeCustomMem(void* buffer_ptr)
to allocate RPC memory with SharedBuffer