-
Notifications
You must be signed in to change notification settings - Fork 538
Qualcomm AI Engine Direct - [DO NOT MERGE] PTE size and Inference Speed Verification #7569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qualcomm AI Engine Direct - [DO NOT MERGE] PTE size and Inference Speed Verification #7569
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7569
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New FailuresAs of commit f228d74 with merge base e00eaea ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Hi @cccclai, For the runner, I have commented out the EOT condition so it can generate all the tokens, making it easier for us to keep track of the inference speed. I have also sent you the PTE I used via email.
Please let me know if you cannot reproduce or run into any other issues. |
@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
I'm getting the following perf number
with this commit and the .pte shared from you... |
ffd7e8b
to
a6aee94
Compare
Summary
This is a draft to verify that hybrid mode models: