Skip to content

[MPS] Add index.Tensor and aten.logical_not #3221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

DenisVieriu97
Copy link
Collaborator

@DenisVieriu97 DenisVieriu97 commented Apr 22, 2024

Add missing llama ops for MPS delegate:

  • index.Tensor
  • logical_not

index.put works correctly for generating 1 token, but gives incorrect results on 2nd token. This remains disabled.

Summary of changes:

  • Adds missing llama2 ops
  • Adds support for launching Metal kernels instead of MPSGraph ops (if MPSGraph doesn't have the support)

cc @cccclai , @shoumikhin

Copy link

pytorch-bot bot commented Apr 22, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/3221

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 2b833bd with merge base 1eaed2b (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 22, 2024
@DenisVieriu97 DenisVieriu97 changed the title [MPS] Add index.Tensor and logical_not [MPS] Add index.Tensor and aten.logical_not Apr 22, 2024
@cccclai
Copy link
Contributor

cccclai commented Apr 23, 2024

Thank you! The logical op is actually from decomposing sdpa - if the sdpa is not ready yet, maybe can use the simpler version #3165 which has fewer ops than the decomposing F.scaled_dot_product_attention

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@DenisVieriu97 DenisVieriu97 force-pushed the dev/denis/missing_llama_ops branch from d13bb4e to 1c9fda5 Compare April 23, 2024 01:07
@DenisVieriu97
Copy link
Collaborator Author

Apple / test-demo-ios / macos-job (pull_request) is failing because it uses the prebuilt mpsdelegate static lib stored in AWS (https://ossci-ios.s3.amazonaws.com/executorch/, "mps_backend": ["sha256": "97db0fd2b458ff4dae3f4e927d417b4ce88ef3bd4114759abe8372a05bac84ad"] which doesn't match anymore with the runtime changes made in this PR. This PR is using the new AOT changes but the older runtime, which are not compatible anymore. I've checked locally, and Apple / test-demo-ios / macos-job passes with the new runtime.
Any ideas how to fix this ? (cc @cccclai , @shoumikhin )

@DenisVieriu97
Copy link
Collaborator Author

Thank you! The logical op is actually from decomposing sdpa - if the sdpa is not ready yet, maybe can use the simpler version #3165 which has fewer ops than the decomposing F.scaled_dot_product_attention

Thank you @cccclai . I'll take a look and create a new PR for those changes

@cccclai
Copy link
Contributor

cccclai commented Apr 23, 2024

Thank you! The logical op is actually from decomposing sdpa - if the sdpa is not ready yet, maybe can use the simpler version #3165 which has fewer ops than the decomposing F.scaled_dot_product_attention

Thank you @cccclai . I'll take a look and create a new PR for those changes

oh sorry pointed to a wrong pr...should be this one #3037. Regarding the test failure, probably @shoumikhin knows more. Will ping him tomorrow

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@DenisVieriu97 DenisVieriu97 force-pushed the dev/denis/missing_llama_ops branch from 1c9fda5 to 2b833bd Compare April 23, 2024 22:54
@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cccclai merged this pull request in 02a6b66.

@cccclai
Copy link
Contributor

cccclai commented Apr 24, 2024

@pytorchbot cherry-pick --onto release/0.2 -c regression

pytorchbot pushed a commit that referenced this pull request Apr 24, 2024
Summary:
Add missing llama ops for MPS delegate:
- `index.Tensor`
- `logical_not`

`index.put` works correctly for generating 1 token, but gives incorrect results on 2nd token. This remains disabled.

Summary of changes:
- Adds missing llama2 ops
- Adds support for launching Metal kernels instead of MPSGraph ops (if MPSGraph doesn't have the support)

cc cccclai , shoumikhin

Pull Request resolved: #3221

Reviewed By: shoumikhin

Differential Revision: D56447710

Pulled By: cccclai

fbshipit-source-id: 778a485df5e67d1afd006b42f07b69c8a3961223
(cherry picked from commit 02a6b66)
@pytorchbot
Copy link
Collaborator

Cherry picking #3221

The cherry pick PR is at #3267 and it is recommended to link a regression cherry pick PR with an issue

Details for Dev Infra team Raised by workflow job

shoumikhin pushed a commit that referenced this pull request Apr 24, 2024
Summary:
Add missing llama ops for MPS delegate:
- `index.Tensor`
- `logical_not`

`index.put` works correctly for generating 1 token, but gives incorrect results on 2nd token. This remains disabled.

Summary of changes:
- Adds missing llama2 ops
- Adds support for launching Metal kernels instead of MPSGraph ops (if MPSGraph doesn't have the support)

cc cccclai , shoumikhin

Pull Request resolved: #3221

Reviewed By: shoumikhin

Differential Revision: D56447710

Pulled By: cccclai

fbshipit-source-id: 778a485df5e67d1afd006b42f07b69c8a3961223
(cherry picked from commit 02a6b66)

Co-authored-by: Denis Vieriu <[email protected]>
This was referenced Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants