-
Notifications
You must be signed in to change notification settings - Fork 617
Fix triton group gemm for tp4 #3762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
This pull request was exported from Phabricator. Differential Revision: D70568729 |
This pull request was exported from Phabricator. Differential Revision: D70568729 |
Summary: X-link: facebookresearch/FBGEMM#843 For whatever reasons, there seems to be some integer overflow for this kernel on AMD, causing it to core dump with TP4 sharding for 17bx128E. On H100, there is no such problem. Differential Revision: D70568729
Summary: X-link: facebookresearch/FBGEMM#843 For whatever reasons, there seems to be some integer overflow for this kernel on AMD, causing it to core dump with TP4 sharding for 17bx128E. On H100, there is no such problem. Differential Revision: D70568729
This pull request was exported from Phabricator. Differential Revision: D70568729 |
Summary: X-link: facebookresearch/FBGEMM#843 Pull Request resolved: pytorch#3762 For whatever reasons, there seems to be some integer overflow for this kernel on AMD, causing it to core dump with TP4 sharding for 17bx128E. On H100, there is no such problem. Reviewed By: levendlee Differential Revision: D70568729
Summary: X-link: facebookresearch/FBGEMM#843 Pull Request resolved: pytorch#3762 For whatever reasons, there seems to be some integer overflow for this kernel on AMD, causing it to core dump with TP4 sharding for 17bx128E. On H100, there is no such problem. Reviewed By: levendlee Differential Revision: D70568729
This pull request was exported from Phabricator. Differential Revision: D70568729 |
This pull request has been merged in c146720. |
Summary: Pull Request resolved: facebookresearch/FBGEMM#843 X-link: pytorch#3762 For whatever reasons, there seems to be some integer overflow for this kernel on AMD, causing it to core dump with TP4 sharding for 17bx128E. On H100, there is no such problem. Reviewed By: levendlee Differential Revision: D70568729 fbshipit-source-id: 7c35536d597e6801b8c1fcbceb3a0132b41b8305
Summary: For whatever reasons, there seems to be some integer overflow for this kernel on AMD, causing it to core dump with TP4 sharding for 17bx128E. On H100, there is no such problem.
Differential Revision: D70568729