-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[NVPTX] Improve lowering of v4i8 #67866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Make it a legal type and plumb through lowering of relevant instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually trying out the patch in triton it causes some invalid ptx to get emitted (see comment)
Verified that NVPTX tests pass with ptxas being able to compiler PTX produced by llc tests.
To make things work consisstently for v4i8, we need to implement other vector ops.
✅ With the latest revision this PR passed the C/C++ code formatter. |
I still need to test it on live code, though we do not have much code that would end up using v4i8. The generated PTX checked in llvm/test/CodeGen/NVPTX/i8x4-instructions.ll could use extra scrutiny. |
I ran the patch on our triton kernels and I don't see any functional problems left. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it required quite a lot of cases to be handled :(
Thanks for doing this, it solves some of the problems triton had with latest LLVM. Changes look good to me.
I see one suspicious failure in tensorflow tests. I suspect I've messed something up in v4i8 comparison. |
Yup, there is a problem:
|
Resolved by 9821e90 |
Found another issue. We merge four independent byte loads with
|
clang-format failure on GitHub is weird -- it just silently exits with an error. The buildkite failure somewhere in RISC-V appears to be unrelated. |
I believe this may be causing failures for u/srem. See #69124 Edit: Also causing failures when you sign-extend the result of a <4 x i8> comparison. |
In [[NVPTX] Improve lowering of v4i8](cbafb6f) @Artem-B add the ability to lower ISD::BUILD_VECTOR with bfi PTX instructions. @Artem-B did this because: ([source](#67866 (comment))) > Under the hood byte extraction/insertion ends up as BFI/BFE instructions, so we may as well do that in PTX, too. https://godbolt.org/z/Tb3zWbj9b However, the example that @Artem-B linked was targeting sm_52. On modern architectures, ptxas uses prmt.b32. [Example](https://godbolt.org/z/Ye4W1n84o). Thus, remove uses of NVPTXISD::BFI in favor of NVPTXISD::PRMT.
In [[NVPTX] Improve lowering of v4i8](llvm@cbafb6f) @Artem-B add the ability to lower ISD::BUILD_VECTOR with bfi PTX instructions. @Artem-B did this because: ([source](llvm#67866 (comment))) > Under the hood byte extraction/insertion ends up as BFI/BFE instructions, so we may as well do that in PTX, too. https://godbolt.org/z/Tb3zWbj9b However, the example that @Artem-B linked was targeting sm_52. On modern architectures, ptxas uses prmt.b32. [Example](https://godbolt.org/z/Ye4W1n84o). Thus, remove uses of NVPTXISD::BFI in favor of NVPTXISD::PRMT.
In [[NVPTX] Improve lowering of v4i8](llvm@cbafb6f) @Artem-B add the ability to lower ISD::BUILD_VECTOR with bfi PTX instructions. @Artem-B did this because: ([source](llvm#67866 (comment))) > Under the hood byte extraction/insertion ends up as BFI/BFE instructions, so we may as well do that in PTX, too. https://godbolt.org/z/Tb3zWbj9b However, the example that @Artem-B linked was targeting sm_52. On modern architectures, ptxas uses prmt.b32. [Example](https://godbolt.org/z/Ye4W1n84o). Thus, remove uses of NVPTXISD::BFI in favor of NVPTXISD::PRMT.
Fix [failure](#110766 (comment)) identified by @akuegel. --- In [[NVPTX] Improve lowering of v4i8](cbafb6f) @Artem-B add the ability to lower ISD::BUILD_VECTOR with bfi PTX instructions. @Artem-B did this because: (#67866 (comment)) Under the hood byte extraction/insertion ends up as BFI/BFE instructions, so we may as well do that in PTX, too. https://godbolt.org/z/Tb3zWbj9b However, the example that @Artem-B linked was targeting sm_52. On modern architectures, ptxas uses prmt.b32. [Example](https://godbolt.org/z/Ye4W1n84o). Thus, remove uses of NVPTXISD::BFI in favor of NVPTXISD::PRMT.
Fix [failure](llvm#110766 (comment)) identified by @akuegel. --- In [[NVPTX] Improve lowering of v4i8](llvm@cbafb6f) @Artem-B add the ability to lower ISD::BUILD_VECTOR with bfi PTX instructions. @Artem-B did this because: (llvm#67866 (comment)) Under the hood byte extraction/insertion ends up as BFI/BFE instructions, so we may as well do that in PTX, too. https://godbolt.org/z/Tb3zWbj9b However, the example that @Artem-B linked was targeting sm_52. On modern architectures, ptxas uses prmt.b32. [Example](https://godbolt.org/z/Ye4W1n84o). Thus, remove uses of NVPTXISD::BFI in favor of NVPTXISD::PRMT.
Fix [failure](llvm#110766 (comment)) identified by @akuegel. --- In [[NVPTX] Improve lowering of v4i8](llvm@cbafb6f) @Artem-B add the ability to lower ISD::BUILD_VECTOR with bfi PTX instructions. @Artem-B did this because: (llvm#67866 (comment)) Under the hood byte extraction/insertion ends up as BFI/BFE instructions, so we may as well do that in PTX, too. https://godbolt.org/z/Tb3zWbj9b However, the example that @Artem-B linked was targeting sm_52. On modern architectures, ptxas uses prmt.b32. [Example](https://godbolt.org/z/Ye4W1n84o). Thus, remove uses of NVPTXISD::BFI in favor of NVPTXISD::PRMT.
Make it a legal type and plumb through lowering of relevant instructions.