-
Notifications
You must be signed in to change notification settings - Fork 14.3k
AMDGPU: Update pattern matching from "x&(-1>>(32-y))" to "bfe x, 0, y" #116115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It is not correct to lower x&(-1>>(32-y) to "bfe x, 0, y". When y equals to 32, "-1" is not shifted, so x&(-1>>(32-32) is still x, but "bfe x, 0, 32" is 0.
@llvm/pr-subscribers-backend-amdgpu Author: Changpeng Fang (changpeng) ChangesIt is not correct to lower x&(-1>>(32-y) to "bfe x, 0, y". When y equals 32, "-1" is not shifted, so x&(-1>>(32-32) is still x, but "bfe x, 0, 32" is 0. Full diff: https://github.com/llvm/llvm-project/pull/116115.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index 5f4cca0645b0ef..2831d0339df6b9 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -3550,12 +3550,6 @@ def : AMDGPUPat <
(V_BFE_U32_e64 $src, (i32 0), $width)
>;
-// x & (-1 >> (bitwidth - y))
-def : AMDGPUPat <
- (DivergentBinFrag<and> i32:$src, (srl_oneuse -1, (sub 32, i32:$width))),
- (V_BFE_U32_e64 $src, (i32 0), $width)
->;
-
// SHA-256 Ma patterns
// ((x & z) | (y & (x | z))) -> BFI (XOR x, y), z, y
diff --git a/llvm/test/CodeGen/AMDGPU/extract-lowbits.ll b/llvm/test/CodeGen/AMDGPU/extract-lowbits.ll
index 7f1f7133d69919..cc2ef0c48f1152 100644
--- a/llvm/test/CodeGen/AMDGPU/extract-lowbits.ll
+++ b/llvm/test/CodeGen/AMDGPU/extract-lowbits.ll
@@ -99,11 +99,21 @@ define i32 @bzhi32_b4_commutative(i32 %val, i32 %numlowbits) nounwind {
; ---------------------------------------------------------------------------- ;
define i32 @bzhi32_c0(i32 %val, i32 %numlowbits) nounwind {
-; GCN-LABEL: bzhi32_c0:
-; GCN: ; %bb.0:
-; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GCN-NEXT: v_bfe_u32 v0, v0, 0, v1
-; GCN-NEXT: s_setpc_b64 s[30:31]
+; SI-LABEL: bzhi32_c0:
+; SI: ; %bb.0:
+; SI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; SI-NEXT: v_sub_i32_e32 v1, vcc, 32, v1
+; SI-NEXT: v_lshr_b32_e32 v1, -1, v1
+; SI-NEXT: v_and_b32_e32 v0, v1, v0
+; SI-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: bzhi32_c0:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_sub_u32_e32 v1, vcc, 32, v1
+; VI-NEXT: v_lshrrev_b32_e64 v1, v1, -1
+; VI-NEXT: v_and_b32_e32 v0, v1, v0
+; VI-NEXT: s_setpc_b64 s[30:31]
%numhighbits = sub i32 32, %numlowbits
%mask = lshr i32 -1, %numhighbits
%masked = and i32 %mask, %val
@@ -134,11 +144,21 @@ define i32 @bzhi32_c1_indexzext(i32 %val, i8 %numlowbits) nounwind {
}
define i32 @bzhi32_c4_commutative(i32 %val, i32 %numlowbits) nounwind {
-; GCN-LABEL: bzhi32_c4_commutative:
-; GCN: ; %bb.0:
-; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GCN-NEXT: v_bfe_u32 v0, v0, 0, v1
-; GCN-NEXT: s_setpc_b64 s[30:31]
+; SI-LABEL: bzhi32_c4_commutative:
+; SI: ; %bb.0:
+; SI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; SI-NEXT: v_sub_i32_e32 v1, vcc, 32, v1
+; SI-NEXT: v_lshr_b32_e32 v1, -1, v1
+; SI-NEXT: v_and_b32_e32 v0, v0, v1
+; SI-NEXT: s_setpc_b64 s[30:31]
+;
+; VI-LABEL: bzhi32_c4_commutative:
+; VI: ; %bb.0:
+; VI-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT: v_sub_u32_e32 v1, vcc, 32, v1
+; VI-NEXT: v_lshrrev_b32_e64 v1, v1, -1
+; VI-NEXT: v_and_b32_e32 v0, v0, v1
+; VI-NEXT: s_setpc_b64 s[30:31]
%numhighbits = sub i32 32, %numlowbits
%mask = lshr i32 -1, %numhighbits
%masked = and i32 %val, %mask ; swapped order
|
It is fine to match the pattern if we know y has at most five active bits (< 32).
Mismatched parens in both summary and description: x&(-1>>(32-y) |
|
Hi, I like your patch but I have a couple of questions:
|
This says that it only looks at bits 4:0 of S2, so 32 would be treated the same as 0. |
Right! Thanks. |
[AMD Official Use Only - AMD Internal Distribution Only]
Right. So bfe x, 0, 32 is the same as bfe x, 0, 0 and the result is 0. But x & (-1 >> 0) equals x.
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Jay Foad ***@***.***>
Sent: Thursday, November 14, 2024 2:58:44 AM
To: llvm/llvm-project ***@***.***>
Cc: Fang, Changpeng ***@***.***>; Author ***@***.***>
Subject: Re: [llvm/llvm-project] AMDGPU: Update pattern matching from "x&(-1>>(32-y))" to "bfe x, 0, y" (PR #116115)
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
The description in the RDNA3 ISA says D0.u = ((S0.u >> S1.u[4 : 0].u) & 32'U((1 << S2.u[4 : 0].u) - 1))
This says that it only looks at bits 4:0 of S2, so 32 would be treated the same as 0.
—
Reply to this email directly, view it on GitHub<#116115 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABALCLIIE2I7JBLDOKAH7PT2AR66JAVCNFSM6AAAAABRXR3SD2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZWGAZTQNZQGY>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
It is not correct to lower "x&(-1>>(32-y))" to "bfe x, 0, y". When y equals 32, "-1" is not shifted, so x&(-1>>(32-32) is still x, but "bfe x, 0, 32" is 0. However, if we know y is at most of 5 bits (< 32), we can still do the pattern matching.