AMDGPU: Update pattern matching from "x&(-1>>(32-y))" to "bfe x, 0, y" #116115

changpeng · 2024-11-13T22:38:38Z

It is not correct to lower "x&(-1>>(32-y))" to "bfe x, 0, y". When y equals 32, "-1" is not shifted, so x&(-1>>(32-32) is still x, but "bfe x, 0, 32" is 0. However, if we know y is at most of 5 bits (< 32), we can still do the pattern matching.

It is not correct to lower x&(-1>>(32-y) to "bfe x, 0, y". When y equals to 32, "-1" is not shifted, so x&(-1>>(32-32) is still x, but "bfe x, 0, 32" is 0.

llvmbot · 2024-11-13T22:39:12Z

@llvm/pr-subscribers-backend-amdgpu

Author: Changpeng Fang (changpeng)

Changes

It is not correct to lower x&(-1>>(32-y) to "bfe x, 0, y". When y equals 32, "-1" is not shifted, so x&(-1>>(32-32) is still x, but "bfe x, 0, 32" is 0.

Full diff: https://github.com/llvm/llvm-project/pull/116115.diff

2 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIInstructions.td (-6)
(modified) llvm/test/CodeGen/AMDGPU/extract-lowbits.ll (+30-10)

diff --git a/llvm/lib/Target/AMDGPU/SIInstructions.td b/llvm/lib/Target/AMDGPU/SIInstructions.td
index 5f4cca0645b0ef..2831d0339df6b9 100644
--- a/llvm/lib/Target/AMDGPU/SIInstructions.td
+++ b/llvm/lib/Target/AMDGPU/SIInstructions.td
@@ -3550,12 +3550,6 @@ def : AMDGPUPat <
   (V_BFE_U32_e64 $src, (i32 0), $width)
 >;
 
-// x & (-1 >> (bitwidth - y))
-def : AMDGPUPat <
-  (DivergentBinFrag<and> i32:$src, (srl_oneuse -1, (sub 32, i32:$width))),
-  (V_BFE_U32_e64 $src, (i32 0), $width)
->;
-
 // SHA-256 Ma patterns
 
 // ((x & z) | (y & (x | z))) -> BFI (XOR x, y), z, y
diff --git a/llvm/test/CodeGen/AMDGPU/extract-lowbits.ll b/llvm/test/CodeGen/AMDGPU/extract-lowbits.ll
index 7f1f7133d69919..cc2ef0c48f1152 100644
--- a/llvm/test/CodeGen/AMDGPU/extract-lowbits.ll
+++ b/llvm/test/CodeGen/AMDGPU/extract-lowbits.ll
@@ -99,11 +99,21 @@ define i32 @bzhi32_b4_commutative(i32 %val, i32 %numlowbits) nounwind {
 ; ---------------------------------------------------------------------------- ;
 
 define i32 @bzhi32_c0(i32 %val, i32 %numlowbits) nounwind {
-; GCN-LABEL: bzhi32_c0:
-; GCN:       ; %bb.0:
-; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GCN-NEXT:    v_bfe_u32 v0, v0, 0, v1
-; GCN-NEXT:    s_setpc_b64 s[30:31]
+; SI-LABEL: bzhi32_c0:
+; SI:       ; %bb.0:
+; SI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; SI-NEXT:    v_sub_i32_e32 v1, vcc, 32, v1
+; SI-NEXT:    v_lshr_b32_e32 v1, -1, v1
+; SI-NEXT:    v_and_b32_e32 v0, v1, v0
+; SI-NEXT:    s_setpc_b64 s[30:31]
+;
+; VI-LABEL: bzhi32_c0:
+; VI:       ; %bb.0:
+; VI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT:    v_sub_u32_e32 v1, vcc, 32, v1
+; VI-NEXT:    v_lshrrev_b32_e64 v1, v1, -1
+; VI-NEXT:    v_and_b32_e32 v0, v1, v0
+; VI-NEXT:    s_setpc_b64 s[30:31]
   %numhighbits = sub i32 32, %numlowbits
   %mask = lshr i32 -1, %numhighbits
   %masked = and i32 %mask, %val
@@ -134,11 +144,21 @@ define i32 @bzhi32_c1_indexzext(i32 %val, i8 %numlowbits) nounwind {
 }
 
 define i32 @bzhi32_c4_commutative(i32 %val, i32 %numlowbits) nounwind {
-; GCN-LABEL: bzhi32_c4_commutative:
-; GCN:       ; %bb.0:
-; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GCN-NEXT:    v_bfe_u32 v0, v0, 0, v1
-; GCN-NEXT:    s_setpc_b64 s[30:31]
+; SI-LABEL: bzhi32_c4_commutative:
+; SI:       ; %bb.0:
+; SI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; SI-NEXT:    v_sub_i32_e32 v1, vcc, 32, v1
+; SI-NEXT:    v_lshr_b32_e32 v1, -1, v1
+; SI-NEXT:    v_and_b32_e32 v0, v0, v1
+; SI-NEXT:    s_setpc_b64 s[30:31]
+;
+; VI-LABEL: bzhi32_c4_commutative:
+; VI:       ; %bb.0:
+; VI-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; VI-NEXT:    v_sub_u32_e32 v1, vcc, 32, v1
+; VI-NEXT:    v_lshrrev_b32_e64 v1, v1, -1
+; VI-NEXT:    v_and_b32_e32 v0, v0, v1
+; VI-NEXT:    s_setpc_b64 s[30:31]
   %numhighbits = sub i32 32, %numlowbits
   %mask = lshr i32 -1, %numhighbits
   %masked = and i32 %val, %mask ; swapped order

llvm/test/CodeGen/AMDGPU/extract-lowbits.ll

It is fine to match the pattern if we know y has at most five active bits (< 32).

jayfoad · 2024-11-14T07:56:36Z

Mismatched parens in both summary and description: x&(-1>>(32-y)

changpeng · 2024-11-14T08:46:36Z

Mismatched parens in both summary and description: x&(-1>>(32-y)
Thanks!

rovka · 2024-11-14T10:53:21Z

Hi, I like your patch but I have a couple of questions:

Where does it say that v_bfe_u32 x, 0, 32 is 0? The description in the RDNA3 ISA says D0.u = ((S0.u >> S1.u[4 : 0].u) & 32'U((1 << S2.u[4 : 0].u) - 1)), which is unfortunately not very specific about edge cases. I would interpret it as large values cause the 1 to be shifted out, then when we subtract 1 we get 0xFFF...F, so when we do & with it we essentially get x. If the hardware actually returns 0, we should probably talk to folks to make that clear in the ISA.
While you're at it, what do v_bfe_i32 and s_bfe_* do and do we have proper tests for them?

jayfoad · 2024-11-14T10:58:21Z

The description in the RDNA3 ISA says D0.u = ((S0.u >> S1.u[4 : 0].u) & 32'U((1 << S2.u[4 : 0].u) - 1))

This says that it only looks at bits 4:0 of S2, so 32 would be treated the same as 0.

rovka · 2024-11-14T12:04:58Z

The description in the RDNA3 ISA says D0.u = ((S0.u >> S1.u[4 : 0].u) & 32'U((1 << S2.u[4 : 0].u) - 1))

This says that it only looks at bits 4:0 of S2, so 32 would be treated the same as 0.

Right! Thanks.

changpeng · 2024-11-14T15:36:36Z

[AMD Official Use Only - AMD Internal Distribution Only] Right. So bfe x, 0, 32 is the same as bfe x, 0, 0 and the result is 0. But x & (-1 >> 0) equals x. Get Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: Jay Foad ***@***.***> Sent: Thursday, November 14, 2024 2:58:44 AM To: llvm/llvm-project ***@***.***> Cc: Fang, Changpeng ***@***.***>; Author ***@***.***> Subject: Re: [llvm/llvm-project] AMDGPU: Update pattern matching from "x&(-1>>(32-y))" to "bfe x, 0, y" (PR #116115) Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. The description in the RDNA3 ISA says D0.u = ((S0.u >> S1.u[4 : 0].u) & 32'U((1 << S2.u[4 : 0].u) - 1)) This says that it only looks at bits 4:0 of S2, so 32 would be treated the same as 0. — Reply to this email directly, view it on GitHub<#116115 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABALCLIIE2I7JBLDOKAH7PT2AR66JAVCNFSM6AAAAABRXR3SD2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZWGAZTQNZQGY>. You are receiving this because you authored the thread.Message ID: ***@***.***>

AMDGPU: Disable pattern matching x&(-1>>(32-y) to "bfe x, 0, y"

24a19f1

It is not correct to lower x&(-1>>(32-y) to "bfe x, 0, y". When y equals to 32, "-1" is not shifted, so x&(-1>>(32-32) is still x, but "bfe x, 0, 32" is 0.

llvmbot added the backend:AMDGPU label Nov 13, 2024

changpeng requested a review from arsenm November 13, 2024 22:38

arsenm reviewed Nov 13, 2024

View reviewed changes

llvm/test/CodeGen/AMDGPU/extract-lowbits.ll Show resolved Hide resolved

AMDGPU: Allow clamp for pattern x&(-1>>(32-y) to "bfe x, 0, y"

fa21d88

It is fine to match the pattern if we know y has at most five active bits (< 32).

changpeng changed the title ~~AMDGPU: Disable pattern matching from x&(-1>>(32-y) to "bfe x, 0, y"~~ AMDGPU: Update pattern matching from x&(-1>>(32-y) to "bfe x, 0, y" Nov 14, 2024

changpeng changed the title ~~AMDGPU: Update pattern matching from x&(-1>>(32-y) to "bfe x, 0, y"~~ AMDGPU: Update pattern matching from "x&(-1>>(32-y))" to "bfe x, 0, y" Nov 14, 2024

arsenm approved these changes Nov 14, 2024

View reviewed changes

changpeng merged commit e3e7c75 into llvm:main Nov 14, 2024
6 of 8 checks passed

changpeng deleted the bfe branch November 14, 2024 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AMDGPU: Update pattern matching from "x&(-1>>(32-y))" to "bfe x, 0, y" #116115

AMDGPU: Update pattern matching from "x&(-1>>(32-y))" to "bfe x, 0, y" #116115

Uh oh!

changpeng commented Nov 13, 2024 •

edited

Loading

Uh oh!

llvmbot commented Nov 13, 2024

Uh oh!

Uh oh!

jayfoad commented Nov 14, 2024

Uh oh!

changpeng commented Nov 14, 2024

Uh oh!

rovka commented Nov 14, 2024

Uh oh!

jayfoad commented Nov 14, 2024

Uh oh!

rovka commented Nov 14, 2024

Uh oh!

changpeng commented Nov 14, 2024 via email

Uh oh!

Uh oh!

Uh oh!

AMDGPU: Update pattern matching from "x&(-1>>(32-y))" to "bfe x, 0, y" #116115

AMDGPU: Update pattern matching from "x&(-1>>(32-y))" to "bfe x, 0, y" #116115

Uh oh!

Conversation

changpeng commented Nov 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 13, 2024

Uh oh!

Uh oh!

jayfoad commented Nov 14, 2024

Uh oh!

changpeng commented Nov 14, 2024

Uh oh!

rovka commented Nov 14, 2024

Uh oh!

jayfoad commented Nov 14, 2024

Uh oh!

rovka commented Nov 14, 2024

Uh oh!

changpeng commented Nov 14, 2024 via email

Uh oh!

Uh oh!

Uh oh!

changpeng commented Nov 13, 2024 •

edited

Loading