[AArch64] All bits of an exact right shift are demanded #97448

momchil-velikov · 2024-07-02T17:29:16Z

When building a vector which contains zero elements, the AArch64 ISel replaces those elements with undef, if they are right shifted out.

However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior.

Should allow #92528 to be recommitted.

When building a vector which contains zero elements, the AArch64 ISel replaces those elements with `undef`, if they are right shifted out. However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior. Change-Id: I8d7eb1964ebbe88a90568be7805a29a72edd89e1

llvmbot · 2024-07-02T17:29:49Z

@llvm/pr-subscribers-backend-aarch64

Author: Momchil Velikov (momchil-velikov)

Changes

When building a vector which contains zero elements, the AArch64 ISel replaces those elements with undef, if they are right shifted out.

However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior.

Should allow #92528 to be recommitted.

Full diff: https://github.com/llvm/llvm-project/pull/97448.diff

2 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+4)
(added) llvm/test/CodeGen/AArch64/ashr-exact-demanded-bits.ll (+20)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index e0c3cc5eddb82..c7b2a21e6ed58 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -22142,6 +22142,10 @@ static SDValue performVectorShiftCombine(SDNode *N,
     if (DCI.DAG.ComputeNumSignBits(Op.getOperand(0)) > ShiftImm)
       return Op.getOperand(0);
 
+  // If the right shift is exact, the shifted out bits matter.
+  if (N->getOpcode() == AArch64ISD::VASHR && N->getFlags().hasExact())
+    return SDValue();
+
   APInt ShiftedOutBits = APInt::getLowBitsSet(OpScalarSize, ShiftImm);
   APInt DemandedMask = ~ShiftedOutBits;
 
diff --git a/llvm/test/CodeGen/AArch64/ashr-exact-demanded-bits.ll b/llvm/test/CodeGen/AArch64/ashr-exact-demanded-bits.ll
new file mode 100644
index 0000000000000..9f877dffe1ab5
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/ashr-exact-demanded-bits.ll
@@ -0,0 +1,20 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s | FileCheck %s
+target triple = "aarch64-linux"
+
+define <2 x float> @f(i8  %0, i8  %1) {
+; CHECK-LABEL: f:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    mov v0.b[3], w0
+; CHECK-NEXT:    mov v0.b[7], w1
+; CHECK-NEXT:    scvtf v0.2s, v0.2s, #24
+; CHECK-NEXT:    ret
+  %3 = insertelement <2 x i8> poison, i8 %0, i64 0
+  %4 = insertelement <2 x i8> %3, i8 %1, i64 1
+  %5 = shufflevector <2 x i8> %4, <2 x i8> <i8 0, i8 poison>, <8 x i32> <i32 2, i32 2, i32 2, i32 0, i32 2, i32 2, i32 2, i32 1>
+  %6 = bitcast <8 x i8> %5 to <2 x i32>
+  %7 = ashr exact <2 x i32> %6, <i32 24, i32 24>
+  %8 = sitofp <2 x i32> %7 to <2 x float>
+  ret <2 x float> %8
+}

davemgreen

I see.. because we need the bits to be 0's. This sounds good, but should we also apply this to VLSHR too? The code for SRA looks like this (but we are not using the input DemandBits here), and SRL has the hasExact part.

      // If the shift is exact, then it does demand the low bits (and knows that
      // they are zero).
      if (Op->getFlags().hasExact())
        InDemandedMask.setLowBits(ShAmt);

      // If any of the demanded bits are produced by the sign extension, we also
      // demand the input sign bit.
      if (DemandedBits.countl_zero() < ShAmt)
        InDemandedMask.setSignBit();

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

Change-Id: I5ac39bf06434cb4029ec8482bd54b2f756bb6af8

davemgreen

Thanks. LGTM

When building a vector which contains zero elements, the AArch64 ISel replaces those elements with `undef`, if they are right shifted out. However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior. Should allow llvm#92528 to be recommitted.

This re-commits d1a4f0c after a issue was fixed in f92bfca ("[AArch64] All bits of an exact right shift are demanded (#97448)").

scweng · 2024-07-02T17:57:58Z

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

@@ -22142,6 +22142,10 @@ static SDValue performVectorShiftCombine(SDNode *N,
    if (DCI.DAG.ComputeNumSignBits(Op.getOperand(0)) > ShiftImm)
      return Op.getOperand(0);

+  // If the right shift is exact, the shifted out bits matter.


nit: perhaps elaborate a bit on why the shifted out bits matter. E.g. "because we may fold the right shift with other instructions and thus require the lower bits to be set."

Now that gets me thinking. Wouldn't this condition than make code that does not fold the right shift with other instructions less efficient? Shouldn't the onus be on wherever folds right shift with sitofp (scvtf) to make sure to clear the lower bits?

(Somehow didn't send this out before -- posting for posterity.)

I wrote that in the commit message

However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior.

momchil-velikov requested review from scweng and davemgreen July 2, 2024 17:29

llvmbot added the backend:AArch64 label Jul 2, 2024

momchil-velikov requested review from Lukacma and CarolineConcatto July 2, 2024 17:29

davemgreen reviewed Jul 2, 2024

View reviewed changes

CarolineConcatto reviewed Jul 3, 2024

View reviewed changes

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Show resolved Hide resolved

[fixup] Do the same for logical shift right too

647c4c1

Change-Id: I5ac39bf06434cb4029ec8482bd54b2f756bb6af8

davemgreen approved these changes Jul 3, 2024

View reviewed changes

momchil-velikov merged commit f92bfca into llvm:main Jul 5, 2024
7 checks passed

momchil-velikov mentioned this pull request Jul 5, 2024

[AArch64] Lower extending sitofp using tbl #92528

Merged

momchil-velikov added a commit that referenced this pull request Jul 8, 2024

Reapply "[AArch64] Lower extending sitofp using tbl (#92528)"

a497e98

This re-commits d1a4f0c after a issue was fixed in f92bfca ("[AArch64] All bits of an exact right shift are demanded (#97448)").

scweng reviewed Jul 8, 2024

View reviewed changes

momchil-velikov deleted the ashr-exact-demanded-bits branch November 13, 2024 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] All bits of an exact right shift are demanded #97448

[AArch64] All bits of an exact right shift are demanded #97448

momchil-velikov commented Jul 2, 2024

llvmbot commented Jul 2, 2024

davemgreen left a comment

davemgreen left a comment

scweng Jul 2, 2024

scweng Jul 8, 2024

momchil-velikov Jul 8, 2024

[AArch64] All bits of an exact right shift are demanded #97448

[AArch64] All bits of an exact right shift are demanded #97448

Conversation

momchil-velikov commented Jul 2, 2024

llvmbot commented Jul 2, 2024

davemgreen left a comment

Choose a reason for hiding this comment

davemgreen left a comment

Choose a reason for hiding this comment

scweng Jul 2, 2024

Choose a reason for hiding this comment

scweng Jul 8, 2024

Choose a reason for hiding this comment

momchil-velikov Jul 8, 2024

Choose a reason for hiding this comment