Skip to content

[AArch64] All bits of an exact right shift are demanded #97448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 5, 2024

Conversation

momchil-velikov
Copy link
Collaborator

When building a vector which contains zero elements, the AArch64 ISel replaces those elements with undef, if they are right shifted out.

However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior.

Should allow #92528 to be recommitted.

When building a vector which contains zero elements, the AArch64 ISel
replaces those elements with `undef`, if they are right shifted out.

However, these elements need to stay zero if the right shift is exact,
or otherwise we will be introducing undefined behavior.

Change-Id: I8d7eb1964ebbe88a90568be7805a29a72edd89e1
@llvmbot
Copy link
Member

llvmbot commented Jul 2, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Momchil Velikov (momchil-velikov)

Changes

When building a vector which contains zero elements, the AArch64 ISel replaces those elements with undef, if they are right shifted out.

However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior.

Should allow #92528 to be recommitted.


Full diff: https://github.com/llvm/llvm-project/pull/97448.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+4)
  • (added) llvm/test/CodeGen/AArch64/ashr-exact-demanded-bits.ll (+20)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index e0c3cc5eddb82..c7b2a21e6ed58 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -22142,6 +22142,10 @@ static SDValue performVectorShiftCombine(SDNode *N,
     if (DCI.DAG.ComputeNumSignBits(Op.getOperand(0)) > ShiftImm)
       return Op.getOperand(0);
 
+  // If the right shift is exact, the shifted out bits matter.
+  if (N->getOpcode() == AArch64ISD::VASHR && N->getFlags().hasExact())
+    return SDValue();
+
   APInt ShiftedOutBits = APInt::getLowBitsSet(OpScalarSize, ShiftImm);
   APInt DemandedMask = ~ShiftedOutBits;
 
diff --git a/llvm/test/CodeGen/AArch64/ashr-exact-demanded-bits.ll b/llvm/test/CodeGen/AArch64/ashr-exact-demanded-bits.ll
new file mode 100644
index 0000000000000..9f877dffe1ab5
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/ashr-exact-demanded-bits.ll
@@ -0,0 +1,20 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s | FileCheck %s
+target triple = "aarch64-linux"
+
+define <2 x float> @f(i8  %0, i8  %1) {
+; CHECK-LABEL: f:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    movi v0.2d, #0000000000000000
+; CHECK-NEXT:    mov v0.b[3], w0
+; CHECK-NEXT:    mov v0.b[7], w1
+; CHECK-NEXT:    scvtf v0.2s, v0.2s, #24
+; CHECK-NEXT:    ret
+  %3 = insertelement <2 x i8> poison, i8 %0, i64 0
+  %4 = insertelement <2 x i8> %3, i8 %1, i64 1
+  %5 = shufflevector <2 x i8> %4, <2 x i8> <i8 0, i8 poison>, <8 x i32> <i32 2, i32 2, i32 2, i32 0, i32 2, i32 2, i32 2, i32 1>
+  %6 = bitcast <8 x i8> %5 to <2 x i32>
+  %7 = ashr exact <2 x i32> %6, <i32 24, i32 24>
+  %8 = sitofp <2 x i32> %7 to <2 x float>
+  ret <2 x float> %8
+}

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.. because we need the bits to be 0's. This sounds good, but should we also apply this to VLSHR too? The code for SRA looks like this (but we are not using the input DemandBits here), and SRL has the hasExact part.

      // If the shift is exact, then it does demand the low bits (and knows that
      // they are zero).
      if (Op->getFlags().hasExact())
        InDemandedMask.setLowBits(ShAmt);

      // If any of the demanded bits are produced by the sign extension, we also
      // demand the input sign bit.
      if (DemandedBits.countl_zero() < ShAmt)
        InDemandedMask.setSignBit();

Change-Id: I5ac39bf06434cb4029ec8482bd54b2f756bb6af8
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. LGTM

@momchil-velikov momchil-velikov merged commit f92bfca into llvm:main Jul 5, 2024
7 checks passed
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
When building a vector which contains zero elements, the AArch64 ISel
replaces those elements with `undef`, if they are right shifted out.

However, these elements need to stay zero if the right shift is exact,
or otherwise we will be introducing undefined behavior.

Should allow llvm#92528 to be
recommitted.
momchil-velikov added a commit that referenced this pull request Jul 8, 2024
This re-commits d1a4f0c after
a issue was fixed in f92bfca
("[AArch64] All bits of an exact right shift are demanded (#97448)").
@@ -22142,6 +22142,10 @@ static SDValue performVectorShiftCombine(SDNode *N,
if (DCI.DAG.ComputeNumSignBits(Op.getOperand(0)) > ShiftImm)
return Op.getOperand(0);

// If the right shift is exact, the shifted out bits matter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps elaborate a bit on why the shifted out bits matter. E.g. "because we may fold the right shift with other instructions and thus require the lower bits to be set."

Now that gets me thinking. Wouldn't this condition than make code that does not fold the right shift with other instructions less efficient? Shouldn't the onus be on wherever folds right shift with sitofp (scvtf) to make sure to clear the lower bits?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Somehow didn't send this out before -- posting for posterity.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote that in the commit message

However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior.

@momchil-velikov momchil-velikov deleted the ashr-exact-demanded-bits branch November 13, 2024 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants