-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[AArch64] All bits of an exact right shift are demanded #97448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AArch64] All bits of an exact right shift are demanded #97448
Conversation
When building a vector which contains zero elements, the AArch64 ISel replaces those elements with `undef`, if they are right shifted out. However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior. Change-Id: I8d7eb1964ebbe88a90568be7805a29a72edd89e1
@llvm/pr-subscribers-backend-aarch64 Author: Momchil Velikov (momchil-velikov) ChangesWhen building a vector which contains zero elements, the AArch64 ISel replaces those elements with However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior. Should allow #92528 to be recommitted. Full diff: https://github.com/llvm/llvm-project/pull/97448.diff 2 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index e0c3cc5eddb82..c7b2a21e6ed58 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -22142,6 +22142,10 @@ static SDValue performVectorShiftCombine(SDNode *N,
if (DCI.DAG.ComputeNumSignBits(Op.getOperand(0)) > ShiftImm)
return Op.getOperand(0);
+ // If the right shift is exact, the shifted out bits matter.
+ if (N->getOpcode() == AArch64ISD::VASHR && N->getFlags().hasExact())
+ return SDValue();
+
APInt ShiftedOutBits = APInt::getLowBitsSet(OpScalarSize, ShiftImm);
APInt DemandedMask = ~ShiftedOutBits;
diff --git a/llvm/test/CodeGen/AArch64/ashr-exact-demanded-bits.ll b/llvm/test/CodeGen/AArch64/ashr-exact-demanded-bits.ll
new file mode 100644
index 0000000000000..9f877dffe1ab5
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/ashr-exact-demanded-bits.ll
@@ -0,0 +1,20 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s | FileCheck %s
+target triple = "aarch64-linux"
+
+define <2 x float> @f(i8 %0, i8 %1) {
+; CHECK-LABEL: f:
+; CHECK: // %bb.0:
+; CHECK-NEXT: movi v0.2d, #0000000000000000
+; CHECK-NEXT: mov v0.b[3], w0
+; CHECK-NEXT: mov v0.b[7], w1
+; CHECK-NEXT: scvtf v0.2s, v0.2s, #24
+; CHECK-NEXT: ret
+ %3 = insertelement <2 x i8> poison, i8 %0, i64 0
+ %4 = insertelement <2 x i8> %3, i8 %1, i64 1
+ %5 = shufflevector <2 x i8> %4, <2 x i8> <i8 0, i8 poison>, <8 x i32> <i32 2, i32 2, i32 2, i32 0, i32 2, i32 2, i32 2, i32 1>
+ %6 = bitcast <8 x i8> %5 to <2 x i32>
+ %7 = ashr exact <2 x i32> %6, <i32 24, i32 24>
+ %8 = sitofp <2 x i32> %7 to <2 x float>
+ ret <2 x float> %8
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.. because we need the bits to be 0's. This sounds good, but should we also apply this to VLSHR too? The code for SRA looks like this (but we are not using the input DemandBits here), and SRL has the hasExact part.
// If the shift is exact, then it does demand the low bits (and knows that
// they are zero).
if (Op->getFlags().hasExact())
InDemandedMask.setLowBits(ShAmt);
// If any of the demanded bits are produced by the sign extension, we also
// demand the input sign bit.
if (DemandedBits.countl_zero() < ShAmt)
InDemandedMask.setSignBit();
Change-Id: I5ac39bf06434cb4029ec8482bd54b2f756bb6af8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM
When building a vector which contains zero elements, the AArch64 ISel replaces those elements with `undef`, if they are right shifted out. However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior. Should allow llvm#92528 to be recommitted.
@@ -22142,6 +22142,10 @@ static SDValue performVectorShiftCombine(SDNode *N, | |||
if (DCI.DAG.ComputeNumSignBits(Op.getOperand(0)) > ShiftImm) | |||
return Op.getOperand(0); | |||
|
|||
// If the right shift is exact, the shifted out bits matter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps elaborate a bit on why the shifted out bits matter. E.g. "because we may fold the right shift with other instructions and thus require the lower bits to be set."
Now that gets me thinking. Wouldn't this condition than make code that does not fold the right shift with other instructions less efficient? Shouldn't the onus be on wherever folds right shift with sitofp (scvtf) to make sure to clear the lower bits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Somehow didn't send this out before -- posting for posterity.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote that in the commit message
However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior.
When building a vector which contains zero elements, the AArch64 ISel replaces those elements with
undef
, if they are right shifted out.However, these elements need to stay zero if the right shift is exact, or otherwise we will be introducing undefined behavior.
Should allow #92528 to be recommitted.