-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[InstCombine] Replace an integer comparison of a phi
node with multiple ucmp
/scmp
operands and a constant with phi
of individual comparisons of original intrinsic's arguments
#107769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Volodymyr Vasylkun (Poseydon42) ChangesWhen we have a This should help with one of the regressions that's blocking #107314 There's a high chance that other similar patterns can be identified, in which case they can be easily handled by the same code by moving the check for "simplifiable" instructions into a lambda. Full diff: https://github.com/llvm/llvm-project/pull/107769.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
index 8195e0539305cc..4c27359752002c 100644
--- a/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
@@ -1809,6 +1809,7 @@ Instruction *InstCombinerImpl::foldOpIntoPhi(Instruction &I, PHINode *PN) {
// Check to see whether the instruction can be folded into each phi operand.
// If there is one operand that does not fold, remember the BB it is in.
SmallVector<Value *> NewPhiValues;
+ SmallVector<unsigned int> OpsToMoveUseTo;
BasicBlock *NonSimplifiedBB = nullptr;
Value *NonSimplifiedInVal = nullptr;
for (unsigned i = 0; i != NumPHIValues; ++i) {
@@ -1820,6 +1821,16 @@ Instruction *InstCombinerImpl::foldOpIntoPhi(Instruction &I, PHINode *PN) {
continue;
}
+ // If the only use of phi is comparing it with a constant then we can
+ // put this comparison in the incoming BB directly after a ucmp/scmp call
+ // because we know that it will simplify to a single icmp.
+ if (isa<CmpIntrinsic>(InVal) &&
+ match(&I, m_c_ICmp(m_Specific(PN), m_Constant()))) {
+ OpsToMoveUseTo.push_back(i);
+ NewPhiValues.push_back(nullptr);
+ continue;
+ }
+
if (NonSimplifiedBB) return nullptr; // More than one non-simplified value.
NonSimplifiedBB = InBB;
@@ -1851,6 +1862,22 @@ Instruction *InstCombinerImpl::foldOpIntoPhi(Instruction &I, PHINode *PN) {
return nullptr;
}
+ // Clone the instruction that uses the phi node and move it into the incoming
+ // BB because we know that the next iteration of InstCombine will simplify it.
+ for (auto OpIndex : OpsToMoveUseTo) {
+ Instruction *Clone = I.clone();
+ Value *Op = PN->getIncomingValue(OpIndex);
+ BasicBlock *OpBB = PN->getIncomingBlock(OpIndex);
+ for (Use &U : Clone->operands()) {
+ if (U == PN)
+ U = Op;
+ else
+ U = U->DoPHITranslation(PN->getParent(), OpBB);
+ }
+ Clone = InsertNewInstBefore(Clone, OpBB->getTerminator()->getIterator());
+ NewPhiValues[OpIndex] = Clone;
+ }
+
// Okay, we can do the transformation: create the new PHI node.
PHINode *NewPN = PHINode::Create(I.getType(), PN->getNumIncomingValues());
InsertNewInstBefore(NewPN, PN->getIterator());
diff --git a/llvm/test/Transforms/InstCombine/phi-with-multiple-unsimplifiable-values.ll b/llvm/test/Transforms/InstCombine/phi-with-multiple-unsimplifiable-values.ll
new file mode 100644
index 00000000000000..b91d8fc6e05705
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/phi-with-multiple-unsimplifiable-values.ll
@@ -0,0 +1,32 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=instcombine -S | FileCheck %s
+
+define i1 @icmp_of_phi_of_scmp_with_constant(i1 %c, i16 %x, i16 %y)
+; CHECK-LABEL: define i1 @icmp_of_phi_of_scmp_with_constant(
+; CHECK-SAME: i1 [[C:%.*]], i16 [[X:%.*]], i16 [[Y:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: br i1 [[C]], label %[[TRUE:.*]], label %[[FALSE:.*]]
+; CHECK: [[TRUE]]:
+; CHECK-NEXT: [[TMP0:%.*]] = icmp slt i16 [[X]], [[Y]]
+; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK: [[FALSE]]:
+; CHECK-NEXT: [[TMP1:%.*]] = icmp slt i16 [[Y]], [[X]]
+; CHECK-NEXT: br label %[[EXIT]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: [[R:%.*]] = phi i1 [ [[TMP0]], %[[TRUE]] ], [ [[TMP1]], %[[FALSE]] ]
+; CHECK-NEXT: ret i1 [[R]]
+;
+{
+entry:
+ br i1 %c, label %true, label %false
+true:
+ %cmp1 = call i8 @llvm.scmp(i16 %x, i16 %y)
+ br label %exit
+false:
+ %cmp2 = call i8 @llvm.scmp(i16 %y, i16 %x)
+ br label %exit
+exit:
+ %phi = phi i8 [%cmp1, %true], [%cmp2, %false]
+ %r = icmp slt i8 %phi, 0
+ ret i1 %r
+}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also add negative tests where one scmp is not one-use or the icmp operand is not constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case you're proposing would still actually enable this optimization since we can still move the icmp
into every incoming BB, and only one of the scmp
s will not be removed due to it being used more than once (in this case the BB with scmp
becomes what was previously NonSimplifiedBB
). I think it's better to add a negative test where both scmp
s are not one-use, since in that case the optimization wouldn't be trivially profitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
Please wait for additional approval from other reviewers :)
exit: | ||
%phi = phi i8 [%cmp1, %true], [%cmp2, %false] | ||
%r = icmp slt i8 %phi, 0 | ||
ret i1 %r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any tests where only 1 of the incoming values is an {s,u}cmp
intrin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Worth noting that as implemented, the one-use check is also load-bearing for correctness, not just profitability. Otherwise we'd have to make sure we handle multi-edges correctly.
I don't think I quite understand the problem, can you give an example of a test case where removing the one-use check would cause issues? |
Something like this: https://llvm.godbolt.org/z/zEfdK4b3W The tricky part is that a predecessors occurs two times in the phi, and the value must be the same for both. So if we clone the instruction two times, we'd get a verifier error. To support this case we would need to make sure that we only clone once per duplicate predecessor. With the one-use limitation, this is a problem we can ignore for now. |
Thank you. I'll add a comment mentioning that so there's no confusion in the future. Apart from that the patch is OK to merge, right? |
Yes, the patch looks good to me! |
…y allowing the phi node to use the result of `ucmp`/`scmp` more than once (#109593) This extends the optimisation implemented in #107769 by relaxing the condtions to make it happen. Now, the value produced by `ucmp`/`scmp` doesn't need to be one-use, but only one-user, meaning it can be present in a single phi node more than once.
When we have a
phi
instruction with more than one of its incoming values being a call toucmp
orscmp
, which is then compared with an integer constant, we can move the comparison through thephi
into the incoming basic blocks because we know that a comparison ofucmp
/scmp
with a constant will be simplified by the next iteration of InstCombine.This should help with one of the regressions that's blocking #107314
There's a high chance that other similar patterns can be identified, in which case they can be easily handled by the same code by moving the check for "simplifiable" instructions into a lambda.