-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[SimplifyCFG] Add optimization for switches of powers of two #70977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-llvm-transforms Author: Daniil (DKay7) ChangesOptimization reduces range for switches which cases are positive powers of two by replacing each case with count_trailing_zero(case). Resolves #70756 Full diff: https://github.com/llvm/llvm-project/pull/70977.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
index 68b5b1a78a3460e..5e44015e1e93595 100644
--- a/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
+++ b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp
@@ -50,6 +50,7 @@
#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/Metadata.h"
@@ -6792,9 +6793,6 @@ static bool ReduceSwitchRange(SwitchInst *SI, IRBuilder<> &Builder,
// This transform can be done speculatively because it is so cheap - it
// results in a single rotate operation being inserted.
- // FIXME: It's possible that optimizing a switch on powers of two might also
- // be beneficial - flag values are often powers of two and we could use a CLZ
- // as the key function.
// countTrailingZeros(0) returns 64. As Values is guaranteed to have more than
// one element and LLVM disallows duplicate cases, Shift is guaranteed to be
@@ -6839,6 +6837,75 @@ static bool ReduceSwitchRange(SwitchInst *SI, IRBuilder<> &Builder,
return true;
}
+static bool isSwitchOfPowersOfTwo(ArrayRef<int64_t> Values) {
+ for (auto &Value : Values) {
+ if (Value <= 0 || (Value & (Value - 1)) != 0)
+ return false;
+ }
+
+ return true;
+}
+
+static bool simplifySwitchOfPowersOfTwo(SwitchInst *SI, IRBuilder<> &Builder,
+ const DataLayout &DL) {
+
+ auto *CondTy = cast<IntegerType>(SI->getCondition()->getType());
+
+ if (CondTy->getIntegerBitWidth() > 64 ||
+ !DL.fitsInLegalInteger(CondTy->getIntegerBitWidth()))
+ return false;
+
+ // Only bother with this optimization if there are more than 3 switch cases;
+ // SDAG will only bother creating jump tables for 4 or more cases.
+ if (SI->getNumCases() < 4)
+ return false;
+
+ SmallVector<int64_t, 4> Values;
+ for (const auto &Case : SI->cases())
+ Values.push_back(Case.getCaseValue()->getValue().getSExtValue());
+
+ if (!isSwitchOfPowersOfTwo(Values))
+ return false;
+
+ Builder.SetInsertPoint(SI);
+
+ auto *Condition = SI->getCondition();
+ auto &Context = SI->getContext();
+ Function *Cttz =
+ Intrinsic::getDeclaration(SI->getModule(), Intrinsic::cttz, {CondTy});
+
+ // FIXME we can also perform this optimization only for switches with
+ // unreachable default case.
+ // This assumtion will save us from checking if `Condition` is a power of two
+
+ // checking if switch condition is a power of two. If not, just set its number
+ // of trailing zeros to 0 since `case 0` is forbidden it will lead to jumping
+ // to default case
+ // condition & (condition - 1) result in zero if condition is a power of two
+ auto *Sub = Builder.CreateSub(Condition, ConstantInt::get(CondTy, 1));
+ auto *And = Builder.CreateAnd(Condition, Sub);
+ auto *Cmp = Builder.CreateICmpNE(And, ConstantInt::get(CondTy, 0));
+
+ // if condition is not a power of two, we add 1 to it to make set its number
+ // of trailing zeros to 0. and case 0 is forbidden in this optimization.
+ auto *Add = Builder.CreateOr(Condition, Builder.CreateZExt(Cmp, CondTy));
+
+ // FIXME maybe we should check if cttz intrinsic is cheap on the target
+ // architecture
+ auto *ResultTrailingZeros = Builder.CreateCall(
+ Cttz, {Add, ConstantInt::get(Type::getInt1Ty(Context), 0)});
+ SI->replaceUsesOfWith(Condition, ResultTrailingZeros);
+
+ // Replace each case with its trailing zeros number
+ for (auto &Case : SI->cases()) {
+ auto *OrigValue = Case.getCaseValue();
+ Case.setValue(cast<ConstantInt>(ConstantInt::get(
+ OrigValue->getType(), OrigValue->getValue().countr_zero())));
+ }
+
+ return true;
+}
+
bool SimplifyCFGOpt::simplifySwitch(SwitchInst *SI, IRBuilder<> &Builder) {
BasicBlock *BB = SI->getParent();
@@ -6886,6 +6953,9 @@ bool SimplifyCFGOpt::simplifySwitch(SwitchInst *SI, IRBuilder<> &Builder) {
SwitchToLookupTable(SI, Builder, DTU, DL, TTI))
return requestResimplify();
+ if (simplifySwitchOfPowersOfTwo(SI, Builder, DL))
+ return requestResimplify();
+
if (ReduceSwitchRange(SI, Builder, DL, TTI))
return requestResimplify();
|
Please add some regression tests (including negative tests, e.g., switch with non-unreachable default case). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add one positive test case first.
Failed Tests (1):
--
| LLVM :: CodeGen/AArch64/switch-unreachable-default.ll
I added all your suggestions and also wrote a test with both positive and negative cases. But what should I do with a failed test in the pipeline? Thanks. |
Possibly you could change |
Well, it doesn't help for
And proposed optimization is able to lower the entire switch to one jump table. There's one way to fix it: I can replace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this transform be part of SwitchToLookupTable()? I'm not sure it makes sense to do this independently, if we can't form a lookup table.
I decided to create separated function to allow another optimizations such as |
|
||
// Only bother with this optimization if there are more than 3 switch cases; | ||
// SDAG will only bother creating jump tables for 4 or more cases. | ||
if (SI->getNumCases() < 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that AArch64 started using a larger limit very recently (#71166). But I think they can add a TTI hook for this if it becomes a problem for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will post a patch later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation looks good, but I think some paths aren't covered by tests yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you!
Please wait for additional approval from other reviewers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well.
Optimization reduces range for switches which cases are positive powers of two by replacing each case with count_trailing_zero(case). Also, this optimization is performed only for switches with default case unreachable Resolves #70756
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I will merge this PR after it passes the CI.
) Optimization reduces the range for switches whose cases are positive powers of two by replacing each case with count_trailing_zero(case). Resolves llvm#70756
) Optimization reduces the range for switches whose cases are positive powers of two by replacing each case with count_trailing_zero(case). Resolves llvm#70756
Optimization reduces range for switches which cases are positive powers of two by replacing each case with count_trailing_zero(case).
Resolves #70756