Skip to content

[SimplifyIndVar] Push more users to worklist for simplifyUsers #93598

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 5 additions & 6 deletions llvm/include/llvm/Transforms/Utils/SimplifyIndVar.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,12 +52,11 @@ class IVVisitor {
/// where the first entry indicates that the function makes changes and the
/// second entry indicates that it introduced new opportunities for loop
/// unswitching.
std::pair<bool, bool> simplifyUsersOfIV(PHINode *CurrIV, ScalarEvolution *SE,
DominatorTree *DT, LoopInfo *LI,
const TargetTransformInfo *TTI,
SmallVectorImpl<WeakTrackingVH> &Dead,
SCEVExpander &Rewriter,
IVVisitor *V = nullptr);
std::pair<bool, bool>
simplifyUsersOfIV(PHINode *CurrIV, ScalarEvolution *SE, DominatorTree *DT,
LoopInfo *LI, const TargetTransformInfo *TTI,
SmallVectorImpl<WeakTrackingVH> &Dead, SCEVExpander &Rewriter,
unsigned MaxDepthOutOfLoop = 1, IVVisitor *V = nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the MaxDepthOutOfLoop parameter? Can you move the cl::opt into SimplifyIndVar and directly use it instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transforms/Utils/LoopUnroll.cpp uses it via simplifyLoopIVs. I didn't want to add it here as it could create regressions in passes I'm not interested in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense. But as implemented, aren't you still calling this with the default of 1 from LoopUnroll? Shouldn't the default be 0 instead?

Could you please rebase over 6e3725d so we at least don't have to pass through this parameter everywhere?

Copy link
Contributor Author

@v01dXYZ v01dXYZ Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question you raise is actually quite important as this code is disabled with this current default config. I disabled it as I wanted to make it optimisation level dependent but I needed some input to calibrate it according to the level.

It is set to 1 which means the set of BBs that will be affected are dist(Loop, BB) < 1 => BB in Loop (the strict bound is on purpose). The 0 value is a special value for an +infinity bound.

I don't know if this encoding is satisfying. Another one is to use -1 but I don't know if cl::opt<unsigned> supports it.

+1 for making pushIVUsers a method instead of a function.

I think I'll put it to 0 now as it will show performance regressions and allow discussing about that. I'll add some tests before though.


/// SimplifyLoopIVs - Simplify users of induction variables within this
/// loop. This does not actually change or add IVs.
Expand Down
11 changes: 9 additions & 2 deletions llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,12 @@ static cl::opt<bool>
AllowIVWidening("indvars-widen-indvars", cl::Hidden, cl::init(true),
cl::desc("Allow widening of indvars to eliminate s/zext"));

static cl::opt<unsigned> MaxDepthOutOfLoop(
"indvars-max-depth-out-of-loop", cl::Hidden, cl::init(0),
cl::desc(
"Strict upper bound for the number of successive out-of-loop blocks "
"when traversing use-def chains. 0 enables full traversal"));

namespace {

class IndVarSimplify {
Expand Down Expand Up @@ -624,8 +630,9 @@ bool IndVarSimplify::simplifyAndExtend(Loop *L,
// Information about sign/zero extensions of CurrIV.
IndVarSimplifyVisitor Visitor(CurrIV, SE, TTI, DT);

const auto &[C, U] = simplifyUsersOfIV(CurrIV, SE, DT, LI, TTI, DeadInsts,
Rewriter, &Visitor);
const auto &[C, U] =
simplifyUsersOfIV(CurrIV, SE, DT, LI, TTI, DeadInsts, Rewriter,
MaxDepthOutOfLoop, &Visitor);

Changed |= C;
RunUnswitching |= U;
Expand Down
77 changes: 46 additions & 31 deletions llvm/lib/Transforms/Utils/SimplifyIndVar.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -62,13 +62,18 @@ namespace {
bool Changed = false;
bool RunUnswitching = false;

// When following the def-use chains, it can go outside the loop.
// Strict upper bound on number of traversed out-of-loop blocks.
unsigned MaxDepthOutOfLoop;

public:
SimplifyIndvar(Loop *Loop, ScalarEvolution *SE, DominatorTree *DT,
LoopInfo *LI, const TargetTransformInfo *TTI,
SCEVExpander &Rewriter,
SmallVectorImpl<WeakTrackingVH> &Dead)
SmallVectorImpl<WeakTrackingVH> &Dead,
unsigned MaxDepthOutOfLoop = 1)
: L(Loop), LI(LI), SE(SE), DT(DT), TTI(TTI), Rewriter(Rewriter),
DeadInsts(Dead) {
DeadInsts(Dead), MaxDepthOutOfLoop(MaxDepthOutOfLoop) {
assert(LI && "IV simplification requires LoopInfo");
}

Expand All @@ -80,10 +85,11 @@ namespace {
/// all simplifications to users of an IV.
void simplifyUsers(PHINode *CurrIV, IVVisitor *V = nullptr);

void pushIVUsers(Instruction *Def,
SmallPtrSet<Instruction *, 16> &Simplified,
SmallVectorImpl<std::pair<Instruction *, Instruction *>>
&SimpleIVUsers);
void pushIVUsers(
Instruction *Def, SmallPtrSet<Instruction *, 16> &Simplified,
SmallVectorImpl<std::tuple<Instruction *, Instruction *, unsigned>>
&SimpleIVUsers,
unsigned OutOfLoopChainCounter);

Value *foldIVUser(Instruction *UseInst, Instruction *IVOperand);

Expand Down Expand Up @@ -514,8 +520,8 @@ bool SimplifyIndvar::eliminateTrunc(TruncInst *TI) {
!DT->isReachableFromEntry(cast<Instruction>(U)->getParent()))
continue;
ICmpInst *ICI = dyn_cast<ICmpInst>(U);
if (!ICI) return false;
assert(L->contains(ICI->getParent()) && "LCSSA form broken?");
if (!ICI)
return false;
if (!(ICI->getOperand(0) == TI && L->isLoopInvariant(ICI->getOperand(1))) &&
!(ICI->getOperand(1) == TI && L->isLoopInvariant(ICI->getOperand(0))))
return false;
Expand Down Expand Up @@ -548,7 +554,7 @@ bool SimplifyIndvar::eliminateTrunc(TruncInst *TI) {
};
// Replace all comparisons against trunc with comparisons against IV.
for (auto *ICI : ICmpUsers) {
bool IsSwapped = L->isLoopInvariant(ICI->getOperand(0));
bool IsSwapped = ICI->getOperand(0) != TI;
auto *Op1 = IsSwapped ? ICI->getOperand(0) : ICI->getOperand(1);
IRBuilder<> Builder(ICI);
Value *Ext = nullptr;
Expand Down Expand Up @@ -846,7 +852,9 @@ bool SimplifyIndvar::strengthenRightShift(BinaryOperator *BO,
/// Add all uses of Def to the current IV's worklist.
void SimplifyIndvar::pushIVUsers(
Instruction *Def, SmallPtrSet<Instruction *, 16> &Simplified,
SmallVectorImpl<std::pair<Instruction *, Instruction *>> &SimpleIVUsers) {
SmallVectorImpl<std::tuple<Instruction *, Instruction *, unsigned>>
&SimpleIVUsers,
unsigned OutOfLoopChainCounter) {
for (User *U : Def->users()) {
Instruction *UI = cast<Instruction>(U);

Expand All @@ -857,16 +865,22 @@ void SimplifyIndvar::pushIVUsers(
if (UI == Def)
continue;

// Only change the current Loop, do not change the other parts (e.g. other
// Loops).
if (!L->contains(UI))
// Avoid adding Defs that SCEV expand to themselves, e.g. the LoopPhis
// of the outer loops.
if (!DT->dominates(L->getHeader(), UI->getParent()))
continue;

// Do not push the same instruction more than once.
if (!Simplified.insert(UI).second)
continue;

SimpleIVUsers.push_back(std::make_pair(UI, Def));
unsigned Counter =
L->contains(UI)
? 0 // reset depth if we go back inside the loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

If we want to compute a overapproximation of the distance in the CFG dist(Loop, BB), this helps to tighten the approximation to the real distance.

: OutOfLoopChainCounter + (UI->getParent() != Def->getParent());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a limit on the depth of the blocks rather than instructions? You can have a very long chain of instructions within a single block, or a very short one across blocks. It seems like that would be the more relevant quantity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I implemented a instruction-wise depth but I changed that as I visualised the algorithm at the CFG level, as accepting optimising basic blocks that are at given distance of the loop dist(Loop, BB) < limit.

Also, I didn't like the idea to have partially optimised basic-blocks.


if (!MaxDepthOutOfLoop || Counter < MaxDepthOutOfLoop)
SimpleIVUsers.push_back(std::make_tuple(UI, Def, Counter));
}
}

Expand Down Expand Up @@ -911,17 +925,17 @@ void SimplifyIndvar::simplifyUsers(PHINode *CurrIV, IVVisitor *V) {
SmallPtrSet<Instruction*,16> Simplified;

// Use-def pairs if IV users waiting to be processed for CurrIV.
SmallVector<std::pair<Instruction*, Instruction*>, 8> SimpleIVUsers;
SmallVector<std::tuple<Instruction *, Instruction *, unsigned>, 8>
SimpleIVUsers;

// Push users of the current LoopPhi. In rare cases, pushIVUsers may be
// called multiple times for the same LoopPhi. This is the proper thing to
// do for loop header phis that use each other.
pushIVUsers(CurrIV, Simplified, SimpleIVUsers);
pushIVUsers(CurrIV, Simplified, SimpleIVUsers, 0);

while (!SimpleIVUsers.empty()) {
std::pair<Instruction*, Instruction*> UseOper =
SimpleIVUsers.pop_back_val();
Instruction *UseInst = UseOper.first;
auto [UseInst, IVOperand, OutOfLoopChainCounter] =
SimpleIVUsers.pop_back_val();

// If a user of the IndVar is trivially dead, we prefer just to mark it dead
// rather than try to do some complex analysis or transformation (such as
Expand All @@ -945,11 +959,11 @@ void SimplifyIndvar::simplifyUsers(PHINode *CurrIV, IVVisitor *V) {
if ((isa<PtrToIntInst>(UseInst)) || (isa<TruncInst>(UseInst)))
for (Use &U : UseInst->uses()) {
Instruction *User = cast<Instruction>(U.getUser());
if (replaceIVUserWithLoopInvariant(User))
if (DT->dominates(L->getHeader(), User->getParent()) &&
replaceIVUserWithLoopInvariant(User))
break; // done replacing
}

Instruction *IVOperand = UseOper.second;
for (unsigned N = 0; IVOperand; ++N) {
assert(N <= Simplified.size() && "runaway iteration");
(void) N;
Expand All @@ -963,22 +977,23 @@ void SimplifyIndvar::simplifyUsers(PHINode *CurrIV, IVVisitor *V) {
continue;

if (eliminateIVUser(UseInst, IVOperand)) {
pushIVUsers(IVOperand, Simplified, SimpleIVUsers);
pushIVUsers(IVOperand, Simplified, SimpleIVUsers, OutOfLoopChainCounter);
continue;
}

if (BinaryOperator *BO = dyn_cast<BinaryOperator>(UseInst)) {
if (strengthenBinaryOp(BO, IVOperand)) {
// re-queue uses of the now modified binary operator and fall
// through to the checks that remain.
pushIVUsers(IVOperand, Simplified, SimpleIVUsers);
pushIVUsers(IVOperand, Simplified, SimpleIVUsers,
OutOfLoopChainCounter);
}
}

// Try to use integer induction for FPToSI of float induction directly.
if (replaceFloatIVWithIntegerIV(UseInst)) {
// Re-queue the potentially new direct uses of IVOperand.
pushIVUsers(IVOperand, Simplified, SimpleIVUsers);
pushIVUsers(IVOperand, Simplified, SimpleIVUsers, OutOfLoopChainCounter);
continue;
}

Expand All @@ -988,7 +1003,7 @@ void SimplifyIndvar::simplifyUsers(PHINode *CurrIV, IVVisitor *V) {
continue;
}
if (isSimpleIVUser(UseInst, L, SE)) {
pushIVUsers(UseInst, Simplified, SimpleIVUsers);
pushIVUsers(UseInst, Simplified, SimpleIVUsers, OutOfLoopChainCounter);
}
}
}
Expand All @@ -1002,13 +1017,13 @@ void IVVisitor::anchor() { }
/// Returns a pair where the first entry indicates that the function makes
/// changes and the second entry indicates that it introduced new opportunities
/// for loop unswitching.
std::pair<bool, bool> simplifyUsersOfIV(PHINode *CurrIV, ScalarEvolution *SE,
DominatorTree *DT, LoopInfo *LI,
const TargetTransformInfo *TTI,
SmallVectorImpl<WeakTrackingVH> &Dead,
SCEVExpander &Rewriter, IVVisitor *V) {
std::pair<bool, bool>
simplifyUsersOfIV(PHINode *CurrIV, ScalarEvolution *SE, DominatorTree *DT,
LoopInfo *LI, const TargetTransformInfo *TTI,
SmallVectorImpl<WeakTrackingVH> &Dead, SCEVExpander &Rewriter,
unsigned MaxDepthOutOfLoop, IVVisitor *V) {
SimplifyIndvar SIV(LI->getLoopFor(CurrIV->getParent()), SE, DT, LI, TTI,
Rewriter, Dead);
Rewriter, Dead, MaxDepthOutOfLoop);
SIV.simplifyUsers(CurrIV, V);
return {SIV.hasChanged(), SIV.runUnswitching()};
}
Expand Down
7 changes: 1 addition & 6 deletions llvm/test/Transforms/IndVarSimplify/X86/pr57187.ll
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,18 @@ define void @test(i32 %start) {
; CHECK-LABEL: @test(
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[START:%.*]], -1
; CHECK-NEXT: [[TMP1:%.*]] = zext i32 [[START]] to i64
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: backedge:
; CHECK-NEXT: br label [[LOOP]]
; CHECK: loop:
; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], [[BACKEDGE:%.*]] ], [ [[TMP1]], [[ENTRY:%.*]] ]
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], -1
; CHECK-NEXT: [[INDVARS:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
; CHECK-NEXT: [[LOOP_EXIT_COND:%.*]] = icmp slt i32 [[TMP0]], 11
; CHECK-NEXT: br i1 [[LOOP_EXIT_COND]], label [[EXIT:%.*]], label [[STUCK_PREHEADER:%.*]]
; CHECK: stuck.preheader:
; CHECK-NEXT: br label [[STUCK:%.*]]
; CHECK: exit:
; CHECK-NEXT: [[IV_NEXT_LCSSA:%.*]] = phi i32 [ [[INDVARS]], [[LOOP]] ]
; CHECK-NEXT: ret void
; CHECK: stuck:
; CHECK-NEXT: br i1 false, label [[BACKEDGE]], label [[STUCK]]
; CHECK-NEXT: br i1 false, label [[BACKEDGE:%.*]], label [[STUCK]]
;
entry:
br label %loop
Expand Down
2 changes: 0 additions & 2 deletions llvm/test/Transforms/IndVarSimplify/lcssa-preservation.ll
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,10 @@ define void @PR18642(i32 %x) {
; CHECK: outer.latch:
; CHECK-NEXT: br i1 false, label [[OUTER_HEADER]], label [[EXIT_LOOPEXIT1:%.*]]
; CHECK: exit.loopexit:
; CHECK-NEXT: [[INC_LCSSA:%.*]] = phi i32 [ -2147483648, [[INNER_LATCH]] ]
; CHECK-NEXT: br label [[EXIT:%.*]]
; CHECK: exit.loopexit1:
; CHECK-NEXT: br label [[EXIT]]
; CHECK: exit:
; CHECK-NEXT: [[EXIT_PHI:%.*]] = phi i32 [ [[INC_LCSSA]], [[EXIT_LOOPEXIT]] ], [ undef, [[EXIT_LOOPEXIT1]] ]
; CHECK-NEXT: ret void
;
entry:
Expand Down
5 changes: 0 additions & 5 deletions llvm/test/Transforms/IndVarSimplify/no-iv-rewrite.ll
Original file line number Diff line number Diff line change
Expand Up @@ -380,18 +380,13 @@ define i32 @isomorphic(i32 %init, i32 %step, i32 %lim) nounwind {
; CHECK-NEXT: [[J:%.*]] = phi i32 [ [[INIT]], [[ENTRY]] ], [ [[J_NEXT:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[II_NEXT]] = add i32 [[II]], [[STEP1]]
; CHECK-NEXT: [[J_NEXT]] = add i32 [[J]], [[STEP1]]
; CHECK-NEXT: [[L_STEP:%.*]] = add i32 [[J]], [[STEP]]
; CHECK-NEXT: [[CMP:%.*]] = icmp ne i32 [[II_NEXT]], [[LIM:%.*]]
; CHECK-NEXT: br i1 [[CMP]], label [[LOOP]], label [[RETURN:%.*]]
; CHECK: return:
; CHECK-NEXT: [[I_LCSSA:%.*]] = phi i32 [ [[J]], [[LOOP]] ]
; CHECK-NEXT: [[J_NEXT_LCSSA:%.*]] = phi i32 [ [[J_NEXT]], [[LOOP]] ]
; CHECK-NEXT: [[K_NEXT_LCSSA:%.*]] = phi i32 [ [[II_NEXT]], [[LOOP]] ]
; CHECK-NEXT: [[L_STEP_LCSSA:%.*]] = phi i32 [ [[L_STEP]], [[LOOP]] ]
; CHECK-NEXT: [[L_NEXT_LCSSA:%.*]] = phi i32 [ [[J_NEXT]], [[LOOP]] ]
; CHECK-NEXT: [[SUM1:%.*]] = add i32 [[I_LCSSA]], [[J_NEXT_LCSSA]]
; CHECK-NEXT: [[SUM2:%.*]] = add i32 [[SUM1]], [[K_NEXT_LCSSA]]
; CHECK-NEXT: [[SUM3:%.*]] = add i32 [[SUM1]], [[L_STEP_LCSSA]]
; CHECK-NEXT: [[SUM4:%.*]] = add i32 [[SUM1]], [[L_NEXT_LCSSA]]
; CHECK-NEXT: ret i32 [[SUM4]]
;
Expand Down
Loading
Loading