Skip to content

Commit 955a681

Browse files
Generalize loop pre-header creation and loop hoisting (#62560)
* Generalize loop pre-header creation and loop hoisting A loop pre-header is a block that flows directly (and only) to the loop entry block. The loop pre-header is the only non-loop predecessor of the entry block. Loop invariant code can be hoisted to the loop pre-header where it is guaranteed to be executed just once (per loop entry). Currently, a loop pre-header has a number of restrictions: - it is only created for a "do-while" (top entry) loop, not for a mid-loop entry. - it isn't created if the current loop head and the loop entry block are in different EH try regions Additionally, partially due those restrictions, loop hoisting has restrictions: - it requires a "do-while" (top entry) loop - it requires the existing `head` block to dominate the loop entry block - it requires the existing `head` block to be in the same EH region as the entry block - it won't hoist if the `entry` block is the first block of a handler This change removes all these restrictions. Previously, even if we did create a pre-header, the definition of a pre-header was a little weaker: an entry predecessor could be a non-loop block and also not the pre-header, if the predecessor was dominated by the entry block. This is more complicated to reason about, so I change the pre-header creation to force entry block non-loop predecessors to branch to the pre-header instead. This case only rarely occurs, when we have what looks like an outer loop back edge but the natural loop recognition package doesn't recognize it as an outer loop. I added a "stress mode" to always create a loop pre-header immediately after loop recognition. This is disabled currently because loop cloning doesn't respect the special status and invariants of a pre-header, and so inserts all the cloning conditions and copied blocks after the pre-header, triggering new loop structure asserts. This should be improved in the future. A lot more checking of the loop table and loop annotations on blocks has been added. This revealed a number of problems with loop unrolling leaving things in a bad state for downstream phases. Loop unrolling has been updated to fix this, in particular, the loop table is rebuilt if it is detected that we unroll a loop that contains nested loops, since there will now be multiple copies of those nested loops. This is the first case where we might rebuild the loop table, so it lays the groundwork for potentially rebuilding the loop table in other cases, such as after loop cloning where we don't add the "slow path" loops to the table. There is some code refactoring to simplify the "find loops" code as well. Some change details: - `optSetBlockWeights` is elevated to a "phase" that runs prior to loop recognition. - LoopFlags is simplified: - LPFLG_DO_WHILE is removed; call `lpIsTopEntry` instead - LPFLG_ONE_EXIT is removed; check `lpExitCnt == 1` instead - LPFLG_HOISTABLE is removed (there are no restrictions anymore) - LPFLG_CONST is removed: check `lpFlags & (LPFLG_CONST_INIT | LPFLG_CONST_LIMIT) == (LPFLG_CONST_INIT | LPFLG_CONST_LIMIT)` instead (only used in one place - bool lpContainsCall is removed and replaced by LPFLG_CONTAINS_CALL - Added a `lpInitBlock` field to the loop table. For constant and variable initialization loops, code assumed that these expressions existed in the `head` block. This isn't true anymore if we insert a pre-header block. So, capture the block where these actually exist when we determine that they do exist, and explicitly use this block pointer where needed. - Added `fgComputeReturnBlocks()` to extract this code out of `fgComputeReachability` into a function - Added `optFindAndScaleGeneralLoopBlocks()` to extract this out of loop recognition to its own function. - Added `optResetLoopInfo()` to reset the loop table and block annotations related to loops - Added `fgDebugCheckBBNumIncreasing()` to allow asserting that the bbNum order of blocks is increasing. This should be used in phases that depend on this order to do bbNum comparisons. - Add a lot more loop table validation in `fgDebugCheckLoopTable()` * Inline fgBuildBlockNumMap to allow using _alloca * Fix BBJ_SWITCH output 1. Change `dspSuccs()` to not call code that will call `GetDescriptorForSwitch()` 2. Change `GetDescriptorForSwitch()` to use the correct max block number while inlining. We probably don't or shouldn't call GetDescriptorForSwitch while inlining, especially after (1), but this change doesn't hurt. * Remove incorrect assertion There was an assertion when considering an existing `head` block as a potential pre-header, that the `entry` block have the `head` block as a predecessor. However, we early exit if we find a non-head, non-loop edge. This could happen before we encounter the `head` block, making the assert incorrect. We don't want to run the entire loop just for the purpose of the assert (at least not here), so just remove the assert. * Formatting * Use `_alloca` instead of `alloca` name * Convert fgBBNumMax usage Change: ``` compIsForInlining() ? impInlineInfo->InlinerCompiler->fgBBNumMax : fgBBNumMax ``` to: ``` impInlineRoot()->fgBBNumMax ``` * Code review feedback 1. Added loop epoch concept. Currently set but never checked. 2. Added disabled assert about return blocks always being moved out of line of loop body. 3. Fixed bug checking `totalIter` after it was decremented to zero as a loop variable 4. Added more comments on incremental loop block `bbNatLoopNum` setting. * Add EH condition when converting head to pre-header When considering converting an existing loop `head` block to a pre-header, verify that the block has the same EH try region that we would create for a new pre-header. If not, we go ahead and create the new pre-header.
1 parent c3683b0 commit 955a681

20 files changed

+1147
-460
lines changed

src/coreclr/jit/block.cpp

Lines changed: 39 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -575,18 +575,49 @@ unsigned BasicBlock::dspCheapPreds()
575575
return count;
576576
}
577577

578-
/*****************************************************************************
579-
*
580-
* Display the basic block successors.
581-
*/
582-
578+
//------------------------------------------------------------------------
579+
// dspSuccs: Display the basic block successors.
580+
//
581+
// Arguments:
582+
// compiler - compiler instance; passed to NumSucc(Compiler*) -- see that function for implications.
583+
//
583584
void BasicBlock::dspSuccs(Compiler* compiler)
584585
{
585586
bool first = true;
586-
for (BasicBlock* const succ : Succs(compiler))
587+
588+
// If this is a switch, we don't want to call `Succs(Compiler*)` because it will eventually call
589+
// `GetSwitchDescMap()`, and that will have the side-effect of allocating the unique switch descriptor map
590+
// and/or compute this switch block's unique succ set if it is not present. Debug output functions should
591+
// never have an effect on codegen. We also don't want to assume the unique succ set is accurate, so we
592+
// compute it ourselves here.
593+
if (bbJumpKind == BBJ_SWITCH)
594+
{
595+
// Create a set with all the successors. Don't use BlockSet, so we don't need to worry
596+
// about the BlockSet epoch.
597+
unsigned bbNumMax = compiler->impInlineRoot()->fgBBNumMax;
598+
BitVecTraits bitVecTraits(bbNumMax + 1, compiler);
599+
BitVec uniqueSuccBlocks(BitVecOps::MakeEmpty(&bitVecTraits));
600+
for (BasicBlock* const bTarget : SwitchTargets())
601+
{
602+
BitVecOps::AddElemD(&bitVecTraits, uniqueSuccBlocks, bTarget->bbNum);
603+
}
604+
BitVecOps::Iter iter(&bitVecTraits, uniqueSuccBlocks);
605+
unsigned bbNum = 0;
606+
while (iter.NextElem(&bbNum))
607+
{
608+
// Note that we will output switch successors in increasing numerical bbNum order, which is
609+
// not related to their order in the bbJumpSwt->bbsDstTab table.
610+
printf("%s" FMT_BB, first ? "" : ",", bbNum);
611+
first = false;
612+
}
613+
}
614+
else
587615
{
588-
printf("%s" FMT_BB, first ? "" : ",", succ->bbNum);
589-
first = false;
616+
for (BasicBlock* const succ : Succs(compiler))
617+
{
618+
printf("%s" FMT_BB, first ? "" : ",", succ->bbNum);
619+
first = false;
620+
}
590621
}
591622
}
592623

src/coreclr/jit/block.h

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -559,7 +559,7 @@ enum BasicBlockFlags : unsigned __int64
559559

560560
// Flags that relate blocks to loop structure.
561561

562-
BBF_LOOP_FLAGS = BBF_LOOP_PREHEADER | BBF_LOOP_HEAD | BBF_LOOP_CALL0 | BBF_LOOP_CALL1,
562+
BBF_LOOP_FLAGS = BBF_LOOP_PREHEADER | BBF_LOOP_HEAD | BBF_LOOP_CALL0 | BBF_LOOP_CALL1 | BBF_LOOP_ALIGN,
563563

564564
// Flags to update when two blocks are compacted
565565

@@ -1001,6 +1001,16 @@ struct BasicBlock : private LIR::Range
10011001
bbHndIndex = from->bbHndIndex;
10021002
}
10031003

1004+
void copyTryIndex(const BasicBlock* from)
1005+
{
1006+
bbTryIndex = from->bbTryIndex;
1007+
}
1008+
1009+
void copyHndIndex(const BasicBlock* from)
1010+
{
1011+
bbHndIndex = from->bbHndIndex;
1012+
}
1013+
10041014
static bool sameTryRegion(const BasicBlock* blk1, const BasicBlock* blk2)
10051015
{
10061016
return blk1->bbTryIndex == blk2->bbTryIndex;

src/coreclr/jit/clrjit.natvis

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ Documentation for VS debugger format specifiers: https://docs.microsoft.com/en-u
2222

2323
<Type Name="BasicBlock">
2424
<DisplayString Condition="bbJumpKind==BBJ_COND || bbJumpKind==BBJ_ALWAYS || bbJumpKind==BBJ_LEAVE || bbJumpKind==BBJ_EHCATCHRET || bbJumpKind==BBJ_CALLFINALLY">BB{bbNum,d}->BB{bbJumpDest->bbNum,d}; {bbJumpKind,en}</DisplayString>
25+
<DisplayString Condition="bbJumpKind==BBJ_SWITCH">BB{bbNum,d}; {bbJumpKind,en}; {bbJumpSwt->bbsCount} cases</DisplayString>
2526
<DisplayString>BB{bbNum,d}; {bbJumpKind,en}</DisplayString>
2627
</Type>
2728

src/coreclr/jit/compiler.cpp

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4898,10 +4898,14 @@ void Compiler::compCompile(void** methodCodePtr, uint32_t* methodCodeSize, JitFl
48984898
//
48994899
DoPhase(this, PHASE_COMPUTE_REACHABILITY, &Compiler::fgComputeReachability);
49004900

4901+
// Scale block weights and mark run rarely blocks.
4902+
//
4903+
DoPhase(this, PHASE_SET_BLOCK_WEIGHTS, &Compiler::optSetBlockWeights);
4904+
49014905
// Discover and classify natural loops (e.g. mark iterative loops as such). Also marks loop blocks
49024906
// and sets bbWeight to the loop nesting levels.
49034907
//
4904-
DoPhase(this, PHASE_FIND_LOOPS, &Compiler::optFindLoops);
4908+
DoPhase(this, PHASE_FIND_LOOPS, &Compiler::optFindLoopsPhase);
49054909

49064910
// Clone loops with optimization opportunities, and choose one based on dynamic condition evaluation.
49074911
//
@@ -5444,20 +5448,16 @@ void Compiler::ResetOptAnnotations()
54445448
// The intent of this method is to update loop structure annotations, and those
54455449
// they depend on; these annotations may have become stale during optimization,
54465450
// and need to be up-to-date before running another iteration of optimizations.
5447-
5451+
//
54485452
void Compiler::RecomputeLoopInfo()
54495453
{
54505454
assert(opts.optRepeat);
54515455
assert(JitConfig.JitOptRepeatCount() > 0);
54525456
// Recompute reachability sets, dominators, and loops.
5453-
optLoopCount = 0;
5457+
optResetLoopInfo();
54545458
fgDomsComputed = false;
5455-
for (BasicBlock* const block : Blocks())
5456-
{
5457-
block->bbFlags &= ~BBF_LOOP_FLAGS;
5458-
block->bbNatLoopNum = BasicBlock::NOT_IN_LOOP;
5459-
}
54605459
fgComputeReachability();
5460+
optSetBlockWeights();
54615461
// Rebuild the loop tree annotations themselves
54625462
optFindLoops();
54635463
}

0 commit comments

Comments
 (0)