Description
Continuation of #93020. During the .NET 9 development cycle, we removed much of the JIT flowgraph implementation's implicit fall-through invariants, and introduced a new block layout strategy based on a reverse post-order traversal of the graph. For .NET 10, we'd like to push this work further in both directions, with the ultimate goals of zero dependence on lexical block ordering in the JIT's frontend, and a global cost-optimizing layout algorithm in the JIT's backend. Below is an early estimate of what each item entails:
Flowgraph Modernization
- Move block layout to the backend, after lowering/LSRA. At this point, we know the JIT won't introduce new blocks, so one reordering pass should be sufficient. Note that the flowgraph transformation steps in
fgUpdateFlowGraph
that we usually run in conjunction with layout aren't designed to run after lowering, so we will likely need to decouple flow opts from layout to make this work. - Ensure backend phases aren't sensitive to block ordering before layout. In particular, LSRA uses its own traversal logic for visiting blocks. Modifying this traversal logic to be agnostic to lexical ordering may facilitate moving layout to after LSRA.
- Remove premature ordering logic during basic block creation. Block creation helpers like
fgNewBBinRegion
may search the block list for insertion points that won't break up existing fall-through. In the JIT frontend, it should make no difference to optimization potential if we just insert new blocks at the end of the list, or at the end of an EH region. Doing this work early should help expose frontend phases that are still sensitive to lexical block ordering (see next task). - Refactor frontend phases to not depend on lexical block ordering. As of writing, we know a few phases that should be graph-based:
- Loop inversion
- Switch recognition (Switch recognition should not be lexical #107076)
- Flowgraph simplification
-
optSetBlockWeights
(see Profile Data section)
- Consider strategies for reducing the burden of flow edge predecessor iteration when transforming the flowgraph. For example, compacting chains of blocks duplicates the effort of redirecting each block's predecessors in each compaction (see comment for pathological scenario).
- Continue to remove premature checks for fall-through behavior. The removal of the
BBJ_NONE
block type left behind breadcrumbs in various phases that we ought to clean up, now that we can model flow explicitly. - Consider enforcing stronger flowgraph invariants (such as no uncompacted blocks) between phases to reduce the burden of work on
fgUpdateFlowGraph
. - Continue deferred .NET 9 items
Block layout
Ideally, the below items get us to a state where block layout produces the "best" ordering it can, given the profile data it has on-hand. If the layout is subpar due to missing/inconsistent profile data, we can at least eliminate the layout strategy as the culprit.
- Implement 3-opt pass on top of the RPO-based layout, modeling layout cost with edge weights
- Consider modeling cost of (un)conditional and forward/backward branches in layout cost for 3-opt
- Consider how 3-opt's layout decisions may affect hot/cold splitting
- Consider how we can achieve acceptable throughput, while running for enough iterations to achieve near-optimal layout
- Continued deferred .NET 9 items
Profile Maintenance
- Continue expanding profile consistency checks through the JIT's frontend. Currently, we bail after inlining.
- JIT: Continue profile consistency checks until after finally cloning #109792
- JIT: Enable profile consistency checking up to morph #111047
- JIT: Move profile consistency checks to after morph #111253
- JIT: Move profile consistency checks to after loop opts #111285
- JIT: Enable profile consistency checks throughout JIT frontend #111498
- JIT: Check for profile consistency throughout JIT backend #111684
- Consider replacing
optSetBlockWeights
with the new profile synthesis implementation. The former frequently produces nonsensical weights for loops, as it relies on a lexical traversal of the block list to identify loops. Fixing this may improveJitOptRepeat
performance. - Consider running profile synthesis right before layout.
- Allow profile data to override the JIT's heuristics more explicitly. For example, if profile data suggests a
BBJ_THROW
block is hot, then order it as such (this particular example is not as perf-sensitive, though).- Enforcing profile consistency checks seems to have fixed this. The
BBJ_THROW
example in particular was largely handled by JIT: Move profile consistency checks to after morph #111253.
- Enforcing profile consistency checks seems to have fixed this. The
cc @dotnet/jit-contrib, @AndyAyersMS
Metadata
Metadata
Assignees
Labels
Type
Projects
Status