-
Notifications
You must be signed in to change notification settings - Fork 793
[Branch Hints] Add a utility to compare with metadata, and use it in merging opt passes #7733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
;; CHECK-NEXT: (f32.const 0) | ||
;; CHECK-NEXT: ) | ||
(func $different (param $x i32) (param $y i32) (result f32) | ||
;; The branch hints differ, so we do not optimize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to merge and drop the hint in this case. If the only benefit of branch hints is that cold code can be placed far away, then surely just deduplicating the cold code into some warmer code is at least as good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the issue may be cold code inside each of the arms. Imagine that under some condition, matching$x
, the first arm runs 30% faster (because the internal if is almost never entered, depending on $y
), and that in the reverse condition, the second arm runs 30% faster (because the internal if is almost always entered). Merging the arms and removing the branch hint would make us 30% slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But wouldn't the hypothetical speedups of 30% be achieved via better code cache locality when the cold blocks are split out? Then merging the code would achieve the same speedups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Maybe the simple testcase here is misleading - imagine there are loops in each of the arms here, so we spend a lot of time in one arm, and the hints matter.
-
The problem is that which blocks are cold depends on what data we are running on:
In more detail, imagine we have a function calc_mostly_0
which processes some huge buffer of data, and assumes most of the code is 0. We would want a hint there that whenever it checks if an item is 0, that is likely.
And imagine we also have calc_mostly_1
which processes similar data, but assumes most of the data is 1. The hint there would say items of 0 are unlikely, the reverse of before.
Now assume those two functions are identical in all but the branch hints. Then they get compiled to very different machine code: in the first, blocks that handle 1 are cold and moved out, while in the latter, blocks that handle 0 are cold and moved out.
And, imagine we can tell what the data properties are, at runtime: perhaps we know that data from certain sources is mostly 0s, and from others, mostly 1s, so we might do
if (probably_mostly_0) {
calc_mostly_0();
} else {
calc_mostly_1();
}
Merging code here, or even just removing the branch hints, would be slower.
This is a bit unlikely to happen in reality - likely there would be other differences somewhere - but it is realistic, I think, to collect separate PGO profiles and compile multiple versions of a function, if you have a way to know which profile is more relevant. Like imagine a physics engine can be tuned for many small objects or a few big and complicated ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, if you merge the functions in that situation, then the merged function has more co-located code than either of the unmerged functions would have. But there would be less code overall, so potentially less code cache pressure and better performance, although that would depend on the placement of the functions and the behavior of the cache. I just don't think there's as clear a benefit to keeping the functions separate as you are arguing, and allowing optimization hints to inhibit optimizations seems like a priority inversion to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is more code overall without merging, but the hot-loop parts would remain hot loops regardless of how much code is around them. We have to assume some effect of locality like that, I think? Otherwise, if I follow your logic, we'd need to consider erasing hints after inlining (as the extra code might render them ineffective), but nothing I have read suggests that.
Also, not merging code here is necessary to preserve the fuzzing invariant that optimizations do not cause bad branch hints: If we merge, we must pick one hint and it might be wrong (causing large slowdowns). Or, if we merge and erase the hints, we may also cause a slowdown.
(I do think maybe when optimizing for size, that we may want to always merge, though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After talking offline, it sounds like we just had different expectations about how useful the hints would be and therefore how hard we should try to preserve them. I hadn't been considering register allocation effects in particular. This approach sgtm, especially if it turns out that LLVM makes similar decisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you raised a very good point with following LLVM. I had trouble finding a clear answer in the source, but here is a testcase with a select:
define i32 @simple_hoist(i32 %n, i1 %cond, i1 %cond2) {
entry:
br i1 %cond, label %then.block, label %else.block
then.block:
%temp1 = select i1 %cond2, i32 10, i32 -1, !prof !1
%val_then = add i32 %n, %temp1
br label %merge.block
else.block:
%temp2 = select i1 %cond2, i32 10, i32 -1, !prof !2
%val_else = add i32 %n, %temp2
br label %merge.block
merge.block:
%res = phi i32 [ %val_then, %then.block ], [ %val_else, %else.block ]
ret i32 %res
}
!1 = !{!"branch_weights", i32 2000, i32 1}
!2 = !{!"branch_weights", i32 1, i32 2000}
and here is one with an if:
define i32 @simple_hoist(i32 %n, i1 %cond, i1 %cond2) {
entry:
br i1 %cond, label %then.block, label %else.block
then.block:
%val_then = add i32 %n, 20
%cmp.then = icmp sgt i32 %val_then, 100
br i1 %cmp.then, label %ret.a, label %ret.b, !prof !1
else.block:
%val_else = add i32 %n, 20
%cmp.else = icmp sgt i32 %val_else, 100
br i1 %cmp.else, label %ret.a, label %ret.b, !prof !2
ret.a:
%res1 = phi i32 [ 42, %then.block ], [ 1337, %else.block ]
ret i32 %res1
ret.b:
%res2 = phi i32 [ 43, %then.block ], [ 1338, %else.block ]
ret i32 %res2
}
!1 = !{!"branch_weights", i32 2000, i32 1}
!2 = !{!"branch_weights", i32 1, i32 2000}
In both cases LLVM is perfectly happy to pick one of the two arbitrarily. It makes no effort to preserve the hint.
I am surprised by this but I think we can trust LLVM on it, so let's not land this, and I'll also revert #7715 . I will also need to figure out some other mechanism for how to fuzz this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to drop the hint on merging? If so, I wouldn't expect that to need special fuzzer support. Or is the plan to pick one arbitrarily and keep its hint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm considering a few possible plans for the fuzzer, either skipping relevant passes, or a flag.
Yes, I think picking one hint arbitrarily, as LLVM does, is good enough - it's simplest and presumably LLVM found it ok to do (maybe even optimal).
Our comparisons are only used in optimization atm, and we decided not to consider metadata when merging functions, as that is what LLVM does: it will fold code together without carefully checking if profiling info matches, and without even trying to fix up that profiling info later. For more on LLVM, see discussion in #7733, but basically LLVM will just merge two instructions with different branch_weights and keep the first of the two, no matter what.
Several passes merge code, and it is dangerous to merge identical code
with different branch hints (or other code annotations): the different
parts of code may intentionally have different hints, e.g. an if that checks
a condition and then runs the same code, knowing that if the condition
was true the hints go one way, and vice versa.