Skip to content

[Branch Hints] Add a utility to compare with metadata, and use it in merging opt passes #7733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

kripken
Copy link
Member

@kripken kripken commented Jul 16, 2025

Several passes merge code, and it is dangerous to merge identical code
with different branch hints (or other code annotations): the different
parts of code may intentionally have different hints, e.g. an if that checks
a condition and then runs the same code, knowing that if the condition
was true the hints go one way, and vice versa.

@kripken kripken requested a review from tlively July 16, 2025 22:38
;; CHECK-NEXT: (f32.const 0)
;; CHECK-NEXT: )
(func $different (param $x i32) (param $y i32) (result f32)
;; The branch hints differ, so we do not optimize.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to merge and drop the hint in this case. If the only benefit of branch hints is that cold code can be placed far away, then surely just deduplicating the cold code into some warmer code is at least as good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the issue may be cold code inside each of the arms. Imagine that under some condition, matching$x, the first arm runs 30% faster (because the internal if is almost never entered, depending on $y), and that in the reverse condition, the second arm runs 30% faster (because the internal if is almost always entered). Merging the arms and removing the branch hint would make us 30% slower.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wouldn't the hypothetical speedups of 30% be achieved via better code cache locality when the cold blocks are split out? Then merging the code would achieve the same speedups.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Maybe the simple testcase here is misleading - imagine there are loops in each of the arms here, so we spend a lot of time in one arm, and the hints matter.

  2. The problem is that which blocks are cold depends on what data we are running on:

In more detail, imagine we have a function calc_mostly_0 which processes some huge buffer of data, and assumes most of the code is 0. We would want a hint there that whenever it checks if an item is 0, that is likely.

And imagine we also have calc_mostly_1 which processes similar data, but assumes most of the data is 1. The hint there would say items of 0 are unlikely, the reverse of before.

Now assume those two functions are identical in all but the branch hints. Then they get compiled to very different machine code: in the first, blocks that handle 1 are cold and moved out, while in the latter, blocks that handle 0 are cold and moved out.

And, imagine we can tell what the data properties are, at runtime: perhaps we know that data from certain sources is mostly 0s, and from others, mostly 1s, so we might do

if (probably_mostly_0) {
  calc_mostly_0();
} else {
  calc_mostly_1();
}

Merging code here, or even just removing the branch hints, would be slower.

This is a bit unlikely to happen in reality - likely there would be other differences somewhere - but it is realistic, I think, to collect separate PGO profiles and compile multiple versions of a function, if you have a way to know which profile is more relevant. Like imagine a physics engine can be tuned for many small objects or a few big and complicated ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, if you merge the functions in that situation, then the merged function has more co-located code than either of the unmerged functions would have. But there would be less code overall, so potentially less code cache pressure and better performance, although that would depend on the placement of the functions and the behavior of the cache. I just don't think there's as clear a benefit to keeping the functions separate as you are arguing, and allowing optimization hints to inhibit optimizations seems like a priority inversion to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is more code overall without merging, but the hot-loop parts would remain hot loops regardless of how much code is around them. We have to assume some effect of locality like that, I think? Otherwise, if I follow your logic, we'd need to consider erasing hints after inlining (as the extra code might render them ineffective), but nothing I have read suggests that.

Also, not merging code here is necessary to preserve the fuzzing invariant that optimizations do not cause bad branch hints: If we merge, we must pick one hint and it might be wrong (causing large slowdowns). Or, if we merge and erase the hints, we may also cause a slowdown.

(I do think maybe when optimizing for size, that we may want to always merge, though.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After talking offline, it sounds like we just had different expectations about how useful the hints would be and therefore how hard we should try to preserve them. I hadn't been considering register allocation effects in particular. This approach sgtm, especially if it turns out that LLVM makes similar decisions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you raised a very good point with following LLVM. I had trouble finding a clear answer in the source, but here is a testcase with a select:

define i32 @simple_hoist(i32 %n, i1 %cond, i1 %cond2) {
entry:
  br i1 %cond, label %then.block, label %else.block

then.block:
  %temp1 = select i1 %cond2, i32 10, i32 -1, !prof !1
  %val_then = add i32 %n, %temp1
  br label %merge.block

else.block:
  %temp2 = select i1 %cond2, i32 10, i32 -1, !prof !2
  %val_else = add i32 %n, %temp2
  br label %merge.block

merge.block:
  %res = phi i32 [ %val_then, %then.block ], [ %val_else, %else.block ]
  ret i32 %res
}

!1 = !{!"branch_weights", i32 2000, i32 1}
!2 = !{!"branch_weights", i32 1, i32 2000}

and here is one with an if:

define i32 @simple_hoist(i32 %n, i1 %cond, i1 %cond2) {
entry:
  br i1 %cond, label %then.block, label %else.block

then.block:
  %val_then = add i32 %n, 20
  %cmp.then = icmp sgt i32 %val_then, 100
  br i1 %cmp.then, label %ret.a, label %ret.b, !prof !1

else.block:
  %val_else = add i32 %n, 20
  %cmp.else = icmp sgt i32 %val_else, 100
  br i1 %cmp.else, label %ret.a, label %ret.b, !prof !2

ret.a:
  %res1 = phi i32 [ 42, %then.block ], [ 1337, %else.block ]
  ret i32 %res1

ret.b:
  %res2 = phi i32 [ 43, %then.block ], [ 1338, %else.block ]
  ret i32 %res2
}

!1 = !{!"branch_weights", i32 2000, i32 1}
!2 = !{!"branch_weights", i32 1, i32 2000}

In both cases LLVM is perfectly happy to pick one of the two arbitrarily. It makes no effort to preserve the hint.

I am surprised by this but I think we can trust LLVM on it, so let's not land this, and I'll also revert #7715 . I will also need to figure out some other mechanism for how to fuzz this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to drop the hint on merging? If so, I wouldn't expect that to need special fuzzer support. Or is the plan to pick one arbitrarily and keep its hint?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm considering a few possible plans for the fuzzer, either skipping relevant passes, or a flag.

Yes, I think picking one hint arbitrarily, as LLVM does, is good enough - it's simplest and presumably LLVM found it ok to do (maybe even optimal).

@kripken kripken closed this Jul 18, 2025
@kripken kripken deleted the eq.meta branch July 18, 2025 00:31
kripken added a commit that referenced this pull request Jul 21, 2025
Our comparisons are only used in optimization atm, and we decided not to
consider metadata when merging functions, as that is what LLVM does: it
will fold code together without carefully checking if profiling info matches,
and without even trying to fix up that profiling info later.

For more on LLVM, see discussion in #7733, but basically LLVM will just merge
two instructions with different branch_weights and keep the first of the two,
no matter what.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants