-
Notifications
You must be signed in to change notification settings - Fork 13.5k
reassociate multiplies with fast-math #22142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We apparently don't do this for integers either: define i32 @foo(i32 %f0, i32 %f1, i32 %f2, i32 %f3) #0 {
%mul = mul i32 %f1, %f0
%mul1 = mul i32 %mul, %f2
%mul2 = mul i32 %mul1, %f3
ret i32 %mul2
} Also, related bug 17305 - looks like we're not reassociating any math ops like this. |
We don't do this optimization in IR because it can increase register pressure, but fast-math FP multiplies on x86 are now reassociated by the machine combiner pass: http://llvm.org/viewvc/llvm-project?view=revision&revision=239486 As with bug 17305, we don't produce the optimal binary tree for larger cases, because it could make compile time explode, but this is probably as good as it gets. I'll leave this open until integer multiplies are implemented too. |
x86 reg-reg integer multiplies can now be reassociated: There's no reason that the reassociation logic can't be hoisted/extended to other architectures, so I hope this optimization will eventually make it to other targets. There's a 'TODO' comment about this in the code. |
mentioned in issue #34959 |
Extended Description
This should be optimized to:
Ie, instead of 3 dependent multiplies, we should have 2 independent fmuls followed by the fmul of those results.
This can be generalized for N fmuls to form the optimal binary tree of independent ops.
The text was updated successfully, but these errors were encountered: