Skip to content

reassociate multiplies with fast-math #22142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rotateright opened this issue Dec 5, 2014 · 4 comments
Closed

reassociate multiplies with fast-math #22142

rotateright opened this issue Dec 5, 2014 · 4 comments
Labels
bugzilla Issues migrated from bugzilla

Comments

@rotateright
Copy link
Contributor

rotateright commented Dec 5, 2014

Bugzilla Link 21768
Resolution FIXED
Resolved on Dec 10, 2017 09:38
Version trunk
OS All
Blocks #34959
CC @hfinkel,@RKSimon,@sanjoy

Extended Description

define float @foo(float %f0, float %f1, float %f2, float %f3) {
  %mul0 = fmul fast float %f1, %f0
  %mul1 = fmul fast float %mul0, %f2
  %mul2 = fmul fast float %mul1, %f3
  ret float %mul2
}

This should be optimized to:

define float @foo(float %f0, float %f1, float %f2, float %f3) {
  %mul0 = fmul fast float %f0, %f1
  %mul1 = fmul fast float %f2, %f3
  %mul2 = fmul fast float %mul0, %mul1
  ret float %mul2
}

Ie, instead of 3 dependent multiplies, we should have 2 independent fmuls followed by the fmul of those results.

This can be generalized for N fmuls to form the optimal binary tree of independent ops.

@rotateright
Copy link
Contributor Author

rotateright commented Dec 5, 2014

We apparently don't do this for integers either:

define i32 @foo(i32 %f0, i32 %f1, i32 %f2, i32 %f3) #0 {
  %mul = mul i32 %f1, %f0
  %mul1 = mul i32 %mul, %f2
  %mul2 = mul i32 %mul1, %f3
  ret i32 %mul2
}

Also, related bug 17305 - looks like we're not reassociating any math ops like this.

@rotateright
Copy link
Contributor Author

We don't do this optimization in IR because it can increase register pressure, but fast-math FP multiplies on x86 are now reassociated by the machine combiner pass:

http://llvm.org/viewvc/llvm-project?view=revision&revision=239486
http://llvm.org/viewvc/llvm-project?view=revision&revision=240361
http://llvm.org/viewvc/llvm-project?view=revision&revision=241752
http://llvm.org/viewvc/llvm-project?view=revision&revision=241873

As with bug 17305, we don't produce the optimal binary tree for larger cases, because it could make compile time explode, but this is probably as good as it gets.

I'll leave this open until integer multiplies are implemented too.

@rotateright
Copy link
Contributor Author

x86 reg-reg integer multiplies can now be reassociated:
http://llvm.org/viewvc/llvm-project?view=revision&revision=243756

There's no reason that the reassociation logic can't be hoisted/extended to other architectures, so I hope this optimization will eventually make it to other targets. There's a 'TODO' comment about this in the code.

@RKSimon
Copy link
Collaborator

RKSimon commented Nov 27, 2021

mentioned in issue #34959

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 9, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

2 participants