-
Notifications
You must be signed in to change notification settings - Fork 13.5k
failure to convert FP addition loop into multiplication with fast-math #28268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Do you suggest to convert this sequence to fmul? Is it the right thing to do under fast-math? |
I think so (although I don't know which pass is responsible for the transform yet). For integers, we have to guard against a negative value for 'n': define i32 @multiply_the_hard_way(i32 %n) {
%cmp = icmp sgt i32 %n, 0
%mul = mul i32 %n, 7
%sel = select i1 %cmp, i32 %mul, i32 0
ret i32 %sel
} So for floats, it should be similar? define float @multiply_the_hard_way(i32 %n) {
%cmp = icmp sgt i32 %n, 0
%n_as_float = sitofp i32 %n to float
%mul = fmul float %n_as_float, 7.0
%sel = select i1 %cmp, float %mul, float 0.0
ret i32 %sel
} |
Typo: i32 --> float |
Note that if 'n' is over ~2^23, we wouldn't have an exact result. I haven't looked to see how the add loop would diverge from an fmul in that case. The difference could be judged too much even for fast-math...although at first glance, I would think the multiply would be the more accurate answer! |
Induction variable simplification will effectively remove this loop if SCEV expressions are formed for the floating point operations. The question is
It's not clear to me that fast-math means that floating point operations can be treated as scaled, infinite precision integer operations, which is effectively what you're doing. This is way beyond reassociation. |
I think patterns like what we have in bug 27881 are common, so IMO, yes.
Agreed. This definitely pushes the (unspecified) bounds of fast-math. I wonder if the fact that the SCEV likely produces a more accurate answer than what the program would produce unoptimized should be taken into consideration. |
So this is tricky. First, because fast-math already does this sometimes (just because of reassociation), but, second, because there's no sequence of local transformations the user could do to make the unoptimized program produce the same result (unlike reassociation). In the end, I'm okay with performing this transformation under fast-math. I do believe, however, it depends on how we define it. We probably should try to define it. Maybe something like this, "-ffast-math enables the compiler to perform transformations on floating-point calculations that are valid when treating the floating-point values as mathematical real numbers, but not semantics preserving when considering the floating-point values' machine representations. It also enables the compiler to perform transformations resulting in floating-point calculations computing fewer correct bits than they would otherwise." |
mentioned in issue #34959 |
Extended Description
Another example in the loop vectorizer code explosion series (bug 27881, bug 27826) can be seen with:
This may look ridiculous in simplified form, but the problem could easily exist in real code or slightly more real code like bug 27881.
This may be considered a bug before it ever gets to the vectorizer as discussed recently here:
http://lists.llvm.org/pipermail/llvm-dev/2016-May/099724.html
Ie, if these were integers, we'd convert this into an imul.
$ ./clang -O2 multiply_the_hard_way.c -S -ffast-math -emit-llvm -o -
The text was updated successfully, but these errors were encountered: