-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[IVDescriptors] Don't require nsz/nnan for (min|max)num. #137003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We do not need to require nsz or nnan for FP reductions with minnum and maxnum. NaNs and nsz are handled consistently with the vector versions of minnum and maxnum. vector.reduce.(fmin|fmax) also matches the minnum/maxnum semantics for comparisons. IIUC this was also the conclusion when support for minimum/maximum was added (https://reviews.llvm.org/D151482?id=531021#inline-1481555). Alive2 agrees that maxnum/minnum can be re-ordered (scalar version: https://alive2.llvm.org/ce/z/GVmgBX) and also verifies the vectorized code end-to-end, with a tweak to replace llvm.reduce.fmax with a scalar maxnum, as Alive2 doesn't support llvm.reduce.fmax: https://alive2.llvm.org/ce/z/EwJKeJ . Note that verification requires a higher timeout than available in the online version.
@llvm/pr-subscribers-llvm-analysis Author: Florian Hahn (fhahn) ChangesWe do not need to require nsz or nnan for FP reductions with minnum and maxnum. NaNs and nsz are handled consistently with the vector versions of minnum and maxnum. vector.reduce.(fmin|fmax) also matches the minnum/maxnum semantics for comparisons. IIUC this was also the conclusion when support for minimum/maximum was added (https://reviews.llvm.org/D151482?id=531021#inline-1481555). Alive2 agrees that maxnum/minnum can be re-ordered (scalar version: https://alive2.llvm.org/ce/z/GVmgBX) and also verifies the vectorized code end-to-end, with a tweak to replace llvm.reduce.fmax with a scalar maxnum, as Alive2 doesn't support llvm.reduce.fmax: https://alive2.llvm.org/ce/z/EwJKeJ . Note that verification requires a higher timeout than available in the online version. Full diff: https://github.com/llvm/llvm-project/pull/137003.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/IVDescriptors.cpp b/llvm/lib/Analysis/IVDescriptors.cpp
index 94c347b01bbfb..4a6efd0ca821a 100644
--- a/llvm/lib/Analysis/IVDescriptors.cpp
+++ b/llvm/lib/Analysis/IVDescriptors.cpp
@@ -892,10 +892,13 @@ RecurrenceDescriptor::InstDesc RecurrenceDescriptor::isRecurrenceInstr(
return true;
if (isa<FPMathOperator>(I) && I->hasNoNaNs() && I->hasNoSignedZeros())
return true;
- // minimum and maximum intrinsics do not require nsz and nnan flags since
- // NaN and signed zeroes are propagated in the intrinsic implementation.
+ // minimum/minnum and maximum/maxnum intrinsics do not require nsz and nnan
+ // flags since NaN and signed zeroes are propagated in the intrinsic
+ // implementation.
return match(I, m_Intrinsic<Intrinsic::minimum>(m_Value(), m_Value())) ||
- match(I, m_Intrinsic<Intrinsic::maximum>(m_Value(), m_Value()));
+ match(I, m_Intrinsic<Intrinsic::maximum>(m_Value(), m_Value())) ||
+ match(I, m_Intrinsic<Intrinsic::minnum>(m_Value(), m_Value())) ||
+ match(I, m_Intrinsic<Intrinsic::maxnum>(m_Value(), m_Value()));
};
if (isIntMinMaxRecurrenceKind(Kind) ||
(HasRequiredFMF() && isFPMinMaxRecurrenceKind(Kind)))
diff --git a/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll b/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll
index 85a90f2e04c5e..97b65e3435b5a 100644
--- a/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll
@@ -1002,7 +1002,8 @@ for.body: ; preds = %entry, %for.body
}
; CHECK-LABEL: @fmin_intrinsic_nofast(
-; CHECK-NOT: <2 x float> @llvm.minnum.v2f32
+; CHECK: call <2 x float> @llvm.minnum.v2f32
+; CHECK: call float @llvm.vector.reduce.fmin.v2f32
define float @fmin_intrinsic_nofast(ptr nocapture readonly %x) {
entry:
br label %for.body
@@ -1022,7 +1023,8 @@ for.body: ; preds = %entry, %for.body
}
; CHECK-LABEL: @fmax_intrinsic_nofast(
-; CHECK-NOT: <2 x float> @llvm.maxnum.v2f32
+; CHECK: call <2 x float> @llvm.maxnum.v2f32
+; CHECK: call float @llvm.vector.reduce.fmax.v2f32
define float @fmax_intrinsic_nofast(ptr nocapture readonly %x) {
entry:
br label %for.body
|
@llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesWe do not need to require nsz or nnan for FP reductions with minnum and maxnum. NaNs and nsz are handled consistently with the vector versions of minnum and maxnum. vector.reduce.(fmin|fmax) also matches the minnum/maxnum semantics for comparisons. IIUC this was also the conclusion when support for minimum/maximum was added (https://reviews.llvm.org/D151482?id=531021#inline-1481555). Alive2 agrees that maxnum/minnum can be re-ordered (scalar version: https://alive2.llvm.org/ce/z/GVmgBX) and also verifies the vectorized code end-to-end, with a tweak to replace llvm.reduce.fmax with a scalar maxnum, as Alive2 doesn't support llvm.reduce.fmax: https://alive2.llvm.org/ce/z/EwJKeJ . Note that verification requires a higher timeout than available in the online version. Full diff: https://github.com/llvm/llvm-project/pull/137003.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/IVDescriptors.cpp b/llvm/lib/Analysis/IVDescriptors.cpp
index 94c347b01bbfb..4a6efd0ca821a 100644
--- a/llvm/lib/Analysis/IVDescriptors.cpp
+++ b/llvm/lib/Analysis/IVDescriptors.cpp
@@ -892,10 +892,13 @@ RecurrenceDescriptor::InstDesc RecurrenceDescriptor::isRecurrenceInstr(
return true;
if (isa<FPMathOperator>(I) && I->hasNoNaNs() && I->hasNoSignedZeros())
return true;
- // minimum and maximum intrinsics do not require nsz and nnan flags since
- // NaN and signed zeroes are propagated in the intrinsic implementation.
+ // minimum/minnum and maximum/maxnum intrinsics do not require nsz and nnan
+ // flags since NaN and signed zeroes are propagated in the intrinsic
+ // implementation.
return match(I, m_Intrinsic<Intrinsic::minimum>(m_Value(), m_Value())) ||
- match(I, m_Intrinsic<Intrinsic::maximum>(m_Value(), m_Value()));
+ match(I, m_Intrinsic<Intrinsic::maximum>(m_Value(), m_Value())) ||
+ match(I, m_Intrinsic<Intrinsic::minnum>(m_Value(), m_Value())) ||
+ match(I, m_Intrinsic<Intrinsic::maxnum>(m_Value(), m_Value()));
};
if (isIntMinMaxRecurrenceKind(Kind) ||
(HasRequiredFMF() && isFPMinMaxRecurrenceKind(Kind)))
diff --git a/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll b/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll
index 85a90f2e04c5e..97b65e3435b5a 100644
--- a/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll
@@ -1002,7 +1002,8 @@ for.body: ; preds = %entry, %for.body
}
; CHECK-LABEL: @fmin_intrinsic_nofast(
-; CHECK-NOT: <2 x float> @llvm.minnum.v2f32
+; CHECK: call <2 x float> @llvm.minnum.v2f32
+; CHECK: call float @llvm.vector.reduce.fmin.v2f32
define float @fmin_intrinsic_nofast(ptr nocapture readonly %x) {
entry:
br label %for.body
@@ -1022,7 +1023,8 @@ for.body: ; preds = %entry, %for.body
}
; CHECK-LABEL: @fmax_intrinsic_nofast(
-; CHECK-NOT: <2 x float> @llvm.maxnum.v2f32
+; CHECK: call <2 x float> @llvm.maxnum.v2f32
+; CHECK: call float @llvm.vector.reduce.fmax.v2f32
define float @fmax_intrinsic_nofast(ptr nocapture readonly %x) {
entry:
br label %for.body
|
I'm not sure how Alive2 came to this conclusion, but I don't think it's correct. According to the LangRef definition of %max = call float @llvm.maxnum.f32(float %a, float %b ) will return %max = call float @llvm.maxnum.f32(float %b, float %c) will return qNaN. I'm not sure how the LangRef defintion of |
@andykaylor This is the result of a recent (and imho misguided) change to LangRef, see #112852. Probably alive2 implements the old semantics. |
That change is, at the very least, troubling. We've added more rules about the handling of NaNs over time, but this is the only instance I'm aware of where sNaN and qNaN result in such different value results. @jcranmer-intel seemed to be against the change, while @arsenm was in favor, so they may want to comment here. If that change to the LangRef stands, then this PR has a problem. I am sympathetic to the point @arsenm makes that if we're going to take the name of an IEEE-754 operation then we should do what IEEE-754 says the operation does, but the current definition still has an exception to that. |
Correct, IEEE really botched this operation definition. It fundamentally has the opposite behavior with a signaling and quiet nans and I don't see any other way around it. It is always going to be a special case and cannot abide by the "treat as-if it were quiet" handling. llvm.minimumnum is a much more sensible operation where it does work for the weird ignored-nan handling (although it still feels like a legacy concern, llvm.minimum just has completely normal nan handling). |
I mentioned this to Florian in another channel, but:
|
Multiply by 1.0 is likely to be optimized away. I think you'd need |
Sorry, I’m talking about what we need at the machine level in the example that Florian gave me. At the IR level, you’re right that’s a canonicalize. |
I don't think we should be inserting canonicalize in the IR for this. You can switch to llvm.minimumnum + nsz to get ignored-signalingness behavior plus the fuzzy signed 0 handling of C fmin. In terms of usability, we mostly shouldn't use llvm.minnum other than it's handy in the lowering path for the common case of targets with the old style operations |
Works for me, as long as we can codegen it well (armv8 at least seems to, from a quick experiment). |
Ok, so it sounds like the new intrinsics are the way to go. I'll add support for maximumnum/minimumnum reductions and update Clang to emit use One thing to note is that GCC uses fmaxnm on AArch64 for fmax, which appears incorrect as per the discussion above: https://godbolt.org/z/W3z1G1z3c |
No, not incorrect. Or depends on your reading of the spec and at what point in time. glibc had the snan-as-qnan behavior until https://sourceware.org/bugzilla/show_bug.cgi?id=20947. The C99 wording was less clear than the C23 wording, which made it more clear that fmin/fmax should follow the IEEE-754 2008 behavior that llvm.maxnum is now supposed to have |
Or maybe it was just clarified that signaling nans will be quieted, I'm not sure without rereading all that |
Yeah - My understanding was that the behaviour of fmax in glibc is equivalent to maxnum, which is equivalent to fmaxnm. So the patch in #112852 brought the documentation in line how it always worked for AArch64/Arm, and what fmax should always have produced. It would certainly be nice for min/max reductions to vectorize without nonan, but generating 3 instructions for every fmin/fmax would be a shame too. (Hopefully some of them can detect no-snan from the generating instruction and not hit the issue, but it likely won't help everywhere). |
I guess the question is if there's anything we could do to still match GCC's behavior? It would indeed be a shame if we would have to generate more instructions than gcc ;) For the vector-reduction case, we will need to generate one extra op to silence the NaN, as the reduction value is guaranteed no-snan, if the start value isnt' snan. |
I guess with the new intrinsics definitions, generating |
My understanding is that it is now correct. With the new definition glibc fmax() == llvm.maxnum == aarch64 fmaxnm. GCC might have an equally troubled past with snan as clang, and treats it more like nan under fast-math. (But not quite like poison, that goes too far some times). They have a I would be happy if snan were relegated to require strict-fp and 'unsupported' otherwise, but I'm not sure people would really like that response. It would allow all architectures to be happier though, with those supporting 2008 fmax behaviour and those supporting the newer ieee 2019 to both produce good code generation. |
Do you have a motivating case? Does it look like |
We should vectorize both, my motivating case is the first one. But even when using |
Good point, yeah. |
We should be able to avoid the canonicalizes in the minimumnum codegen in most cases. In AMDGPU we have a DAG optimization to try to prune out redundant canonicalizes during selection. It's theoretically broken if an incoming FP operation that should canonicalizes ends up dropping the canonicalize later. If we strengthened the CodeGen rules to disallow the canonicalize dropping we permit in the IR, we could do a better job cleaning these out in the combiner. |
We do not need to require nsz or nnan for FP reductions with minnum and maxnum. NaNs and nsz are handled consistently with the vector versions of minnum and maxnum. vector.reduce.(fmin|fmax) also matches the minnum/maxnum semantics for comparisons.
IIUC this was also the conclusion when support for minimum/maximum was added (https://reviews.llvm.org/D151482?id=531021#inline-1481555).
Alive2 agrees that maxnum/minnum can be re-ordered (scalar version: https://alive2.llvm.org/ce/z/GVmgBX) and also verifies the vectorized code end-to-end, with a tweak to replace llvm.reduce.fmax with a scalar maxnum, as Alive2 doesn't support llvm.reduce.fmax: https://alive2.llvm.org/ce/z/EwJKeJ . Note that verification requires a higher timeout than available in the online version.