[IVDescriptors] Don't require nsz/nnan for (min|max)num. #137003

fhahn · 2025-04-23T15:25:18Z

We do not need to require nsz or nnan for FP reductions with minnum and maxnum. NaNs and nsz are handled consistently with the vector versions of minnum and maxnum. vector.reduce.(fmin|fmax) also matches the minnum/maxnum semantics for comparisons.

IIUC this was also the conclusion when support for minimum/maximum was added (https://reviews.llvm.org/D151482?id=531021#inline-1481555).

Alive2 agrees that maxnum/minnum can be re-ordered (scalar version: https://alive2.llvm.org/ce/z/GVmgBX) and also verifies the vectorized code end-to-end, with a tweak to replace llvm.reduce.fmax with a scalar maxnum, as Alive2 doesn't support llvm.reduce.fmax: https://alive2.llvm.org/ce/z/EwJKeJ . Note that verification requires a higher timeout than available in the online version.

We do not need to require nsz or nnan for FP reductions with minnum and maxnum. NaNs and nsz are handled consistently with the vector versions of minnum and maxnum. vector.reduce.(fmin|fmax) also matches the minnum/maxnum semantics for comparisons. IIUC this was also the conclusion when support for minimum/maximum was added (https://reviews.llvm.org/D151482?id=531021#inline-1481555). Alive2 agrees that maxnum/minnum can be re-ordered (scalar version: https://alive2.llvm.org/ce/z/GVmgBX) and also verifies the vectorized code end-to-end, with a tweak to replace llvm.reduce.fmax with a scalar maxnum, as Alive2 doesn't support llvm.reduce.fmax: https://alive2.llvm.org/ce/z/EwJKeJ . Note that verification requires a higher timeout than available in the online version.

llvmbot · 2025-04-23T15:25:55Z

@llvm/pr-subscribers-llvm-analysis

Author: Florian Hahn (fhahn)

Changes

We do not need to require nsz or nnan for FP reductions with minnum and maxnum. NaNs and nsz are handled consistently with the vector versions of minnum and maxnum. vector.reduce.(fmin|fmax) also matches the minnum/maxnum semantics for comparisons.

IIUC this was also the conclusion when support for minimum/maximum was added (https://reviews.llvm.org/D151482?id=531021#inline-1481555).

Alive2 agrees that maxnum/minnum can be re-ordered (scalar version: https://alive2.llvm.org/ce/z/GVmgBX) and also verifies the vectorized code end-to-end, with a tweak to replace llvm.reduce.fmax with a scalar maxnum, as Alive2 doesn't support llvm.reduce.fmax: https://alive2.llvm.org/ce/z/EwJKeJ . Note that verification requires a higher timeout than available in the online version.

Full diff: https://github.com/llvm/llvm-project/pull/137003.diff

2 Files Affected:

(modified) llvm/lib/Analysis/IVDescriptors.cpp (+6-3)
(modified) llvm/test/Transforms/LoopVectorize/minmax_reduction.ll (+4-2)

diff --git a/llvm/lib/Analysis/IVDescriptors.cpp b/llvm/lib/Analysis/IVDescriptors.cpp
index 94c347b01bbfb..4a6efd0ca821a 100644
--- a/llvm/lib/Analysis/IVDescriptors.cpp
+++ b/llvm/lib/Analysis/IVDescriptors.cpp
@@ -892,10 +892,13 @@ RecurrenceDescriptor::InstDesc RecurrenceDescriptor::isRecurrenceInstr(
        return true;
      if (isa<FPMathOperator>(I) && I->hasNoNaNs() && I->hasNoSignedZeros())
        return true;
-     // minimum and maximum intrinsics do not require nsz and nnan flags since
-     // NaN and signed zeroes are propagated in the intrinsic implementation.
+     // minimum/minnum and maximum/maxnum intrinsics do not require nsz and nnan
+     // flags since NaN and signed zeroes are propagated in the intrinsic
+     // implementation.
      return match(I, m_Intrinsic<Intrinsic::minimum>(m_Value(), m_Value())) ||
-            match(I, m_Intrinsic<Intrinsic::maximum>(m_Value(), m_Value()));
+            match(I, m_Intrinsic<Intrinsic::maximum>(m_Value(), m_Value())) ||
+            match(I, m_Intrinsic<Intrinsic::minnum>(m_Value(), m_Value())) ||
+            match(I, m_Intrinsic<Intrinsic::maxnum>(m_Value(), m_Value()));
     };
     if (isIntMinMaxRecurrenceKind(Kind) ||
         (HasRequiredFMF() && isFPMinMaxRecurrenceKind(Kind)))
diff --git a/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll b/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll
index 85a90f2e04c5e..97b65e3435b5a 100644
--- a/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll
@@ -1002,7 +1002,8 @@ for.body:                                         ; preds = %entry, %for.body
 }
 
 ; CHECK-LABEL: @fmin_intrinsic_nofast(
-; CHECK-NOT: <2 x float> @llvm.minnum.v2f32
+; CHECK: call <2 x float> @llvm.minnum.v2f32
+; CHECK: call float @llvm.vector.reduce.fmin.v2f32
 define float @fmin_intrinsic_nofast(ptr nocapture readonly %x) {
 entry:
   br label %for.body
@@ -1022,7 +1023,8 @@ for.body:                                         ; preds = %entry, %for.body
 }
 
 ; CHECK-LABEL: @fmax_intrinsic_nofast(
-; CHECK-NOT: <2 x float> @llvm.maxnum.v2f32
+; CHECK: call <2 x float> @llvm.maxnum.v2f32
+; CHECK: call float @llvm.vector.reduce.fmax.v2f32
 define float @fmax_intrinsic_nofast(ptr nocapture readonly %x) {
 entry:
   br label %for.body

llvmbot · 2025-04-23T15:25:56Z

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

We do not need to require nsz or nnan for FP reductions with minnum and maxnum. NaNs and nsz are handled consistently with the vector versions of minnum and maxnum. vector.reduce.(fmin|fmax) also matches the minnum/maxnum semantics for comparisons.

IIUC this was also the conclusion when support for minimum/maximum was added (https://reviews.llvm.org/D151482?id=531021#inline-1481555).

Alive2 agrees that maxnum/minnum can be re-ordered (scalar version: https://alive2.llvm.org/ce/z/GVmgBX) and also verifies the vectorized code end-to-end, with a tweak to replace llvm.reduce.fmax with a scalar maxnum, as Alive2 doesn't support llvm.reduce.fmax: https://alive2.llvm.org/ce/z/EwJKeJ . Note that verification requires a higher timeout than available in the online version.

Full diff: https://github.com/llvm/llvm-project/pull/137003.diff

2 Files Affected:

(modified) llvm/lib/Analysis/IVDescriptors.cpp (+6-3)
(modified) llvm/test/Transforms/LoopVectorize/minmax_reduction.ll (+4-2)

diff --git a/llvm/lib/Analysis/IVDescriptors.cpp b/llvm/lib/Analysis/IVDescriptors.cpp
index 94c347b01bbfb..4a6efd0ca821a 100644
--- a/llvm/lib/Analysis/IVDescriptors.cpp
+++ b/llvm/lib/Analysis/IVDescriptors.cpp
@@ -892,10 +892,13 @@ RecurrenceDescriptor::InstDesc RecurrenceDescriptor::isRecurrenceInstr(
        return true;
      if (isa<FPMathOperator>(I) && I->hasNoNaNs() && I->hasNoSignedZeros())
        return true;
-     // minimum and maximum intrinsics do not require nsz and nnan flags since
-     // NaN and signed zeroes are propagated in the intrinsic implementation.
+     // minimum/minnum and maximum/maxnum intrinsics do not require nsz and nnan
+     // flags since NaN and signed zeroes are propagated in the intrinsic
+     // implementation.
      return match(I, m_Intrinsic<Intrinsic::minimum>(m_Value(), m_Value())) ||
-            match(I, m_Intrinsic<Intrinsic::maximum>(m_Value(), m_Value()));
+            match(I, m_Intrinsic<Intrinsic::maximum>(m_Value(), m_Value())) ||
+            match(I, m_Intrinsic<Intrinsic::minnum>(m_Value(), m_Value())) ||
+            match(I, m_Intrinsic<Intrinsic::maxnum>(m_Value(), m_Value()));
     };
     if (isIntMinMaxRecurrenceKind(Kind) ||
         (HasRequiredFMF() && isFPMinMaxRecurrenceKind(Kind)))
diff --git a/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll b/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll
index 85a90f2e04c5e..97b65e3435b5a 100644
--- a/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll
+++ b/llvm/test/Transforms/LoopVectorize/minmax_reduction.ll
@@ -1002,7 +1002,8 @@ for.body:                                         ; preds = %entry, %for.body
 }
 
 ; CHECK-LABEL: @fmin_intrinsic_nofast(
-; CHECK-NOT: <2 x float> @llvm.minnum.v2f32
+; CHECK: call <2 x float> @llvm.minnum.v2f32
+; CHECK: call float @llvm.vector.reduce.fmin.v2f32
 define float @fmin_intrinsic_nofast(ptr nocapture readonly %x) {
 entry:
   br label %for.body
@@ -1022,7 +1023,8 @@ for.body:                                         ; preds = %entry, %for.body
 }
 
 ; CHECK-LABEL: @fmax_intrinsic_nofast(
-; CHECK-NOT: <2 x float> @llvm.maxnum.v2f32
+; CHECK: call <2 x float> @llvm.maxnum.v2f32
+; CHECK: call float @llvm.vector.reduce.fmax.v2f32
 define float @fmax_intrinsic_nofast(ptr nocapture readonly %x) {
 entry:
   br label %for.body

andykaylor · 2025-04-23T19:04:18Z

Alive2 agrees that maxnum/minnum can be re-ordered (scalar version: https://alive2.llvm.org/ce/z/GVmgBX)

I'm not sure how Alive2 came to this conclusion, but I don't think it's correct. According to the LangRef definition of llvm.maxnum, "if either operand is sNaN, the result is qNaN" but "If either operand is a qNaN, returns the other non-NaN operand." So, if %a is sNaN and neither %b or %c are NaN,

%max = call float @llvm.maxnum.f32(float %a, float %b )
%max2 = call float @llvm.maxnum.f32(float %c, float %max)

will return %c whereas

%max = call float @llvm.maxnum.f32(float %b, float %c)
%max2 = call float @llvm.maxnum.f32(float %a, float %max)

will return qNaN.

I'm not sure how the LangRef defintion of llvm.maxnum squares with the general claim that "Floating-point math operations are allowed to treat all NaNs as if they were quiet NaNs."

nikic · 2025-04-23T20:01:46Z

@andykaylor This is the result of a recent (and imho misguided) change to LangRef, see #112852. Probably alive2 implements the old semantics.

andykaylor · 2025-04-23T20:31:48Z

@andykaylor This is the result of a recent (and imho misguided) change to LangRef, see #112852. Probably alive2 implements the old semantics.

That change is, at the very least, troubling. We've added more rules about the handling of NaNs over time, but this is the only instance I'm aware of where sNaN and qNaN result in such different value results. @jcranmer-intel seemed to be against the change, while @arsenm was in favor, so they may want to comment here. If that change to the LangRef stands, then this PR has a problem.

I am sympathetic to the point @arsenm makes that if we're going to take the name of an IEEE-754 operation then we should do what IEEE-754 says the operation does, but the current definition still has an exception to that.

arsenm · 2025-04-23T20:38:37Z

We've added more rules about the handling of NaNs over time, but this is the only instance I'm aware of where sNaN and qNaN result in such different value results.

Correct, IEEE really botched this operation definition. It fundamentally has the opposite behavior with a signaling and quiet nans and I don't see any other way around it. It is always going to be a special case and cannot abide by the "treat as-if it were quiet" handling. llvm.minimumnum is a much more sensible operation where it does work for the weird ignored-nan handling (although it still feels like a legacy concern, llvm.minimum just has completely normal nan handling).

stephentyrone · 2025-04-24T14:03:42Z

I mentioned this to Florian in another channel, but:

yes, fmaxnm/fminnm have always been fundamentally busted
we can avoid the problem for the purposes of lowering C's fmax/fmin without no-honor-nans by multiplying by 1.0 before doing fmaxnm (the multiplication sends sNaN to qNaN, which will then be dropped when the other operand is a number).

andykaylor · 2025-04-24T17:05:37Z

2. we can avoid the problem for the purposes of lowering C's fmax/fmin without no-honor-nans by multiplying by 1.0 before doing fmaxnm (the multiplication sends sNaN to qNaN, which will then be dropped when the other operand is a number).

Multiply by 1.0 is likely to be optimized away. I think you'd need llvm.canonicalize instead.

stephentyrone · 2025-04-24T18:30:58Z

Sorry, I’m talking about what we need at the machine level in the example that Florian gave me. At the IR level, you’re right that’s a canonicalize.

arsenm · 2025-04-24T19:13:02Z

we can avoid the problem for the purposes of lowering C's fmax/fmin without no-honor-nans by multiplying by 1.0 before doing fmaxnm (the multiplication sends sNaN to qNaN, which will then be dropped when the other operand is a number).
At the IR level, you’re right that’s a canonicalize.

I don't think we should be inserting canonicalize in the IR for this. You can switch to llvm.minimumnum + nsz to get ignored-signalingness behavior plus the fuzzy signed 0 handling of C fmin. In terms of usability, we mostly shouldn't use llvm.minnum other than it's handy in the lowering path for the common case of targets with the old style operations

stephentyrone · 2025-04-24T19:39:45Z

Works for me, as long as we can codegen it well (armv8 at least seems to, from a quick experiment).

fhahn · 2025-04-25T14:10:55Z

Ok, so it sounds like the new intrinsics are the way to go. I'll add support for maximumnum/minimumnum reductions and update Clang to emit use maximumnum/minimumnum for fmax/fmin.

One thing to note is that GCC uses fmaxnm on AArch64 for fmax, which appears incorrect as per the discussion above: https://godbolt.org/z/W3z1G1z3c

arsenm · 2025-04-25T14:55:16Z

One thing to note is that GCC uses fmaxnm on AArch64 for fmax, which appears incorrect as per the discussion above: https://godbolt.org/z/W3z1G1z3c

No, not incorrect. Or depends on your reading of the spec and at what point in time. glibc had the snan-as-qnan behavior until https://sourceware.org/bugzilla/show_bug.cgi?id=20947. The C99 wording was less clear than the C23 wording, which made it more clear that fmin/fmax should follow the IEEE-754 2008 behavior that llvm.maxnum is now supposed to have

arsenm · 2025-04-25T14:59:48Z

One thing to note is that GCC uses fmaxnm on AArch64 for fmax, which appears incorrect as per the discussion above: https://godbolt.org/z/W3z1G1z3c

No, not incorrect. Or depends on your reading of the spec and at what point in time. glibc had the snan-as-qnan behavior until https://sourceware.org/bugzilla/show_bug.cgi?id=20947. The C99 wording was less clear than the C23 wording, which made it more clear that fmin/fmax should follow the IEEE-754 2008 behavior that llvm.maxnum is now supposed to have

Or maybe it was just clarified that signaling nans will be quieted, I'm not sure without rereading all that

davemgreen · 2025-04-25T15:53:51Z

Yeah - My understanding was that the behaviour of fmax in glibc is equivalent to maxnum, which is equivalent to fmaxnm. So the patch in #112852 brought the documentation in line how it always worked for AArch64/Arm, and what fmax should always have produced. It would certainly be nice for min/max reductions to vectorize without nonan, but generating 3 instructions for every fmin/fmax would be a shame too. (Hopefully some of them can detect no-snan from the generating instruction and not hit the issue, but it likely won't help everywhere).

fhahn · 2025-04-25T16:06:27Z

Yeah - My understanding was that the behaviour of fmax in glibc is equivalent to maxnum, which is equivalent to fmaxnm. So the patch in #112852 brought the documentation in line how it always worked for AArch64/Arm, and what fmax should always have produced. It would certainly be nice for min/max reductions to vectorize without nonan, but generating 3 instructions for every fmin/fmax would be a shame too. (Hopefully some of them can detect no-snan from the generating instruction and not hit the issue, but it likely won't help everywhere).

I guess the question is if there's anything we could do to still match GCC's behavior? It would indeed be a shame if we would have to generate more instructions than gcc ;)

For the vector-reduction case, we will need to generate one extra op to silence the NaN, as the reduction value is guaranteed no-snan, if the start value isnt' snan.

fhahn · 2025-04-25T16:08:31Z

t would certainly be nice for min/max reductions to vectorize without nonan, but generating 3 instructions for every fmin/fmax would be a shame too.

I guess with the new intrinsics definitions, generating maxnum for fmax is technically incorrect, and would have to be fixed regardless? Or if we accept to keep ignoring snans, we should also ignore them for reductions of maxnum, like gcc

davemgreen · 2025-04-25T16:22:37Z

I guess with the new intrinsics definitions, generating maxnum for fmax is technically incorrect, and would have to be fixed regardless?

My understanding is that it is now correct. With the new definition glibc fmax() == llvm.maxnum == aarch64 fmaxnm.

GCC might have an equally troubled past with snan as clang, and treats it more like nan under fast-math. (But not quite like poison, that goes too far some times). They have a -fsignaling-nans from https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fsignaling-nans, which is still described as experimental. And https://gcc.gnu.org/wiki/FloatingPointMath lists what is supported and not under the default options. It "does not care about signalling NaNs" under the default fp model (which could be interpreted be a little different from treating them as numbers vs nans in max).

I would be happy if snan were relegated to require strict-fp and 'unsupported' otherwise, but I'm not sure people would really like that response. It would allow all architectures to be happier though, with those supporting 2008 fmax behaviour and those supporting the newer ieee 2019 to both produce good code generation.

davemgreen · 2025-04-25T16:24:29Z

For the vector-reduction case, we will need to generate one extra op to silence the NaN, as the reduction value is guaranteed no-snan, if the start value isnt' snan.

Do you have a motivating case? Does it look like maxnum(reduct_iv, load) or something more complex like maxnum(reduct_iv, fp_op(..)). With the second, if you know it is not snan, then I was wondering if it generating the reduction could still be valid.

fhahn · 2025-04-25T18:01:47Z

For the vector-reduction case, we will need to generate one extra op to silence the NaN, as the reduction value is guaranteed no-snan, if the start value isnt' snan.

Do you have a motivating case? Does it look like maxnum(reduct_iv, load) or something more complex like maxnum(reduct_iv, fp_op(..)). With the second, if you know it is not snan, then I was wondering if it generating the reduction could still be valid.

We should vectorize both, my motivating case is the first one. But even when using maximumnum, we should be able to lower maximumnum(rdx_phi, fp_op() to fmaxnm without anything else; We could also do that with maxnum(reduct_iv, fp_op(..)), but it gets a bit fuzzy and relies on nothing removing the fp_op later, which would make things incorrect again

davemgreen · 2025-04-25T18:36:50Z

For the vector-reduction case, we will need to generate one extra op to silence the NaN, as the reduction value is guaranteed no-snan, if the start value isnt' snan.

Do you have a motivating case? Does it look like maxnum(reduct_iv, load) or something more complex like maxnum(reduct_iv, fp_op(..)). With the second, if you know it is not snan, then I was wondering if it generating the reduction could still be valid.

We should vectorize both, my motivating case is the first one. But even when using maximumnum, we should be able to lower maximumnum(rdx_phi, fp_op() to fmaxnm without anything else; We could also do that with maxnum(reduct_iv, fp_op(..)), but it gets a bit fuzzy and relies on nothing removing the fp_op later, which would make things incorrect again

Good point, yeah.

arsenm · 2025-04-25T21:01:18Z

For the vector-reduction case, we will need to generate one extra op to silence the NaN, as the reduction value is guaranteed no-snan, if the start value isnt' snan.

We should be able to avoid the canonicalizes in the minimumnum codegen in most cases. In AMDGPU we have a DAG optimization to try to prune out redundant canonicalizes during selection. It's theoretically broken if an incoming FP operation that should canonicalizes ends up dropping the canonicalize later. If we strengthened the CodeGen rules to disallow the canonicalize dropping we permit in the IR, we could do a better job cleaning these out in the combiner.

fhahn requested review from davemgreen, annamthomas, andykaylor and efriedma-quic April 23, 2025 15:25

llvmbot added llvm:analysis llvm:transforms labels Apr 23, 2025

fhahn closed this Apr 25, 2025

fhahn deleted the lv-vectorize-maxnum-reduction branch April 25, 2025 14:50

[IVDescriptors] Don't require nsz/nnan for (min|max)num. #137003

[IVDescriptors] Don't require nsz/nnan for (min|max)num. #137003

Uh oh!

Conversation

fhahn commented Apr 23, 2025

Uh oh!

llvmbot commented Apr 23, 2025

Uh oh!

llvmbot commented Apr 23, 2025

Uh oh!

andykaylor commented Apr 23, 2025

Uh oh!

nikic commented Apr 23, 2025

Uh oh!

andykaylor commented Apr 23, 2025

Uh oh!

arsenm commented Apr 23, 2025

Uh oh!

stephentyrone commented Apr 24, 2025

Uh oh!

andykaylor commented Apr 24, 2025

Uh oh!

stephentyrone commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Apr 24, 2025

Uh oh!

stephentyrone commented Apr 24, 2025

Uh oh!

fhahn commented Apr 25, 2025

Uh oh!

arsenm commented Apr 25, 2025

Uh oh!

arsenm commented Apr 25, 2025

Uh oh!

davemgreen commented Apr 25, 2025

Uh oh!

fhahn commented Apr 25, 2025

Uh oh!

fhahn commented Apr 25, 2025

Uh oh!

davemgreen commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davemgreen commented Apr 25, 2025

Uh oh!

fhahn commented Apr 25, 2025

Uh oh!

davemgreen commented Apr 25, 2025

Uh oh!

arsenm commented Apr 25, 2025

Uh oh!

Uh oh!

stephentyrone commented Apr 24, 2025 •

edited

Loading

davemgreen commented Apr 25, 2025 •

edited

Loading