Allow the vector and scalar code paths for min, max, and fma APIs to share logic #116804

tannergooding · 2025-06-19T03:02:24Z

This fixes #116803
This fixes #115381

dotnet-policy-service · 2025-06-19T03:03:10Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

…trinsics

…ax as intrinsic

Copilot

Pull Request Overview

This PR consolidates the vector and scalar code paths for math intrinsics (min, max, and FMA) by sharing common logic across different platforms and updating the intrinsic lowering functions accordingly. Key changes include:

Removal of duplicate intrinsic case blocks in ValueNumStore and selective compilation using TARGET_RISCV64.
Renaming and refactoring of the FMA lowering method from LowerFusedMultiplyAdd to LowerFusedMultiplyOp.
Updates to intrinsic lists and helper APIs to support unified handling for min/max variants.

Reviewed Changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/coreclr/jit/valuenum.cpp	Removed redundant intrinsic cases for max/min and encapsulated platform‑specific handling.
src/coreclr/jit/lowerxarch.cpp	Renamed and refactored the FMA lowering routine to share logic among different intrinsics.
src/coreclr/jit/namedintrinsiclist.h	Added new intrinsic entries (e.g. NI_System_Math_MaxNative/MinNative) for unified API support.
src/coreclr/jit/hwintrinsiclistxarch.h	Updated intrinsic declarations to include new min/max variants.
src/coreclr/jit/compiler.h	Consolidated helper API signatures for SIMD min/max nodes into unified functions.
Other files	Adjusted intrinsic processing and lowering logic across xarch, arm64, and codegen modules.

Comments suppressed due to low confidence (3)

src/coreclr/jit/lowerxarch.cpp:1386

Renaming the intrinsic lowering method to LowerFusedMultiplyOp improves clarity; please ensure that all call sites and related documentation are updated accordingly.

void Lowering::LowerFusedMultiplyOp(GenTreeHWIntrinsic* node)

src/coreclr/jit/compiler.h:3338

With the introduction of unified APIs for min/max intrinsics, please confirm that all consumers of the previous gtNewSimdMaxNode/gtNewSimdMinNativeNode functions have been updated to use the new unified APIs.

                                 bool        isNumber);

tannergooding · 2025-06-20T19:08:48Z

CC. @dotnet/jit-contrib, @jakobbotsch, @kunalspathak for review.

diffs are very positive. We see -77.3k bytes for windows arm64 in full opts due to the significantly more compact codegen as compared to the naive inlined version. There is a +11.6k in minopts regression, which is due to us having expanded intrinsics instead of calls.

The diffs for x64 are much smaller since we were already accelerating most scalar scenarios already. There are some opportunities to improve the codegen more, but it would involve more work than what this did, which was mostly centralizing the existing logic.

Throughput ranges from a slight improvement (-0.07% on the high end for Windows Arm64 full opts) to a slight regression (+0.04% on the high end for Windows x64 full opts)

kunalspathak · 2025-06-20T22:05:30Z

do you know why linux/x64 is also seeing regression?

src/coreclr/jit/hwintrinsicxarch.cpp

src/coreclr/jit/importercalls.cpp

src/coreclr/jit/gentree.cpp

tannergooding · 2025-06-20T22:32:49Z

do you know why linux/x64 is also seeing regression?

Looks like it generated code for AVX2 rather than AVX512 (likely just a difference in Azure machine allocation):

; Windows 64 - System.MathBenchmarks.Single:MaxTest() (Instrumented Tier0)
        vaddsd   xmm0, xmm0, xmm2
+       vmovaps  xmm5, xmm3
+       vmovaps  xmm16, xmm0
+       vrangesd xmm17, xmm5, xmm16, 5
+       vfixupimmsd xmm5, xmm16, xmm4, 0
+       vfixupimmsd xmm17, xmm5, xmm4, 0
        vaddsd   xmm6, xmm17, xmm6

; Linux x64 - System.MathBenchmarks.Single:MinTest() (FullOpts)
+       vsubss   xmm1, xmm1, xmm2
+       vmovaps  xmm4, xmm3
+       vmovaps  xmm5, xmm1
+       vcmpps   xmm6, xmm4, xmm5, 0
+       vxorps   xmm7, xmm7, xmm7
+       vpcmpgtd xmm7, xmm7, xmm4
+       vandps   xmm6, xmm7, xmm6
+       vcmpps   xmm7, xmm4, xmm4, 4
+       vorps    xmm6, xmm7, xmm6
+       vcmpps   xmm7, xmm4, xmm5, 1
+       vorps    xmm6, xmm7, xmm6
+       vblendvps xmm4, xmm5, xmm4, xmm6
+       vaddss   xmm0, xmm4, xmm0

The code is still much better than the previous pattern, it's just larger than the naive code with 4+ branches. This also means more places are inlined, where-as previously they were calls.

kunalspathak · 2025-06-21T00:35:49Z

/azp run Antigen, Fuzzlyn

azure-pipelines · 2025-06-21T00:36:05Z

Azure Pipelines successfully started running 2 pipeline(s).

kunalspathak

LGTM as long as Antigen/Fuzzlyn is green.

tannergooding · 2025-06-21T04:12:30Z

/azp run Antigen, Fuzzlyn

azure-pipelines · 2025-06-21T04:12:42Z

Azure Pipelines successfully started running 2 pipeline(s).

tannergooding added 2 commits June 18, 2025 19:06

Allow the vector and scalar code paths for min/max APIs to share logic

ab69cce

Allow the vector and scalar code paths for fma APIs to share logic

f2d2b9b

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 19, 2025

dotnet-policy-service bot assigned tannergooding Jun 19, 2025

tannergooding force-pushed the minmax-improvement branch from dca4623 to 4eb60ff Compare June 19, 2025 04:18

Ensure max is handled for simdSize 32 and 64

2ef03f8

tannergooding force-pushed the minmax-improvement branch from 4eb60ff to 2ef03f8 Compare June 19, 2025 04:32

Fix the riscv64 build

6a5f1a7

tannergooding force-pushed the minmax-improvement branch from 682ef24 to 0ac1e3f Compare June 19, 2025 16:11

tannergooding and others added 5 commits June 19, 2025 09:11

Ensure integer min/max magnitude is handled

0ac1e3f

Ensure the parameter ordering is correct

66522d2

Apply formatting patch

5f8ed93

Ensure that the remaining code paths execute when importing minMax in…

7d73033

…trinsics

Ensure we don't pop the args on platforms which aren't treating min/m…

7a92db2

…ax as intrinsic

tannergooding marked this pull request as ready for review June 20, 2025 18:59

Copilot AI review requested due to automatic review settings June 20, 2025 18:59

Copilot AI reviewed Jun 20, 2025

View reviewed changes

kunalspathak reviewed Jun 20, 2025

View reviewed changes

src/coreclr/jit/hwintrinsicxarch.cpp Show resolved Hide resolved

kunalspathak reviewed Jun 20, 2025

View reviewed changes

src/coreclr/jit/importercalls.cpp Show resolved Hide resolved

kunalspathak reviewed Jun 20, 2025

View reviewed changes

src/coreclr/jit/gentree.cpp Show resolved Hide resolved

Responding to PR feedback

4ec31eb

kunalspathak approved these changes Jun 21, 2025

View reviewed changes

build-analysis bot mentioned this pull request Jun 21, 2025

MSBuild crashing in the build #92290

Open

Check IsVectorNegativeZero using the node's simd base type

9e5edb9

tannergooding merged commit f124c0e into dotnet:main Jun 21, 2025
121 of 130 checks passed

tannergooding deleted the minmax-improvement branch June 21, 2025 12:11

tannergooding mentioned this pull request Jun 21, 2025

Ensure FMA optimizations kick in under embedded broadcast #116891

Merged

This was referenced Jun 24, 2025

[Perf] Linux/x64: 6 Improvements on 6/21/2025 12:11:31 PM +00:00 dotnet/perf-autofiling-issues#58240

Closed

[Perf] Windows/x64: 8 Improvements on 6/21/2025 12:11:31 PM +00:00 dotnet/perf-autofiling-issues#58219

Closed

DrewScoggins mentioned this pull request Jun 24, 2025

[Perf] Windows/x64: 6 Regressions on 6/21/2025 12:11:31 PM +00:00 #116971

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow the vector and scalar code paths for min, max, and fma APIs to share logic #116804

Allow the vector and scalar code paths for min, max, and fma APIs to share logic #116804

Uh oh!

tannergooding commented Jun 19, 2025 •

edited

Loading

Uh oh!

dotnet-policy-service bot commented Jun 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

tannergooding commented Jun 20, 2025

Uh oh!

kunalspathak commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tannergooding commented Jun 20, 2025

Uh oh!

kunalspathak commented Jun 21, 2025

Uh oh!

azure-pipelines bot commented Jun 21, 2025

Uh oh!

kunalspathak left a comment

Uh oh!

tannergooding commented Jun 21, 2025

Uh oh!

azure-pipelines bot commented Jun 21, 2025

Uh oh!

Uh oh!

Uh oh!

Allow the vector and scalar code paths for min, max, and fma APIs to share logic #116804

Allow the vector and scalar code paths for min, max, and fma APIs to share logic #116804

Uh oh!

Conversation

tannergooding commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Jun 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

tannergooding commented Jun 20, 2025

Uh oh!

kunalspathak commented Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tannergooding commented Jun 20, 2025

Uh oh!

kunalspathak commented Jun 21, 2025

Uh oh!

azure-pipelines bot commented Jun 21, 2025

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

tannergooding commented Jun 21, 2025

Uh oh!

azure-pipelines bot commented Jun 21, 2025

Uh oh!

Uh oh!

Uh oh!

tannergooding commented Jun 19, 2025 •

edited

Loading