Skip to content

Carry ExtractMostSignificantBits through to LIR and add constant folding support #117673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 16, 2025

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Jul 15, 2025

This doesn't update the IR to take advantage of any special patterns yet.

It does, however, simplify the codegen for V128<byte>.ExtractMostSignificantBits.

The logic was previously:

    op1 = op1 & Vector128.Create<ulong>(0x8080808080808080).AsByte();
    op1 = AdvSimd.ShiftLogical(op1, Vector128.Create<ulong>(0x00FFFEFDFCFBFAF9).AsSByte());

    return (Vector128.Sum(op1.GetUpper()) << 8) | Vector128.Sum(op1.GetLower());

The updated logic is:

    op1 = op1 & Vector128.Create<ulong>(0x8080808080808080).AsByte();
    op1 = AdvSimd.ShiftLogical(op1, Vector128.Create<ulong>(0x00FFFEFDFCFBFAF9).AsSByte());

    var tmp = AdvSimd.ZeroExtendWideningUpper(op1);
    tmp = AdvSimd.ShiftLeftLogical(tmp, 8);
    tmp = AdvSimd.AddWideningLower(tmp, op1.GetLower());

    return Vector128.Sum(tmp);

The original logic would generate:

            movi    v16.16b, #0x80
            and     v16.16b, v0.16b, v16.16b
            ldr     q17, [@RWD00]
            ushl    v16.16b, v16.16b, v17.16b
            mov     v17.16b, v16.16b
            addv    b17, v17.8b
            umov    w0, v17.b[0]
            ext     v16.16b, v16.16b, v16.16b, #8
            addv    b16, v16.8b
            umov    w1, v16.b[0]
            orr     w0, w0, w1,  LSL #8

While the newer logic is a bit smaller and avoids a second more expensive addv instruction:

            movi    v16.16b, #0x80
            and     v16.16b, v0.16b, v16.16b
            ldr     q17, [@RWD00]
            ushl    v16.16b, v16.16b, v17.16b
            uxtl2   v17.8h, v16.16b
            shl     v17.8h, v17.8h, #8
            uaddw   v16.8h, v17.8h, v16.8b
            addv    h16, v16.8h
            umov    w0, v16.h[0]

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 15, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tannergooding
Copy link
Member Author

Diffs look positive and it's a nice throughput improvement as well.

The few regressions are from places we have multiple ExtractMostSignificantBits() calls and the relevant constants are no longer able to be CSE'd, which is just another variant of #70182.

We should get even bigger wins if we add some optimizations for particular x.ExtractMostSignificantBits() patterns (i.e. cases like x.ExtractMSB() == 0 and such).

@tannergooding
Copy link
Member Author

/azp run Fuzzlyn

@tannergooding tannergooding marked this pull request as ready for review July 15, 2025 18:40
@tannergooding tannergooding requested review from Copilot and EgorBo July 15, 2025 18:40
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR carries the ExtractMostSignificantBits intrinsic through to the LIR (Low-level Intermediate Representation) and adds constant folding support. The main goal is to enable better codegen optimization for SIMD operations that extract the most significant bits from vector elements.

  • Removes early expansion of ExtractMostSignificantBits intrinsics during import phase
  • Adds constant folding capabilities for ExtractMostSignificantBits in both value numbering and expression folding
  • Implements LIR-level rewriting for ExtractMostSignificantBits with platform-specific optimizations

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/coreclr/jit/valuenum.cpp Adds constant folding support for ExtractMostSignificantBits in value numbering
src/coreclr/jit/simd.h Implements EvaluateExtractMSB template functions for constant evaluation
src/coreclr/jit/rationalize.h Declares RewriteHWIntrinsicExtractMsb method for LIR rewriting
src/coreclr/jit/rationalize.cpp Implements platform-specific LIR rewriting for ExtractMostSignificantBits
src/coreclr/jit/hwintrinsicxarch.cpp Removes early expansion logic for x86/x64 short/ushort cases
src/coreclr/jit/hwintrinsiclistxarch.h Updates intrinsic flags to enable special import and disable early codegen
src/coreclr/jit/hwintrinsiclistarm64.h Updates intrinsic flags to enable special import and disable early codegen
src/coreclr/jit/hwintrinsicarm64.cpp Removes early expansion logic for ARM64 ExtractMostSignificantBits
src/coreclr/jit/gentree.cpp Adds constant folding support for ExtractMostSignificantBits in expression folding

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@tannergooding
Copy link
Member Author

/ba-g unrelated android timeout and image acquisition failure that passed on last run.

@tannergooding tannergooding merged commit c0d4efe into dotnet:main Jul 16, 2025
105 of 111 checks passed
@tannergooding tannergooding deleted the arm64-extractmsb branch July 16, 2025 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants