Skip to content

JIT: Allow more containment opts in Tier0 #117622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 22, 2025

Conversation

saucecontrol
Copy link
Member

@saucecontrol saucecontrol commented Jul 14, 2025

This enables embedded broadcast of non-const values in Tier0

Diffs are a net improvement, although there are a few regressions where an extra temp ends up being introduced due to arg swapping.

There are also a few 1- or 2-byte regressions where we swapped from containing a full vector load arg to containing a broadcast arg, which then forces EVEX encoding. It would be interesting to look at optimizing around that (separately -- it would impact FullOpts as well)

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 14, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 14, 2025
@saucecontrol
Copy link
Member Author

cc @tannergooding

@tannergooding
Copy link
Member

There are also a few 1- or 2-byte regressions where we swapped from containing a full vector load arg to containing a broadcast arg

We view this as an explicit improvement and the real "issue" is more that SPMI doesn't surface any size savings in the data section size. -- That is, while the codegen is 1-2 bytes bigger, we save 8-60 bytes of data section size and improve cache locality.

@saucecontrol
Copy link
Member Author

We view this as an explicit improvement and the real "issue" is more that SPMI doesn't surface any size savings in the data section size. -- That is, while the codegen is 1-2 bytes bigger, we save 8-60 bytes of data section size and improve cache locality.

The cases I'm referring to are like this:
image

where it's a broadcast either way, and we can contain either the broadcast or the full vector. It's always 2 instructions because they can't both be contained. Switching from containing the full vector to containing the broadcast means you have to switch to EVEX, so it's a net increase in size.

This particular regression only applies to instructions where we swap operands in order to be able to contain one, so I think we could simply give lower preference to CnsVec operands that might be turned into broadcast. Or something like that?

@tannergooding
Copy link
Member

This particular regression only applies to instructions where we swap operands in order to be able to contain one, so I think we could simply give lower preference to CnsVec operands that might be turned into broadcast. Or something like that?

Ah, I see.

Yeah, in general we want to prefer loads from arbitrary memory, then broadcastable constants, then regular constants.

@saucecontrol
Copy link
Member Author

Disabled the aligned load containment. Diffs are smaller but still a net improvement.

@saucecontrol
Copy link
Member Author

I've split the TryFoldCnsVecForEmbeddedBroadcast changes out into to #117700

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. CC. @dotnet/jit-contrib for secondary review

@tannergooding
Copy link
Member

/ba-g unrelated arm64 timeouts

@tannergooding tannergooding merged commit 0b2f272 into dotnet:main Jul 22, 2025
102 of 110 checks passed
@saucecontrol saucecontrol deleted the more-t0-opts branch July 22, 2025 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants