-
Notifications
You must be signed in to change notification settings - Fork 5.1k
JIT: Allow more containment opts in Tier0 #117622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4b78536
to
b042369
Compare
We view this as an explicit improvement and the real "issue" is more that SPMI doesn't surface any size savings in the data section size. -- That is, while the codegen is 1-2 bytes bigger, we save 8-60 bytes of data section size and improve cache locality. |
Ah, I see. Yeah, in general we want to prefer loads from arbitrary memory, then broadcastable constants, then regular constants. |
Disabled the aligned load containment. Diffs are smaller but still a net improvement. |
I've split the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. CC. @dotnet/jit-contrib for secondary review
/ba-g unrelated arm64 timeouts |
This enables embedded broadcast of non-const values in Tier0
Diffs are a net improvement, although there are a few regressions where an extra temp ends up being introduced due to arg swapping.
There are also a few 1- or 2-byte regressions where we swapped from containing a full vector load arg to containing a broadcast arg, which then forces EVEX encoding. It would be interesting to look at optimizing around that (separately -- it would impact FullOpts as well)