Skip to content

Refactoring the xarch perfScore logic to use a lookup table where trivial #117123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 29, 2025

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Jun 29, 2025

This does some basic cleanup of the xarch perfScore logic to use a lookup table for instructions where it's trivial to do. This makes it more clear that new instructions need to consider this field and makes it easier to see where custom handling does exist.

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 29, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tannergooding tannergooding marked this pull request as ready for review June 29, 2025 15:30
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the x86/x64 instruction performance scoring by replacing many explicit getInsExecutionCharacteristics cases with two lookup tables (insLatencyInfos and insThroughputInfos) and updating the instruction macros to include latency (lat) and throughput (tp) parameters.

  • Updated INST0INST5 macros in instr.h and related uses in instr.cpp and emitxarch.cpp to accept lat and tp before tt and flags.
  • Introduced insLatencyInfos and insThroughputInfos tables in emitxarch.cpp, driving default latency/throughput.
  • Simplified the large switch in getInsExecutionCharacteristics to use the lookup tables in a unified default case, removing many individual case blocks.

Reviewed Changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated no comments.

File Description
src/coreclr/jit/instr.h Adjusted INST* macros to include new lat and tp fields
src/coreclr/jit/instr.cpp Updated name table macros to match the new macro signatures
src/coreclr/jit/emitxarch.cpp Added latency/throughput tables and replaced many cases with lookup in getInsExecutionCharacteristics
Comments suppressed due to low confidence (1)

@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib, @AndyAyersMS for review.

Just some cleanup/simplification of the perfScore logic for xarch to allow it being table driven where trivial to do.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect this to change perf scores anywhere?

@tannergooding
Copy link
Member Author

Do you expect this to change perf scores anywhere?

No, I explicitly kept them "as is" for now. I do think there's some changes we could make to normalize values a bit more, but that should be separate I think.

In particular I think the places we'd want to look at in the future would be resolving the mix between skylake-x (pre-AVX512) and icelake (post-AVX512) we currently have, which leads to some unintended noise. We also don't have any recognition of the AVX512 embedded mask/broadcast "costs" and have this nuance where contained loads/stores show up as a perf score regression in many cases due to it changing the throughput score

@tannergooding tannergooding merged commit b97f98b into dotnet:main Jun 29, 2025
111 of 113 checks passed
@tannergooding tannergooding deleted the perfscore branch June 29, 2025 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants