-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Adding push2/pop2 #116035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding push2/pop2 #116035
Conversation
e9713e7
to
7dcab65
Compare
7dcab65
to
b960c6e
Compare
b960c6e
to
f25f57b
Compare
04bdc1b
to
9e1a9f5
Compare
@dotnet/samsung Could you please take a look? These changes may be related to riscv64. |
RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9083 / 9113 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283571 / 284631 (99.63%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 305223 / 306944 (99.44%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9083 / 9113 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283315 / 284395 (99.62%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 299368 / 301100 (99.42%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
66245b1
to
68903ee
Compare
RISC-V Release-CLR-QEMU: 9084 / 9114 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
68903ee
to
908cedb
Compare
RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 260891 / 261953 (99.59%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 299817 / 301551 (99.42%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
908cedb
to
bc308aa
Compare
RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283658 / 284736 (99.62%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
bc308aa
to
6b2c687
Compare
6b2c687 is being scheduled for building and testingGIT: |
RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
@kunalspathak @EgorBo This PR is ready for review |
6b2c687 is being scheduled for building and testingGIT: |
1 similar comment
6b2c687 is being scheduled for building and testingGIT: |
RISC-V Release-FX-QEMU: 261199 / 262260 (99.60%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
6b2c687 is being scheduled for building and testingGIT: |
1 similar comment
6b2c687 is being scheduled for building and testingGIT: |
RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283661 / 284755 (99.62%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9083 / 9113 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 307376 / 309101 (99.44%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
src/coreclr/jit/emitxarch.h
Outdated
} | ||
else | ||
{ | ||
assert((instOptions & INS_OPTS_APX_ppx_MASK) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is always true because it is inside else
of the opposite condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed else
src/coreclr/jit/emitxarch.cpp
Outdated
// Setting EVEX.W = 1 bit indicates a push-pop acceleration (PPX) hint | ||
// The current recommendation is to use PUSH2/POP2 only with PPX hint | ||
// So, it is used only in Epilog/Prolog code generation | ||
if (id->idIsApxPpxContextSet()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it always true for push2
and pop2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
push2
/pop2
are new instructions. PPX
is a feature which push'/
pop/
pop2/
push2can use. So theoretically, we might have a
push2/
pop2without
PPX enabled. We currently only use push2
/pop2
with PPX
since that's the guidance(the guidance is for performance reasons) but that doesn't mean push2
/pop2
cannot be used without PPX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we convert it to assert(id->idIsApxPpxContextSet())
then and update it in future when there is a need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Works for me. This means I have to remote the unit tests with no PPX I included to verify encoding since that'll fail this assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments.
Superpmi result with/without PPX feature
Did you get a chance to check if all the perfscore is coming just from prolog and epilog? Also, is it because we have not setup the perf latency/throughput numbers accurately (I see a TODO there and added a comment about it) and hence perfscore is just because of reduced in number of instructions?
I did check and it's entirely coming from The guidance I got was |
src/coreclr/jit/codegenxarch.cpp
Outdated
@@ -9404,6 +9404,15 @@ void CodeGen::genAmd64EmitterUnitTestsApx() | |||
theEmitter->emitIns_R_R_R(INS_pext, EA_4BYTE, REG_R16, REG_R18, REG_R17); | |||
theEmitter->emitIns_R_R_R(INS_pext, EA_8BYTE, REG_R16, REG_R18, REG_R17); | |||
|
|||
theEmitter->emitIns_R_R(INS_push2, EA_PTRSIZE, REG_R16, REG_R17, INS_OPTS_EVEX_nd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assume we have verified that this encoding works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's verified. Running these tests with some env variables encodes using JIT and decodes using coredistools(llvm toolchain). That helps verify the encoding, Additionally verified using xed
I did remove the tests without PPX from here due to this - #116035 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some minor comments. Also, please fix the formatting.
/azp run runtime-coreclr superpmi-replay-apx |
No pipelines are associated with this pull request. |
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
PR Overview
This PR does the following
APX and PPX
As part of Intel APX(Intel Advanced Performance Extensions), a couple of new features are available for working with stack
This write up with be focused on push2/pop2.
PUSH2/POP2
PUSH2 and POP2 are two new instructions for (respectively) pushing/popping 2 GPRs at a time to/from
the stack. These instructions use eEVEX encoding. The data being pushed/popped by PUSH2/POP2 must be 16B-aligned on the stack.
Guidance from Intel
It’s not part of the spec but in current implementations, push2/pop2 should really only be used with PPX hints and thus should only be used in matching “pairs”. i.e.
Unwind code
Windows does not current support unwind for push2. After discussion with Kunal, I decided to use 2
unwwind_push()
to simulatepush2
. This will need to be updated later once we have supportTesting done
Superpmi result with/without PPX feature
Diffs are based on 2,602,472 contexts (1,012,864 MinOpts, 1,589,608 FullOpts).
MISSED contexts: 17 (0.00%)
Base JIT options: JitBypassApxCheck=1
Diff JIT options: EnableApxPPX=1;JitBypassApxCheck=1
Overall (+15,581,843 bytes)
MinOpts (+527,206 bytes)
FullOpts (+15,054,637 bytes)
Sample diff
-6 (-17.65%) : 17249.dasm - System.Globalization.CalendarData:LoadCalendarDataFromSystemCore(System.String,ushort):bool:this (FullOpts)
-6 (-16.67%) : 7885.dasm - Microsoft.Extensions.Logging.LoggerMessage+<>c__DisplayClass12_0`2[int,System.__Canon]:b__1(Microsoft.Extensions.Logging.ILogger,int,System.__Canon,System.Exception):this (FullOpts)
-9 (-16.67%) : 13852.dasm - System.Linq.Enumerable+ArrayWhereSelectIterator`2[System.__Canon,System.__Canon]:GetCount(bool,System.ReadOnlySpan`1[System.__Canon],System.Func`2[System.__Canon,bool],System.Func`2[System.__Canon,System.__Canon]):int (FullOpts)