Skip to content

Adding push2/pop2 #116035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 24, 2025
Merged

Adding push2/pop2 #116035

merged 6 commits into from
Jun 24, 2025

Conversation

DeepakRajendrakumaran
Copy link
Contributor

@DeepakRajendrakumaran DeepakRajendrakumaran commented May 27, 2025

PR Overview

This PR does the following

  1. Enable Push2 and Pop2 instructions
  2. Enable PPX features for Push/Pop/Push2/Pop2
  3. Modify function epilog/prolog to use push2/pop2 and PPX

APX and PPX

As part of Intel APX(Intel Advanced Performance Extensions), a couple of new features are available for working with stack

  • PUSH2/POP2 instructions that transfer two register values within a single memory operation.
  • PPX (push-pop acceleration) : A PPX hint that helps processor tracks these new instructions internally and fast-forwards register data between matching PUSH2 and POP2 instructions without going through memory. This is also applicable for PUSH/POP with REX2 encoding

This write up with be focused on push2/pop2.

PUSH2/POP2

PUSH2 and POP2 are two new instructions for (respectively) pushing/popping 2 GPRs at a time to/from
the stack. These instructions use eEVEX encoding. The data being pushed/popped by PUSH2/POP2 must be 16B-aligned on the stack.

Guidance from Intel

It’s not part of the spec but in current implementations, push2/pop2 should really only be used with PPX hints and thus should only be used in matching “pairs”. i.e.

push2.p
…
pop2.p

Unwind code

Windows does not current support unwind for push2. After discussion with Kunal, I decided to use 2 unwwind_push() to simulate push2. This will need to be updated later once we have support

Testing done

  1. Emitter unit tests added and checked to verify encoding
  2. superpmi ran with APX enabled

Superpmi result with/without PPX feature

Diffs are based on 2,602,472 contexts (1,012,864 MinOpts, 1,589,608 FullOpts).

MISSED contexts: 17 (0.00%)

Base JIT options: JitBypassApxCheck=1

Diff JIT options: EnableApxPPX=1;JitBypassApxCheck=1

Overall (+15,581,843 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs Base Instruction Count Diff Instruction Count
benchmarks.run.windows.x64.checked.mch 12,265,429 +449,098 -1.75% 3080529 -86,249(-2.98%)(-3.32%)
benchmarks.run_pgo.windows.x64.checked.mch 65,974,296 +1,060,625 -2.64% 15365972 -224,793(-2.16%)(-2.36%)
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch 12,617,955 +461,447 -1.74% 3167977 -88,298(-2.97%)(-3.31%)
coreclr_tests.run.windows.x64.checked.mch 410,382,821 +3,285,229 -1.94% 85035916 -676,431(-2.26%)(-2.46%)
libraries.pmi.windows.x64.checked.mch 57,910,005 +2,456,153 -2.00% 14674111 -420,750(-3.09%)(-3.57%)
libraries_tests.run.windows.x64.Release.mch 350,075,962 +3,750,526 -2.46% 76683341 -772,586(-1.97%)(-2.06%)
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 153,338,819 +3,692,459 -1.64% 35431157 -672,354(-2.09%)(-2.31%)
realworld.run.windows.x64.checked.mch 11,691,157 +378,585 -1.94% 2834926 -74,551(-2.77%)(-3.00%)
smoke_tests.nativeaot.windows.x64.checked.mch 5,507,705 +47,721 -1.89% 1544803 -8,336(-3.58%)(-4.29%)
MinOpts (+527,206 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs Base Instruction Count Diff Instruction Count
benchmarks.run_pgo.windows.x64.checked.mch 18,993,001 +33,556 -2.73% 4248074 -9,369(-2.75%)(-2.75%)
coreclr_tests.run.windows.x64.checked.mch 280,800,379 +190,604 -0.29% 56348876 -28,091(-0.91%)(-1.15%)
libraries_tests.run.windows.x64.Release.mch 198,087,087 +267,130 -1.01% 41226776 -59,783(-1.05%)(-1.08%)
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 10,686,322 +35,916 -0.07% 2487522 -403(-0.13%)(-0.29%)
FullOpts (+15,054,637 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs Base Instruction Count Diff Instruction Count
benchmarks.run.windows.x64.checked.mch 12,264,751 +449,098 -1.75% 3080340 -86,249(-2.98%)(-3.32%)
benchmarks.run_pgo.windows.x64.checked.mch 46,981,295 +1,027,069 -2.64% 11117898 -215,424(-2.14%)(-2.34%)
benchmarks.run_pgo_optrepeat.windows.x64.checked.mch 12,617,253 +461,447 -1.74% 3167770 -88,298(-2.97%)(-3.31%)
coreclr_tests.run.windows.x64.checked.mch 129,582,442 +3,094,625 -2.22% 28687040 -648,340(-2.41%)(-2.59%)
libraries.pmi.windows.x64.checked.mch 57,797,220 +2,456,153 -2.00% 14653829 -420,750(-3.09%)(-3.57%)
libraries_tests.run.windows.x64.Release.mch 151,988,875 +3,483,396 -2.75% 35456565 -712,803(-2.13%)(-2.23%)
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 142,652,497 +3,656,543 -1.70% 32943635 -671,951(-2.11%)(-2.32%)
realworld.run.windows.x64.checked.mch 11,466,278 +378,585 -1.94% 2799763 -74,551(-2.77%)(-3.00%)
smoke_tests.nativeaot.windows.x64.checked.mch 5,506,552 +47,721 -1.89% 1544498 -8,336(-3.58%)(-4.29%)

Sample diff

-6 (-17.65%) : 17249.dasm - System.Globalization.CalendarData:LoadCalendarDataFromSystemCore(System.String,ushort):bool:this (FullOpts)
@@ -42,13 +42,10 @@
 
 G_M54418_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        push     rbp
-       push     r15
-       push     r14
-       push     r13
-       push     r12
-       push     rdi
-       push     rsi
-       push     rbx
+       push2p   r15, r14
+       push2p   r13, r12
+       push2p   rdi, rsi
+       pushp    rbx
        sub      rsp, 104
        lea      rbp, [rsp+0xA0]
        mov      rbx, rcx
@@ -56,7 +53,7 @@ G_M54418_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
        mov      rsi, rdx
        ; gcrRegs +[rsi]
        mov      edi, r8d
-						;; size=33 bbWeight=1 PerfScore 9.50
+						;; size=43 bbWeight=1 PerfScore 6.50
 G_M54418_IG02:        ; bbWeight=1, gcrefRegs=0048 {rbx rsi}, byrefRegs=0000 {}, byref
        lea      rcx, [rbp-0x78]
        call     CORINFO_HELP_INIT_PINVOKE_FRAME
@@ -75,18 +72,15 @@ G_M54418_IG02:        ; bbWeight=1, gcrefRegs=0048 {rbx rsi}, byrefRegs=0000 {},
 						;; size=42 bbWeight=1 PerfScore 8.00
 G_M54418_IG03:        ; bbWeight=1, epilog, nogc, extend
        add      rsp, 104
-       pop      rbx
-       pop      rsi
-       pop      rdi
-       pop      r12
-       pop      r13
-       pop      r14
-       pop      r15
+       popp     rbx
+       pop2p    rsi, rdi
+       pop2p    r12, r13
+       pop2p    r14, r15
        pop      rbp
        ret      
-						;; size=17 bbWeight=1 PerfScore 5.25
+						;; size=27 bbWeight=1 PerfScore 5.25
 
-; Total bytes of code 92, prolog size 24, PerfScore 22.75, instruction count 34, allocated bytes for code 92 (MethodHash=8e392b6d) for method System.Globalization.CalendarData:LoadCalendarDataFromSystemCore(System.String,ushort):bool:this (FullOpts)
+; Total bytes of code 112, prolog size 34, PerfScore 19.75, instruction count 28, allocated bytes for code 112 (MethodHash=8e392b6d) for method System.Globalization.CalendarData:LoadCalendarDataFromSystemCore(System.String,ushort):bool:this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -94,17 +88,17 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x10
+  SizeOfProlog      : 0x1A
   CountOfUnwindCodes: 9
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x10 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 12 * 8 + 8 = 104 = 0x68
-    CodeOffset: 0x0C UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbx (3)
-    CodeOffset: 0x0B UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rsi (6)
-    CodeOffset: 0x0A UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rdi (7)
-    CodeOffset: 0x09 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r12 (12)
-    CodeOffset: 0x07 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r13 (13)
-    CodeOffset: 0x05 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r14 (14)
-    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r15 (15)
+    CodeOffset: 0x1A UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 12 * 8 + 8 = 104 = 0x68
+    CodeOffset: 0x16 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbx (3)
+    CodeOffset: 0x13 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rsi (6)
+    CodeOffset: 0x13 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rdi (7)
+    CodeOffset: 0x0D UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r12 (12)
+    CodeOffset: 0x0D UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r13 (13)
+    CodeOffset: 0x07 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r14 (14)
+    CodeOffset: 0x07 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r15 (15)
     CodeOffset: 0x01 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
-6 (-16.67%) : 7885.dasm - Microsoft.Extensions.Logging.LoggerMessage+<>c__DisplayClass12_0`2[int,System.__Canon]:b__1(Microsoft.Extensions.Logging.ILogger,int,System.__Canon,System.Exception):this (FullOpts)
@@ -17,11 +17,9 @@
 ; Lcl frame size = 32
 
 G_M15345_IG01:        ; bbWeight=1, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref, nogc <-- Prolog IG
-       push     r14
-       push     rdi
-       push     rsi
-       push     rbp
-       push     rbx
+       pushp    rbp
+       push2p   r14, rdi
+       push2p   rsi, rbx
        sub      rsp, 32
        mov      rbx, rcx
        ; gcrRegs +[rbx]
@@ -30,7 +28,7 @@ G_M15345_IG01:        ; bbWeight=1, gcVars=0000000000000000 {}, gcrefRegs=0000 {
        mov      ebp, r8d
        mov      rdi, r9
        ; gcrRegs +[rdi]
-						;; size=22 bbWeight=1 PerfScore 6.25
+						;; size=31 bbWeight=1 PerfScore 4.25
 G_M15345_IG02:        ; bbWeight=1, gcrefRegs=00C8 {rbx rsi rdi}, byrefRegs=0000 {}, byref, isz
        mov      edx, dword ptr [rbx+0x10]
        mov      rcx, rsi
@@ -45,13 +43,11 @@ G_M15345_IG02:        ; bbWeight=1, gcrefRegs=00C8 {rbx rsi rdi}, byrefRegs=0000
 G_M15345_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
        ; gcrRegs -[rbx rsi rdi]
        add      rsp, 32
-       pop      rbx
-       pop      rbp
-       pop      rsi
-       pop      rdi
-       pop      r14
+       pop2p    rbx, rsi
+       pop2p    rdi, r14
+       popp     rbp
        ret      
-						;; size=11 bbWeight=0.50 PerfScore 1.88
+						;; size=20 bbWeight=0.50 PerfScore 1.88
 G_M15345_IG04:        ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=00C8 {rbx rsi rdi}, byrefRegs=0000 {}, gcvars, byref, nogc
        ; gcrRegs +[rbx rsi rdi]
        mov      r14, gword ptr [rsp+0x70]
@@ -67,15 +63,13 @@ G_M15345_IG04:        ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=00C
 						;; size=22 bbWeight=0.50 PerfScore 1.50
 G_M15345_IG05:        ; bbWeight=0.50, epilog, nogc, extend
        add      rsp, 32
-       pop      rbx
-       pop      rbp
-       pop      rsi
-       pop      rdi
-       pop      r14
+       pop2p    rbx, rsi
+       pop2p    rdi, r14
+       popp     rbp
        tail.jmp [Microsoft.Extensions.Logging.LoggerMessage+<>c__DisplayClass12_0`2[int,System.__Canon]:<Define>g__Log|0(Microsoft.Extensions.Logging.ILogger,int,System.__Canon,System.Exception):this]
-						;; size=16 bbWeight=0.50 PerfScore 2.38
+						;; size=25 bbWeight=0.50 PerfScore 2.38
 
-; Total bytes of code 94, prolog size 10, PerfScore 18.75, instruction count 36, allocated bytes for code 94 (MethodHash=e9a4c40e) for method Microsoft.Extensions.Logging.LoggerMessage+<>c__DisplayClass12_0`2[int,System.__Canon]:<Define>b__1(Microsoft.Extensions.Logging.ILogger,int,System.__Canon,System.Exception):this (FullOpts)
+; Total bytes of code 121, prolog size 19, PerfScore 16.75, instruction count 30, allocated bytes for code 121 (MethodHash=e9a4c40e) for method Microsoft.Extensions.Logging.LoggerMessage+<>c__DisplayClass12_0`2[int,System.__Canon]:<Define>b__1(Microsoft.Extensions.Logging.ILogger,int,System.__Canon,System.Exception):this (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -83,14 +77,14 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x0A
+  SizeOfProlog      : 0x13
   CountOfUnwindCodes: 6
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x0A UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 3 * 8 + 8 = 32 = 0x20
-    CodeOffset: 0x06 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbx (3)
-    CodeOffset: 0x05 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
-    CodeOffset: 0x04 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rsi (6)
-    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rdi (7)
-    CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r14 (14)
+    CodeOffset: 0x13 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 3 * 8 + 8 = 32 = 0x20
+    CodeOffset: 0x0F UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbx (3)
+    CodeOffset: 0x0F UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rsi (6)
+    CodeOffset: 0x09 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rdi (7)
+    CodeOffset: 0x09 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r14 (14)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
-9 (-16.67%) : 13852.dasm - System.Linq.Enumerable+ArrayWhereSelectIterator`2[System.__Canon,System.__Canon]:GetCount(bool,System.ReadOnlySpan`1[System.__Canon],System.Func`2[System.__Canon,bool],System.Func`2[System.__Canon,System.__Canon]):int (FullOpts)
@@ -28,19 +28,16 @@
 ; Lcl frame size = 32
 
 G_M19035_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
-       push     r15
-       push     r14
-       push     r13
-       push     rdi
-       push     rsi
-       push     rbp
-       push     rbx
+       pushp    rbp
+       push2p   r15, r14
+       push2p   r13, rdi
+       push2p   rsi, rbx
        sub      rsp, 32
        mov      rbx, r9
        ; gcrRegs +[rbx]
        mov      rsi, gword ptr [rsp+0x80]
        ; gcrRegs +[rsi]
-						;; size=25 bbWeight=1 PerfScore 8.50
+						;; size=36 bbWeight=1 PerfScore 5.50
 G_M19035_IG02:        ; bbWeight=1, gcrefRegs=0048 {rbx rsi}, byrefRegs=0100 {r8}, byref, isz
        ; byrRegs +[r8]
        test     dl, dl
@@ -94,36 +91,30 @@ G_M19035_IG08:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
 						;; size=2 bbWeight=0.50 PerfScore 0.12
 G_M19035_IG09:        ; bbWeight=0.50, epilog, nogc, extend
        add      rsp, 32
-       pop      rbx
-       pop      rbp
-       pop      rsi
-       pop      rdi
-       pop      r13
-       pop      r14
-       pop      r15
+       pop2p    rbx, rsi
+       pop2p    rdi, r13
+       pop2p    r14, r15
+       popp     rbp
        ret      
-						;; size=15 bbWeight=0.50 PerfScore 2.38
+						;; size=26 bbWeight=0.50 PerfScore 2.38
 G_M19035_IG10:        ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
        mov      eax, -1
 						;; size=5 bbWeight=0.50 PerfScore 0.12
 G_M19035_IG11:        ; bbWeight=0.50, epilog, nogc, extend
        add      rsp, 32
-       pop      rbx
-       pop      rbp
-       pop      rsi
-       pop      rdi
-       pop      r13
-       pop      r14
-       pop      r15
+       pop2p    rbx, rsi
+       pop2p    rdi, r13
+       pop2p    r14, r15
+       popp     rbp
        ret      
-						;; size=15 bbWeight=0.50 PerfScore 2.38
+						;; size=26 bbWeight=0.50 PerfScore 2.38
 G_M19035_IG12:        ; bbWeight=0, gcVars=0000000000000000 {}, gcrefRegs=0000 {}, byrefRegs=0000 {}, gcvars, byref
        call     CORINFO_HELP_OVERFLOW
        ; gcr arg pop 0
        int3     
 						;; size=6 bbWeight=0 PerfScore 0.00
 
-; Total bytes of code 131, prolog size 14, PerfScore 70.03, instruction count 54, allocated bytes for code 131 (MethodHash=2c44b5a4) for method System.Linq.Enumerable+ArrayWhereSelectIterator`2[System.__Canon,System.__Canon]:GetCount(bool,System.ReadOnlySpan`1[System.__Canon],System.Func`2[System.__Canon,bool],System.Func`2[System.__Canon,System.__Canon]):int (FullOpts)
+; Total bytes of code 164, prolog size 25, PerfScore 67.03, instruction count 45, allocated bytes for code 164 (MethodHash=2c44b5a4) for method System.Linq.Enumerable+ArrayWhereSelectIterator`2[System.__Canon,System.__Canon]:GetCount(bool,System.ReadOnlySpan`1[System.__Canon],System.Func`2[System.__Canon,bool],System.Func`2[System.__Canon,System.__Canon]):int (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -131,16 +122,16 @@ Unwind Info:
   >>   End offset   : 0xd1ffab1e (not in unwind data)
   Version           : 1
   Flags             : 0x00
-  SizeOfProlog      : 0x0E
+  SizeOfProlog      : 0x19
   CountOfUnwindCodes: 8
   FrameRegister     : none (0)
   FrameOffset       : N/A (no FrameRegister) (Value=0)
   UnwindCodes       :
-    CodeOffset: 0x0E UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 3 * 8 + 8 = 32 = 0x20
-    CodeOffset: 0x0A UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbx (3)
-    CodeOffset: 0x09 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)
-    CodeOffset: 0x08 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rsi (6)
-    CodeOffset: 0x07 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rdi (7)
-    CodeOffset: 0x06 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r13 (13)
-    CodeOffset: 0x04 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r14 (14)
-    CodeOffset: 0x02 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r15 (15)
+    CodeOffset: 0x19 UnwindOp: UWOP_ALLOC_SMALL (2)     OpInfo: 3 * 8 + 8 = 32 = 0x20
+    CodeOffset: 0x15 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbx (3)
+    CodeOffset: 0x15 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rsi (6)
+    CodeOffset: 0x0F UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rdi (7)
+    CodeOffset: 0x0F UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r13 (13)
+    CodeOffset: 0x09 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r14 (14)
+    CodeOffset: 0x09 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: r15 (15)
+    CodeOffset: 0x03 UnwindOp: UWOP_PUSH_NONVOL (0)     OpInfo: rbp (5)

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 27, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 27, 2025
@DeepakRajendrakumaran DeepakRajendrakumaran changed the title Draft : Adding puhs2/pop2 Draft : Adding push2/pop2 May 28, 2025
@DeepakRajendrakumaran DeepakRajendrakumaran force-pushed the push2 branch 4 times, most recently from 04bdc1b to 9e1a9f5 Compare June 9, 2025 17:26
@risc-vv
Copy link

risc-vv commented Jun 10, 2025

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

@risc-vv
Copy link

risc-vv commented Jun 10, 2025

RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
=======================
      passed: 9082
      failed: 2
     skipped: 599
      killed: 28
------------------------
 TOTAL tests: 9711
VIRTUAL time: 35h 15min 56s 106ms
   REAL time: 36min 1s 374ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-CLR-VF2: 9083 / 9113 (99.67%)
=======================
      passed: 9083
      failed: 2
     skipped: 599
      killed: 28
------------------------
 TOTAL tests: 9712
VIRTUAL time: 10h 29min 58s 721ms
   REAL time: 42min 53s 335ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-QEMU: 283571 / 284631 (99.63%)
=======================
      passed: 283571
      failed: 1054
     skipped: 38
      killed: 6
------------------------
 TOTAL tests: 284669
VIRTUAL time: 29h 43min 24s 559ms
   REAL time: 1h 11min 41s 861ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-VF2: 305223 / 306944 (99.44%)
=======================
      passed: 305223
      failed: 1711
     skipped: 38
      killed: 10
------------------------
 TOTAL tests: 306982
VIRTUAL time: 20h 34min 21s 342ms
   REAL time: 2h 13min 5s 272ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: 55928e12d732d9b877f2eb289899c3327ab54c6e
CI: 985b32219c9d1164ea1a09421e4f004672ee8c85
REPO: DeepakRajendrakumaran/runtime
BRANCH: push2
CONFIG: Release
LIB_CONFIG: Release

@risc-vv
Copy link

risc-vv commented Jun 10, 2025

RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
=======================
      passed: 9082
      failed: 2
     skipped: 599
      killed: 28
------------------------
 TOTAL tests: 9711
VIRTUAL time: 35h 17min 30s 631ms
   REAL time: 36min 8s 592ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-CLR-VF2: 9083 / 9113 (99.67%)
=======================
      passed: 9083
      failed: 2
     skipped: 599
      killed: 28
------------------------
 TOTAL tests: 9712
VIRTUAL time: 10h 42min 44s 749ms
   REAL time: 43min 40s 937ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-QEMU: 283315 / 284395 (99.62%)
=======================
      passed: 283315
      failed: 1075
     skipped: 38
      killed: 5
------------------------
 TOTAL tests: 284433
VIRTUAL time: 29h 40min 46s 127ms
   REAL time: 1h 12min 16s 450ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-VF2: 299368 / 301100 (99.42%)
=======================
      passed: 299368
      failed: 1722
     skipped: 38
      killed: 10
------------------------
 TOTAL tests: 301138
VIRTUAL time: 20h 36min 26s 576ms
   REAL time: 2h 6min 6s 96ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: 585402f9ce53e2b8439107d618faaa7beb7e5c83
CI: 8fec31ecd11a1e652234ca5056ed447368abd635
REPO: DeepakRajendrakumaran/runtime
BRANCH: push2
CONFIG: Release
LIB_CONFIG: Release

@DeepakRajendrakumaran DeepakRajendrakumaran force-pushed the push2 branch 3 times, most recently from 66245b1 to 68903ee Compare June 12, 2025 15:42
@risc-vv
Copy link

risc-vv commented Jun 13, 2025

RISC-V Release-CLR-QEMU: 9084 / 9114 (99.67%)
=======================
      passed: 9084
      failed: 2
     skipped: 599
      killed: 28
------------------------
 TOTAL tests: 9713
VIRTUAL time: 35h 15min 0s 697ms
   REAL time: 36min 1s 908ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-CLR-VF2: 9082 / 9112 (99.67%)
=======================
      passed: 9082
      failed: 2
     skipped: 598
      killed: 28
------------------------
 TOTAL tests: 9710
VIRTUAL time: 10h 54min 26s 26ms
   REAL time: 44min 42s 847ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: 68903ee1b404ee05645075c19df6b0bbd85650b6
CI: ad349d0f0dd61055dacdfc98d8ad42963a159890
REPO: DeepakRajendrakumaran/runtime
BRANCH: push2
CONFIG: Release
LIB_CONFIG: Release

@risc-vv
Copy link

risc-vv commented Jun 16, 2025

RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
=======================
      passed: 9082
      failed: 2
     skipped: 599
      killed: 28
------------------------
 TOTAL tests: 9711
VIRTUAL time: 35h 24min 9s 599ms
   REAL time: 36min 17s 85ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-CLR-VF2: 9082 / 9112 (99.67%)
=======================
      passed: 9082
      failed: 2
     skipped: 599
      killed: 28
------------------------
 TOTAL tests: 9711
VIRTUAL time: 10h 30min 43s 763ms
   REAL time: 42min 54s 850ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-QEMU: 260891 / 261953 (99.59%)
=======================
      passed: 260891
      failed: 1055
     skipped: 38
      killed: 7
------------------------
 TOTAL tests: 261991
VIRTUAL time: 28h 56min 56s 487ms
   REAL time: 1h 10min 16s 90ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-VF2: 299817 / 301551 (99.42%)
=======================
      passed: 299817
      failed: 1725
     skipped: 38
      killed: 9
------------------------
 TOTAL tests: 301589
VIRTUAL time: 20h 34min 38s 342ms
   REAL time: 2h 13min 37s 522ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: 908cedbd2ca658404f1dee3e67c9e2dd6d09bff1
CI: ff51305a244b14413d8afd782debae409d7468b8
REPO: DeepakRajendrakumaran/runtime
BRANCH: push2
CONFIG: Release
LIB_CONFIG: Release

@risc-vv
Copy link

risc-vv commented Jun 16, 2025

RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
=======================
      passed: 9082
      failed: 2
     skipped: 599
      killed: 28
------------------------
 TOTAL tests: 9711
VIRTUAL time: 35h 19min 32s 281ms
   REAL time: 36min 11s 159ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-QEMU: 283658 / 284736 (99.62%)
=======================
      passed: 283658
      failed: 1072
     skipped: 38
      killed: 6
------------------------
 TOTAL tests: 284774
VIRTUAL time: 29h 14min 48s 566ms
   REAL time: 1h 10min 39s 555ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-CLR-VF2: 9082 / 9112 (99.67%)
=======================
      passed: 9082
      failed: 2
     skipped: 599
      killed: 28
------------------------
 TOTAL tests: 9711
VIRTUAL time: 10h 38min 31s 107ms
   REAL time: 43min 26s 788ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: bc308aa4f12df2ae3ef31e47f06127fdbbb6c005
CI: ff51305a244b14413d8afd782debae409d7468b8
REPO: DeepakRajendrakumaran/runtime
BRANCH: push2
CONFIG: Release
LIB_CONFIG: Release

@risc-vv
Copy link

risc-vv commented Jun 16, 2025

6b2c687 is being scheduled for building and testing

GIT: 6b2c687e251543c10e560dc72a5c03721faabb32
REPO: DeepakRajendrakumaran/runtime
BRANCH: push2

@risc-vv
Copy link

risc-vv commented Jun 16, 2025

RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
=======================
      passed: 9082
      failed: 2
     skipped: 599
      killed: 28
------------------------
 TOTAL tests: 9711
VIRTUAL time: 35h 22min 17s 113ms
   REAL time: 36min 17s 533ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: 6b2c687e251543c10e560dc72a5c03721faabb32
CI: ff51305a244b14413d8afd782debae409d7468b8
REPO: DeepakRajendrakumaran/runtime
BRANCH: push2
CONFIG: Release
LIB_CONFIG: Release

@DeepakRajendrakumaran DeepakRajendrakumaran changed the title Draft : Adding push2/pop2 Adding push2/pop2 Jun 16, 2025
@DeepakRajendrakumaran
Copy link
Contributor Author

@kunalspathak @EgorBo This PR is ready for review

@risc-vv
Copy link

risc-vv commented Jun 16, 2025

6b2c687 is being scheduled for building and testing

GIT: 6b2c687e251543c10e560dc72a5c03721faabb32
REPO: dotnet/runtime
BRANCH: main

1 similar comment
@risc-vv
Copy link

risc-vv commented Jun 16, 2025

6b2c687 is being scheduled for building and testing

GIT: 6b2c687e251543c10e560dc72a5c03721faabb32
REPO: dotnet/runtime
BRANCH: main

@risc-vv
Copy link

risc-vv commented Jun 16, 2025

RISC-V Release-FX-QEMU: 261199 / 262260 (99.60%)
=======================
      passed: 261199
      failed: 1052
     skipped: 39
      killed: 9
------------------------
 TOTAL tests: 262299
VIRTUAL time: 30h 35min 6s 851ms
   REAL time: 1h 10min 6s 981ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: 6b2c687e251543c10e560dc72a5c03721faabb32
CI: ff51305a244b14413d8afd782debae409d7468b8
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@risc-vv
Copy link

risc-vv commented Jun 16, 2025

6b2c687 is being scheduled for building and testing

GIT: 6b2c687e251543c10e560dc72a5c03721faabb32
REPO: dotnet/runtime
BRANCH: main

1 similar comment
@risc-vv
Copy link

risc-vv commented Jun 16, 2025

6b2c687 is being scheduled for building and testing

GIT: 6b2c687e251543c10e560dc72a5c03721faabb32
REPO: dotnet/runtime
BRANCH: main

@risc-vv
Copy link

risc-vv commented Jun 16, 2025

RISC-V Release-CLR-QEMU: 9082 / 9112 (99.67%)
=======================
      passed: 9082
      failed: 2
     skipped: 599
      killed: 28
------------------------
 TOTAL tests: 9711
VIRTUAL time: 35h 19min 17s 575ms
   REAL time: 36min 9s 812ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-QEMU: 283661 / 284755 (99.62%)
=======================
      passed: 283661
      failed: 1087
     skipped: 39
      killed: 7
------------------------
 TOTAL tests: 284794
VIRTUAL time: 31h 13min 22s 447ms
   REAL time: 1h 10min 55s 87ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-CLR-VF2: 9083 / 9113 (99.67%)
=======================
      passed: 9083
      failed: 2
     skipped: 599
      killed: 28
------------------------
 TOTAL tests: 9712
VIRTUAL time: 10h 42min 10s 797ms
   REAL time: 43min 38s 61ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

RISC-V Release-FX-VF2: 307376 / 309101 (99.44%)
=======================
      passed: 307376
      failed: 1715
     skipped: 39
      killed: 10
------------------------
 TOTAL tests: 309140
VIRTUAL time: 20h 12min 29s 691ms
   REAL time: 2h 7min 2s 527ms
=======================

report.xml, report.md, failures.xml, testclr_details.tar.zst

Build information and commands

GIT: 6b2c687e251543c10e560dc72a5c03721faabb32
CI: ff51305a244b14413d8afd782debae409d7468b8
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

}
else
{
assert((instOptions & INS_OPTS_APX_ppx_MASK) == 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is always true because it is inside else of the opposite condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed else

// Setting EVEX.W = 1 bit indicates a push-pop acceleration (PPX) hint
// The current recommendation is to use PUSH2/POP2 only with PPX hint
// So, it is used only in Epilog/Prolog code generation
if (id->idIsApxPpxContextSet())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it always true for push2 and pop2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

push2/pop2 are new instructions. PPX is a feature which push'/pop/pop2/push2can use. So theoretically, we might have apush2/pop2withoutPPX enabled. We currently only use push2/pop2 with PPX since that's the guidance(the guidance is for performance reasons) but that doesn't mean push2/pop2 cannot be used without PPX

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we convert it to assert(id->idIsApxPpxContextSet()) then and update it in future when there is a need?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Works for me. This means I have to remote the unit tests with no PPX I included to verify encoding since that'll fail this assert

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments.

Superpmi result with/without PPX feature

Did you get a chance to check if all the perfscore is coming just from prolog and epilog? Also, is it because we have not setup the perf latency/throughput numbers accurately (I see a TODO there and added a comment about it) and hence perfscore is just because of reduced in number of instructions?

@DeepakRajendrakumaran
Copy link
Contributor Author

Added some comments.

Superpmi result with/without PPX feature

Did you get a chance to check if all the perfscore is coming just from prolog and epilog? Also, is it because we have not setup the perf latency/throughput numbers accurately (I see a TODO there and added a comment about it) and hence perfscore is just because of reduced in number of instructions?

I did check and it's entirely coming from epilog/prolog. See examples here). The perfscore improvement is due to reduced number of instructions.

The guidance I got was TP/Latency should be the same as regular push and pop. as long as PPX hint is used. I'll update the TP/Latency numbers based on the ToDo once Agner/uops adds them if it's different

@@ -9404,6 +9404,15 @@ void CodeGen::genAmd64EmitterUnitTestsApx()
theEmitter->emitIns_R_R_R(INS_pext, EA_4BYTE, REG_R16, REG_R18, REG_R17);
theEmitter->emitIns_R_R_R(INS_pext, EA_8BYTE, REG_R16, REG_R18, REG_R17);

theEmitter->emitIns_R_R(INS_push2, EA_PTRSIZE, REG_R16, REG_R17, INS_OPTS_EVEX_nd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume we have verified that this encoding works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's verified. Running these tests with some env variables encodes using JIT and decodes using coredistools(llvm toolchain). That helps verify the encoding, Additionally verified using xed

I did remove the tests without PPX from here due to this - #116035 (comment)

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some minor comments. Also, please fix the formatting.

@kunalspathak
Copy link
Member

/azp run runtime-coreclr superpmi-replay-apx

Copy link

No pipelines are associated with this pull request.

@DeepakRajendrakumaran
Copy link
Contributor Author

added some minor comments. Also, please fix the formatting.

Done!

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@kunalspathak kunalspathak merged commit 5b8b862 into dotnet:main Jun 24, 2025
107 of 109 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants