-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Virtual stub indirect call profiling #116453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virtual stub indirect call profiling #116453
Conversation
/azp run runtime-coreclr superpmi-diffs |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr pgostress |
Azure Pipelines successfully started running 1 pipeline(s). |
pgo stress failure is stack overflow test. |
On the example from #116055 G_M16291_IG01: ;; offset=0x0000
push rdi
push rsi
push rbx
sub rsp, 48
mov qword ptr [rsp+0x28], rcx
mov rdi, rdx
mov rbx, r8
mov rsi, r9
;; size=21 bbWeight=1 PerfScore 5.00
G_M16291_IG02: ;; offset=0x0015
mov rdx, qword ptr [rcx+0x40]
mov r11, qword ptr [rdx+0x10]
test r11, r11
je SHORT G_M16291_IG12
;; size=13 bbWeight=1 PerfScore 5.25
G_M16291_IG03: ;; offset=0x0022
mov r8, 0x7FFB7FE1AF00 ; System.Collections.Generic.StringEqualityComparer
cmp qword ptr [rdi], r8
jne SHORT G_M16291_IG14
;; size=15 bbWeight=1 PerfScore 4.25
G_M16291_IG04: ;; offset=0x0031
cmp rbx, rsi
je SHORT G_M16291_IG13
;; size=5 bbWeight=0.50 PerfScore 0.62
G_M16291_IG05: ;; offset=0x0036
test rbx, rbx
je SHORT G_M16291_IG08
;; size=5 bbWeight=0.40 PerfScore 0.50
G_M16291_IG06: ;; offset=0x003B
test rsi, rsi
je SHORT G_M16291_IG08
;; size=5 bbWeight=0.40 PerfScore 0.50
G_M16291_IG07: ;; offset=0x0040
mov r8d, dword ptr [rbx+0x08]
cmp r8d, dword ptr [rsi+0x08]
je SHORT G_M16291_IG09
;; size=10 bbWeight=0.40 PerfScore 2.38
G_M16291_IG08: ;; offset=0x004A
xor eax, eax
jmp SHORT G_M16291_IG10
;; size=4 bbWeight=0.24 PerfScore 0.55
G_M16291_IG09: ;; offset=0x004E
lea rcx, bword ptr [rbx+0x0C]
add r8d, r8d
lea rdx, bword ptr [rsi+0x0C]
call [System.SpanHelpers:SequenceEqual(byref,byref,nuint):bool]
;; size=17 bbWeight=0.45 PerfScore 1.93
G_M16291_IG10: ;; offset=0x005F
nop
;; size=1 bbWeight=1 PerfScore 0.25
G_M16291_IG11: ;; offset=0x0060
add rsp, 48
pop rbx
pop rsi
pop rdi
ret
;; size=8 bbWeight=1 PerfScore 2.75
G_M16291_IG12: ;; offset=0x0068
mov rdx, 0x7FFB806B4228 ; global ptr
call CORINFO_HELP_RUNTIMEHANDLE_METHOD
mov r11, rax
jmp SHORT G_M16291_IG03
;; size=20 bbWeight=0.20 PerfScore 0.70
G_M16291_IG13: ;; offset=0x007C
mov eax, 1
jmp SHORT G_M16291_IG10
;; size=7 bbWeight=0.10 PerfScore 0.23
G_M16291_IG14: ;; offset=0x0083
mov rcx, rdi
mov rdx, rbx
mov r8, rsi
call [r11]
jmp SHORT G_M16291_IG10
;; size=14 bbWeight=0 PerfScore 0.00
; Total bytes of code 145, prolog size 12, PerfScore 24.90, instruction count 47, allocated bytes for code 145 (MethodHash=455bc05c) for method Program:<<Main>$>g__Wrapper|0_0[System.__Canon](System.Collections.Generic.IEqualityComparer`1[System.__Canon],System.__Canon,System.__Canon):bool (Tier1) We can optimize this a bit more by not fetching them method handle in the successful GDV case, though it gets cached so it is already a cold path in Tier1. This doesn't yet handle the case in #108913 (comment) because arrays implement interfaces differently... I think I can address this in a follow in PR. I still need to verify that this doesn't regress existing devirtualizations. |
@davidwrighton can you look over the jit interface changes? The basic idea is that if either the base class or the object class is shared, we rely on canonical interface class matching, and if there is just one match we use the (possibly instantiated) matching interface class to find the method. We were already doing some of this already, but once we found a single matching interface class we just threw it away. This approach won't work for exact arrays object and shared base interfaces; I will fix that in a follow-up PR (the jit side needs changes for this too). |
Wondering if we should have a higher GDV threshold for shared methods. Some other ideas I considered here:
In the latter, we'd need some way to efficiently dispatch to the specialized version:
|
@EgorBot -intel -arm64 using System;
using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Running;
public class StringComparisons
{
private string s1 = "1";
private string s2 = "1";
[Benchmark]
public bool Equals_inline() => string.Equals(s1, s2);
[Benchmark]
public bool Equals_specialized() => EqualityComparer<string>.Default.Equals(s1, s2);
[Benchmark]
public bool Equals_generic() => GenericEquals(s1, s2);
[MethodImpl(MethodImplOptions.NoInlining)]
private static bool GenericEquals<T>(T a, T b) => EqualityComparer<T>.Default.Equals(a, b);
} |
Example above from #10050 |
src/coreclr/vm/jitinterface.cpp
Outdated
|
||
pDevirtMD = pObjMT->GetMethodDescForInterfaceMethod(TypeHandle(pBaseMT), pBaseMD, FALSE /* throwOnConflict */); | ||
MethodDesc* interfaceMD = interfaceMT->GetParallelMethodDesc(pBaseMD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the conditions which made you do the parallel methoddesc lookup for interfaceMD
instead of just using pBaseMD
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not answer your question, but this sort of lookup would fail
impDevirtualizeCall: Trying to devirtualize interface call:
class for 'this' is System.Collections.Generic.IEqualityComparer`1[System.__Canon] (attrib 20220400)
base method is System.Collections.Generic.IEqualityComparer`1[System.__Canon]::Equals
Considering guarded devirtualization at IL offset 3 (0x3)
Likely classes for call [000010] on class 00007FF819435700 (System.Collections.Generic.IEqualityComparer`1[System.__Canon])
1) 00007FF819518608 (System.Collections.Generic.StringEqualityComparer) [likelihood:100%]
Accepting type System.Collections.Generic.StringEqualityComparer with likelihood 100 as a candidate
GDV likely: resolveVirtualMethod (method 00007FF819435660 class 00007FF819518608 context 00007FF819435701)
First because the old code's else if (!pObjMT->CanCastToInterface(pBaseMT))
would report a failed cast, and bypassing that (as this PR does), the subsequent GetMethodDescForInterfaceMethod
would fail.
I don't think I tried
pDevirtMD = pObjMT->GetMethodDescForInterfaceMethod(TypeHandle(interfaceMT), pBaseMD, FALSE /* throwOnConflict */);
if that's what you are suggesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it looks like the parallel projection isn't needed. Will push a PR to remove that.
@EgorBot -intel -arm64 --envvars DOTNET_JitDisasm:GenericEquals using System;
using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Running;
public class StringComparisons
{
private string s1 = "test string 1";
private string s2 = "test string 2";
[Benchmark]
public bool Equals_inline() => string.Equals(s1, s2);
[Benchmark]
public bool Equals_specialized() => EqualityComparer<string>.Default.Equals(s1, s2);
[Benchmark]
public bool Equals_generic() => GenericEquals(s1, s2);
[MethodImpl(MethodImplOptions.NoInlining)]
private static bool GenericEquals<T>(T a, T b) => EqualityComparer<T>.Default.Equals(a, b);
} |
Local bespoke ASP.NET SPMI with these changes replays fairly cleanly against the baseline. In spot checking I haven't seen any cases where we lose a devirtualization. I suppose when comparing differential metrics the number of GDVs should never decrease, and that might be a decent check. Let me try this. Diffs are based on 196,789 contexts (107,987 MinOpts, 88,802 FullOpts). MISSED contexts: base: 119 (0.06%), diff: 0 (0.00%) Overall (+334,166 bytes)
MinOpts (+60,520 bytes)
FullOpts (+273,646 bytes)
|
/asp run runtime-coreclr superpmi-diffs |
Ah in the #10050 example we're not making an interface call, so there is actually no diff. |
@EgorBot -intel -arm64 --envvars DOTNET_JitDisasm:Wrapper using System;
using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Running;
public class StringComparisons
{
private string s1 = "test string 1";
private string s2 = "test string 2";
[Benchmark]
public bool Equals_inline() => string.Equals(s1, s2);
[Benchmark]
public bool Equals_specialized() => EqualityComparer<string>.Default.Equals(s1, s2);
[Benchmark]
public bool Equals_generic() => GenericEquals(s1, s2);
[MethodImpl(MethodImplOptions.NoInlining)]
private static bool GenericEquals<T>(T a, T b) => EqualityComparer<T>.Default.Equals(a, b);
[Benchmark]
public bool Equals_wrapper() => Wrapper(EqualityComparer<string>.Default, s1, s2);
[MethodImpl(MethodImplOptions.NoInlining)]
private static bool Wrapper<T>(IEqualityComparer<T> comparer, T t1, T t2)
=> comparer.Equals(t1, t2);
} |
src/coreclr/jit/compiler.h
Outdated
@@ -11945,7 +11945,7 @@ class GenTreeVisitor | |||
|
|||
if (call->gtCallType == CT_INDIRECT) | |||
{ | |||
if (call->gtCallCookie != nullptr) | |||
if ((call->gtCallCookie != nullptr) && !call->IsVirtualStub()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the active member of the union is not gtCallCookie
when call->IsVirtualStub()
, but rather either gtInlineCandidateInfo
or gtHandleHistogramProfileCandidateInfo
. If so that makes it undefined behavior to access gtCallCookie
in these cases.
I think all the call->IsVirtualStub()
checks need to be first to avoid this UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, revised all these.
src/coreclr/jit/gentree.cpp
Outdated
assert(call->gtCallCookie == nullptr || call->gtCallCookie->OperIs(GT_CNS_INT, GT_IND) || | ||
call->IsVirtualStub()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one seems especially problematic, aren't we going to be treating a HandleHistogramProfileCandidateInfo*
or InlineCandidateInfo*
as a GenTree*
and calling OperIs
on it?
src/coreclr/jit/inline.cpp
Outdated
@@ -1345,7 +1345,7 @@ InlineContext* InlineStrategy::NewContext(InlineContext* parentContext, Statemen | |||
// ldarg instruction. | |||
context->m_Location = stmt->GetDebugInfo().GetLocation(); | |||
|
|||
assert(call->gtCallType == CT_USER_FUNC); | |||
// assert(call->gtCallType == CT_USER_FUNC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete this?
src/coreclr/jit/morph.cpp
Outdated
@@ -2102,6 +2102,8 @@ GenTreeCall* Compiler::fgMorphArgs(GenTreeCall* call) | |||
continue; | |||
} | |||
|
|||
assert(!argx->OperIs(GT_NONE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like an odd assert to have in this specific place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this was a leftover from tracking down a missing check for a valid cookie. Removed.
|
@EgorBot -amd --filter |
Co-authored-by: Jakob Botsch Nielsen <[email protected]>
@davidwrighton can you take another look? I removed the parallel method desc computation... |
Add support for profiling indirect virtual stub calls and subsequent guarded devirtualization.