Skip to content

Virtual stub indirect call profiling #116453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 17, 2025

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Jun 10, 2025

Add support for profiling indirect virtual stub calls and subsequent guarded devirtualization.

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 10, 2025
@jakobbotsch
Copy link
Member

/azp run runtime-coreclr superpmi-diffs

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

/azp run runtime-coreclr pgostress

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AndyAyersMS
Copy link
Member Author

pgo stress failure is stack overflow test.

@AndyAyersMS
Copy link
Member Author

On the example from #116055

G_M16291_IG01:  ;; offset=0x0000
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 48
       mov      qword ptr [rsp+0x28], rcx
       mov      rdi, rdx
       mov      rbx, r8
       mov      rsi, r9
						;; size=21 bbWeight=1 PerfScore 5.00
G_M16291_IG02:  ;; offset=0x0015
       mov      rdx, qword ptr [rcx+0x40]
       mov      r11, qword ptr [rdx+0x10]
       test     r11, r11
       je       SHORT G_M16291_IG12
						;; size=13 bbWeight=1 PerfScore 5.25
G_M16291_IG03:  ;; offset=0x0022
       mov      r8, 0x7FFB7FE1AF00      ; System.Collections.Generic.StringEqualityComparer
       cmp      qword ptr [rdi], r8
       jne      SHORT G_M16291_IG14
						;; size=15 bbWeight=1 PerfScore 4.25
G_M16291_IG04:  ;; offset=0x0031
       cmp      rbx, rsi
       je       SHORT G_M16291_IG13
						;; size=5 bbWeight=0.50 PerfScore 0.62
G_M16291_IG05:  ;; offset=0x0036
       test     rbx, rbx
       je       SHORT G_M16291_IG08
						;; size=5 bbWeight=0.40 PerfScore 0.50
G_M16291_IG06:  ;; offset=0x003B
       test     rsi, rsi
       je       SHORT G_M16291_IG08
						;; size=5 bbWeight=0.40 PerfScore 0.50
G_M16291_IG07:  ;; offset=0x0040
       mov      r8d, dword ptr [rbx+0x08]
       cmp      r8d, dword ptr [rsi+0x08]
       je       SHORT G_M16291_IG09
						;; size=10 bbWeight=0.40 PerfScore 2.38
G_M16291_IG08:  ;; offset=0x004A
       xor      eax, eax
       jmp      SHORT G_M16291_IG10
						;; size=4 bbWeight=0.24 PerfScore 0.55
G_M16291_IG09:  ;; offset=0x004E
       lea      rcx, bword ptr [rbx+0x0C]
       add      r8d, r8d
       lea      rdx, bword ptr [rsi+0x0C]
       call     [System.SpanHelpers:SequenceEqual(byref,byref,nuint):bool]
						;; size=17 bbWeight=0.45 PerfScore 1.93
G_M16291_IG10:  ;; offset=0x005F
       nop      
						;; size=1 bbWeight=1 PerfScore 0.25
G_M16291_IG11:  ;; offset=0x0060
       add      rsp, 48
       pop      rbx
       pop      rsi
       pop      rdi
       ret      
						;; size=8 bbWeight=1 PerfScore 2.75
G_M16291_IG12:  ;; offset=0x0068
       mov      rdx, 0x7FFB806B4228      ; global ptr
       call     CORINFO_HELP_RUNTIMEHANDLE_METHOD
       mov      r11, rax
       jmp      SHORT G_M16291_IG03
						;; size=20 bbWeight=0.20 PerfScore 0.70
G_M16291_IG13:  ;; offset=0x007C
       mov      eax, 1
       jmp      SHORT G_M16291_IG10
						;; size=7 bbWeight=0.10 PerfScore 0.23
G_M16291_IG14:  ;; offset=0x0083
       mov      rcx, rdi
       mov      rdx, rbx
       mov      r8, rsi
       call     [r11]
       jmp      SHORT G_M16291_IG10
						;; size=14 bbWeight=0 PerfScore 0.00

; Total bytes of code 145, prolog size 12, PerfScore 24.90, instruction count 47, allocated bytes for code 145 (MethodHash=455bc05c) for method Program:<<Main>$>g__Wrapper|0_0[System.__Canon](System.Collections.Generic.IEqualityComparer`1[System.__Canon],System.__Canon,System.__Canon):bool (Tier1)

We can optimize this a bit more by not fetching them method handle in the successful GDV case, though it gets cached so it is already a cold path in Tier1.

This doesn't yet handle the case in #108913 (comment) because arrays implement interfaces differently... I think I can address this in a follow in PR.

I still need to verify that this doesn't regress existing devirtualizations.

@AndyAyersMS
Copy link
Member Author

@davidwrighton can you look over the jit interface changes? The basic idea is that if either the base class or the object class is shared, we rely on canonical interface class matching, and if there is just one match we use the (possibly instantiated) matching interface class to find the method.

We were already doing some of this already, but once we found a single matching interface class we just threw it away.

This approach won't work for exact arrays object and shared base interfaces; I will fix that in a follow-up PR (the jit side needs changes for this too).

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jun 11, 2025

Wondering if we should have a higher GDV threshold for shared methods.

Some other ideas I considered here:

  • profile (context, target object) pairs (so we don't have to devirtualize exact obj/shared interface cases specially, we always have exact for both). This makes the guard check more complex and we'd like want to hide that complexity so we can easily recognize guard checks later on.
  • profile the context, then specialize the entire method for frequently seen contexts. This would provide a benefit similar to being able to specialize a shared generic for a ref class. This profile could also be useful in R2R to find likely used instantiations.

In the latter, we'd need some way to efficiently dispatch to the specialized version:

  • The jit could create N versions and in the "main" one add suitable dispatch code at the start;
  • The jit could tell the runtime it will create N versions and let the runtime build some kind of type dispatching thing to pick which one gets invoked.

@AndyAyersMS
Copy link
Member Author

@EgorBot -intel -arm64

using System;
using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Running;

public class StringComparisons
{
    private string s1 = "1";
    private string s2 = "1";

    [Benchmark]
    public bool Equals_inline() => string.Equals(s1, s2);

    [Benchmark]
    public bool Equals_specialized() => EqualityComparer<string>.Default.Equals(s1, s2);

    [Benchmark]
    public bool Equals_generic() => GenericEquals(s1, s2);

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static bool GenericEquals<T>(T a, T b) => EqualityComparer<T>.Default.Equals(a, b);
}

@AndyAyersMS
Copy link
Member Author

Example above from #10050


pDevirtMD = pObjMT->GetMethodDescForInterfaceMethod(TypeHandle(pBaseMT), pBaseMD, FALSE /* throwOnConflict */);
MethodDesc* interfaceMD = interfaceMT->GetParallelMethodDesc(pBaseMD);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the conditions which made you do the parallel methoddesc lookup for interfaceMD instead of just using pBaseMD?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not answer your question, but this sort of lookup would fail

impDevirtualizeCall: Trying to devirtualize interface call:
    class for 'this' is System.Collections.Generic.IEqualityComparer`1[System.__Canon] (attrib 20220400)
    base method is System.Collections.Generic.IEqualityComparer`1[System.__Canon]::Equals
Considering guarded devirtualization at IL offset 3 (0x3)
Likely classes for call [000010] on class 00007FF819435700 (System.Collections.Generic.IEqualityComparer`1[System.__Canon])
  1) 00007FF819518608 (System.Collections.Generic.StringEqualityComparer) [likelihood:100%]
Accepting type System.Collections.Generic.StringEqualityComparer with likelihood 100 as a candidate
GDV likely: resolveVirtualMethod (method 00007FF819435660 class 00007FF819518608 context 00007FF819435701)

First because the old code's else if (!pObjMT->CanCastToInterface(pBaseMT)) would report a failed cast, and bypassing that (as this PR does), the subsequent GetMethodDescForInterfaceMethod would fail.

I don't think I tried

pDevirtMD = pObjMT->GetMethodDescForInterfaceMethod(TypeHandle(interfaceMT), pBaseMD, FALSE /* throwOnConflict */);

if that's what you are suggesting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it looks like the parallel projection isn't needed. Will push a PR to remove that.

@AndyAyersMS
Copy link
Member Author

@EgorBot -intel -arm64 --envvars DOTNET_JitDisasm:GenericEquals

using System;
using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Running;

public class StringComparisons
{
    private string s1 = "test string 1";
    private string s2 = "test string 2";

    [Benchmark]
    public bool Equals_inline() => string.Equals(s1, s2);

    [Benchmark]
    public bool Equals_specialized() => EqualityComparer<string>.Default.Equals(s1, s2);

    [Benchmark]
    public bool Equals_generic() => GenericEquals(s1, s2);

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static bool GenericEquals<T>(T a, T b) => EqualityComparer<T>.Default.Equals(a, b);
}

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Jun 11, 2025

Local bespoke ASP.NET SPMI with these changes replays fairly cleanly against the baseline.

In spot checking I haven't seen any cases where we lose a devirtualization. I suppose when comparing differential metrics the number of GDVs should never decrease, and that might be a decent check. Let me try this.

Diffs are based on 196,789 contexts (107,987 MinOpts, 88,802 FullOpts).

MISSED contexts: base: 119 (0.06%), diff: 0 (0.00%)

Overall (+334,166 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 73,143,777 +334,166 +21.54%
MinOpts (+60,520 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 32,088,596 +60,520 +7.45%
FullOpts (+273,646 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
aspnet.run.windows.x64.checked.mch 41,055,181 +273,646 +28.38%

@AndyAyersMS
Copy link
Member Author

/asp run runtime-coreclr superpmi-diffs

@AndyAyersMS
Copy link
Member Author

Ah in the #10050 example we're not making an interface call, so there is actually no diff.

@AndyAyersMS
Copy link
Member Author

@EgorBot -intel -arm64 --envvars DOTNET_JitDisasm:Wrapper

using System;
using BenchmarkDotNet.Attributes;
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Running;

public class StringComparisons
{
    private string s1 = "test string 1";
    private string s2 = "test string 2";

    [Benchmark]
    public bool Equals_inline() => string.Equals(s1, s2);

    [Benchmark]
    public bool Equals_specialized() => EqualityComparer<string>.Default.Equals(s1, s2);

    [Benchmark]
    public bool Equals_generic() => GenericEquals(s1, s2);

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static bool GenericEquals<T>(T a, T b) => EqualityComparer<T>.Default.Equals(a, b);

    [Benchmark]
    public bool Equals_wrapper() => Wrapper(EqualityComparer<string>.Default, s1, s2);

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static bool Wrapper<T>(IEqualityComparer<T> comparer, T t1, T t2)
    => comparer.Equals(t1, t2);
}

@@ -11945,7 +11945,7 @@ class GenTreeVisitor

if (call->gtCallType == CT_INDIRECT)
{
if (call->gtCallCookie != nullptr)
if ((call->gtCallCookie != nullptr) && !call->IsVirtualStub())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the active member of the union is not gtCallCookie when call->IsVirtualStub(), but rather either gtInlineCandidateInfo or gtHandleHistogramProfileCandidateInfo. If so that makes it undefined behavior to access gtCallCookie in these cases.

I think all the call->IsVirtualStub() checks need to be first to avoid this UB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, revised all these.

Comment on lines 5981 to 5982
assert(call->gtCallCookie == nullptr || call->gtCallCookie->OperIs(GT_CNS_INT, GT_IND) ||
call->IsVirtualStub());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems especially problematic, aren't we going to be treating a HandleHistogramProfileCandidateInfo* or InlineCandidateInfo* as a GenTree* and calling OperIs on it?

@@ -1345,7 +1345,7 @@ InlineContext* InlineStrategy::NewContext(InlineContext* parentContext, Statemen
// ldarg instruction.
context->m_Location = stmt->GetDebugInfo().GetLocation();

assert(call->gtCallType == CT_USER_FUNC);
// assert(call->gtCallType == CT_USER_FUNC);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this?

@@ -2102,6 +2102,8 @@ GenTreeCall* Compiler::fgMorphArgs(GenTreeCall* call)
continue;
}

assert(!argx->OperIs(GT_NONE));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an odd assert to have in this specific place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was a leftover from tracking down a missing check for a valid cookie. Removed.

@AndyAyersMS
Copy link
Member Author

@egorbot --amd --filter "System.Collections.IterateForEachNonGeneric<String>.ArrayList*"

@amanasifkhalid
Copy link
Member

@EgorBot -amd --filter System.Collections.IterateForEachNonGeneric<String>.ArrayList*

Co-authored-by: Jakob Botsch Nielsen <[email protected]>
@AndyAyersMS
Copy link
Member Author

@davidwrighton can you take another look? I removed the parallel method desc computation...

@AndyAyersMS AndyAyersMS merged commit 39a31f0 into dotnet:main Jun 17, 2025
107 of 109 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants