Skip to content

Improve startup perf by avoiding JIT when invoking well-known signatures #115345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

steveharter
Copy link
Contributor

@steveharter steveharter commented May 6, 2025

This is the main contribution for tracking issue #112994.

For a R2R-enabled console "hello world" app, it reduced the jitted methods during startup by half (24 to 12) saving ~9% or 7ms (locally from 78->71ms).

For a non-R2R app there will be no jit necessary for each well-known signature after the first is jitted since they are cached. They are cacheable since the IL uses calli with no coupling to specific MemberInfo tokens. Note that it is possible to add calli and caching to the emit-based path (non-well-known cases) as was done in the prior PR for this; that would allow similar perf benefits of avoiding jit (+emit) for signatures that have already done that, but also adds the complexity and overhead of a cache that needs to key on each parameter.

Currently, the well-known method list is currently fairly short but will still cover many signatures because:

  • Reference type parameters are normalized to System.Object.
  • All primitive types and enums, along with common value types like DateTime and Guid, are included in property getter and setter signatures.

Design \ implementation notes:

  • To support efficient, clean calls to property setters and getters for the intrinsics, 2 delegates were added to InvokerEmitUtil.cs for a total of 5. This also pushed the code to add more shared methods in that file.
    • The 5 delegates were also changed to add the function pointer arg, which currently is only used for the intrinsic case. By passing in an unused IntPtr.Zero for the other cases, this avoids having additional delegates and conditionals to call a different delegate.
  • For the various files that implement invoke:
    • Added support for the 2 new delegates.
    • Methods are based more on the InvokerStrategy than arg count; this removes some conditionals. Note that when the interpreted path is removed, more more conditionals can be removed from these files.
    • The interpreted path is only used for CoreClr if the feature flag ForceInterpretedInvoke is set. This code should be removed shortly. Previously, the interpreted path is used for the first call to each method, then the method is emit'd on the second call but since we have the intrinsics now for startup\warmup, we should no longer have to do this. This also simplified the code since we no longer need to switch dynamically, and we moved from 3 fields to hold the 3 delegates to 1 field with appropriate casts to one of the 5 delegates.
    • The InvocationFlags was moved from the invoker classes to the MethodInfo, etc. classes. It was found that if reflection is used for inspection, but not invoke, it would sometimes unnecessarily create the invoker class.
    • The case for calling a constructor on an existing instance was supported previously in a separate file that used the interpreted path. Since we want to remove that path, we now track and pass in an additional bool for that case (e.g. "callCtorAsMethod").
      • This also makes the throughput performance multiple times faster in this case; throughput performance in other areas is a wash - some slightly faster, some slightly slower.
    • To allow the new intrinsics to be used for constructors -- which is important for some tested applications that use factory and DI patterns -- the alloc for the intrinsic case is done via CreateUnitializedObject and then the constructor is called as a normal method.
      • The call to CreateUnitializedObject was made fast by avoiding some layers - see the _allocator field.
    • The delegate field on each invoker class is based on 1 of the 5 different delegate types, and when called will then call one of these 3 cases (this approach avoids conditionals and allows validation and marshalling logic to be shared):
      • The new intrinsic.
      • The emit'd dynamic method.
      • A wrapper to call the interpreted \ native C++ path which for CoreClr should be removed shortly.
  • For the new intrinsic support:
    • There are static properties to cache each well-known method with lazy support.
      • There is a special return transform for Enum getters.
    • Currently, due to small number of well-known methods and logic to find the correct intrinsic, the property-based caching approach is simpler and faster than using a hashtable. Once the look-up logic finds the appropriate intrinsic, the delegate to that is set on the appropriate invoker class.

@steveharter steveharter added this to the 10.0.0 milestone May 6, 2025
@steveharter steveharter self-assigned this May 6, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-reflection
See info in area-owners.md if you want to be subscribed.

… to use interpreted mode on first call to each method
Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants