Skip to content

Micro optimizations to improve the performance of EH stackwalking, particularly in the X86 with Funclets model #114582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 22, 2025

Conversation

davidwrighton
Copy link
Member

@davidwrighton davidwrighton commented Apr 11, 2025

  • Implement EHEnumInitFromStackFrameIterator as a SuppressGCTransition QCall and optimize its performance
    • This allows skipping setting the InlinedCallFrame to indicate that it is an EH frame (as suppress GC transition frames are NOT generated in that situation)
    • On X86 this also allows skipping using an EH prolog for this function
    • Only Update the MethodRegionInfo if there are EH regions to walk
  • Improve the codegen and reduce the usage of UpdateRuntimeWrappedExceptions api
    • Previously we would call IsRuntimeWrappedExceptions, which would lazily compute the flag. However, since we can't actually run the lazy computation during EH, we had already forced it to be initialized, so we didn't actually need to have the full lazy computation logic in place.
    • Also, we were setting this flag as we walked the stack frame via SfiInit and SfiNext, but only checking it when parsing the EH clause data. Move the computation to the EHEnumInitFromStackFrameIterator api, and only compute the correct version of the flag IF there are clauses to walk.
    • Only Update the IsRuntimeWrappedExceptions flag and the MethodRegionInfo if there are EH regions to walk
    • For reflection emitted modules, always consider the status of this flag to be computed, but if we emit the controlling attribute, adjust the current cached computed value. NOTE: this is a minor breaking change from the previous behavior.
  • Improve the performance of AppendExceptionStackFrame slightly be using the variant of GCX_COOP which takes a Thread* instead of getting it from the TLS data.
  • Improve the performance of StackTraceInfo::AppendElement
    • It always calls EnsureStackTraceArray which ALSO needs to have a protected GC variable. Instead of doing that locally in EnsureStackTraceArray, instead make the GCPROTECT in EnsureStackTraceArray be a bit larger. This allows avoiding modifying the TLS linked list of GCFrames, as well as avoids needing an x86 EH prolog for EnsureStackTraceArray
    • Update ExceptionObject::GetStackTrace to use an out of line copy of code to clone the stack trace array in the presence of the multi-threaded scenario. This avoids the EH prolog on X86.
    • Update ExceptionObject::GetStackTrace to avoid needing to regather the current thread, instead taking it as a parameter
    • Change ExceptionObject::GetStackTraceParts to use a faster technique for checking to see if the array is an sbyte array or an object[].
  • Update NotifyFunctionEnter to check CORProfileTrackExceptions before calling the various profiler reporting functions. This allows the check to happen only once instead of 4 times, and also allowed me to outline some logic so that the function didn't need an EH prolog on X86
  • For handling of the m_emptyDebuggerExState on the ThreadExceptionState object, move that into a global static variable to improve the performance of calls of the ThreadExceptionState::GetDebuggerState api, and add a new ThreadExceptionState::SetDebuggerIndicatedFramePointer api to avoid even touching the empty debugger state
  • Refactor the EECodeInfo::DecodeGCHdrInfo function into a fast inlineable path, and a slow path that does a lot of work for better inlining behavior.

This PR improves the performance of deep stack EH throwing by micro-optimizing a number of scenarios. In particular in the Windows X86 Funclet model, it achieves about a 15% improvement to a simple benchmark.

…rticularly in the X86 with Funclets model

- Implement EHEnumInitFromStackFrameIterator as a SuppressGCTransition QCall and optimize its performance
  - This allows skipping setting hte InlinedCallFrame to indicate that it is an EH frame (as suppress GC transition frames are NOT generated in that situation)
  - On X86 this also allows skipping using an EH prolog for this function
  - Only Update the MethodRegionInfo if there are EH regions to walk
- Improve the codegen and reduce the usage of UpdateRuntimeWrappedExceptions api
  - Previously we would call IsRuntimeWrappedExceptions, which would lazily compute the flag. However, since we can't actually run the lazy computation during EH, we had already forced it to be initialized, so we didn't actually need to have the full lazy computation logic in place.
  - Also, we were setting this flag as we walked the stack frame via SfiInit and SfiNext, but only checkinging it when parsing the EH clause data. Move the computation to the EHEnumInitFromStackFrameIterator api, and only compute the correct version of the flag IF there are clauses to walk.
  - Only Update the IsRuntimeWrappedExceptions flag and the MethodRegionInfo if there are EH regions to walk
- Improve the performance of AppendExceptionStackFrame slightly be using the variant of GCX_COOP which takes a Thread* instead of getting it from the TLS data.
- Improve the performance of StackTraceInfo::AppendElement
  - It always calls EnsureStackTraceArray which ALSO needs to have a protected GC variable. Instead of doing that locally in EnsureStackTraceArray, instead make the GCPROTECT in EnsureStackTraceArray be a bit larger. This allows avoiding modifying the TLS linked list of GCFrames, as well as avoids needing an x86 EH prolog for EnsureStackTraceArray
  -  Update ExceptionObject::GetStackTrace to use an out of line copy of code to clone the stack trace array in the presence of the multi-threaded scenario. This avoids the EH prolog on X86.
  - Update ExceptionObject::GetStackTrace to avoid needing to regather the current thread, instead taking it as a parameter
  - Change ExceptionObject::GetStackTraceParts to use a faster technique for checking to see if the array is an sbyte array or an object[].
- Update NotifyFunctionEnter to check CORProfileTrackExceptions before calling the various profiler reporting functions. This allows the check to happen only once instead of 4 times, and also allowed me to outline some logic so that the function didn't need an EH prolog on X86
- For handling of the m_emptyDebuggerExState on the ThreadExceptionState object, move that into a global static variable to improve the performance of calls of the ThreadExceptionState::GetDebuggerState api, and add a new ThreadExceptionState::SetDebuggerIndicatedFramePointer api to avoid even touching the empty debugger state
- Refactor the EECodeInfo::DecodeGCHdrInfo function into a fast inlineable path, and a slow path that does a lot of work.
@Copilot Copilot AI review requested due to automatic review settings April 11, 2025 23:24
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

@@ -56,8 +56,13 @@ struct StackTraceElement

class StackTraceInfo
{
struct StackTraceArrayProtect
Copy link
Preview

Copilot AI Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding inline documentation to clarify the roles of m_pStackTraceArray and m_pStackTraceArrayNew within StackTraceArrayProtect to improve future maintainability.

Copilot uses AI. Check for mistakes.

}
CONTRACTL_END

_ASSERTE(IsRuntimeWrapExceptionsStatusComputed());
Copy link
Preview

Copilot AI Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be helpful to document the assumption that the runtime wrap exception status is computed before calling IsRuntimeWrapExceptionsDuringEH, to aid future maintainers in understanding the precondition.

Copilot uses AI. Check for mistakes.

GCPROTECT_BEGIN(newStackTrace);

size_t stackTraceCapacity = pStackTrace->Capacity();
size_t stackTraceCapacity = pStackTraceProtected->m_pStackTraceArray.Capacity();
Copy link
Preview

Copilot AI Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a comment to explain the use of m_pStackTraceArray within the StackTraceArrayProtect structure, to clarify how capacity and copying are managed in the new implementation.

Copilot uses AI. Check for mistakes.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -291,6 +291,8 @@ ExceptionFlags* ThreadExceptionState::GetFlags()
#if !defined(DACCESS_COMPILE)

#ifdef DEBUGGING_SUPPORTED
static DebuggerExState m_emptyDebuggerExState;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static DebuggerExState m_emptyDebuggerExState;
static DebuggerExState s_emptyDebuggerExState;


#if defined(_MSC_VER)
#pragma warning(default : 4640)
#endif
return &m_emptyDebuggerExState;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return &m_emptyDebuggerExState;
return &s_emptyDebuggerExState;

_ASSERTE(hdrInfoSize != 0);
m_hdrInfoTable = (PTR_CBYTE)gcInfoToken.Info + hdrInfoSize;
}
GCInfoToken gcInfoToken = GetGCInfoToken();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GCInfoToken gcInfoToken = GetGCInfoToken();
_ASSERTE(m_hdrInfoTable == NULL);
GCInfoToken gcInfoToken = GetGCInfoToken();

@filipnavara
Copy link
Member

filipnavara commented Apr 13, 2025

Some of the test failures look legit:

> C:\h\w\A8C60910\w\A155092C\e\baseservices\baseservices\../callconvs/TestCallingConventions/TestCallingConventions.cmd

Assert failure(PID 3332 [0x00000d04], Thread: 1432 [0x0598]): pExInfo->m_pMDToReportFunctionLeave != NULL

CORECLR! NotifyExceptionPassStarted + 0x190 (0x00007fff`a273a730)
CORECLR! SfiInit + 0x128 (0x00007fff`a273d158)
SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007fff`87c416d4)
SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x00007fff`87c40cc8)
CORECLR! CallDescrWorkerInternal + 0x84 (0x00007fff`a2441f94)
CORECLR! CallDescrWorkerWithHandler + 0x120 (0x00007fff`a25bc5e8)
CORECLR! DispatchCallSimple + 0xF8 (0x00007fff`a25bce28)
CORECLR! DispatchManagedException + 0x238 (0x00007fff`a27387f8)
CORECLR! IL_Throw + 0x160 (0x00007fff`a2637b30)
<no module>! <no symbol> + 0x0 (0x00007fff`42f74094)
    File: D:\a\_work\1\s\src\coreclr\vm\exceptionhandling.cpp:3680
    Image: C:\h\w\A8C60910\p\corerun.exe

I didn't try to track it down yet, but it seems related to the NotifyFunctionEnter change.

@filipnavara
Copy link
Member

Seems like the tests passed. Can we get this merged or is there still outstanding work to be done?

@davidwrighton davidwrighton merged commit 2224669 into dotnet:main Apr 22, 2025
96 checks passed
@filipnavara
Copy link
Member

Thanks a lot!

@davidwrighton
Copy link
Member Author

@filipnavara I'm sorry about the delay, I've been unavoidably away from work for the last week, and I'm only now getting back. I didn't get a chance to look into the debugger scenarios, so I'll try to see if I can get that done some time this week, but I may not be able to.

@filipnavara
Copy link
Member

No worries. I appreciate any help with this! Thanks for keeping me in the loop.

mikem8361 pushed a commit to mikem8361/runtime that referenced this pull request Apr 24, 2025
This was caused by changes in PR dotnet#114582 where
Exception::GetStackTrace called GetThread() which throws an error exception in the
DAC. Changed the enummem.cpp code in ClrDataAccess::DumpManagedExcepObject() to call
the GetStackTrace overload that allows a NULL pCurrentThread parameter to be passed.
mikem8361 pushed a commit that referenced this pull request Apr 25, 2025
)

This was caused by changes in PR #114582 where
Exception::GetStackTrace called GetThread() which throws an error exception in the
DAC. Changed the enummem.cpp code in ClrDataAccess::DumpManagedExcepObject() to call
the GetStackTrace overload that allows a NULL pCurrentThread parameter to be passed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants