-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Alternate approach for HELPER_METHOD_FRAME removal for arithmetic div… #113286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternate approach for HELPER_METHOD_FRAME removal for arithmetic div… #113286
Conversation
…ide operations - Instead of moving to managed - For non 64-bit platforms, remove the helpers - For Unix platforms, rely on [[clang::musttail]] to tail-call a helper which throws (current implementation still uses FCThrow, but that should be relatively easy to move to a call to the managed throw routines - For Windows X86, rewrite the helpers in assembly with a tail-call approach for throwing. - As part of writing the helpers in assembly, it was noted that the stack layout is swapped from the underyling CRT routines for 64bit arithmetic. Tweak the helper calls for those cases to pass in data in the order which the CRT is going to handle them efficiently. Do this for all architectures/OSs as other ones don't really care about this weirdness. - Also it was noted that the existing helpers do a fair bit of unnecessary stack manipulation, and the manual rewrite avoids all of that. Maybe this will actually be faster. TODO: Bump jit version TODO: Bump R2R version TODO: Actually use managed helper to throw instead of keeping the FCThrow in place
src/coreclr/vm/jithelpers.cpp
Outdated
// Using the musttail attribute requires that we disble our protection against using return in the wrong place | ||
#ifdef return | ||
#undef return | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related: I'm trying to remove the cases outside of HMFs that engage this infra as it continues to cause us problems in places.
src/coreclr/vm/jithelpers.cpp
Outdated
{ | ||
if (divisor == 0) | ||
{ | ||
[[clang::musttail]] return JIT_ThrowDivideByZero_RetInt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can define MUSTTAIL for this:
#if TARGET_UNIX
#ifdef __clang__
#define MUSTTAIL [[clang::musttail]]
#else // __clang__
#define MUSTTAIL [[gnu::musttail]]
#endif // __clang__
#endif // TARGET_UNIX
src/coreclr/vm/jithelpers.cpp
Outdated
{ | ||
FCALL_CONTRACT; | ||
|
||
FCThrow(kOverflowException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we directly inline them in JIT? there already are CORINFO_HELP_OVERFLOW
and CORINFO_HELP_THROWDIVZERO
exposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is just an experiment to see how musttail works, and if it can be used to help solve the HMF problem without changing the JIT/etc for this issue. Given the number of test failures, something clearly didn't go completely right with my change here, so I want to understand what I've missed.
src/coreclr/inc/jithelpers.h
Outdated
// CORINFO_HELP_DBL2INT, CORINFO_HELP_DBL2UINT, and CORINFO_HELP_DBL2LONG get | ||
// patched for CPUs that support SSE2 (P4 and above). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// CORINFO_HELP_DBL2INT, CORINFO_HELP_DBL2UINT, and CORINFO_HELP_DBL2LONG get | |
// patched for CPUs that support SSE2 (P4 and above). |
Outdated comment
…the original order (Sacrifices perf, but avoid needinga bunch of optimizer changes) Adjust the negate path for the X86 hand coded assembly for JIT_LDiv to load edx before jumping to JIT_LDiv_DoNegate
… the FCThrow machinery
…hat the musttail feature doesn't work on Linux X86 very well, but it DOES seem to work on arm, so there is at least 1 platform where it works
src/coreclr/vm/jithelpers.cpp
Outdated
|
||
#if TARGET_UNIX | ||
#ifdef __clang__ | ||
#define MUSTTAIL [[clang::musttail]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the performance concern was only for win-x86 and performance optimizations aren't critical for linux-arm and linux-x86. Moreover, it seems like MUSTTAIL is only to accelerate exception path with no effect on the actual div/mod ops? If this is the case, then I think we can just drop this optimization even if it was for win-x86 because perf concern was about the div/mod (non-exception) happy path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MUSTTAIL logic here isn't an optimization as without HMFs we can't through an exception from an FCALL, but I agree, it doesn't look like it has worked out as a reasonable way to make this work simple. My current plan is to keep the work I've done to port the Windows X86 helpers to handcoded assembly, and merge it with @am11's work that is more general purpose, and then call it good.
…te_approach_to_removingHMF_for_arithmetic
There was also a pure managed experiment somewhere in the middle using generic math with no FCall that was passing all tests: runtime/src/libraries/System.Private.CoreLib/src/System/Math.DivModInt.cs Lines 180 to 243 in e51389e
It was quite slow EgorBot/runtime-utils#193. (was probably dreaming that it could outperform the native implementation 🥲) The FCall version you've applied was performing much better EgorBot/runtime-utils#191, which is fine for 32-bit linux. |
[MethodImpl(MethodImplOptions.InternalCall)] | ||
private static extern uint DivUInt32Internal(uint dividend, uint divisor); | ||
|
||
#if NATIVEAOT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be same between NAOT and regular CoreCLR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AOT could use InternalCall (with revert of RuntimeHelpers.cs
file). There was a slight regression with QCall on coreclr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, my point is that there is no reason for two to be different.
(I do not have a strong opinion on FCall vs. QCall here as long as it is the same.)
…te_approach_to_removingHMF_for_arithmetic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Number are looking promising EgorBot/runtime-utils#336 (comment) (0.55) 🚀
@@ -1,6 +1,7 @@ | |||
// Licensed to the .NET Foundation under one or more agreements. | |||
// The .NET Foundation licenses this file to you under the MIT license. | |||
|
|||
using System; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: undo
…te_approach_to_removingHMF_for_arithmetic
a17d34e
to
0eb3870
Compare
src/coreclr/inc/jithelpers.h
Outdated
#else // TARGET_64BIT | ||
DYNAMICJITHELPER(CORINFO_HELP_LMUL_OVF, NULL, METHOD__NIL) | ||
DYNAMICJITHELPER(CORINFO_HELP_ULMUL_OVF, NULL, METHOD__NIL) | ||
JITHELPER(CORINFO_HELP_LDIV, NULL, METHOD__NIL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Padding for the METHOD__NIL
column is off.
src/coreclr/inc/jithelpers.h
Outdated
DYNAMICJITHELPER(CORINFO_HELP_ULMOD, NULL, METHOD__MATH__MOD_UINT64) | ||
#endif // TARGET_X86 && TARGET_WINDOWS | ||
#else // TARGET_64BIT | ||
DYNAMICJITHELPER(CORINFO_HELP_LMUL_OVF, NULL, METHOD__NIL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If CORINFO_HELP_LMUL/ULMUL_OVF
are unused on 64-bit, I would expect them to be JITHELPER
.
If they are used, I would expect them to look like the 32-bit ones and be outside the #ifndef TARGET_64BIT
ifdef.
…ach_to_removingHMF_for_arithmetic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Thanks for this @davidwrighton! For the remaining alloc-like HMFs, I think we can either use managed runtime/src/coreclr/vm/jithelpers.cpp Line 622 in b3188c5
cc @AaronRobinsonMSFT |
@davidwrighton @am11 this causes an issue on the runtime sdk codeflow on win-x86: dotnet/sdk#48034
|
Ah, nvm, it is also failing in AOT outerloop: https://github.com/dotnet/runtime/runs/40052903604. Although we haven't touched these exports. 👀 |
Ok thanks, I'll prepare a revert since this is blocking the codeflow. |
This is a combination of @am11's work in PR #109087 and some work to just rewrite the Windows X86 helpers in assembly.