Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

JIT: enable inline pinvoke in more cases #8263

Merged
merged 2 commits into from
Dec 6, 2016

Conversation

AndyAyersMS
Copy link
Member

An inline pinvoke is a pinvoke where the managed/native transition
overhead is reduced by inlining parts of the transition bookkeeping
around the call site. A normal pinvoke does this bookkeeping in
a stub method that interposes between the managed caller and the
native callee.

Previously the jit would not allow pinvoke calls that came from inlines
to be optimized via inline pinvoke. This sometimes caused performance
surprises for users who wrap DLL imports with managed methods. See for
instance #2373.

Inline pinvokes were also disallowed for x64 for pinvoke calls in try
regions. This was a holdover from earlier runtimes; more general
runtime support was eventually added, but the jit continued to follow
older restrictions.

This change lifts both of those limitations. Pinvokes from inlined
method bodies are now given the same treatment as pinvokes in the
root method, and pinvokes in try regions can be optimzied via inline
pinvoke. The legality check for inline pinvokes has been streamlined
slightly to remove a redundant check. Inline pinvokes introduced
by inlining are handled by accumulating the unmanaged method count
with the value from inlinees, and deferring insertion of the special
basic blocks until after inlining, so that if the only inline pinvokes
come from inline instances they are still properly processed.

Inline pinvokes are still disallowed in filter and handler regions
(catches and finallies).

Inline pinvokes are also now suppressed in rarely executed blocks,
for instance blocks leading up to throws or similar.

This change also adds a new test case that shows the variety of
situations that can occur with pinvoke, inlining, and EH.

The inliner is now also changed to preferentially report inline
reasons as forced instead of always when both are applicable.

@AndyAyersMS
Copy link
Member Author

@JosephTremoulet PTAL
cc @dotnet/jit-contrib

@dotnet-bot test Windows_NT jitstress1
@dotnet-bot test Windows_NT jitstress2
@dotnet-bot test Windows_NT gcstress0x3
@dotnet-bot test Windows_NT gcstress0xC

@AndyAyersMS
Copy link
Member Author

Looking into the x86 legacy failure now.

Inline PInvoke size cost on x64 is roughly 60 bytes in prolog/epilog and 70 bytes per call site. It appears the inline frame is often placed far from RBP so accesses require 32 bit offsets and this contributes a fair amount to the per-site size cost. Seems a bit odd that frame layout (from what I can tell) does not take static appearance frequency into account. Or perhaps there is some runtime constraint on layout or other layout critieria I'm not aware of. At any rate the inline frame is large and only so many locals can be close to RBP, so the fact that the frame touches require larger offsets may be a benchmarky artifact and in large methods everything is far from RBP and large offsets are the norm.

PInvoke overhead is also amplified in methods with EH because any method with an inline pinvoke must do full callee save register save/restore so that any GC ref left by a caller in a callee-save can be found (the GC will not look at unmanaged frames for roots) -- and RyuJit will save/restore the same register set in each funclet that it does in the main method. So when a method has both inline pinvoke and EH the funclet prologs and epilogs are also large.

Since the inliner doesn't model the size overheads of inline pinvoke there is some risk of unexpectedly large code size increases with this change. I have considered restricting the inline case to just force inlines but am on the fence about it and working on gathering more data.

Code size diffs for x64 below. Probably not super-representative since FX dlls won't have many direct pinvokes and crossgen won't inline cross assembly. Will do similar measurements on desktop or give SPMI on CoreCLR a try...

Total bytes of diff: 5479 (0.06 % of base)
    diff is a regression.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes
Top file regressions by size (bytes):
        5479 : System.Private.CoreLib.dasm (0.17 % of base)
1 total files with size differences.
Top method regessions by size (bytes):
         378 : System.Private.CoreLib.dasm - System.IO.File:FillAttributeInfo(ref,byref,bool,bool):int
         360 : System.Private.CoreLib.dasm - System.Reflection.Emit.TypeNameBuilder:ConstructAssemblyQualifiedNameWorker(ref,int):this
         320 : System.Private.CoreLib.dasm - System.Reflection.Emit.TypeNameBuilder:AddElementType(ref):this
         288 : System.Private.CoreLib.dasm - System.Security.SecureString:MarshalToStringCore(bool,bool):long:this
         254 : System.Private.CoreLib.dasm - System.Security.SafeBSTRHandle:ClearBuffer():this
Top method improvements by size (bytes):
         -18 : System.Private.CoreLib.dasm - DomainNeutralILStubClass:IL_STUB_PInvoke(int,ref,ref,byref,ref,byref,byref):bool
34 total methods with size differences.

@AndyAyersMS
Copy link
Member Author

Can repro the x86 failure in regression\CLR-x86-JIT\V1-M11-Beta1\b36471. In this case there is a now inline pinvoke with the above changes from an inline of GC.Collect, which leads to this assert:

Assertion failed 'VarSetOps::Equal(this, lastlife, block->bbLiveOut)' 
   in 'Test.AA:Static3()' (IL size 43)
File: c:\repos\coreclr\src\jit\stackfp.cpp Line: 4145

Issue seems to be that inline pinvokes force the one return transformation, which then kicks in (despite there only being one return block to begin with) and sets the one return block to BBF_DONT_REMOVE. Because of this, even though the return block is unreachable it hangs around. In methods with inline PInvoke the return block gets special liveness treatment, so we end up in the bad situation where the local frame is marked as live into the return block but the return block is unreachable; hence the assert.

The assert comes from the x87 float analysis and so only impacts the x86 legacy backend.

Note this is likely a pre-existing issue; it should fire in test cases where there is a float used in the method and a root method inline pinvoke followed by a throw. I will try and work up a repro that triggers the failure without my changes.

Note that the current method for "keep alive" of the inline frame relies on the return block being reachable (with some consideration for tail calls), so there may be a more general problem here (again, pre-existing) where the methodology of special-casing liveness for BBJ_RETURN blocks is insufficient to actually keep the inline frame live. Also the special casing is scattered throughout the code base and ideally would be replaced by some more general mechanism.

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Nov 24, 2016

Analysis above was a bit off -- it's not the unreachable return that was causing problems, it was the implicit use of the local frame var coupled with the fact that the var is not kept alive, so the local frame var was dying in a pinvoke block, but the stack FP liveness check did not model this, while regular liveness did. So they were out of sync.

But there was indeed a repro that did not require the changes in this PR. Added that as another test case.

The local frame var likely should not be dying at all, but that's a separate issue.

Copy link

@JosephTremoulet JosephTremoulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few nits.

// in the inlined Frame (see CORINFO_EE_INFO::InlinedCallFrameInfo::offsetOfGSCookie),
// but this would not protect framelets/return-address of handlers.
//
// Until recently, we also disallowed inline pinvoke in try regions on x64.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this sort of comment (especially with the "recently" that will soon be stale) tends to confuse more than elucidate, IMO -- I'd drop this line and let the commit message (where you already go into more detail) cover that change.


if (block->isRunRarely())
{
return;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment?

}
// For inline cases we technically should look at both the current
// block and the call site block (or just the latter if we've
// fused the EH trees. However the block-related checks pertain to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: missing close-paren before period.

@AndyAyersMS
Copy link
Member Author

Desktop testing has uncovered two failing test cases that need further investigation. Will try and port these over to CoreCLR.

Code size impact on desktop (via SPMI) looks reasonable -- in fact a small net decrease -- mainly from not using inline pinvoke in rarely run blocks.

@AndyAyersMS
Copy link
Member Author

Updated to restore the "not within try" restriction for now for x64. Not 100% sure what goes wrong but the runtime clearly gets confused about an inline pinvoke in a try region. Left a note in the code about the problematic test case. Unfortunately this case can't easily be ported over to CoreCLR.

Rebased and squashed.

Revised comments on the test case to reflect the try restriction.

Incorporated review feedback.

Jit-format locally wants to change a totally unrelated line in importer.cpp; let's see what the CI version says.

@JosephTremoulet
Copy link

Still LGTM. FWIW. IMO. etc.

@AndyAyersMS
Copy link
Member Author

Looks like CI has the same idea about formatting in importer.cpp. @adiaaida any idea why? I didn't change this line...

22:18:15 At Line 6207 Before:
22:18:15     nextOpcode = (OPCODE)getU1LittleEndian(codeAddrOfNextOpcode);
22:18:15 After:
22:18:15     nextOpcode          = (OPCODE)getU1LittleEndian(codeAddrOfNextOpcode);

// Arguments:
// callRetTyp - return type of the call
// block - block contaning the call, or for inlinees, block
// contaiing the call being inlined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: contaiing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(multiple places)

//
// Arguments:
// callRetTyp - return type of the call
// block - block contaning the call, or for inlinees, block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: contaning

// in the inlined Frame (see CORINFO_EE_INFO::InlinedCallFrameInfo::offsetOfGSCookie),
// but this would not protect framelets/return-address of handlers.
//
// On x64, we disable pinvoke inlining inside of try regions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - I think that this comment was more appropriate where it was before in the implementation - right before the condition and inside the #ifdef _TARGET_X64_.

//
// We have to disable pinvoke inlining inside of filters
// because in case the main execution (i.e. in the try block) is inside
// unmanaged code, we cannot reuse the inlined stub (we still need the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth mentioning that this only applies for runtimes that support managed/unmanaged exception handling interop that means desktop. We do not support exception handling interop anywhere else (though, CoreCLR on Windows is gray area because of it is not explicitly disabled there today).

// On x64, we disable pinvoke inlining inside of try regions.
// Here is the comment from JIT64 explaining why:
//
// [VSWhidbey: 611015] - because the jitted code links in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also only applies for runtimes that support managed/unmanaged exception handling interop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a good idea to explicitly disable the exception handling interop for CoreCLR on Windows, so that we do need to worry about it and pessimize the codegen for it. It probably does not work well anyway because of we do not have any tests for it. @janvorli understands the CoreCLR exception handling best - he should be able to help you with it.

In any case, it should be safe to disable this restriction on Unix at least because of the exception handling interop is explicitly disabled there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #8459 for removing the interop support and jit restrictions.

An inline pinvoke is a pinvoke where the managed/native transition
overhead is reduced by inlining parts of the transition bookkeeping
around the call site. A normal pinvoke does this bookkeeping in
a stub method that interposes between the managed caller and the
native callee.

Previously the jit would not allow pinvoke calls that came from inlines
to be optimized via inline pinvoke. This sometimes caused performance
surprises for users who wrap DLL imports with managed methods. See for
instance #2373.

This change lifts this limitation. Pinvokes from inlined method bodies
are now given the same treatment as pinvokes in the root method. The
legality check for inline pinvokes has been streamlined slightly to
remove a redundant check. Inline pinvokes introduced by inlining are
handled by accumulating the unmanaged method count with the value from
inlinees, and deferring insertion of the special basic blocks until after
inlining, so that if the only inline pinvokes come from inline instances
they are still properly processed.

Inline pinvokes are still disallowed in try and handler regions
(catches, filters, and finallies).

X87 liveness tracking was updated to handle the implicit inline frame
var references. This was a pre-existing issue that now can show up more
frequently. Added a test case that fails with the stock legacy jit
(and also with the new enhancements to pinvoke). Now both the original
failing case and this case pass.

Inline pinvokes are also now suppressed in rarely executed blocks,
for instance blocks leading up to throws or similar.

The inliner is now also changed to preferentially report inline
reasons as forced instead of always when both are applicable.

This change adds a new test case that shows the variety of
situations that can occur with pinvoke, inlining, and EH.
Still honoring windows exception interop restrictions on all platforms
and runtimes. Will revisit when addressing #8459.
@AndyAyersMS
Copy link
Member Author

Updated to pull in the changes from #8437, and incorporate Jan's feedback.

Have not relaxed restrictions yet, will do so when addressing #8459.

@jkotas
Copy link
Member

jkotas commented Dec 5, 2016

Thanks!

@michellemcdaniel
Copy link

michellemcdaniel commented Dec 5, 2016

@AndyAyersMS I have narrowed the formatting weirdness down to the following repro:

void Compiler::impImportNewObjArray()
{
#ifdef COR_JIT_EE_VERSION
#endif
}

bool Compiler::impCanPInvokeInlineCallSite()
{
#ifdef _TARGET_AMD64_
#else
    const bool inX64Try = false;
#endif // _TARGET_AMD64_

    return !inX64Try;
}

bool Compiler::impIsTailCallILPattern()
{
#ifdef _TARGET_AMD64_
#else
    nextOpcode = (OPCODE)getU1LittleEndian(codeAddrOfNextOpcode);
#endif
}

Remove any of that, and it goes away. For some reason, clang-format has decided that the nextOpcode line that you had to change and the const bool inX64Try = false; line you added should have their ='s aligned. I cannot for the life of me figure out why. So I think this is one that we're going to have to live with for now.

@AndyAyersMS
Copy link
Member Author

@adiaaida thanks for investigating.

I am just going to go with the suggested changes for now.

@michellemcdaniel
Copy link

@AndyAyersMS Sounds good. I opened a bug with the clang people to look into this.

@AndyAyersMS
Copy link
Member Author

Did another round of desktop testing and all seems well. So, time to merge.

@AndyAyersMS AndyAyersMS merged commit 3de3940 into dotnet:master Dec 6, 2016
@AndyAyersMS AndyAyersMS deleted the InlineInlinePinvoke branch December 6, 2016 19:12
@karelz karelz added this to the 2.0.0 milestone Aug 28, 2017
@karelz karelz added this to the 2.0.0 milestone Aug 28, 2017
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…invoke

JIT: enable inline pinvoke in more cases

Commit migrated from dotnet/coreclr@3de3940
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants