Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Do not inline methods that never return #6103

Merged
merged 2 commits into from
Aug 5, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 36 additions & 2 deletions src/jit/flowgraph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5768,14 +5768,48 @@ void Compiler::fgFindBasicBlocks()

if (compIsForInlining())
{
bool hasReturnBlocks = false;
bool hasMoreThanOneReturnBlock = false;

for (BasicBlock* block = fgFirstBB; block != nullptr; block = block->bbNext)
{
if (block->bbJumpKind == BBJ_RETURN)
{
if (hasReturnBlocks)
{
hasMoreThanOneReturnBlock = true;
break;
}

hasReturnBlocks = true;
}
}

if (!hasReturnBlocks && !compInlineResult->UsesLegacyPolicy())
{
//
// Mark the call node as "no return". The inliner might ignore CALLEE_DOES_NOT_RETURN and
// fail inline for a different reasons. In that case we still want to make the "no return"
// information available to the caller as it can impact caller's code quality.
//

impInlineInfo->iciCall->gtCallMoreFlags |= GTF_CALL_M_DOES_NOT_RETURN;
}

compInlineResult->NoteBool(InlineObservation::CALLEE_DOES_NOT_RETURN, !hasReturnBlocks);

if (compInlineResult->IsFailure())
{
return;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry that this might be too general. I've seen cases where people have written performance-sensitive code that is effectively while (true) ... and then relied on exceptions to terminate processing, and such methods can benefit from being inlined.

Ideally we'd like to check that the inlinee does very little other than throw an exception.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen cases where people have written performance-sensitive code that is effectively while (true) ... and then relied on exceptions to terminate processing, and such methods can benefit from being inlined

Exception throwing has always been super slow path. I do not think there is any benefit in inlining infinite loops that are terminated by throwing exceptions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @jkotas already pointed out exceptions are very slow anyway. The typical throw sequence already needs at least 3 calls - new, ctor and throw - and the throw itself is very expensive.

Note that this does not affect methods like this one:

void ThrowIfNull(object value) {
    if (value == null)
        throw new ArgumentNullException();
}

Not inlining this would be bad since you'd pay the cost of a call even if value is not null.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the throw itself is very expensive

Some numbers: if in my test I pass a null to the Test method then the execution time is 50 minutes instead of 500ms! That's 6000 times slower than the no-exception case.

Copy link
Member

@AndyAyersMS AndyAyersMS Jul 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exceptions are slow. But the point is that X is postdominated by a throw does not imply X is not performance sensitive.

This is a tricky thing to get right because there are cases where a relatively large amount of computation is done solely for the purpose of throwing a more detailed exception.

My preference would be to start off conservatively here, only matching simple methods with this heuristic, for instance methods that are loop-free and relatively small. This should catch most of the idiomatic throw helper cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the point is that X is postdominated by a throw does not imply X is not performance sensitive.

I had similar doubts myself back when I did this experiment but I just can't find a reasonable use case that could be impacted by this. That X needs to be small enough for the method to be an inline candidate and it has to contain a loop otherwise the performance X will be dwarfed by the throw. And then the loop performance needs to be better when inlined due to optimizations such as constant propagation, otherwise the only impact we could possible measure would be the delay of the loop start by the time taken by the call.

I'd love to see an example that shows that this is indeed a problem. The current implementation already fails to reject inlining in some cases where there's no reason to inline, I'm somewhat reluctant to impose additional limitations.

for instance methods that are loop-free

I'll have to check but I'm not sure there's any loop information available during inlining. Probably the next best thing would be to just match a single BB of type BBJ_THROW, possibly preceded by a list of BBJ_ALWAYS BBs. To make it even more conservative we could match only BBJ_THROW containing a single statement.

and relatively small

Wouldn't this be redundant as the inliner already takes code size into account?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the current inliner -- via LegacyPolicy -- is conservative with respect to size of inlinees. But I'd rather not bake that assumption in here. Going forward, policies will likely be somewhat less conservative.

It would be simple enough here to just impose a limit on the number of BB's. Or turn this into an informational observation and let the policy decide whether or not to inline, and set the limits there.

noway_assert(info.compXcptnsCount == 0);
compHndBBtab = impInlineInfo->InlinerCompiler->compHndBBtab;
compHndBBtabAllocCount = impInlineInfo->InlinerCompiler->compHndBBtabAllocCount; // we probably only use the table, not add to it.
compHndBBtabCount = impInlineInfo->InlinerCompiler->compHndBBtabCount;
info.compXcptnsCount = impInlineInfo->InlinerCompiler->info.compXcptnsCount;

if (info.compRetNativeType != TYP_VOID &&
fgMoreThanOneReturnBlock())
if (info.compRetNativeType != TYP_VOID && hasMoreThanOneReturnBlock)
{
// The lifetime of this var might expand multiple BBs. So it is a long lifetime compiler temp.
lvaInlineeReturnSpillTemp = lvaGrabTemp(false DEBUGARG("Inline candidate multiple BBJ_RETURN spill temp"));
Expand Down
3 changes: 3 additions & 0 deletions src/jit/gentree.h
Original file line number Diff line number Diff line change
Expand Up @@ -2873,6 +2873,7 @@ struct GenTreeCall final : public GenTree
// know when these flags are set.

#define GTF_CALL_M_R2R_REL_INDIRECT 0x2000 // GT_CALL -- ready to run call is indirected through a relative address
#define GTF_CALL_M_DOES_NOT_RETURN 0x4000 // GT_CALL -- call does not return

bool IsUnmanaged() const { return (gtFlags & GTF_CALL_UNMANAGED) != 0; }
bool NeedsNullCheck() const { return (gtFlags & GTF_CALL_NULLCHECK) != 0; }
Expand Down Expand Up @@ -2993,6 +2994,8 @@ struct GenTreeCall final : public GenTree

bool IsVarargs() const { return (gtCallMoreFlags & GTF_CALL_M_VARARGS) != 0; }

bool IsNoReturn() const { return (gtCallMoreFlags & GTF_CALL_M_DOES_NOT_RETURN) != 0; }

unsigned short gtCallMoreFlags; // in addition to gtFlags

unsigned char gtCallType :3; // value from the gtCallTypes enumeration
Expand Down
1 change: 1 addition & 0 deletions src/jit/inline.def
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ INLINE_OBSERVATION(ARG_FEEDS_RANGE_CHECK, bool, "argument feeds range chec
INLINE_OBSERVATION(BEGIN_OPCODE_SCAN, bool, "prepare to look at opcodes", INFORMATION, CALLEE)
INLINE_OBSERVATION(BELOW_ALWAYS_INLINE_SIZE, bool, "below ALWAYS_INLINE size", INFORMATION, CALLEE)
INLINE_OBSERVATION(CLASS_PROMOTABLE, bool, "promotable value class", INFORMATION, CALLEE)
INLINE_OBSERVATION(DOES_NOT_RETURN, bool, "does not return", INFORMATION, CALLEE)
INLINE_OBSERVATION(END_OPCODE_SCAN, bool, "done looking at opcodes", INFORMATION, CALLEE)
INLINE_OBSERVATION(HAS_SIMD, bool, "has SIMD arg, local, or ret", INFORMATION, CALLEE)
INLINE_OBSERVATION(HAS_SWITCH, bool, "has switch", INFORMATION, CALLEE)
Expand Down
2 changes: 1 addition & 1 deletion src/jit/inline.h
Original file line number Diff line number Diff line change
Expand Up @@ -563,7 +563,7 @@ struct InlineInfo
bool hasSIMDTypeArgLocalOrReturn;
#endif // FEATURE_SIMD

GenTree * iciCall; // The GT_CALL node to be inlined.
GenTreeCall * iciCall; // The GT_CALL node to be inlined.
GenTree * iciStmt; // The statement iciCall is in.
BasicBlock * iciBlock; // The basic block iciStmt is in.
};
Expand Down
107 changes: 100 additions & 7 deletions src/jit/inlinepolicy.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -77,21 +77,24 @@ InlinePolicy* InlinePolicy::GetPolicy(Compiler* compiler, bool isPrejitRoot)

#endif // defined(DEBUG) || defined(INLINE_DATA)

InlinePolicy* policy = nullptr;
// Optionally install the ModelPolicy.
bool useModelPolicy = JitConfig.JitInlinePolicyModel() != 0;

if (useModelPolicy)
{
// Optionally install the ModelPolicy.
policy = new (compiler, CMK_Inlining) ModelPolicy(compiler, isPrejitRoot);
return new (compiler, CMK_Inlining) ModelPolicy(compiler, isPrejitRoot);
}
else

// Optionally fallback to the original legacy policy
bool useLegacyPolicy = JitConfig.JitInlinePolicyLegacy() != 0;

if (useLegacyPolicy)
{
// Use the legacy policy
policy = new (compiler, CMK_Inlining) LegacyPolicy(compiler, isPrejitRoot);
return new (compiler, CMK_Inlining) LegacyPolicy(compiler, isPrejitRoot);
}

return policy;
// Use the enhanced legacy policy by default
return new (compiler, CMK_Inlining) EnhancedLegacyPolicy(compiler, isPrejitRoot);
}

//------------------------------------------------------------------------
Expand Down Expand Up @@ -850,6 +853,96 @@ int LegacyPolicy::CodeSizeEstimate()
}
}

//------------------------------------------------------------------------
// NoteBool: handle a boolean observation with non-fatal impact
//
// Arguments:
// obs - the current obsevation
// value - the value of the observation

void EnhancedLegacyPolicy::NoteBool(InlineObservation obs, bool value)
{
switch (obs)
{
case InlineObservation::CALLEE_DOES_NOT_RETURN:
m_IsNoReturn = value;
m_IsNoReturnKnown = true;
break;

default:
// Pass all other information to the legacy policy
LegacyPolicy::NoteBool(obs, value);
break;
}
}

//------------------------------------------------------------------------
// NoteInt: handle an observed integer value
//
// Arguments:
// obs - the current obsevation
// value - the value being observed

void EnhancedLegacyPolicy::NoteInt(InlineObservation obs, int value)
{
switch (obs)
{
case InlineObservation::CALLEE_NUMBER_OF_BASIC_BLOCKS:
{
assert(value != 0);
assert(m_IsNoReturnKnown);

//
// Let's be conservative for now and reject inlining of "no return" methods only
// if the callee contains a single basic block. This covers most of the use cases
// (typical throw helpers simply do "throw new X();" and so they have a single block)
// without affecting more exotic cases (loops that do actual work for example) where
// failure to inline could negatively impact code quality.
//

unsigned basicBlockCount = static_cast<unsigned>(value);

if (m_IsNoReturn && (basicBlockCount == 1))
{
SetNever(InlineObservation::CALLEE_DOES_NOT_RETURN);
}
else
{
LegacyPolicy::NoteInt(obs, value);
}

break;
}

default:
// Pass all other information to the legacy policy
LegacyPolicy::NoteInt(obs, value);
break;
}
}

//------------------------------------------------------------------------
// PropagateNeverToRuntime: determine if a never result should cause the
// method to be marked as un-inlinable.

bool EnhancedLegacyPolicy::PropagateNeverToRuntime() const
{
//
// Do not propagate the "no return" observation. If we do this then future inlining
// attempts will fail immediately without marking the call node as "no return".
// This can have an adverse impact on caller's code quality as it may have to preserve
// registers across the call.
// TODO-Throughput: We should persist the "no return" information in the runtime
// so we don't need to re-analyze the inlinee all the time.
//

bool propagate = (m_Observation != InlineObservation::CALLEE_DOES_NOT_RETURN);

propagate &= LegacyPolicy::PropagateNeverToRuntime();

return propagate;
}

#ifdef DEBUG

//------------------------------------------------------------------------
Expand Down
29 changes: 29 additions & 0 deletions src/jit/inlinepolicy.h
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,35 @@ class LegacyPolicy : public LegalPolicy
bool m_MethodIsMostlyLoadStore :1;
};

// EnhancedLegacyPolicy extends the legacy policy by rejecting
// inlining of methods that never return because they throw.

class EnhancedLegacyPolicy : public LegacyPolicy
{
public:
EnhancedLegacyPolicy(Compiler* compiler, bool isPrejitRoot)
: LegacyPolicy(compiler, isPrejitRoot)
, m_IsNoReturn(false)
, m_IsNoReturnKnown(false)
{
// empty
}

// Policy observations
void NoteBool(InlineObservation obs, bool value) override;
void NoteInt(InlineObservation obs, int value) override;

// Policy policies
bool PropagateNeverToRuntime() const override;
bool IsLegacyPolicy() const override { return false; }

protected:

// Data members
bool m_IsNoReturn :1;
bool m_IsNoReturnKnown :1;
};

#ifdef DEBUG

// RandomPolicy implements a policy that inlines at random.
Expand Down
1 change: 1 addition & 0 deletions src/jit/jitconfigvalues.h
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ CONFIG_STRING(JitNoInlineRange, W("JitNoInlineRange"))
CONFIG_STRING(JitInlineReplayFile, W("JitInlineReplayFile"))
#endif // defined(DEBUG) || defined(INLINE_DATA)

CONFIG_INTEGER(JitInlinePolicyLegacy, W("JitInlinePolicyLegacy"), 0)
CONFIG_INTEGER(JitInlinePolicyModel, W("JitInlinePolicyModel"), 0)

#undef CONFIG_INTEGER
Expand Down
25 changes: 25 additions & 0 deletions src/jit/morph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8035,6 +8035,31 @@ GenTreePtr Compiler::fgMorphCall(GenTreeCall* call)
return result;
}

if (call->IsNoReturn())
{
//
// If we know that the call does not return then we can set fgRemoveRestOfBlock
// to remove all subsequent statements and change the call's basic block to BBJ_THROW.
// As a result the compiler won't need to preserve live registers across the call.
//
// This isn't need for tail calls as there shouldn't be any code after the call anyway.
// Besides, the tail call code is part of the epilog and converting the block to
// BBJ_THROW would result in the tail call being dropped as the epilog is generated
// only for BBJ_RETURN blocks.
//
// Currently this doesn't work for non-void callees. Some of the code that handles
// fgRemoveRestOfBlock expects the tree to have GTF_EXCEPT flag set but call nodes
// do not have this flag by default. We could add the flag here but the proper solution
// would be to replace the return expression with a local var node during inlining
// so the rest of the call tree stays in a separate statement. That statement can then
// be removed by fgRemoveRestOfBlock without needing to add GTF_EXCEPT anywhere.
//

if (!call->IsTailCall() && call->TypeGet() == TYP_VOID)
{
fgRemoveRestOfBlock = true;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somebody else on the jit team can chime in, but there should be some way to indicate that the return value is not actually going to be produced, so that this restriction can be lifted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was primarily caused by an assert I hit in a test. It would seem that the call node needs to have GTF_EXCEPT if fgRemoveRestOfBlock is set. I find this confusing, how come GTF_EXCEPT is not always set on calls to begin with? Anyway, I didn't look more into this because such cases should be rare.


return call;
}
Expand Down
75 changes: 75 additions & 0 deletions tests/src/JIT/Performance/CodeQuality/Inlining/NoThrowInline.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using Microsoft.Xunit.Performance;
using System;
using System.Linq;
using System.Runtime.CompilerServices;
using System.Reflection;
using System.Collections.Generic;
using Xunit;

[assembly: OptimizeForBenchmarks]
[assembly: MeasureInstructionsRetired]

public static class NoThrowInline
{
#if DEBUG
public const int Iterations = 1;
#else
public const int Iterations = 100000000;
#endif

static void ThrowIfNull(string s)
{
if (s == null)
ThrowArgumentNullException();
}

static void ThrowArgumentNullException()
{
throw new ArgumentNullException();
}

//
// We expect ThrowArgumentNullException to not be inlined into Bench, the throw code is pretty
// large and throws are extremly slow. However, we need to be careful not to degrade the
// non-exception path performance by preserving registers across the call. For this the compiler
// will have to understand that ThrowArgumentNullException never returns and omit the register
// preservation code.
//
// For example, the Bench method below has 4 arguments (all passed in registers on x64) and fairly
// typical argument validation code. If the compiler does not inline ThrowArgumentNullException
// and does not make use of the "no return" information then all 4 register arguments will have
// to be spilled and then reloaded. That would add 8 unnecessary memory accesses.
//

[MethodImpl(MethodImplOptions.NoInlining)]
static int Bench(string a, string b, string c, string d)
{
ThrowIfNull(a);
ThrowIfNull(b);
ThrowIfNull(c);
ThrowIfNull(d);

return a.Length + b.Length + c.Length + d.Length;
}

[Benchmark]
public static void Test()
{
foreach (var iteration in Benchmark.Iterations)
{
using (iteration.StartMeasurement())
{
Bench("a", "bc", "def", "ghij");
}
}
}

public static int Main()
{
return (Bench("a", "bc", "def", "ghij") == 10) ? 100 : -1;
}
}
Loading