Description
When a caller argument is an exact type and feeds a virtual or interface call in the callee, we might want to inline more aggressively.
A toy example of this can be found in this BenchmarkDotNet sample. Here Run
, if inlined, would allow the interface calls to devirtualize.
Run
is currently pretty far from being a viable inline candidate:
Inline candidate callsite is boring. Multiplier increased to 1.3.
calleeNativeSizeEstimate=545
callsiteNativeSizeEstimate=115
benefit multiplier=1.3
threshold=149
Native estimate for function size exceeds threshold for inlining 54.5 > 14.9 (multiplier = 1.3)
Inline expansion aborted, inline not profitable
INLINER: during 'fgInline' result 'failed this call site' reason 'unprofitable inline' for
'Jit_InterfaceMethod:Run1():double:this' calling 'Jit_InterfaceMethod:Run(ref):double:this'
Caller knows that the argument to Run
is exact:
lvaUpdateClass: Updating class for V01 (00007FF966344CD0) Foo1 to be exact
[000029] ------------ * STMT void (IL 0x00C... ???)
[000027] I-C-G------- \--* CALL double Jit_InterfaceMethod.Run
[000025] ------------ this in rcx +--* LCL_VAR ref V00 this
[000026] ------------ arg1 \--* LCL_VAR ref V01 loc0
Likely we would not give a ~3.65x boost to inlining benefit based on one argument reaching one call site. But if we also realized the call site was in a loop perhaps the net effect would be enough to justify an inline.
Currently we don't know when observing arg uses whether that use is in a loop or not. But if we were to associate uses with callee IL offsets we could circle back after finding all the branch targets and develop a crude estimator for loop depth, then sum up the weighted observations.
It would also be nice to tabulate a few more opportunities of this kind. The basic observational part change is simple enough to prototype that perhaps just building it is one way to make forward progress.
category:cq
theme:inlining
skill-level:intermediate
cost:medium