Skip to content

Conversation

@JohanEngelen
Copy link
Member

An "inlined" function will be defined with the LLVM attribute available_externally, so that LLVM can decide to actually inline the function or not.

pragma(inline, false) --> the function receives the noinline attribute.
pragma(inline, true) --> the function receives the alwaysinline attribute and cross-module inlining for this function is performed even at -O0.

The current heuristic for (defining available_externally functions to enable) inlining functions is: true :-)
Very aggressive indeed; good for testing :). Perhaps a simple statement count heuristic can be used .

Functions that have __FILE__ as default template parameter need to always be inlined pragma(inline, true), otherwise it results in link errors. See the forum discussion:
https://forum.dlang.org/thread/[email protected]

To inline a function, semantic analysis needs to be completed for that function (DMD does the same). Doing that however, may add new symbols to the module member list for codegenning. This means that the order of how modules are codegenned needs to be reversed (or, members[0] needs to be codegenned last, as that one receives all new symbols). This showed up while testing Weka's codebase and cost me a lot of time. Unfortunately, I have not managed to recreate this problem in a testcase.

Tested succesfully on Weka's codebase.

}

// Set inlining attribute
if (fdecl->neverInline) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is of course completely orthogonal to your PR, but we should probably make LDC_never_inline just set ::inlining appropriately instead of having ::neverInline, now that the former exists.

@dnadlinger
Copy link
Member

dnadlinger commented Jun 23, 2016

In case you are wondering why I'm being excessively picky here: From trying to get this to work previously, I'm rather anxious about things randomly breaking with weird symptoms from trying to run semantic analysis out-of-order. In particular, I would sometimes end up with __traits(compiles, …) and template specialisations giving different results, or some of the counters used to determine the symbol name for lambdas/… not matching the actual non-inline runs.

In other words, I pretty much expect that we are going to have to come back to this piece of code a number of times to debug whatever issues turn up once this is used in the wild, which is why I think it makes sense to strive to make it as easily understandable as possible.

Along the same lines, I think it would make sense – whatever heuristic we actually end up with for isInlineCandidate – to add a hidden command line switch to disable cross-module inlining altogether. This way, we'll be able to get a quick handle on incoming bug reports.

@JohanEngelen
Copy link
Member Author

Thanks for your comments David. Trying to incorporate them all.

Indeed, a specific switch to turn this off is good to have (also for testing purposes).

@JohanEngelen
Copy link
Member Author

Note for future work. Some of the functions in the testcase are not inlined yet with this:
https://issues.dlang.org/show_bug.cgi?id=9646
functions _D3std5range10primitives12__T5emptyTaZ5emptyFNaNbNdNiNfxAaZb and _D3std9algorithm9searching34__T4findVAyaa6_61203d3d2062TAhTAhZ4findFNaNbNiNfAhAhZAh.

@JohanEngelen
Copy link
Member Author

JohanEngelen commented Jun 28, 2016

The AppVeyor -O3 bug appears to be an optimization bug with LLVM3.9 (can reproduce on Mac, but not with LLVM3.8).

Edit: nope, not at all. It's a bug in Phobos: dlang/phobos#4509

@JohanEngelen
Copy link
Member Author

For future work: nested functions are not chain-inlined.

pragma(inline, true) bool addAndCheckInsideNestedFunc(int checkbefore, int increment)
{
    pragma(inline, true)
    bool addCheckNested(int checkbefore, int increment)
    {
        return true;
    }

    return addCheckNested(checkbefore, increment);
}

@JohanEngelen
Copy link
Member Author

The only remaining problem is a linking problem: for LLVM < 3.7 we don't emit symbols in COMDAT any groups. So for some symbols there are multiple definitions link errors (e.g. initializers of structs nested inside an inlined function). Now I can either complicate the LDC logic. Or simple disable cross-module inlining for LLVM < 3.7. Strongly levitating toward the latter...

ddmd/globals.d Outdated
const(char)* ldc_version;
const(char)* llvm_version;

bool gaggedForInlining; // Set for functionSemantic3 for external inlining canditates
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just caught my eye, no time for a proper review: candidates

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed :)

@JohanEngelen
Copy link
Member Author

With disabling for LLVM < 3.7, it is all green now.
Inlining heuristic + cleanup and done!

@dnadlinger
Copy link
Member

Imho just inlining always-inline functions would be enough as a first step to get this merged. Coming up with/tuning a good heuristic for striking the balance between compile time and executable performance could take some time. (One possibility to think about: Different "thresholds" for -O2 and -O3.)

@JohanEngelen
Copy link
Member Author

The compile slowdown may not be too terrible. LLVM will do its optimizations first, and non-inlined functions are discarded before machine codegen. So I think we can live with a superdumb inlining heuristic, and leave the good stuff up to LLVM? I've implemented a simple statement counting heuristic now.

@JohanEngelen JohanEngelen force-pushed the inlining branch 2 times, most recently from d54ae98 to 12b98e0 Compare July 2, 2016 15:02
@JohanEngelen JohanEngelen changed the title [WIP] Implement cross-module inlining Implement cross-module inlining Jul 2, 2016
@JohanEngelen
Copy link
Member Author

no more "WIP" ! :)

checkNestedStruct_1();
checkNestedStruct_2();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many trailing newlines...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:/ fixed!

Logger::println("Does not need codegen, skipping.");
return;
// Force codegen if this is a templated function with pragma(inline, true).
if ((decl->members->dim == 1) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you ever get around to looking into using onemember?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I haven't yet. Thanks for the reminder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having looked more closely at it now, it is not completely clear to me how to work with the onemember functions and the AST :S. It'd be good to have a testcase where something that should be inlined isn't.
In some parts of DMD I see that instead decl.aliasdecl is checked to see if it is a template with eponymous function... I'm confused. Testcase needed! ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have misremembered how onemember works myself, then.

@dnadlinger
Copy link
Member

Looks good to merge to me.

uint oldgag = global.gag;
version (IN_LLVM)
{
if (global.gag && !spec && !global.gaggedForInlining)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be beneficial to discuss whether this is the "proper" way of doing it with upstream, although I understand if you don't want to spend the energy on it.

@dnadlinger
Copy link
Member

Random side note: It might be interesting to add diagnostic counters (like in LLVM proper) to defineAsExternallyAvailable to keep track of the different reasons for bailing out.

@JohanEngelen
Copy link
Member Author

Random side note: It might be interesting to add diagnostic counters

I've been thinking the same. Clang/LLVM's code seems to use an interface for reporting internal compiler diagnostics. Would be great to have something similar in LDC (also for e.g. PGO things).

@dnadlinger
Copy link
Member

AppVeyor actually ran is green. A unique opportunity.

@dnadlinger dnadlinger merged commit 75e1573 into ldc-developers:master Jul 4, 2016
@JohanEngelen
Copy link
Member Author

Great!

@kinke
Copy link
Member

kinke commented Jul 4, 2016

Nice work, thanks! I fast-forwarded the ldc Phobos branch accordingly.

AppVeyor actually ran is green. A unique opportunity.

;) AppVeyor is currently the only fully working CI system, the only green spark in a sea of red. ;) It does take a painful lot of time though.

@dnadlinger
Copy link
Member

AppVeyor is currently the only fully working CI system, the only green spark in a sea of red. ;) It does take a painful lot of time though.

That is true, but it's so slow that it's faster to verify that the only Travis failures are due to build timeouts than waiting for it to complete. ;)

@JohanEngelen
Copy link
Member Author

Checked just now: llvm.org/apt shows that the APT repo is now up-to-date with trunk head again, so CircleCI should be green soon (because it is at roughly the same version as AppVeyor).
At the moment it fails because a slightly more recent LLVM than AppVeyor has an API change. I already mailed the person working on it to restore the old default behavior (i.e., no need for us to make code modifications). Hopefully done in a few days, then CircleCI should be green too. Fingers crossed!

@9il
Copy link

9il commented Jul 5, 2016

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants