Skip to content

[NFC] LocalGraph: Optimize params with no sets #6046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 24, 2023
Merged

[NFC] LocalGraph: Optimize params with no sets #6046

merged 4 commits into from
Oct 24, 2023

Conversation

kripken
Copy link
Member

@kripken kripken commented Oct 23, 2023

If a local index has no sets, then all gets of that index read from the entry block
(a param, or a zero for a local). This is actually a common case, where a param
has no other set, and so it is worth optimizing, which this PR does by avoiding
any flowing operation at all for that index: we just skip and write the entry block
as the source of information for such gets.

#6042 on precompute-propagate goes from 3 minutes to 2 seconds with this (!).
But that testcase is rather special in that it is a huge function with many, many
gets in it, so the overhead we remove here is very noticeable.

@kripken kripken requested a review from tlively October 23, 2023 23:55
Copy link
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@@ -119,6 +119,10 @@ struct Flower : public CFGWalker<Flower, Visitor<Flower>, Info> {
basicToFlowMap[basicBlocks[i].get()] = &flowBlocks[i];
}

// We note which local indexes have local.sets, as that can help us
// optimize later (if there are none at all).
std::unordered_set<Index> hasSet;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by review:
Would this be better as a bit vector? Is it known how many params and locals the function has at this point in the code? It seems like even though params without sets are common, most indexes will have sets (since most will be locals rather than params, and I'd expect most of those to have sets). So maybe allocating bits up front will be cheaper than allocating enough hash table space for the used indexes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll measure that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't measure a difference with that, but it does seem better so I switched to it.

@kripken
Copy link
Member Author

kripken commented Oct 24, 2023

Unfortunately this PR is too simple, as the fuzzer noticed we are wrong in unreachable code: if there are no sets, we may still not read from the entry block, if we are unreachable and do not reach it.

Hopefully there is a way to make this work that is still efficient enough.

@tlively
Copy link
Member

tlively commented Oct 24, 2023

What's wrong with assuming that unreachable gets read from the entry block? That seems like it should be a safe overapproximation of program behavior.

@kripken
Copy link
Member Author

kripken commented Oct 24, 2023

Yeah, I suspect it is safe as you said. But it does hit some assertions atm that require us to actually be precise. I'm not sure yet if it's worth removing or relaxing those.

@kripken
Copy link
Member Author

kripken commented Oct 24, 2023

Detecting unreachability turns out to be trivial here, so it seems the best course. That way we keep the IR consistent and don't need to remove any existing assertions, and it won't cause any confusion during future debugging sessions etc.

@kripken kripken enabled auto-merge (squash) October 24, 2023 17:26
@kripken kripken merged commit 92c8a46 into main Oct 24, 2023
@kripken kripken deleted the lg.fast.2 branch October 24, 2023 17:37
kripken added a commit that referenced this pull request Oct 24, 2023
Followup to #6046 - the fuzzer found we missed handling the case of the entry itself
being unreachable, or of an unreachable loop later. Properly identifying unreachable
code requires a flow analysis, unfortunately, so this PR gives up on that and instead
allows LocalGraph to be imprecise in unreachable code. That avoids adding any
overhead, but does mean the IR may be slightly confusing when debugging. It does
not have any optimization downsides, however, as it only affects unreachable code.

Also add a dump() impl in that file which helps debugging.
radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
If a local index has no sets, then all gets of that index read from the entry block
(a param, or a zero for a local). This is actually a common case, where a param
has no other set, and so it is worth optimizing, which this PR does by avoiding
any flowing operation at all for that index: we just skip and write the entry block
as the source of information for such gets.

WebAssembly#6042 on precompute-propagate goes from 3 minutes to 2 seconds with this (!).
But that testcase is rather special in that it is a huge function with many, many
gets in it, so the overhead we remove is very noticeable there.
radekdoulik pushed a commit to dotnet/binaryen that referenced this pull request Jul 12, 2024
…ssembly#6048)

Followup to WebAssembly#6046 - the fuzzer found we missed handling the case of the entry itself
being unreachable, or of an unreachable loop later. Properly identifying unreachable
code requires a flow analysis, unfortunately, so this PR gives up on that and instead
allows LocalGraph to be imprecise in unreachable code. That avoids adding any
overhead, but does mean the IR may be slightly confusing when debugging. It does
not have any optimization downsides, however, as it only affects unreachable code.

Also add a dump() impl in that file which helps debugging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants