-
Notifications
You must be signed in to change notification settings - Fork 787
[NFC] LocalGraph: Optimize params with no sets #6046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
src/ir/LocalGraph.cpp
Outdated
@@ -119,6 +119,10 @@ struct Flower : public CFGWalker<Flower, Visitor<Flower>, Info> { | |||
basicToFlowMap[basicBlocks[i].get()] = &flowBlocks[i]; | |||
} | |||
|
|||
// We note which local indexes have local.sets, as that can help us | |||
// optimize later (if there are none at all). | |||
std::unordered_set<Index> hasSet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by review:
Would this be better as a bit vector? Is it known how many params and locals the function has at this point in the code? It seems like even though params without sets are common, most indexes will have sets (since most will be locals rather than params, and I'd expect most of those to have sets). So maybe allocating bits up front will be cheaper than allocating enough hash table space for the used indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll measure that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't measure a difference with that, but it does seem better so I switched to it.
Unfortunately this PR is too simple, as the fuzzer noticed we are wrong in unreachable code: if there are no sets, we may still not read from the entry block, if we are unreachable and do not reach it. Hopefully there is a way to make this work that is still efficient enough. |
What's wrong with assuming that unreachable gets read from the entry block? That seems like it should be a safe overapproximation of program behavior. |
Yeah, I suspect it is safe as you said. But it does hit some assertions atm that require us to actually be precise. I'm not sure yet if it's worth removing or relaxing those. |
Detecting unreachability turns out to be trivial here, so it seems the best course. That way we keep the IR consistent and don't need to remove any existing assertions, and it won't cause any confusion during future debugging sessions etc. |
Followup to #6046 - the fuzzer found we missed handling the case of the entry itself being unreachable, or of an unreachable loop later. Properly identifying unreachable code requires a flow analysis, unfortunately, so this PR gives up on that and instead allows LocalGraph to be imprecise in unreachable code. That avoids adding any overhead, but does mean the IR may be slightly confusing when debugging. It does not have any optimization downsides, however, as it only affects unreachable code. Also add a dump() impl in that file which helps debugging.
If a local index has no sets, then all gets of that index read from the entry block (a param, or a zero for a local). This is actually a common case, where a param has no other set, and so it is worth optimizing, which this PR does by avoiding any flowing operation at all for that index: we just skip and write the entry block as the source of information for such gets. WebAssembly#6042 on precompute-propagate goes from 3 minutes to 2 seconds with this (!). But that testcase is rather special in that it is a huge function with many, many gets in it, so the overhead we remove is very noticeable there.
…ssembly#6048) Followup to WebAssembly#6046 - the fuzzer found we missed handling the case of the entry itself being unreachable, or of an unreachable loop later. Properly identifying unreachable code requires a flow analysis, unfortunately, so this PR gives up on that and instead allows LocalGraph to be imprecise in unreachable code. That avoids adding any overhead, but does mean the IR may be slightly confusing when debugging. It does not have any optimization downsides, however, as it only affects unreachable code. Also add a dump() impl in that file which helps debugging.
If a local index has no sets, then all gets of that index read from the entry block
(a param, or a zero for a local). This is actually a common case, where a param
has no other set, and so it is worth optimizing, which this PR does by avoiding
any flowing operation at all for that index: we just skip and write the entry block
as the source of information for such gets.
#6042 on
precompute-propagate
goes from 3 minutes to 2 seconds with this (!).But that testcase is rather special in that it is a huge function with many, many
gets in it, so the overhead we remove here is very noticeable.