Skip to content

Commit 2d99e10

Browse files
authored
[NFC] Avoid quadratic time when precomputing blocks (#6862)
When precomputing fails on a child block of a parent block, there is no point to precompute the parent, as that will fail as well. This makes --precompute on Emscripten's test_biggerswitch go from 1.44 seconds to 0.02 seconds (not a typo, that is 72x faster). The absolute number is not that big, but we do run this pass more than once, so it saves a noticeable chunk of time.
1 parent 60bd610 commit 2d99e10

File tree

1 file changed

+67
-0
lines changed

1 file changed

+67
-0
lines changed

src/passes/Precompute.cpp

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -363,6 +363,73 @@ struct Precompute
363363
}
364364
}
365365

366+
void visitBlock(Block* curr) {
367+
// When block precomputation fails, it can lead to quadratic slowness due to
368+
// the "tower of blocks" pattern used to implement switches:
369+
//
370+
// (block
371+
// (block
372+
// ...
373+
// (block
374+
// (br_table ..
375+
//
376+
// If we try to precompute each block here, and fail on each, then we end up
377+
// doing quadratic work. This is also wasted work as once a nested block
378+
// fails to precompute there is not really a chance to succeed on the
379+
// parent. If we do *not* fail to precompute, however, then we do want to
380+
// precompute such nested blocks, e.g.:
381+
//
382+
// (block $out
383+
// (block
384+
// (br $out)
385+
// )
386+
// )
387+
//
388+
// Here we *can* precompute the inner block, so when we get to the outer one
389+
// we see this:
390+
//
391+
// (block $out
392+
// (br $out)
393+
// )
394+
//
395+
// And that precomputes to nothing. Therefore when we see a child of the
396+
// block that is another block (it failed to precompute to something
397+
// simpler) then we leave early here.
398+
//
399+
// Note that in theory we could still precompute here if wasm had
400+
// instructions that allow such things, e.g.:
401+
//
402+
// (block $out
403+
// (block
404+
// (cause side effect1)
405+
// (cause side effect2)
406+
// )
407+
// (undo those side effects exactly)
408+
// )
409+
//
410+
// We are forced to invent a side effect that we can precisely undo (unlike,
411+
// say locals - a local.set would persist outside of the block, and even if
412+
// we did another set to the original value, this pass doesn't track values
413+
// that way). Only with that can we make the inner block un-precomputable
414+
// (because there are side effects) but the outer one is (because those
415+
// effects are undone). Note that it is critical that we have two things in
416+
// the block, so that we can't precompute it to one of them (which is what
417+
// we did to the br in the previous example). Note also that this is still
418+
// optimizable using other passes, as merge-blocks will fold the two blocks
419+
// together.
420+
if (!curr->list.empty() && curr->list[0]->is<Block>()) {
421+
// The first child is a block, that is, it could not be simplified, so
422+
// this looks like the "tower of blocks" pattern. Avoid quadratic time
423+
// here as explained above. (We could also look at other children of the
424+
// block, but the only real-world pattern identified so far is on the
425+
// first child, so keep things simple here.)
426+
return;
427+
}
428+
429+
// Otherwise, precompute normally like all other expressions.
430+
visitExpression(curr);
431+
}
432+
366433
// If we failed to precompute a constant, perhaps we can still precompute part
367434
// of an expression. Specifically, consider this case:
368435
//

0 commit comments

Comments
 (0)