@@ -363,6 +363,73 @@ struct Precompute
363
363
}
364
364
}
365
365
366
+ void visitBlock (Block* curr) {
367
+ // When block precomputation fails, it can lead to quadratic slowness due to
368
+ // the "tower of blocks" pattern used to implement switches:
369
+ //
370
+ // (block
371
+ // (block
372
+ // ...
373
+ // (block
374
+ // (br_table ..
375
+ //
376
+ // If we try to precompute each block here, and fail on each, then we end up
377
+ // doing quadratic work. This is also wasted work as once a nested block
378
+ // fails to precompute there is not really a chance to succeed on the
379
+ // parent. If we do *not* fail to precompute, however, then we do want to
380
+ // precompute such nested blocks, e.g.:
381
+ //
382
+ // (block $out
383
+ // (block
384
+ // (br $out)
385
+ // )
386
+ // )
387
+ //
388
+ // Here we *can* precompute the inner block, so when we get to the outer one
389
+ // we see this:
390
+ //
391
+ // (block $out
392
+ // (br $out)
393
+ // )
394
+ //
395
+ // And that precomputes to nothing. Therefore when we see a child of the
396
+ // block that is another block (it failed to precompute to something
397
+ // simpler) then we leave early here.
398
+ //
399
+ // Note that in theory we could still precompute here if wasm had
400
+ // instructions that allow such things, e.g.:
401
+ //
402
+ // (block $out
403
+ // (block
404
+ // (cause side effect1)
405
+ // (cause side effect2)
406
+ // )
407
+ // (undo those side effects exactly)
408
+ // )
409
+ //
410
+ // We are forced to invent a side effect that we can precisely undo (unlike,
411
+ // say locals - a local.set would persist outside of the block, and even if
412
+ // we did another set to the original value, this pass doesn't track values
413
+ // that way). Only with that can we make the inner block un-precomputable
414
+ // (because there are side effects) but the outer one is (because those
415
+ // effects are undone). Note that it is critical that we have two things in
416
+ // the block, so that we can't precompute it to one of them (which is what
417
+ // we did to the br in the previous example). Note also that this is still
418
+ // optimizable using other passes, as merge-blocks will fold the two blocks
419
+ // together.
420
+ if (!curr->list .empty () && curr->list [0 ]->is <Block>()) {
421
+ // The first child is a block, that is, it could not be simplified, so
422
+ // this looks like the "tower of blocks" pattern. Avoid quadratic time
423
+ // here as explained above. (We could also look at other children of the
424
+ // block, but the only real-world pattern identified so far is on the
425
+ // first child, so keep things simple here.)
426
+ return ;
427
+ }
428
+
429
+ // Otherwise, precompute normally like all other expressions.
430
+ visitExpression (curr);
431
+ }
432
+
366
433
// If we failed to precompute a constant, perhaps we can still precompute part
367
434
// of an expression. Specifically, consider this case:
368
435
//
0 commit comments