Share trap mode between asm2wasm and s2wasm #1168

jgravelle-google · 2017-09-06T19:53:44Z

Reimplement handling trapping float ops as a binaryen pass.

Test changes should be just:

Reordering generated functions (the timing of when they were added was changed)
Un-inlining some generated functions (inlining happens before the trap mode pass can)
Adding new functions for more general IR (e.g. f32-to-i32)

dschuff

Haven't looked at asm2wasm.h yet.

Can we add at least one minimal s2wasm test too?

dschuff · 2017-09-06T21:17:15Z

scripts/test/asm2wasm.py

+)
+
+
+def test_asm2wasm():


Any changes in this content, or just moved out from shared?

Just moved from shared

dschuff · 2017-09-06T21:27:54Z

src/passes/FloatTrap.h

+ */
+
+//
+// Pass that supports potentially-trapping float operations.


Maybe just a little more detail about what it actually does?

dschuff · 2017-09-06T21:34:50Z

src/passes/FloatTrap.cpp

+
+Expression* FloatTrap::makeTrappingUnary(Unary* curr) {
+  Name name = getUnaryFuncName(curr);
+  if (mode == Mode::Allow || !name.is()) {


Would make sense the call to getUnaryFunctionName conditional on the mode being something other than allow? Then we wouldn't run the switch statement unless we knew we actually had to do something.

Come to think of it, why do we even have the allow mode? Doesn't that make the whole pass a no-op?

Allow does make the whole pass a no-op. It seems cleaner than conditionally adding the pass if the flag is set.
Could probably do something more clever by not walking the module if Allow, by overriding one of the Walker methods. Would still introduce a sequence point by being another non-parallel pass.

dschuff · 2017-09-06T21:36:41Z

src/passes/FloatTrap.cpp

+  // is precisely like JS but in order to do that we do a slow ffi
+  // If i64, there is no "JS" way to handle this, as no i64s in JS, so always clamp if we don't allow traps
+  if (curr->type != i64 && mode == Mode::JS) {
+    // WebAssembly traps on float-to-int overflows, but asm.js wouldn't, so we must emulate that


Nit: We aren't really "emulating" it at all, we are literally actually calling JS to do the conversion.

Almost all of the comments were just grabbed as-is from asm2wasm

... which makes references to asm.js all the more misleading.

And "handle" is probably the better word here?

dschuff · 2017-09-06T21:38:02Z

src/passes/FloatTrap.cpp

+  // If i64, there is no "JS" way to handle this, as no i64s in JS, so always clamp if we don't allow traps
+  if (curr->type != i64 && mode == Mode::JS) {
+    // WebAssembly traps on float-to-int overflows, but asm.js wouldn't, so we must emulate that
+    CallImport *ret = allocator.alloc<CallImport>();


Not that I want you to necessarily rewrite this whole patch, but would using a Builder make code significantly better?

Indeed it does a853c0b , thanks

dschuff · 2017-09-06T21:41:13Z

src/passes/FloatTrap.cpp

+  func->body = builder.makeUnary(truncOp,
+    builder.makeGetLocal(0, type)
+  );
+  // too small XXX this is different than asm.js, which does frem. here we


is the 'XXX' meant for you to follow up on before commit?

dschuff · 2017-09-06T21:43:39Z

src/tools/s2wasm.cpp

@@ -81,6 +83,24 @@ int main(int argc, const char *argv[]) {
           [&allowMemoryGrowth](Options *, const std::string &) {
             allowMemoryGrowth = true;
           })
+      .add("--emit-potential-traps", "",


I get that this matches asm2wasm, but I feel like we should have made this flag something like --trap-mode=<enum>. Like, if Allow is the default, this flag is really a no-op. Maybe a followup fix?

Well we don't have this hooked up in Emscripten yet, so now seems like a great time to make an asymmetric change.

we should keep it matching asm2wasm, though? perhaps we should change both

Yeah we can change both. But I agree that since we don't have the s2wasm one hooked up yet, that we might as well put what we finally want in s2wasm, then maybe fix asm2wasm in a followup PR.

Well we'll need to make an Emscripten patch to change the plumbing for asm2wasm anyway, we can bundle that with the patch to add plumbing for s2wasm, and we can keep them the same in Binaryen in the meantime.

So:
Keep this as is for now
Follow up with a patch to Binaryen to change both sets of args to --trap-mode=<blah>
At the same time submit a patch to Emscripten to add s2wasm's invocation and change asm2wasm's.

kripken · 2017-09-06T22:12:14Z

src/passes/FloatTrap.cpp

+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+


Needs a comment explaining what the pass does

If we keep the .h and .cpp, should we duplicate the comment across both, leave it in the header, or move it to the implementation?

kripken · 2017-09-06T22:14:59Z

src/passes/FloatTrap.h

@@ -0,0 +1,70 @@
+/*


having a pass with a .h file feels wrong to me. a pass should be self-contained. how about a header file with the just the constants etc., probably under ast/?

for adding the pass to be run, can use the string name, instead of the class identifier.

The pass needs an argument though, which means we need to have the Mode enum and constructors visible to callers.

It's weird and I'm not sure where to put it. It isn't registered by name with the pass system, so it doesn't really belong in passes/, but it is a pass over the module. Originally I had it all in a header file in src/, but that seemed wrong.
ast/ seems fine to me, thanks.

How about making separate passes for the modes: one pass would do the clamp mode, and one would do the JS mode? We can define them all in a single pass cpp file (see e.g. simplify-locals pass which defines 4). Internally it would be a flag inside the pass class.

Then the users would add the right pass of the 2 (or not add any pass if we allow traps). Might add a little utility function for that in the header.

Yeah that's way cleaner, and has the benefit of letting us run this in wasm-opt as well, which means I can write some finer-grained tests for it.

kripken · 2017-09-06T22:16:50Z

src/passes/FloatTrap.h

+// Pass that supports potentially-trapping float operations.
+//
+
+#ifndef wasm_passes_FloatTrap_h


why float trap as a name - doesn't this handle int div/rem of 0 also, etc.?

That it does. How about TrapModePass?

i'm terrible at names, but maybe src/ast/trapping.h, enum is TrapMode?

kripken · 2017-09-06T22:17:34Z

test/debugInfo.fromasm

@@ -47,17 +46,11 @@
    (get_local $0)
   )
  )
+  ;;@ even-opted.cpp:3:0


why are test results changing? should this be just a refactoring without functional changes?

My PR description was pretty sparse on detail, and I forgot to write down at least one case:

Because we change the timing of when we convert things to calls, optimizations happen in different orders. This lets constant folding be more effective, as seen in big_uint_div_u (in test/unit.fromasm), because we can constant fold the i32.div_u instruction directly, and then it doesn't generate a clamping call at all.

In this case, we add clamping after the inliner has run, so i32s-rem gets generated and called, instead of generated, used once, inlined, and removed.

Essentially: semantically these should all be the same, but the timing isn't identical and that perturbs the optimizer.

Hm, interesting; it's actually not obvious whether it's better to clamp as early as possible (since it does actually seem better to just inline and remove i32s-rem if there's only one use), or later (since const-folding could eliminate calls before they get generated). Anyway we can land this one way and maybe experiment with that later.

Hmm, yeah, looks like in big_uint_div_u we manage to optimize more. But I'm not sure this is what we want. I believe what's going on is that we have an unsigned -1 / 2 operation, which precompute happily precomputes, leaving the result. But if this operation required clamping or js-ing, then that precomputing would have been wrong, wouldn't it? For example if it traps, it would be precomputed into a trap.

Instead, could we run this pass first, before everything else? Probably will need to do that in two places: when optimizing, right before passRunner.add<AutoDrop>(), and when not optimizing, right before passRunner.add<FinalizeCalls>(this).

Side note, the test result from before this PR could also be optimized with a full constant propagation pass, which we don't have yet. We should probably add that eventually, marked #1172.

I currently run it as early as possible, which is necessarily after AutoDrop because the IR isn't valid before AutoDrop runs - asm2wasm seems to rely on AutoDrop cleaning it up, and we validate after each pass. We could run this separately in a separate PassRunner, or run AutoDrop and only AutoDrop separately in a separate PassRunner.

The other complication w/r/t timing in asm2wasm is the OptimizingIncrementalModuleBuilder that we use when running optimization passes that runs passes on each function - meaning the passes added to it need to be function-parallel, which we fundamentally can't make the clamping pass be because it adds functions to the module and thus introduces a sequence point for pass scheduling. Which we can probably work around with a separate pass runner. Maybe.

The other downside to adding the functions before any optimizations, is that then in the pass we have to explicitly exclude any generated functions when running on functions, because otherwise we'll replace the i32.div_u instruction in the $i32u-div function with a call to $i32u-div.

The pass also gains a precondition, "must have called helper function addTrappingFunctions before running this pass".

These both make the pass less pass-like, but probably does a better job of maintaining asm2wasm's behavior exactly as it is today. I was operating under the assumption that that's a non-goal as long as the semantics are the same.

I agree those two are downsides. But isn't the issue that we must do it that way to maintain the right semantics? Otherwise if the optimizer kicks in before we modify these math operations, it could optimize them incorrectly, as it's not seeing the correct math operations.

Otherwise we'd need to not run the optimizer until we run that pass, which would be substantially slower (we'd need to finish translating all functions before we start to optimize).

Another option might be to have some utility code that adds the functions, and also has code to do an individual fixup. And separately we'd have a pass that adds the functions and does a traversal to do all the fixups, using that. So the pass could be 100% self-contained. And asm2wasm would not use that pass, it would call the utility code directly to do the fixups during translation (as it does now, but in a call to shared code), and also call the utility method to add the functions.

Sounds very plausible. Let me try to add a test that exposes that behavior.

I originally implemented this as a pass for s2wasm and asm2wasm just calling utility functions directly, basically changing nothing about its flow (see 8ffe1c9). I was able to use the pass for asm2wasm as well, which let me move the utility functions into the pass itself, which let me store the added functions on an instance variable on the pass object, instead of being a static variable, and generally make the interface much nicer.
Clean code does not trump working code, so I'll fix it.

Added an INT_MIN / -1 test in 691cf5e. It does not currently misoptimize into a trap, and thinking about it a bit more I'm not sure if any optimization pass should ever do so?
Do these optimizations happen other than in OptimizeInstructions.cpp? Are there any patterns you can think of that would get optimized into an unconditional trap that I should check?

Hmm you're right, currently Precompute will handle constants and branches, but not traps. But it could handle traps some day, although I guess that's low priority (and the type of trap would be lost, leaving a generic (unreachable)).

An issue currently affecting us would be that the side effects might be different (e.g. an FFI into JS should have more side effects than possibly trapping).

Anyhow, we don't need a specific new test for that I guess, seeing the current test outputs are identical should be enough.

jgravelle-google · 2017-09-12T21:51:51Z

I think this pass should be ready to merge at this point? Is there anything I'm missing?

dschuff · 2017-09-12T21:58:10Z

src/passes/TrapMode.cpp

+    func->body = builder.makeUnary(truncOp,
+      builder.makeGetLocal(0, type)
+    );
+    // too small XXX this is different than asm.js, which does frem. here we


can probably just remove the "XXX" and maybe say "float is too small"

Implements #1172: this adds a variant of precompute, "precompute-propagate", which also does constant propagation. Precompute by itself just runs the interpreter on each expression and sees if it is in fact a constant; precompute-propagate also looks at the graph of connections between get and set locals, and propagates those constant values. This helps with cases as noticed in #1168 - while in most cases LLVM will do this already, it's important when inlining, e.g. inlining of the clamping math functions. This new pass is run when inlining, and otherwise only in -O3/-Oz, as it does increase compilation time noticeably if run on everything (and for almost no benefit if LLVM has run). Most of the code here is just refactoring out from the ssa pass the get/set graph computation, so it can now be used by both the ssa pass and precompute-propagate.

kripken · 2017-09-12T22:12:53Z

src/asm2wasm.h

@@ -1440,7 +1162,7 @@ void Asm2WasmBuilder::processAsm(Ref ast) {
        if (importResult == f64) {
          // we use a JS f64 value which is the most general, and convert to it
          switch (old) {
-            case i32: replaceCurrent(parent->makeTrappingFloatToInt32(true /* signed, asm.js ffi */, curr)); break;
+            case i32: replaceCurrent(parent->builder.makeUnary(TruncSFloat64ToInt32, curr)); break;


didn't we say we'd avoid using the pass in asm2wasm, and have it call directly to utility methods to do this? to avoid all the issues with having a pass be self-contained, and run at the right time, and all that?

maybe I misuderstood what we agreed on.

Maybe I did. I thought we only cared about doing that in order to avoid potentially optimizing trapping opts into unconditional wasm traps.
As far as correctness goes, if that's not a concern then this is still fine.
This does change the visibility of the clamping to the optimizer, which I was assuming wasn't as important because of the inherent slowness of clamp/js modes as it was.

If we do care, I still think we should use a pass, with the following approach:

change the pass to be function-parallel

run it as early as possible (right after AutoDrop)

generate all the generated functions all the time, because we don't know which ones we need

let DCE remove the ones we don't need

I think we should care, yeah: if the optimizer works on incorrect code (before clamping was applied) then it doesn't see the right side effects, might precompute incorrectly, etc.

The plan 1-4 sounds ok. Hopefully with that the test outputs will be unchanged. (The only change I'd expect to see is that the order of functions might differ, since we add the helper functions themselves at a different time, that's ok of course.)

kripken · 2017-09-12T22:14:05Z

src/asm2wasm.h

@@ -1550,6 +1272,7 @@ void Asm2WasmBuilder::processAsm(Ref ast) {
  // finalizeCalls also does autoDrop, which is crucial for the non-optimizing case,
  // so that the output of the first pass is valid
  passRunner.add<FinalizeCalls>(this);
+  addTrapModePass(passRunner, trapMode);


this will run that pass after the functions have already been (mostly) optimized.

kripken · 2017-09-13T00:03:26Z

test/debugInfo.fromasm.clamp

-     (get_local $2)
-    )
-    (i32.const 0)
+   (call $i32s-rem


hmm, this used to be inlined, and now it isn't. if this is still the case after the other changes, that's worth investigating.

jgravelle-google · 2017-09-28T22:22:19Z

Ok there we go.

So: what we discussed earlier turned out to be way trickier than I was expecting due to timing issues that I still don't understand. So I bailed on that and basically did what @kripken suggested in the first place, and just directly call the makeTrapping functions inline in asm2wasm, the way they were done before I touched it at all.
I did this by introducing a new helper class, GeneratedTrappingFunctions, which either collects the generated functions to add later, or it adds them immediately to the module. This lets us avoid any global/static variables, while letting us keep the desired functionality on either end, and reuse the code to boot.
The test diffs now are totally additive (s2wasm test case, plus the added trapping_sint_div_s case in unit.asm.js), or new functionality (handling f32-to-int in asm2wasm without having to convert to
f64s first, and [not yet implemented] double->uint32 conversion), which should be much more reasonable.
Only thing I'm unhappy with is the name: GeneratedTrappingFunctions. Any suggestions?

This matches s2wasm.py, and makes iterating on asm2wasm slightly faster.

jgravelle-google · 2017-09-29T00:33:36Z

In other fun news, rebasing this on master I run into a problem:

Running bin/asm2wasm test/unit.asm.js -O --mem-init=a.mem gives different results for the function trapping_sint_div_s when the environment variable BINARYEN_PASS_DEBUG is 1. With it set, the function gets const-folded into just (i32.const 0), otherwise it calls the generated function as usual.

We only added it to test that trapping divisions would get constant-folded at the correct time. Now that we're not changing the timing of trapping modes, the test is unneeded (and problematic).

kripken · 2017-09-29T22:22:54Z

That's... very strange. Pass debug mode should never change behavior. It just forces single-threaded mode + extra validation. Perhaps this is because of single-threaded mode, does BINARYEN_CORES=1 cause the same odd result? If so maybe there's a threading bug.

jgravelle-google · 2017-09-29T23:04:35Z

Yep, happens with BINARYEN_CORES=1 too.
I figure the problem is that we're adding functions while we're iterating over the module in asm2wasm, therefore the call target isn't visible when working in parallel, therefore it can't be const-folded. But when debugging or single-threading, that's less of a problem.

In any case I only added the trapping_sint_div_s test because I was changing the order of optimizations/inlining with respect to trapping functions. So I just removed the offending test because that's not necessarily something we care about now that the timing is the same and other tests haven't changed.

kripken · 2017-09-29T23:44:42Z

Ok, sounds like it won't block this PR then, but can you make me a standalone testcase of the issue perhaps? I don't fully understand but it sounds like this could be a serious existing bug, so I'd like to investigate.

jgravelle-google · 2017-10-02T16:15:43Z

Sure, the test I added in 33cbbc8 causes the problem. Just tested it on master and it reproduces the issue.

I think it's because asm2wasm is creating new functions while running optimizations on functions in the background.

kripken · 2017-10-02T20:34:52Z

Thanks, and yeah, you're right, it's related to the optimization and adding order. Turns out we add those trapping functions while translating the functions, and they don't get in the queue t o be optimized. Except if we optimize at the end all together anyhow - if no multithreading - in which case they do.

Mostly we don't need to optimize these methods anyhow, but your testcase actually shows an example of where we do (inlining limit is hit or not hit based on whether we optimized). So seems worth fixing, fix in #1208.

jgravelle-google · 2017-10-02T20:43:43Z

Hm, can we land this before #1208? That fix is going to look very different as a result of this PR.

Anything else outstanding here or should I merge?

kripken · 2017-10-02T20:50:53Z

Yeah, merge this first. We'll figure out those followups later.

dschuff reviewed Sep 6, 2017

View reviewed changes

kripken reviewed Sep 6, 2017

View reviewed changes

jgravelle-google force-pushed the shared_trap_mode branch from 6fc31e3 to a4f58e2 Compare September 8, 2017 17:49

kripken mentioned this pull request Sep 12, 2017

precompute-propagate pass #1179

Merged

jgravelle-google force-pushed the shared_trap_mode branch from 691cf5e to 8e0c0aa Compare September 12, 2017 21:21

dschuff approved these changes Sep 12, 2017

View reviewed changes

kripken reviewed Sep 12, 2017

View reviewed changes

kripken reviewed Sep 13, 2017

View reviewed changes

kripken mentioned this pull request Sep 26, 2017

Bad double -> uint32 conversion when using clamp trap mode emscripten-core/emscripten#5603

Closed

jgravelle-google added 17 commits September 28, 2017 15:42

Extract Asm2WasmBuilder::TrapMode to shared FloatTrapMode

9fea05f

Extract makeTrappingI32Binary

2877cfd

Extract makeTrappingI64Binary

9a51e46

Extract asm2wasm test script into scripts/test/asm2wasm.py

625b49b

This matches s2wasm.py, and makes iterating on asm2wasm slightly faster.

Simplify callsites with an arg struct

3860a7b

Combine func adding across i32 and i64

1717c5a

Support f32-to-int in asm2wasm

6a8fd32

Add BinaryenTrapMode pass, run pass from s2wasm

64fcbf6

BinaryenTrapMode pass takes trap context as a parameter

f092e0c

Pass fully supports non-trapping binary ops

54c3558

Defer adding functions until after iteration (hackily)

39a24ab

Update asm2wasm to work with deferred function adding, rebuild tests

a8a93a9

Extract makeTrappingFloatToInt32

1f75d79

Extract makeTrappingFloatToInt64

ccf9652

Add unary conversions to trap pass

926d6d8

Add functions in the pass itself

8f75232

Set s2wasm trap mode with command-line arguments

1806b49

jgravelle-google added 9 commits September 28, 2017 17:08

Add s2wasm trap mode tests. Force float->int conversion to be signed

317793a

Add trapping_sint_div_s test to unit.asm.js

33cbbc8

Fix flake8 issues with test scripts

85b8dc4

Update pass description comment

0d9301f

Extract building functions methods

3d8ce62

Make generate functions into top-level functions

a63c311

Add GeneratedTrappingFunctions class to manage function/import additions

3c9bab2

Move ensure/makeTrapping functions outside class scope

da95b72

Use GeneratedTrappingFunctions to add immediately in asm2wasm mode

c6c45f0

Remove trapping_sint_div_s test

34e4f92

We only added it to test that trapping divisions would get constant-folded at the correct time. Now that we're not changing the timing of trapping modes, the test is unneeded (and problematic).

jgravelle-google force-pushed the shared_trap_mode branch from bc46589 to 34e4f92 Compare September 29, 2017 17:25

Review feedback, add validator/*.wasm to .gitignore

652ac6b

jgravelle-google added 2 commits September 29, 2017 15:44

Add support for unsigned float-to-int conversion

c9fdeca

Use opcode directly instead of bools

51ed481

Update s2wasm clamp test for unsigned ftoi

3ecd486

kripken mentioned this pull request Oct 2, 2017

optimize asm2wasm helper functions like i32-div #1208

Merged

kripken approved these changes Oct 2, 2017

View reviewed changes

jgravelle-google merged commit a9f91b9 into master Oct 2, 2017

jgravelle-google deleted the shared_trap_mode branch October 2, 2017 20:51

jgravelle-google mentioned this pull request Oct 5, 2017

Support BINARYEN_TRAP_MODEs for upstream wasm backend #1143

Closed

jgravelle-google mentioned this pull request Oct 16, 2017

Update auto_update_tests.py for trap-mode flags #1225

Merged

aheejin mentioned this pull request Nov 29, 2017

Disable binaryen trap test on Wasm backend emscripten-core/emscripten#5877

Merged

		)


		def test_asm2wasm():

Share trap mode between asm2wasm and s2wasm #1168

Share trap mode between asm2wasm and s2wasm #1168

Conversation

jgravelle-google commented Sep 6, 2017

dschuff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kripken Sep 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kripken Sep 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgravelle-google commented Sep 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgravelle-google commented Sep 28, 2017

jgravelle-google commented Sep 29, 2017

kripken commented Sep 29, 2017

jgravelle-google commented Sep 29, 2017

kripken commented Sep 29, 2017

jgravelle-google commented Oct 2, 2017

kripken commented Oct 2, 2017

jgravelle-google commented Oct 2, 2017

kripken commented Oct 2, 2017

kripken Sep 7, 2017 •

edited

Loading

kripken Sep 8, 2017 •

edited

Loading