-
Notifications
You must be signed in to change notification settings - Fork 786
Share trap mode between asm2wasm and s2wasm #1168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked at asm2wasm.h yet.
Can we add at least one minimal s2wasm test too?
) | ||
|
||
|
||
def test_asm2wasm(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any changes in this content, or just moved out from shared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just moved from shared
src/passes/FloatTrap.h
Outdated
*/ | ||
|
||
// | ||
// Pass that supports potentially-trapping float operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just a little more detail about what it actually does?
src/passes/FloatTrap.cpp
Outdated
|
||
Expression* FloatTrap::makeTrappingUnary(Unary* curr) { | ||
Name name = getUnaryFuncName(curr); | ||
if (mode == Mode::Allow || !name.is()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would make sense the call to getUnaryFunctionName
conditional on the mode being something other than allow? Then we wouldn't run the switch statement unless we knew we actually had to do something.
Come to think of it, why do we even have the allow mode? Doesn't that make the whole pass a no-op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow does make the whole pass a no-op. It seems cleaner than conditionally adding the pass if the flag is set.
Could probably do something more clever by not walking the module if Allow, by overriding one of the Walker methods. Would still introduce a sequence point by being another non-parallel pass.
src/passes/FloatTrap.cpp
Outdated
// is precisely like JS but in order to do that we do a slow ffi | ||
// If i64, there is no "JS" way to handle this, as no i64s in JS, so always clamp if we don't allow traps | ||
if (curr->type != i64 && mode == Mode::JS) { | ||
// WebAssembly traps on float-to-int overflows, but asm.js wouldn't, so we must emulate that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We aren't really "emulating" it at all, we are literally actually calling JS to do the conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost all of the comments were just grabbed as-is from asm2wasm
... which makes references to asm.js all the more misleading.
And "handle" is probably the better word here?
src/passes/FloatTrap.cpp
Outdated
// If i64, there is no "JS" way to handle this, as no i64s in JS, so always clamp if we don't allow traps | ||
if (curr->type != i64 && mode == Mode::JS) { | ||
// WebAssembly traps on float-to-int overflows, but asm.js wouldn't, so we must emulate that | ||
CallImport *ret = allocator.alloc<CallImport>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I want you to necessarily rewrite this whole patch, but would using a Builder make code significantly better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it does a853c0b , thanks
src/passes/FloatTrap.cpp
Outdated
func->body = builder.makeUnary(truncOp, | ||
builder.makeGetLocal(0, type) | ||
); | ||
// too small XXX this is different than asm.js, which does frem. here we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the 'XXX' meant for you to follow up on before commit?
@@ -81,6 +83,24 @@ int main(int argc, const char *argv[]) { | |||
[&allowMemoryGrowth](Options *, const std::string &) { | |||
allowMemoryGrowth = true; | |||
}) | |||
.add("--emit-potential-traps", "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that this matches asm2wasm, but I feel like we should have made this flag something like --trap-mode=<enum>
. Like, if Allow is the default, this flag is really a no-op. Maybe a followup fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we don't have this hooked up in Emscripten yet, so now seems like a great time to make an asymmetric change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should keep it matching asm2wasm, though? perhaps we should change both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we can change both. But I agree that since we don't have the s2wasm one hooked up yet, that we might as well put what we finally want in s2wasm, then maybe fix asm2wasm in a followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we'll need to make an Emscripten patch to change the plumbing for asm2wasm anyway, we can bundle that with the patch to add plumbing for s2wasm, and we can keep them the same in Binaryen in the meantime.
So:
Keep this as is for now
Follow up with a patch to Binaryen to change both sets of args to --trap-mode=<blah>
At the same time submit a patch to Emscripten to add s2wasm's invocation and change asm2wasm's.
src/passes/FloatTrap.cpp
Outdated
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a comment explaining what the pass does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep the .h and .cpp, should we duplicate the comment across both, leave it in the header, or move it to the implementation?
src/passes/FloatTrap.h
Outdated
@@ -0,0 +1,70 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having a pass with a .h
file feels wrong to me. a pass should be self-contained. how about a header file with the just the constants etc., probably under ast/
?
for adding the pass to be run, can use the string name, instead of the class identifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pass needs an argument though, which means we need to have the Mode enum and constructors visible to callers.
It's weird and I'm not sure where to put it. It isn't registered by name with the pass system, so it doesn't really belong in passes/
, but it is a pass over the module. Originally I had it all in a header file in src/
, but that seemed wrong.
ast/
seems fine to me, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making separate passes for the modes: one pass would do the clamp mode, and one would do the JS mode? We can define them all in a single pass cpp file (see e.g. simplify-locals pass which defines 4). Internally it would be a flag inside the pass class.
Then the users would add the right pass of the 2 (or not add any pass if we allow traps). Might add a little utility function for that in the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's way cleaner, and has the benefit of letting us run this in wasm-opt
as well, which means I can write some finer-grained tests for it.
src/passes/FloatTrap.h
Outdated
// Pass that supports potentially-trapping float operations. | ||
// | ||
|
||
#ifndef wasm_passes_FloatTrap_h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why float trap as a name - doesn't this handle int div/rem of 0 also, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That it does. How about TrapModePass
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm terrible at names, but maybe src/ast/trapping.h
, enum is TrapMode
?
test/debugInfo.fromasm
Outdated
@@ -47,17 +46,11 @@ | |||
(get_local $0) | |||
) | |||
) | |||
;;@ even-opted.cpp:3:0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are test results changing? should this be just a refactoring without functional changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My PR description was pretty sparse on detail, and I forgot to write down at least one case:
- Because we change the timing of when we convert things to calls, optimizations happen in different orders. This lets constant folding be more effective, as seen in
big_uint_div_u
(in test/unit.fromasm), because we can constant fold thei32.div_u
instruction directly, and then it doesn't generate a clamping call at all.
In this case, we add clamping after the inliner has run, so i32s-rem
gets generated and called, instead of generated, used once, inlined, and removed.
Essentially: semantically these should all be the same, but the timing isn't identical and that perturbs the optimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, interesting; it's actually not obvious whether it's better to clamp as early as possible (since it does actually seem better to just inline and remove i32s-rem
if there's only one use), or later (since const-folding could eliminate calls before they get generated). Anyway we can land this one way and maybe experiment with that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah, looks like in big_uint_div_u
we manage to optimize more. But I'm not sure this is what we want. I believe what's going on is that we have an unsigned -1 / 2
operation, which precompute
happily precomputes, leaving the result. But if this operation required clamping or js-ing, then that precomputing would have been wrong, wouldn't it? For example if it traps, it would be precomputed into a trap.
Instead, could we run this pass first, before everything else? Probably will need to do that in two places: when optimizing, right before passRunner.add<AutoDrop>()
, and when not optimizing, right before passRunner.add<FinalizeCalls>(this)
.
Side note, the test result from before this PR could also be optimized with a full constant propagation pass, which we don't have yet. We should probably add that eventually, marked #1172.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently run it as early as possible, which is necessarily after AutoDrop
because the IR isn't valid before AutoDrop
runs - asm2wasm seems to rely on AutoDrop cleaning it up, and we validate after each pass. We could run this separately in a separate PassRunner, or run AutoDrop and only AutoDrop separately in a separate PassRunner.
The other complication w/r/t timing in asm2wasm is the OptimizingIncrementalModuleBuilder
that we use when running optimization passes that runs passes on each function - meaning the passes added to it need to be function-parallel, which we fundamentally can't make the clamping pass be because it adds functions to the module and thus introduces a sequence point for pass scheduling. Which we can probably work around with a separate pass runner. Maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other downside to adding the functions before any optimizations, is that then in the pass we have to explicitly exclude any generated functions when running on functions, because otherwise we'll replace the i32.div_u
instruction in the $i32u-div
function with a call to $i32u-div
.
The pass also gains a precondition, "must have called helper function addTrappingFunctions
before running this pass".
These both make the pass less pass-like, but probably does a better job of maintaining asm2wasm's behavior exactly as it is today. I was operating under the assumption that that's a non-goal as long as the semantics are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree those two are downsides. But isn't the issue that we must do it that way to maintain the right semantics? Otherwise if the optimizer kicks in before we modify these math operations, it could optimize them incorrectly, as it's not seeing the correct math operations.
Otherwise we'd need to not run the optimizer until we run that pass, which would be substantially slower (we'd need to finish translating all functions before we start to optimize).
Another option might be to have some utility code that adds the functions, and also has code to do an individual fixup. And separately we'd have a pass that adds the functions and does a traversal to do all the fixups, using that. So the pass could be 100% self-contained. And asm2wasm would not use that pass, it would call the utility code directly to do the fixups during translation (as it does now, but in a call to shared code), and also call the utility method to add the functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds very plausible. Let me try to add a test that exposes that behavior.
I originally implemented this as a pass for s2wasm and asm2wasm just calling utility functions directly, basically changing nothing about its flow (see 8ffe1c9). I was able to use the pass for asm2wasm as well, which let me move the utility functions into the pass itself, which let me store the added functions on an instance variable on the pass object, instead of being a static variable, and generally make the interface much nicer.
Clean code does not trump working code, so I'll fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an INT_MIN / -1
test in 691cf5e. It does not currently misoptimize into a trap, and thinking about it a bit more I'm not sure if any optimization pass should ever do so?
Do these optimizations happen other than in OptimizeInstructions.cpp
? Are there any patterns you can think of that would get optimized into an unconditional trap that I should check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm you're right, currently Precompute will handle constants and branches, but not traps. But it could handle traps some day, although I guess that's low priority (and the type of trap would be lost, leaving a generic (unreachable)
).
An issue currently affecting us would be that the side effects might be different (e.g. an FFI into JS should have more side effects than possibly trapping).
Anyhow, we don't need a specific new test for that I guess, seeing the current test outputs are identical should be enough.
6fc31e3
to
a4f58e2
Compare
691cf5e
to
8e0c0aa
Compare
I think this pass should be ready to merge at this point? Is there anything I'm missing? |
src/passes/TrapMode.cpp
Outdated
func->body = builder.makeUnary(truncOp, | ||
builder.makeGetLocal(0, type) | ||
); | ||
// too small XXX this is different than asm.js, which does frem. here we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can probably just remove the "XXX" and maybe say "float is too small"
Implements #1172: this adds a variant of precompute, "precompute-propagate", which also does constant propagation. Precompute by itself just runs the interpreter on each expression and sees if it is in fact a constant; precompute-propagate also looks at the graph of connections between get and set locals, and propagates those constant values. This helps with cases as noticed in #1168 - while in most cases LLVM will do this already, it's important when inlining, e.g. inlining of the clamping math functions. This new pass is run when inlining, and otherwise only in -O3/-Oz, as it does increase compilation time noticeably if run on everything (and for almost no benefit if LLVM has run). Most of the code here is just refactoring out from the ssa pass the get/set graph computation, so it can now be used by both the ssa pass and precompute-propagate.
src/asm2wasm.h
Outdated
@@ -1440,7 +1162,7 @@ void Asm2WasmBuilder::processAsm(Ref ast) { | |||
if (importResult == f64) { | |||
// we use a JS f64 value which is the most general, and convert to it | |||
switch (old) { | |||
case i32: replaceCurrent(parent->makeTrappingFloatToInt32(true /* signed, asm.js ffi */, curr)); break; | |||
case i32: replaceCurrent(parent->builder.makeUnary(TruncSFloat64ToInt32, curr)); break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't we say we'd avoid using the pass in asm2wasm, and have it call directly to utility methods to do this? to avoid all the issues with having a pass be self-contained, and run at the right time, and all that?
maybe I misuderstood what we agreed on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I did. I thought we only cared about doing that in order to avoid potentially optimizing trapping opts into unconditional wasm traps.
As far as correctness goes, if that's not a concern then this is still fine.
This does change the visibility of the clamping to the optimizer, which I was assuming wasn't as important because of the inherent slowness of clamp/js modes as it was.
If we do care, I still think we should use a pass, with the following approach:
- change the pass to be function-parallel
- run it as early as possible (right after AutoDrop)
- generate all the generated functions all the time, because we don't know which ones we need
- let DCE remove the ones we don't need
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should care, yeah: if the optimizer works on incorrect code (before clamping was applied) then it doesn't see the right side effects, might precompute incorrectly, etc.
The plan 1-4 sounds ok. Hopefully with that the test outputs will be unchanged. (The only change I'd expect to see is that the order of functions might differ, since we add the helper functions themselves at a different time, that's ok of course.)
src/asm2wasm.h
Outdated
@@ -1550,6 +1272,7 @@ void Asm2WasmBuilder::processAsm(Ref ast) { | |||
// finalizeCalls also does autoDrop, which is crucial for the non-optimizing case, | |||
// so that the output of the first pass is valid | |||
passRunner.add<FinalizeCalls>(this); | |||
addTrapModePass(passRunner, trapMode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will run that pass after the functions have already been (mostly) optimized.
test/debugInfo.fromasm.clamp
Outdated
(get_local $2) | ||
) | ||
(i32.const 0) | ||
(call $i32s-rem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this used to be inlined, and now it isn't. if this is still the case after the other changes, that's worth investigating.
Ok there we go. So: what we discussed earlier turned out to be way trickier than I was expecting due to timing issues that I still don't understand. So I bailed on that and basically did what @kripken suggested in the first place, and just directly call the |
This matches s2wasm.py, and makes iterating on asm2wasm slightly faster.
In other fun news, rebasing this on master I run into a problem: Running |
We only added it to test that trapping divisions would get constant-folded at the correct time. Now that we're not changing the timing of trapping modes, the test is unneeded (and problematic).
bc46589
to
34e4f92
Compare
That's... very strange. Pass debug mode should never change behavior. It just forces single-threaded mode + extra validation. Perhaps this is because of single-threaded mode, does |
Yep, happens with In any case I only added the |
Ok, sounds like it won't block this PR then, but can you make me a standalone testcase of the issue perhaps? I don't fully understand but it sounds like this could be a serious existing bug, so I'd like to investigate. |
Sure, the test I added in 33cbbc8 causes the problem. Just tested it on master and it reproduces the issue. I think it's because asm2wasm is creating new functions while running optimizations on functions in the background. |
Thanks, and yeah, you're right, it's related to the optimization and adding order. Turns out we add those trapping functions while translating the functions, and they don't get in the queue t o be optimized. Except if we optimize at the end all together anyhow - if no multithreading - in which case they do. Mostly we don't need to optimize these methods anyhow, but your testcase actually shows an example of where we do (inlining limit is hit or not hit based on whether we optimized). So seems worth fixing, fix in #1208. |
Hm, can we land this before #1208? That fix is going to look very different as a result of this PR. Anything else outstanding here or should I merge? |
Yeah, merge this first. We'll figure out those followups later. |
Reimplement handling trapping float ops as a binaryen pass.
Test changes should be just:
f32-to-i32
)