-
Notifications
You must be signed in to change notification settings - Fork 786
Non-nullable locals, option 1a #4824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I would prefer not to validate - it gives optimizations more flexibility and less to think about and does not have a significant performance cost. For the fixup pass, I would prefer to run it as a normal pass that is inserted into the default pass pipeline so we can run OptimizeInstructions as a follow-up. We should also run it as an unconditional preliminary stage of module writing to cover situations where the default pass pipeline is not used. We currently assume that every pass that modifies Binaryen IR must necessarily invalidate stack IR, but we can add a I think we should have our text format match our binary output to the extent possible, but I guess we already print |
Running on both normal IR and stack IR is a good idea. In stack IR though it would need to insert new instructions (casts), which is awkward there, but it's doable. And I guess we could start by just invalidating stack IR, which might be ok as a small temporary regression. The other downsides of that approach worry me more, though: not actually validating 1a on our output (only VMs would catch any bugs of ours) and intermediate states do not validate on 1a (annoying when debugging intermediate passes - can't run those on a VM. Mostly a burden for us and not our users, but still annoying). |
In debug mode perhaps we can add some extra validation as part of module writing.
When the intermediate results are dumped during debugging, I guess they would have the same fixups applied? |
If we do that then it would affect reproducibility. That is, if we run passes A,B,C, and we save right after B from the middle to a wasm file, running C on that wasm would not necessarily be identical to the original run of all passes in sequence. This can be quite annoying in practice. |
If we don't apply fixups to text files then that would be an escape hatch for that situation, which is already useful today for stacky code. |
An interesting issue the fuzzer noticed here: the SSA pass will create new locals and also rename existing ones. Even if it fixes up non-nullability of new locals, the renaming can break 1a validation, because as an optimization it doesn't bother renaming in unreachable code. The renaming is not observable there to execution, but it is to 1a validation. Example: (local.set $x (..))
(unreachable)
(local.get $x)
==> ;; ssa
(local.set $x.renamed (..)) ;; rename in reachable code
(unreachable)
(local.get $x) ;; do not rename in unreachable code That This isn't hard to work around (the normal workaround can be done), but it does show an interesting corner case. |
Another interesting situation: (unreachable)
(local.get $nn) ;; a get of a non-nullable local, in unreachable code
=> ;; some pass that decides to reorder those two
(local.get $nn)
(unreachable) Reordering a The problem is that such a The fix for this particular example is probably to trap in the interpreter, which means basically implementing option 6 from the spec discussion. As reordering traps is valid in Binaryen IR, that would make this valid once more. Reordering with other things might still be a problem, but only in code that began broken (it marks a local as non-nullable but it can read a null). |
Trapping isn't a full solution either, I realize. E.g. RemoveUnusedBrs wants to do this: (if
(condition)
(call $noticeable)
(local.get $nn)
)
=>
(select
(call $noticeable)
(local.get $nn)
(condition)
) That would execute the get unconditionally, which would then start to trap. If the condition was guarding against exactly that then we are in trouble. I'm not sure we have a good way out of this. We could just make EffectAnalyzer mark all The most reasonable path might be to perform 1a after each pass ("1a validation in Binaryen IR" in the first post). That's cheaper than doing it on each EffectAnalyzer. As mentioned above it adds 7% overall, but maybe that's the cost we need to pay here? (A "heavier" IR, more like those in optimizing VMs, would just constantly track the 1a-edness of each |
Another downside I noticed while investigating the options here using fuzzing: Fixing things up during binary writing makes fuzzing very slow. If there is a testcase that hits a bug, and fixing up the non-nullable locals would avoid the bug, then each iteration in the fuzzer will avoid the bug when it is written out. We'd maybe want to add an option to avoid fixups during binary writing, which adds more complexity. |
I'm not sure trapping makes sense. Working backwards: the semantics of the program after fixups is that the local.get succeeds and the fixups should semantically be no-ops (they just map the internal language with more relaxed validation to the real language), so the semantics before fixups should also be that the local.get succeeds.
If we don't trap, then we don't run into that problem and the normal fixups can make the code valid as necessary. |
Well, we could just hack the assertion that checks that an instruction emits a type that is a subtype of it's declared type (just for non-nullable (unreachable)
(struct.get $A 0
(local.get $nn)
)
=> ;; some pass that decides to reorder those two
(struct.get $A 0
(local.get $nn)
)
(unreachable) Now the (But this seems smaller than the problem in my post after it.) |
The interpreter should still have |
But I don't think there should be any special handling in the effects analyzer since we already prevent gets from moving back before their corresponding sets. |
After the fixups, yes. But before the fixups we run into the problem I mentioned. So this is an issue for only doing the fixups late in the pipeline + in binary writing (and it is not a problem for doing fixups after each pass). |
I guess my high-level point is that the interpreter should provide the semantics before and after the fixups. That's independent of when and how often fixups are applied. |
I'm not sure what you mean by "provide the semantics"? Are you saying the semantics should be different before and after the fixups? After the fixups I see no problem. But before them, I don't see a good option either way:
|
Sorry, I meant to write "provide the same semantics" but missed the key word 😅 Concretely, when executing a local.get of a non-nullable local, the interpreter should:
We don't need to make EffectAnalyzer aware of the traps, since no optimization would move a get before the first set in reachable code anyway. So there should be no effect on optimization quality. |
I don't think this can work as it hits the problems I mentioned before. Over the weekend I realized I may not have been explaining it well, so let me try again. var x = non null Foo; // declare non-nullable local
x = (return, getFoo()); // x is set a sequence of a return + get a non-null value
x // get x
=> // 1. simplify the set: "getFoo()" and "x =" do not execute, so just return
var x = non null Foo;
return;
x // this get no longer has a set
=> // 2. reorder the return with the get. the get has no effects, so it's ok
var x = non null Foo;
x // non-nullable local getting a null
return;
=> // 3. fix up non-nullable locals
var x = null Foo; // now nullable
if (!x) trap; // this traps
return; This program now traps, but it shouldn't. (This isn't hypothetical - credit for finding this goes to the fuzzer.) This happens because we didn't tell EffectAnalyzer about an effect. One solution is for
The other option is to fix up 1a after each pass. That avoids the above problem because between steps 1 and 2 we'd make the local nullable and mark the get as possibly trapping, after which it is marked properly and won't be moved into reachable code. In theory a pass that does all of 1,2,3 at once could hit such a problem, so future passes would need to look out for that in principle, but writing modular passes as we do tends to avoid that complexity (writing large composite passes always has the requirement of the pass not emitting something invalid at the end). |
I would say we should just go with the fixup after every pass option for now, but a 7% optimization time hit seems pretty bad. It might be worth investigating the no-reordering-with-unreachable approach. Maybe we should land the expensive, simple fix now but keep working on improving it? |
I've meanwhile done some optimizations on the fixup-after-each-pass approach, and it's less than half of the original 7%. I think I can get it even lower. |
I believe I've gotten this down to 2-4% overhead (2% on Dart, 4% on J2Wasm). However, that is still somewhat annoying... Before I open a PR I think I'll tinker with the parallel approach of doing more DCE. I suspect that very few passes may actually end up creating DCE opportunities, so manually annotating them may lead to good results. |
An overview of this is in the README in the diff here (conveniently, it is near the top of the diff). Basically, we fix up nn locals after each pass, by default. This keeps things easy to reason about - what validates is what is valid wasm - but there are some minor nuances as mentioned there, in particular, we ignore nameless blocks (which are commonly added by various passes; ignoring them means we can keep more locals non-nullable). The key addition here is LocalStructuralDominance which checks which local indexes have the "structural dominance" property of 1a, that is, that each get has a set in its block or an outer block that precedes it. I optimized that function quite a lot to reduce the overhead of running that logic after each pass. The overhead is something like 2% on J2Wasm and 0% on Dart (0%, because in this mode we shrink code size, so there is less work actually, and it balances out). Since we run fixups after each pass, this PR removes logic to manually call the fixup code from various places we used to call it (like eh-utils and various passes). Various passes are now marked as requiresNonNullableLocalFixups => false. That lets us skip running the fixups after them, which we normally do automatically. This helps avoid overhead. Most passes still need the fixups, though - any pass that adds a local, or a named block, or moves code around, likely does. This removes a hack in SimplifyLocals that is no longer needed. Before we worked to avoid moving a set into a try, as it might not validate. Now, we just do it and let fixups happen automatically if they need to: in the common code they probably don't, so the extra complexity seems not worth it. Also removes a hack from StackIR. That hack tried to avoid roundtrip adding a nondefaultable local. But we have the logic to fix that up now, and opts will likely keep it non-nullable as well. Various tests end up updated here because now a local can be non-nullable - previous fixups are no longer needed. Note that this doesn't remove the gc-nn-locals feature. That has been useful for testing, and may still be useful in the future - it basically just allows nn locals in all positions (that can't read the null default value at the entry). We can consider removing it separately. Fixes #4824
Now that "1a" is the final implementation of non-nullable locals in the spec, we should implement it. That approach does have the downside of extra work for tools like Binaryen, though, so it's not trivial to do. The problem is that a set allows gets only until the end of the current block, which means many natural code transforms "break" 1a, such as:
However we handle this, we need to add fixup logic for 1a, as well as specific workarounds in relevant passes (we don't want to add a local that will end up fixed up later anyhow, if that is worse, which some passes know). Aside from that, some possible approaches:
No 1a validation in Binaryen IR
In this approach binaryen IR allows more things than normal wasm, in particular, non-nullable locals can be used arbitrarily and not just in the 1a form. To emit valid wasm binaries:
ref.as_non_null
on its gets (to keep the types of gets identical).ref.as_non_null
may be removed, but this is too late to run the pass that does so.ref.as_non_null
after it.A downside is that we never validate that we are emitting a valid wasm (again, since we don't validate 1a). Also, intermediate states during the pass pipeline - before we get to the fixups - may not validate as 1a, which can be annoying during debugging.
We'd also need to decide what to do in the text format. Perhaps we wouldn't do fixups there, which would keep it 1:1 with Binaryen IR for testing.
1a validation in Binaryen IR
Here our validator adds 1a checks. Each pass must therefore emit valid 1a. Some options for that:
The extra work is something like 7% slower compilation overall. It's not a huge cost since it's basically 1a validation, which is simple and fast, and the data is in cache already since we just worked on it. Still, it's a noticeable slowdown. (Just on GC builds, though: when GC is off we can avoid any extra work.)
A downside here is that forcing 1a limits what we can do in intermediate steps. In theory allowing more there might help out even if we end up undoing it later (the other effects might remain). On J2Wasm and Dart at least this seems to be negligible, though it does happen.
Thoughts?
The text was updated successfully, but these errors were encountered: