Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

br_on_null vs br_on_non_null? #45

Closed
kripken opened this issue Jan 29, 2021 · 16 comments
Closed

br_on_null vs br_on_non_null? #45

kripken opened this issue Jan 29, 2021 · 16 comments

Comments

@kripken
Copy link
Member

kripken commented Jan 29, 2021

When starting to work on this in binaryen, there was some confusion. I had some questions on the design that I don't see discussed, so opening this issue.

The spec says:

br_on_null $l : [t* (ref null ht)] -> [t* (ref ht)]
  iff $l : [t*]
branches to $l on null, otherwise returns operand as non-null

That is, if the input is null we branch, sending values to the target as necessary. If it is not null we return it as not null, and leave the other values on the stack.

I can see how this makes sense, but I wonder if we've considered the opposite, br_on_non_null? That could be

br_on_non_null $l : [(ref null ht)] -> [(ref null ht)]
  iff $l : [(ref ht)]
branches to $l on non-null, otherwise returns operand

Note how this would match br_on_func, br_on_data, br_on_i31 in the GC proposal: they all get an input, check if it can be converted to something more specific, and if so send it on the branch, and otherwise leave it on the stack.

I think matching the GC proposal makes sense to do for consistency and uniformity - making all br_on_* behave as similarly as possible.

Alternatively, we can achieve consistency by flipping the GC proposal ones, so we'd have br_on_non_func etc. That also makes sense in a way, as it would be like ref.as_func: return the refined value. So ref.as_* traps if it can't be refined, and br_on_non_* would branch in that case. (Though personally I think the smaller change, in the previous paragraph, is nicer.)

Are there code size or other concerns here that motivated the current design?

@RossTate
Copy link

Heh, I was actually going to recommend the opposite change for br_on_func and such. The reason is that I expect many functions will have the same code to execute for the "refinement-fails" case across multiple uses of these instructions (such as "throw NullPointerException" or "throw ClassCastException"), whereas what should happen in the "refinement-succeeds" cases is more likely to differ per instruction. Thus if the "refinement-fails" case is the one that's specified by a label, multiple br_on_func instructions can use one label to handle the failure case.

Regardless, I agree that we should be consistent across these instructions.

@lars-t-hansen
Copy link
Contributor

There's not much of a cost to having both variants for all of them, I expect.

@kripken
Copy link
Member Author

kripken commented Feb 1, 2021

@lars-t-hansen Maybe not a big cost, I guess, yes. But it is more work, and it feels a little inconsistent with existing things, like we have br_if but not br_if_not, etc.

Personally I prefer minimalism myself...

kripken added a commit to WebAssembly/binaryen that referenced this issue Feb 1, 2021
This is only partial support, as br_on_null also has an extra optional
value in the spec. Implementing that is cumbersome in binaryen, and
there is ongoing spec discussions about it (see
WebAssembly/function-references#45 ), so
for now we only support the simple case without the default value.

Also fix prefixed opcodes to be LEBs in RefAs, which was noticed here
as the change here made it noticeable whether the values were int8 or
LEBs.
@lars-t-hansen
Copy link
Contributor

@kripken, not sure i agree that there's an equivalence there. 'not' is a general operator we already have, and br_if just branches on a value while consuming the value. The new operators have different semantics and do not consume, but transform (minimally a type change) the values.

If we're only going to have one variant I think we need a lot more use cases from actual compilers. Expressivity in the generated code seems important, and since br has only one arm it's not obvious what the best semantics are. Based on some experience of my own I suspect @RossTate is right, in many cases we want branch-if-value-does-not-meet-criterion to handle failure in a common way. But as you're arguing the opposite case, it seems clear that a single instruction may not meet all needs adequately.

@kripken
Copy link
Member Author

kripken commented Feb 2, 2021

@lars-t-hansen Fair points, I agree.

I don't have strong arguments for the side I prefer, just an intuition, so if others feel otherwise that's fine. Also sounds good to wait on data.

@rossberg
Copy link
Member

rossberg commented Feb 3, 2021

The motivation for having them the way they are now is that this allows expressing simple type dispatch compactly:

(get ref)
(br_on_null $handle-null-case) 
(br_on_i31 $handle-i31-case) 
(br_on_func $handle-func-case) 
...

In that sense, br_on_null is the choice consistent with the other br_on_ instructions in the GC proposal.

@kripken, how would you use the inverse form?

@RossTate
Copy link

RossTate commented Feb 3, 2021

@rossberg In WebAssembly/gc#118, when discussing the performance issues of switching on RTTs, you said "RTTs are intended for checking downcasts against known target representations, not for switching on representations". Given that, the type-dispatch example you gave above does not seem to be the primary use case that the instructions should be optimized for (in terms of space).

@kripken
Copy link
Member Author

kripken commented Feb 3, 2021

@rossberg I see what you mean now, thanks. Yes, for that pattern it seems nice. I don't have an intuition for how common it is (is it common in Java/C#/etc. to use "any" where a ref can be almost anything?).

So I'm not opposed to the current approach. However, perhaps we can remove some of the inconsistency at least. Right now the specs say

br_on_null $l : [t* (ref null ht)] -> [t* (ref ht)]
iff $l : [t*]

br_on_func $l : [t] -> [t]
  iff $l : [t']

For instructions with almost the same name, that feels like a big difference to me. I think it would be more consistent to have this:

br_on_null $l : [(ref null ht)] -> [(ref ht)]
iff $l : []

That is, br_on_null would be like the other br_on_* instructions in that it sends the thing it looks for, and nothing else, on the branch. It happens that the thing it looks for is null, so there is nothing to send.

There is still some inconsistency in that the other br_on_* return the same value as the input, while br_on_null refines it. That does annoy me a little aesthetically, but if it's just me I can live with it...

Otherwise, my general idea is that br_on_non_null would be useful in places that have a null check, so code like

if (instance->optionalThing) {
  doSomething(instance->optionalThing);
}

But, br_on_null could be used for them as well, I think, with similar efficiency.

@rossberg
Copy link
Member

rossberg commented Feb 4, 2021

@kripken, ah yes, you are right, there is an inconsistency between this instructions wrt to allowing additional arguments. That's an oversight. We could add the args on the others or remove them on br_on_null. I created a PR for the former: WebAssembly/gc#192, PTAL.

As for the type refinement in the result, that's of course important to track the knowledge that these can no longer be null, and you don't need to repeat the check. The other instructions could have similar refinements if the type system was able to express them. That goes back to the question of having more general union types, see e.g. the discussion in WebAssembly/gc#130. It's possible that we might be able to strengthen the instruction types in that regard eventually, but I wouldn't hold my breath.

@kripken
Copy link
Member Author

kripken commented Feb 4, 2021

@rossberg

I see, makes sense.

Another possible option is to rename br_on_null as ref.as_non_null_or_br - since it is basically ref.as_non_null (returns the refined type) except instead of trapping it branches on null. But that is a clumsy name... anyhow, these are just minor aesthetic issues.

@rossberg
Copy link
Member

rossberg commented Feb 9, 2021

@kripken, with the discussion above and WebAssembly/gc#192 landed, do you think this issue has been addressed?

@kripken
Copy link
Member Author

kripken commented Feb 9, 2021

@rossberg I think it's certainly improved, and there is nothing urgent from my perspective. But I hope we can improve it further before the final spec, if we have time, based on real-world code usage patterns. There are also comments from @RossTate and @lars-t-hansen above. So perhaps we can leave this open for now? (But not opposed to closing, and opening another issue later if relevant.)

@askeksa-google
Copy link

In dart2wasm, I have found that a very common pattern is "use a value if it is non-null, otherwise compute it". This arises both explicitly from the ?? operator: valueThatMayBeNull() ?? evaluateIfValueWasNull(), and implicitly from lazily initialized globals and constants.

This pattern is expressed concisely using br_on_non_null:

(block (param (ref null $Foo)) (result (ref $Foo))
 (br_on_non_null 0)
 ... compute value ...
end)

whereas with br_on_null it gets more involved:

(block (param (ref null $Foo)) (result (ref $Foo))
 (block (param (ref null $Foo))
  (br_on_null 0)
  (br 1)
 end)
 ... compute value ...
end)

So this is a real-world data point which favors br_on_non_null, at least from a code size perspective.

@rossberg
Copy link
Member

@askeksa-google, yes, moreover, I've found that the inverted direction is often useful for other br_on_* instructions as well. I ended up wanting both directions in my compilers, so I'd actually be leaning towards providing both, i.e., all br_on_* instructions come as pairs of br_on_* and br_on_non_*. Does that make sense?

@askeksa-google
Copy link

askeksa-google commented Apr 13, 2021

Yes. Since these are not just negations, rather they thread their sharpened values differently, it makes sense to have both. When the wrong one is needed, the overhead is much bigger than just an i32.eqz to negate the condition (or folding it into the test, as is often possible for the plain conditional branches).

@rossberg
Copy link
Member

Closing via #48.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants