Skip to content
This repository was archived by the owner on Apr 25, 2025. It is now read-only.

1a implementation for toolchains: dummy initializers #314

Closed
carlopi opened this issue Jul 27, 2022 · 11 comments
Closed

1a implementation for toolchains: dummy initializers #314

carlopi opened this issue Jul 27, 2022 · 11 comments

Comments

@carlopi
Copy link

carlopi commented Jul 27, 2022

I sort of lost track of the discussion on non-nullable locals. Really nice to see it has been settled and there is progress on 1a.

After reading discussion on WebAssembly/function-references#44 and also discussion on implementation on bynarien (WebAssembly/binaryen#4824) I realized there might a (potentially obvious) escape hatch for toolchains:

  1. have a bunch of nullable globals with the right types
  2. initialize them as part of the start() function
  3. in the first few instructions of every function, set any non-nullable local like:
global.get X
ref.as_non_null
local.set Y

This explicit initialization to a dummy value allows to skip the complexities of having to deal with the validation part for 1a, allowing back things like

if (someCondition)
    local = costly_initialization();
else
    local = other_costly_initialization();

func(local);

or even more complex to legalize like:

const bool condition = someFunc();
if (condition)
    local = costly_initialization();

//some other code

if (condition == false)
    local = other_costly_initialization();

func(local);

without having to force additional restrictions on a given tool IR.

This might not be optimal (as in the added initialization costs 3 instructions per locals that might be avoided), but allows to simplify implementation and could be used either as backstop or as simple way to legalize intermediate (invalid) states.

I briefly discussed with @kripken (https://twitter.com/carlo_piovesan/status/1551834112788398080 and replies), I agree that there are potentially drawbacks / its' not optimal, but I think it might a strategy worth considering enough to be shared here.

One open question I have is whether it's always possible to generate such dummy initializes in advance (it's easy for function references, less obvious for arbitrary potentially recursive types).

@tlively
Copy link
Member

tlively commented Jul 27, 2022

No, any recursive type where there is a path in the infinitely unrolled type definition that contains only non-nullable references cannot be constructed by Wasm because it would require being able to refer to a value before it has been initialized. These types can still be potentially created by a host, so they can’t just be optimized entirely out, either.

@carlopi
Copy link
Author

carlopi commented Jul 27, 2022

I see now, thanks, but it would be still possible (I believe) to have a nullable global variable with the same type, then as part of the initialization of the module call an appropriate host function that returns a dummy value with the correct type and assign it to the global variable.
From there it's just a matter of taking the global, casting to non-null, and you have your dummy.

Thanks for highlighting this issue with recursive types, I knew that there had to be this corner case, but I don't think this invalidate the reasoning.

@tlively
Copy link
Member

tlively commented Jul 27, 2022

Yep, that would work fine as long as such a host function exists, but it might not in general. For example the host might only pass in the value as a function parameter, or it might only return it as a result of calling a function with side effects, or it might return a dataref value with one of many possible types depending on external factors.

@kripken
Copy link
Member

kripken commented Jul 27, 2022

Looks like this was suggested by @askeksa-google recently. It seems like it could be useful sometimes.

Maybe we should document all the "workaround" options somewhere? I think the main ones that have been discussed are:

  1. A producer can simply never emit non-nullable types.
  2. A producer can never emit non-nullable locals: such a local can be replaced with a nullable one + ref.as_non_null on gets.
  3. A producer can emit dummy/dead sets in the function entry (or maybe in internal scopes) to ensure 1a validation.

1 and 2 may be good enough for a compiler if an optimizer like wasm-opt runs later, since wasm GC provides enough information for a tool to infer non-nullability. 3 might be good enough on its own, I'm not sure, but it is more work than 1 or 2.

Aside from those workarounds, the more optimal path is probably to decide what to do in each situation separately. I think that's what we have to do in Binaryen, though we haven't decided how yet, but something along the lines of 2 but only if we can't move code around to avoid the issue, etc. That does take more work, of course.

@dcodeIO
Copy link

dcodeIO commented Jul 27, 2022

Is it only me, or is speccing something half-baked and then discussing and documenting a bunch of workarounds (where one is to not use the feature at all and others are clumsy / suboptimal / impossible) not a desirable outcome? I certainly appreciate all the effort and discussion, specifically the summary document, but when I judge just by the outcome, well...

@carlopi
Copy link
Author

carlopi commented Jul 27, 2022

Kudos to @askeksa-google then. Having read his comments, the only potential expansion is in potentially providing a JS-based initializer.

+1 on documenting workarounds or other relevant by-products of the discussion somewhere, this was the basic idea behind creating the issue (not sure where it's the right place).

On dummies, there are a few minor improvements that makes sense, like moving it to an inner scope (but out of loops), or removing the dummy entirely if, after optimizations have been performed, 1a validation would be satisfied anyhow.

@dcodeIO: I get the sarcasm, but hurdles for producers vs better runtime characteristics is part of Wasm design process, and here allowing useful proposal (like function references and then strings / GC) to move further makes a lot of sense, and the compromise on 1a makes sense to me.

@titzer
Copy link
Contributor

titzer commented Jul 27, 2022

Option 1a is a compromise as we couldn't reach consensus on a more general solution. It is preferable to the status quo, which was to defer and not allow non-nullable locals. It leaves the door open to either 1b (annotations) or 1c (further inference).

As Thomas points out, the fact remains that there are some types that programs can write that simply cannot be initialized without either host help or a fixpoint operator. That wouldn't have been solved by any of the alternatives that were available.

@rossberg
Copy link
Member

rossberg commented Jul 27, 2022

@tlively:

These types can still be potentially created by a host

Actually, I'd much prefer if we did not give hosts a mandate to violate the Wasm data model this way.

For one, as a general design principal, the host should not have magic powers when it comes to Wasm data structures, as that breaks the virtualisation principle.

Second, such an ability could also be abused to construct cyclic data structures for other types, which breaks guarantees that Wasm programs would otherwise get for recursive data types – e.g., that traversing a (non-mutable) recursive type always terminates (in other words, that such recursion is inductive).

@tlively
Copy link
Member

tlively commented Jul 27, 2022

@rossberg, that sounds fine to me, but do we have a way of normatively constraining hosts that way? (Other than in the JS API specifically)

@rossberg
Copy link
Member

rossberg commented Jul 27, 2022

Yes. We already axiomatise what a host function is allowed to do to the store when called. In particular, the resulting store must be valid and the store extension relation constrains how it can be modified, e.g., not changing immutable things. With GC, these definitions would also talk about the heap, and validity could rule out cyclic data in some suitable way (details tbd).

@tlively
Copy link
Member

tlively commented Nov 1, 2022

Closing this as non-actionable for the MVP design, but I would welcome a PR adding a new document containing notes about how the proposal could be used.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants