Dynamic linking requirements #228

tlively · 2021-06-30T00:34:59Z

Following up from today's subgroup meeting, this issue aims to identify the linking use cases and properties we expect to natively support in a GC MVP. The idea is that we will be ok with frontends being forced to use arbitrary complex compilation and user-level linking schemes to correctly implement any source language linking semantics more expressive than what we agree on here.

Based on our conversation this morning, here's my understanding of where we're at:

Requirements:

Modules with DAG-shaped type dependence graphs should be able to be instantiated in topological order, just like in MVP Wasm.

Non-goals:

We consider mutually recursive modules to be out of scope.

Unclear:

We do not have consensus on whether sharing types between modules should require explicit imports and exports.

Are there any other requirements, non-goals, or axes of consideration that I am missing?

rossberg · 2021-06-30T10:44:17Z

Requirements:

Modules with DAG-shaped dependence graphs should be able to be instantiated in topological order, just like in MVP Wasm.

It isn't clear to me whether this is meant to refer to source-level modules (compiled down to Wasm modules) or to Wasm modules per se. In the latter case, isn't it a vacuous requirement? (I.e., how would it not be satisfied?)

Are there any other requirements, non-goals, or axes of consideration that I am missing?

I would phrase it like this:

Compilation should be able to target GC types in the same way it can target linear memory, i.e., preserving instantiation order and linking logic.

Because only with this, the use of GC types will be backwards compatible with existing use cases, eco-systems, and tool chains.

tlively · 2021-06-30T14:52:33Z

Requirements:

Modules with DAG-shaped dependence graphs should be able to be instantiated in topological order, just like in MVP Wasm.

It isn't clear to me whether this is meant to refer to source-level modules (compiled down to Wasm modules) or to Wasm modules per se. In the latter case, isn't it a vacuous requirement? (I.e., how would it not be satisfied?)

Yes, this was referring to Wasm modules, and yes, I would hope that everyone already agrees on this very simple requirement. One way it would not be satisfied is for example with nominal types but no type imports or exports, but again, I think we already all agree on this requirement. (I'll amend the wording to "type dependence graphs")

Are there any other requirements, non-goals, or axes of consideration that I am missing?

I would phrase it like this:

Compilation should be able to target GC types in the same way it can target linear memory, i.e., preserving instantiation order and linking logic.

Do you mean that the instantiation order of source modules should be preserved? If so that would be scoped to a DAG of source modules and exclude mutually dependent source modules, right? Can you be more precise about the linking logic that you expect to be preserved?

Because only with this, the use of GC types will be backwards compatible with existing use cases, eco-systems, and tool chains.

Do you mean existing Wasm use cases ecosystems and tool chains that using linear memory here?

rossberg · 2021-07-01T06:16:08Z

Compilation should be able to target GC types in the same way it can target linear memory, i.e., preserving instantiation order and linking logic.

Do you mean that the instantiation order of source modules should be preserved? If so that would be scoped to a DAG of source modules and exclude mutually dependent source modules, right?

What I mean is that the same Wasm module and linking topology you can use when compiling for linear memory should also work when compiling for GC types. So that you can switch the data representation or compilation scheme without affecting the order in which modules can be plugged together. If it did, then that would be a clear sign that GC types are not modular enough.

That doesn't imply that mutually recursive source modules can be compiled to mutually recursive Wasm modules, because that's not possible with linear memory either.

Can you be more precise about the linking logic that you expect to be preserved?

In a nutshell, switching internal data representation from memory to GC should not require new custom linking infrastructure that was not necessary before and that may not even be possible to integrate into existing tool chains or eco systems.

Because only with this, the use of GC types will be backwards compatible with existing use cases, eco-systems, and tool chains.

Do you mean existing Wasm use cases ecosystems and tool chains that using linear memory here?

Yes (to the extent that they even care).

Does that make sense?

tlively · 2021-07-01T16:21:09Z

Yes, thanks. One issue is that existing toolchains and ecosystems have no concept of sharing data type definitions, so I don't think it's clear what it means to match their behavior in this case. It seems reasonable to expect data type definitions to be shared in the same way as function and tag definitions, which would make nominal typing the logical choice, but it also seems reasonable to expect them to be shared in the same way as function types (i.e. not explicitly shared), which would make structural typing the logical choice. Since both designs are compatible with existing mechanisms in WebAssembly, I don't think we can make progress on this choice solely by looking at what WebAssembly has done so far.

rossberg · 2021-07-06T14:21:00Z

One issue is that existing toolchains and ecosystems have no concept of sharing data type definitions

Right, but that is my very point: if toolchains and ecosystems needed to add special support for sharing types a.k.a. declarations of GC data layout -- other than generically linking type import/exports together -- then we'd be doing it wrong. They do not need any special logic for data structures defined in linear memory or functions passing data in linear memory.

jakobkummerow · 2021-07-06T16:55:18Z

if toolchains and ecosystems needed to add special support for sharing types [...] then we'd be doing it wrong.

While I'm not trying to argue for or against any particular solution here, I'm taking issue with the generality of this statement. There are no existing toolchains or ecosystems for WasmGC. We're trying to establish a new ecosystem, based on new (or at least significantly updated) toolchains. We're obviously expecting these toolchains to add special support for everything that WasmGC introduces (or, in many cases, for having a Wasm-producing backend at all).

And we're necessarily facing constraints and tradeoffs in this design work. Generally speaking, after careful contemplation and/or experimentation, if we determine that for a given problem a certain solution is the best tradeoff, even though it implies placing certain burdens/requirements on toolchains, then so be it. It's a factor to consider, but depending on what the requirements are, not unquestionably "wrong".

rossberg · 2021-07-07T05:34:59Z

There are no existing toolchains or ecosystems for WasmGC. We're trying to establish a new ecosystem, based on new (or at least significantly updated) toolchains.

Actually, we're not! We are trying to enable new ways of compiling certain high-level language constructs. For the most part, that should be an implementation detail of codegen that does not affect other parts of tool chains, and certainly not entire eco systems! If it did, that would be a failure mode.

Toolchain = compiler, linker, package manager, packer, perhaps custom loader, etc. Of these, only compilers should be significantly affected.

Ecosystem = web, CDNs, blockchains, component systems, standalone implementations, etc., with their generic loading and linking mechanisms. Shouldn't be affected at all.

conrad-watt · 2021-07-07T09:31:38Z

Actually, we're not! We are trying to enable new ways of compiling certain high-level language constructs. For the most part, that should be an implementation detail of codegen that does not affect other parts of tool chains, and certainly not entire eco systems!

For us to violate this principle (given the current topic of discussion), there would have to be a language currently targetting linear memory (I imagine shipping its own GC), which upon switching to GC types would need to significantly alter its linking model. I believe that separately compiled modules of such a language would currently be linking with a central "runtime" module anyway, to get the definition of its GC at least, and so I don't think that mandating explicit sharing of types would require it to change its linking model.

In any case, I think it would be too strong to say that a shift to GC types shouldn't affect the other parts of any possible toolchain people could currently be using targetting Wasm. For example, with linear memory, separate modules can agree on distinguished static offsets in their shared linear memory where important state is stored, and don't need to coordinate these offsets through imports and exports. However, if that state were to be shifted inside a reference type, this would no longer be possible and some additional central coordination would be needed.

rossberg · 2021-07-07T11:30:46Z

I believe that separately compiled modules of such a language would currently be linking with a central "runtime" module anyway, to get the definition of its GC at least, and so I don't think that mandating explicit sharing of types would require it to change its linking model.

Linking a fixed runtime module (or a set thereof) is totally fine. But that would not solve the problem of providing an open set of type definitions -- unless you require whole-program linking and break all other linking scenarios that are currently possible.

In any case, I think it would be too strong to say that a shift to GC types shouldn't affect the other parts of any possible toolchain

I agree, that's why I said "significantly affect", meaning that you don't need to add substantial new infrastructure, for some definition of "substantial".

manoskouk · 2021-07-07T15:26:06Z

Right, but that is my very point: if toolchains and ecosystems needed to add special support for sharing types a.k.a. declarations of GC data layout -- other than generically linking type import/exports together -- then we'd be doing it wrong. They do not need any special logic for data structures defined in linear memory or functions passing data in linear memory.

Currently, importing a function needs the importing module to replicate its signature. Also, structural types need their layout replicated in any module that uses them. It seems reasonable to me that the should allow the same for GC types if they need importing (in a nominal type system).

conrad-watt · 2021-07-08T00:02:21Z

... break all other linking scenarios that are currently possible.

I agree, that's why I said "significantly affect", meaning that you don't need to add substantial new infrastructure, for some definition of "substantial".

I was trying to argue something stronger, that I think we shouldn't overly restrict ourselves by uncompromisingly aiming to support linking scenarios that toolchains currently targeting linear memory (and wanting to migrate to GC) could be engineered to use in theory, but aren't in practice.

So I'd argue that it's not an instant dealbreaker for a hypothetical scheme to be "significantly affected". We'd want to weigh up how much we care about continuing to facilitate it.

rossberg · 2021-07-08T17:20:47Z

@manoskouk:

Currently, importing a function needs the importing module to replicate its signature. Also, structural types need their layout replicated in any module that uses them. It seems reasonable to me that the should allow the same for GC types if they need importing (in a nominal type system).

Well, yes, but the problem is that you cannot replicate nominal types without changing the meaning of (and potentially breaking) the program.

@conrad-watt:

So I'd argue that it's not an instant dealbreaker for a hypothetical scheme to be "significantly affected". We'd want to weigh up how much we care about continuing to facilitate it.

Agreed in the abstract, but there is nothing much hypothetical about the ability to do more than just whole-program linking. For one, the module linking proposal is all about supporting that. Another example are my experimental compilers, which depend on it, not for anything fancy but for doing pretty standard code loading stuff.

conrad-watt · 2021-07-08T17:38:57Z

Agreed in the abstract, but there is nothing much hypothetical about the ability to do more than just whole-program linking.

My belief is that explicit/nominal type imports and exports can do more than whole-program linking, so long as someone is managing a convention for how type imports/exports are resolved and disambiguated. This could be done by the toolchain at a "deploy" stage, or by the host, or by some user script (edit: with more flexibility if we assume the importexport mechanism that @tlively sketched).

I somewhat view equi-recursive canonicalisation as the Wasm runtime implementing and natively providing a particularly onerous form of type disambiguation that every program has to pay a penalty for, even if they don't rely on it.

I admit that the kind of "easy" runtime type disambiguation strategies a user/toolchain could implement would be less powerful than equi-recursive canonicalisation, but I think they'd be more powerful than whole-program linking. So a question I'm interested in is - is there a linking scheme in that gap that we want to support?

EDIT: apologies for the accidental closure!

manoskouk · 2021-07-13T09:02:03Z

Well, yes, but the problem is that you cannot replicate nominal types without changing the meaning of (and potentially breaking) the program.

Nominal types defined in another module can be marked as imported. Then, during module instantiation, the module can be instantiated with the runtime type representations that correspond to the imported types, just as it is with runtime representations of functions.

rossberg · 2021-07-13T15:32:47Z

@conrad-watt:

I somewhat view equi-recursive canonicalisation as the Wasm runtime implementing and natively providing a particularly onerous form of type disambiguation

I'm afraid I don't agree with that framing, which seems rather backwards to me (and not just because I would regard it as the inverse of disambiguation).

Also, please let's not conflate the question of structural types with the choice of semantics for type recursion -- those are rather separate questions.

@manoskouk:

Nominal types defined in another module can be marked as imported. Then, during module instantiation, the module can be instantiated with the runtime type representations that correspond to the imported types, just as it is with runtime representations of functions.

Yes, but that doesn't scale. I'll explain why in today's meeting. One problem is that types affect static typing, which actually makes them very different from functions.

conrad-watt · 2021-07-13T15:59:42Z

I'm afraid I don't agree with that framing, which seems rather backwards to me (and not just because I would regard it as the inverse of disambiguation).

Point taken, using the word "disambiguation" does somewhat presuppose a generative semantics. I can understand that you don't agree with my characterisation of equi-recursive canonicalisation, but how do you feel about the more limited claim that nominal type imports and exports can do better than whole-program linking?

Also, please let's not conflate the question of structural types with the choice of semantics for type recursion -- those are rather separate questions.

I agree that finding an alternative semantics for type recursion while sticking with structural types would be a "third way" here. We've investigated iso-recursive types and the general feeling seems to be that either nominal or equi-recursive types would still be preferable. So unless we can come up with another structural approach to investigate (edit: or improve our perception of iso-recursion), a criticism of equi-recursive types is a criticism of the only structural approach we consider viable.

tlively · 2021-07-13T18:27:55Z

Another option I've been thinking about would be to have nominal types that can be recursive as well as inductive structural types. That way classes, vtables, and methods in an OO language could all lower to recursive nominal types, but things like tuples and free functions could use structural types. I believe that would unlock most of the simplicity and expressiveness of structural types without the overhead of equirecursive canonicalization.

RossTate · 2021-07-13T19:51:36Z

We have yet to see an example that nominal types cannot address pretty easily. In particular, #156 provides nominal solutions to many of the major concerns expressed in today's presentation, such as simple loading and infinite number of types (in fact, it offers another solution distinct from the one I described today), with consensus established (amongst everyone but @rossberg) that these solutions would be satisfactory.

Once such an example illustrating a real need for structural types with type canonicalization is supplied and recognized, then we can properly understand @rossberg's concern and determine how best to address it (possibly as the current MVP does). Note that #156 ended with many people requesting such an example from @rossberg 8 months ago.

tlively · 2022-02-12T01:48:46Z

We've settled on #243 as the type system solution for dynamic linking, so closing this.

RossTate mentioned this issue Jul 1, 2021

Illustration of Staged Compilation for Deferred Loading #229

Closed

conrad-watt closed this as completed Jul 8, 2021

conrad-watt reopened this Jul 8, 2021

tlively closed this as completed Feb 12, 2022

Dynamic linking requirements #228

Dynamic linking requirements #228

Comments

tlively commented Jun 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

rossberg commented Jun 30, 2021

Uh oh!

tlively commented Jun 30, 2021

Uh oh!

rossberg commented Jul 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlively commented Jul 1, 2021

Uh oh!

rossberg commented Jul 6, 2021

Uh oh!

jakobkummerow commented Jul 6, 2021

Uh oh!

rossberg commented Jul 7, 2021

Uh oh!

conrad-watt commented Jul 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rossberg commented Jul 7, 2021

Uh oh!

manoskouk commented Jul 7, 2021

Uh oh!

conrad-watt commented Jul 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rossberg commented Jul 8, 2021

Uh oh!

conrad-watt commented Jul 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manoskouk commented Jul 13, 2021

Uh oh!

rossberg commented Jul 13, 2021

Uh oh!

conrad-watt commented Jul 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlively commented Jul 13, 2021

Uh oh!

RossTate commented Jul 13, 2021

Uh oh!

tlively commented Feb 12, 2022

Uh oh!

tlively commented Jun 30, 2021 •

edited

Loading

rossberg commented Jul 1, 2021 •

edited

Loading

conrad-watt commented Jul 7, 2021 •

edited

Loading

conrad-watt commented Jul 8, 2021 •

edited

Loading

conrad-watt commented Jul 8, 2021 •

edited

Loading

conrad-watt commented Jul 13, 2021 •

edited

Loading