From 6e782323488e3df00b97ca57588329ec7660885a Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Wed, 15 Sep 2021 16:03:42 +0200 Subject: [PATCH 01/15] Context lookup --- proposals/gc/MVP.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index 0fca57c76..1ff21d3b8 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -134,15 +134,21 @@ ctxtype ::= | (rec *). - `unpacked(t) = t` - `unpacked(pt) = i32` +* Context lookup `C($t)` is extended to recursive type indices by ignoring the `rec` marker. + - `C(rec $t) = C($t)` + * Unrolling a possibly recursive context type projects the respective item + - `unroll($t) = unroll(` - `unroll() = ` - `unroll((rec *).i) = (*)[i]` * Expanding a type definition unrolls it and returns its plain definition + - `expand($t) = expand(` - `expand() = ` - where `unroll() = sub x* ` + #### Type Validity Some of the rules define a type as `ok` for a certain index, written `ok(x)`. This controls uses of type indices as supertypes inside a recursive group: the subtype hierarchy must not be cyclic, and hence any type index used for a supertype is required to be smaller than the index `x` of the current type. From be40331b64e3f757d5f4a489a2f2172b69d167c0 Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Fri, 17 Sep 2021 07:03:53 +0200 Subject: [PATCH 02/15] Fixes; rename deftype->subtype --- proposals/gc/MVP.md | 78 ++++++++++++++++++++++----------------------- 1 file changed, 39 insertions(+), 39 deletions(-) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index 1ff21d3b8..9b59202e5 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -93,13 +93,13 @@ New abbreviations are introduced for reference types in binary and text format, #### Type Definitions -* `typedef` is the syntax for an entry in the type section, generalising the existing syntax - - `typedef ::= | rec *` - - `module ::= {..., types vec()}` +* `deftype` is the syntax for an entry in the type section, generalising the existing syntax + - `deftype ::= | rec *` + - `module ::= {..., types vec()}` - a `rec` definition defines a group of mutually recursive types that can refer to each other; it thereby defines several type indices at a time -* `deftype` is a new category of type defining a single type, as a subtype of possible other types - - `deftype ::= sub * ` +* `subtype` is a new category of type defining a single type, as a subtype of possible other types + - `subtype ::= sub * ` - the preexisting syntax with no `sub` clause is redefined to be a shorthand for a `sub` clause with empty `typeidx` list: ` == sub () ` - Note: This allows multiple supertypes. For the MVP, it could be restricted to at most one supertype. @@ -125,7 +125,7 @@ TODO: Need to be able to use `i31` as a type definition. Validity of a module is checked under a context storing the definitions for each type. In the case of recursive types, this definition is given by a respective projection from the full type: ``` -ctxtype ::= | (rec *). +ctxtype ::= | (rec *). ``` #### Auxiliary Definitions @@ -138,12 +138,12 @@ ctxtype ::= | (rec *). - `C(rec $t) = C($t)` * Unrolling a possibly recursive context type projects the respective item - - `unroll($t) = unroll(` - - `unroll() = ` - - `unroll((rec *).i) = (*)[i]` + - `unroll($t) = unroll()` iff `C($t) = ` + - `unroll() = ` + - `unroll((rec *).i) = (*)[i]` * Expanding a type definition unrolls it and returns its plain definition - - `expand($t) = expand(` + - `expand($t) = expand()` iff `C($t) = ` - `expand() = ` - where `unroll() = sub x* ` @@ -154,35 +154,35 @@ ctxtype ::= | (rec *). Some of the rules define a type as `ok` for a certain index, written `ok(x)`. This controls uses of type indices as supertypes inside a recursive group: the subtype hierarchy must not be cyclic, and hence any type index used for a supertype is required to be smaller than the index `x` of the current type. * a sequence of type definitions is valid if each item is valid within the context containing the prior items - - ` * ok` - - iff ` ok` and extends the context accordingly - - and `* ok` under the extended context + - ` * ok` + - iff ` ok` and extends the context accordingly + - and `* ok` under the extended context -* a plain type definition is valid if its `deftype` is at its type index - - ` ok` and extends the context with `` - - iff ` ok($t)` where `$t` is the next unused (i.e., current) type index +* a plain type definition is valid if its `subtype` is at its type index + - ` ok` and extends the context with `` + - iff ` ok($t)` where `$t` is the next unused (i.e., current) type index * a recursive type definition is valid if its types are valid under the context containing all of them - - `rec * ok` and extends the context with `*` - - iff `* ok($t)` under the extended context(!) - - where `x` is the next unused (i.e., current) type index - - and `N = |*|-1` - - and `* = *[$t:=rec $t, ..., $t+N:=rec $t+N]` - - and `* = (rec *).0, ..., (rec *).N` - - Note: `*` marks all recursive occurrences of type indices from within this group with `rec`; this is expressed here by a validation-time substitution, but an implementation could insert the annotations on the fly during decoding. + - `rec * ok` and extends the context with `*` + - iff `* ok($t)` under the extended context(!) + - where `$t` is the next unused (i.e., current) type index + - and `N = |*|-1` + - and `* = *[$t:=rec $t, ..., $t+N:=rec $t+N]` + - and `* = (rec *).0, ..., (rec *).N` + - Note: `*` marks all recursive occurrences of type indices from within this group with `rec`; this is expressed here by a validation-time substitution, but an implementation could insert the annotations on the fly during decoding. - Because rec type indices `rec $t`cannot occur in the source, the following are invariants that are established by these rules: - - (1) rec indices only occur in the context and inside a `(rec *)`, + - (1) rec indices only occur in the context and inside a `(rec *)`, - (2) they only refer to indices from the same recursive group, - (3) all internal indices within a recursive group are marked with rec, - - (4) all rec indices are bound by the same `(rec *).i` in the context. + - (4) all rec indices are bound by the same `(rec *).i` in the context. Together, these invariants ensure that [type equivalence](#type-equivalence) on recursive types is equivalent to an _iso-recursive_ interpretation. -* a sequence of deftype's is valid of each of them is valid for their respective index - - ` * ok($t)` - - iff ` ok($t)` - - and `* ok($t+1)` +* a sequence of subtype's is valid of each of them is valid for their respective index + - ` * ok($t)` + - iff ` ok($t)` + - and `* ok($t+1)` -* an individual deftype is valid if its definition is valid, matches every supertype, and no supertype has an index higher than its own +* an individual subtype is valid if its definition is valid, matches every supertype, and no supertype has an index higher than its own - `sub $t* ok($t')` - iff ` ok` - and `( <: expand($t))*` @@ -200,8 +200,8 @@ Type equivalence, written `t == t'` here, is defined inductively, as before. All Even recursive and supertype definitions are just congruences: * two recursive types are equivalent if they are equivalent pointwise - - `(rec *) == (rec *)` - - iff `( == )*` + - `(rec *) == (rec *)` + - iff `( == )*` - Note: This rule is only used on types looked up in the context, where [recursive type indices](#types) have been marked with `rec` accordingly. That way, all recursive references are compared using the rule below, which prevents looping. * two subtypes are equivalent if their structure is equivalent and they have equivalent supertypes @@ -215,12 +215,12 @@ Type indices are only equivalent if they are either both non-recursive or both r - `$t == $t'` - iff `$t = ` and `$t' = ` and ` = ` -For type indices marked `rec`, the rules recursively assume that the respective pair of `(rec )` in their definitions is already equal and can be ignored. +For type indices marked `rec`, the rules recursively assume that the respective pair of `(rec )` in their definitions is already equal and can be ignored. * two recursive type indices are equivalent if they project the same index - `(rec $t) == (rec $t')` - - iff `$t = (rec *).i` - - and `$t' = (rec *).i'` + - iff `$t = (rec *).i` + - and `$t' = (rec *).i'` - and `i = i'` This is the only interesting rule. It is sound due to the [invariants](#type-validity) established for recursive type indices. @@ -672,7 +672,7 @@ The opcode for heap types is encoded as an `s33`. | -0x21 | `struct ft*` | `ft* : vec(fieldtype)` | | -0x22 | `array ft` | `ft : fieldtype` | | -0x30 | `sub $t* st` | `$t* : vec(typeidx)`, `st : strtype` | -| -0x31 | `rec dt*` | `dt* : vec(deftype)` | +| -0x31 | `rec dt*` | `dt* : vec(subtype)` | #### Field Types @@ -788,7 +788,7 @@ C |- ft ok C |- array ft ok ``` -#### Defined Types (`C |- * ok(x)`) +#### Defined Types (`C |- * ok(x)`) ``` C |- st ok @@ -803,7 +803,7 @@ C |- dt'* ok(x+1) C |- dt dt'* ok(x) ``` -#### Type Definitions (`C |- * -| C'`) +#### Type Definitions (`C |- * -| C'`) ``` C |- dt ok(|C|) @@ -893,7 +893,7 @@ C |- ft == ft' C |- arrat ft == array ft' ``` -#### Defined Types (`C |- == `) +#### Defined Types (`C |- == `) ``` (C |- x == x')* From 37bebafabb6efebae0ebc0f625f92e17996faaae Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Fri, 17 Sep 2021 07:24:08 +0200 Subject: [PATCH 03/15] Fix example --- proposals/gc/MVP.md | 31 ++++++++++++++++++------------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index 9b59202e5..babf80382 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -227,15 +227,15 @@ This is the only interesting rule. It is sound due to the [invariants](#type-val Note: Semantically, a type index `rec $t` is best thought of as being a self type variable bound by the enclosing recursive type (that has been alpha-renamed to be the same on both sides during the equivalence check), plus a projection determined by the definition of `$t`. For example, the mutually recursive types ``` -(type (rec - $t1 (struct (field i32 (ref $t2))) - $t2 (struct (field i64 (ref $t1))) -)) +(rec + (type $t1 (struct (field i32 (ref $t2)))) + (type $t2 (struct (field i64 (ref $t1)))) +) ``` which in the context are recorded as ``` -$t1 = (rec (struct (field i32 (ref (rec $t2))))).0 -$t2 = (rec (struct (field i64 (ref (rec $t1))))).1 +$t1 = (rec (struct (field i32 (ref (rec $t2)))) (struct (field i64 (ref $t1)))).0 +$t2 = (rec (struct (field i32 (ref (rec $t2)))) (struct (field i64 (ref $t1)))).1 ``` morally represent the higher-kinded iso-recursive types ``` @@ -246,17 +246,22 @@ where `<...>` denotes a type tuple. (A single syntactic type variable is enough Consequently, if there was an equivalent pair of types, ``` -(type (rec - $u1 (struct (field i32 (ref $u2))) - $u2 (struct (field i64 (ref $u1))) -)) +(rec + (type $u1 (struct (field i32 (ref $u2)))) + (type $u2 (struct (field i64 (ref $u1)))) +) ``` recorded in the context as ``` -$u1 = (rec (struct (field i32 (ref (rec $u2))))).0 -$u2 = (rec (struct (field i64 (ref (rec $u1))))).1 +$u1 = (rec (struct (field i32 (ref (rec $u2)))) (struct (field i64 (ref $u1)))).0 +$u2 = (rec (struct (field i32 (ref (rec $u2)))) (struct (field i64 (ref $u1)))).1 +``` +representing the iso-recursive types +``` +u1 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).0 +u2 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).1 ``` -then comparing `(rec $t1) == (rec $u1)` amounts to checking `a.0 == a.0`, as one would expect. +which are structural identical to the previous ones. With that in mind, comparing `(rec $t1) == (rec $u1)` amounts to checking `a.0 == a.0`, and the above equivalence rule is defined to behave like that. Note 2: This semantics implies that type equivalence checks can be implemented in constant-time by representing all types as trees and canonicalising them bottom-up in linear time upfront. For this purpose, all `(rec $t)` are treated as leaves, only representing the projection index `i` (representing the semantic type `a.i`) and omitting the type index `$t` itself. From 68ce610f42278a3ba543c38b26eeef1606361919 Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Fri, 17 Sep 2021 07:50:09 +0200 Subject: [PATCH 04/15] Example --- proposals/gc/MVP.md | 52 ++++++++++++++++++++++++++++++++++----------- 1 file changed, 40 insertions(+), 12 deletions(-) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index babf80382..5d5fd0caf 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -192,6 +192,34 @@ Some of the rules define a type as `ok` for a certain index, written `ok(x)`. Th * as [before](https://github.com/WebAssembly/function-references/proposals/function-references/Overview.md#types), a strtype is valid if all the occurring value types are valid - specifically, a concrete reference type `(ref $t)` is valid when `$t` is defined in the context +Example: Consider two mutually recursive types: +``` +(rec + (type $t1 (struct (field i32 (ref $t2)))) + (type $t2 (struct (field i64 (ref $t1)))) +) +``` +In the context, these will be recorded as: +``` +$t1 = rect1t2.0 +$t2 = rect1t2.1 + +where + +rect1t2 = (rec + (struct (field i32 (ref (rec $t2)))) + (struct (field i64 (ref (rec $t1)))) +) +``` +That is, the recursive occurrences of type indices `$t1` and `$t2` are marked with `rec` and the types themselves are defined as projections from the respective recursion group. +Morally, these bindings represent the higher-kinded iso-recursive types +``` +t1 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).0 +t2 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).1 +``` +where `<...>` denotes a type tuple. Interestingly, a single syntactic type variable is enough for all types, because recursive types cannot nest by construction. Because it is unique, the variables do not actually need to be represented in the Wasm syntax. +On the other hand, by representing a recursive reference by a global index, lookup is still possible as usual and unrolling does not require subtitution. + #### Equivalence @@ -225,25 +253,25 @@ For type indices marked `rec`, the rules recursively assume that the respective This is the only interesting rule. It is sound due to the [invariants](#type-validity) established for recursive type indices. -Note: Semantically, a type index `rec $t` is best thought of as being a self type variable bound by the enclosing recursive type (that has been alpha-renamed to be the same on both sides during the equivalence check), plus a projection determined by the definition of `$t`. For example, the mutually recursive types +Note: Semantically, a type index `rec $t` is best thought of as being a self type variable bound by the enclosing recursive type (that has been alpha-renamed to be the same on both sides during the equivalence check), plus a projection determined by the definition of `$t`. + +Example: As explained above, the mutually recursive types ``` (rec (type $t1 (struct (field i32 (ref $t2)))) (type $t2 (struct (field i64 (ref $t1)))) ) ``` -which in the context are recorded as +would be recorded in the context as ``` -$t1 = (rec (struct (field i32 (ref (rec $t2)))) (struct (field i64 (ref $t1)))).0 -$t2 = (rec (struct (field i32 (ref (rec $t2)))) (struct (field i64 (ref $t1)))).1 +$t1 = (rec (struct (field i32 (ref (rec $t2)))) (struct (field i64 (ref (rec $t1))))).0 +$t2 = (rec (struct (field i32 (ref (rec $t2)))) (struct (field i64 (ref (rec $t1))))).1 ``` -morally represent the higher-kinded iso-recursive types +which morally represents the higher-kinded iso-recursive types ``` t1 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).0 t2 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).1 ``` -where `<...>` denotes a type tuple. (A single syntactic type variable is enough for all types, because recursive types cannot nest by construction. Because it is unique, the variable does not actually need to be represented.) - Consequently, if there was an equivalent pair of types, ``` (rec @@ -253,19 +281,19 @@ Consequently, if there was an equivalent pair of types, ``` recorded in the context as ``` -$u1 = (rec (struct (field i32 (ref (rec $u2)))) (struct (field i64 (ref $u1)))).0 -$u2 = (rec (struct (field i32 (ref (rec $u2)))) (struct (field i64 (ref $u1)))).1 +$u1 = (rec (struct (field i32 (ref (rec $u2)))) (struct (field i64 (ref (rec $u1))))).0 +$u2 = (rec (struct (field i32 (ref (rec $u2)))) (struct (field i64 (ref (rec $u1))))).1 ``` -representing the iso-recursive types +then they represent the iso-recursive types ``` u1 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).0 u2 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).1 ``` -which are structural identical to the previous ones. With that in mind, comparing `(rec $t1) == (rec $u1)` amounts to checking `a.0 == a.0`, and the above equivalence rule is defined to behave like that. +which are structural identical to the previous ones. With that in mind, the rule for comparing `(rec $t1) == (rec $u1)` just compares the projections `.0` in the definition of both types, which amounts to checking `a.0 == a.0`. Note 2: This semantics implies that type equivalence checks can be implemented in constant-time by representing all types as trees and canonicalising them bottom-up in linear time upfront. For this purpose, all `(rec $t)` are treated as leaves, only representing the projection index `i` (representing the semantic type `a.i`) and omitting the type index `$t` itself. -Note 3: It's worth noting that the only relevant difference to a truly nominal type system is the equivalence rule on (non-recursive) type indices: instead of looking at their definitions, a nominal system would require `$t = $t'` syntactically (at least as long as we ignore things like checking imports, where type indices become meaningless). +Note 3: It's worth noting that the only observable difference to a nominal type system is the equivalence rule on (non-recursive) type indices: instead of looking at their definitions, a nominal system would require `$t = $t'` syntactically (at least as long as we ignore things like checking imports, where type indices become meaningless). #### Subtyping From 4b2d4e8ac5983ca97a656e07703fba32ea983f24 Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Fri, 17 Sep 2021 07:54:56 +0200 Subject: [PATCH 05/15] Eps --- proposals/gc/MVP.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index 5d5fd0caf..f83aaf8de 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -211,13 +211,13 @@ rect1t2 = (rec (struct (field i64 (ref (rec $t1)))) ) ``` -That is, the recursive occurrences of type indices `$t1` and `$t2` are marked with `rec` and the types themselves are defined as projections from the respective recursion group. -Morally, these bindings represent the higher-kinded iso-recursive types +That is, the recursive occurrences of type indices `$t1` and `$t2` are marked with `rec` and the types themselves are defined as projections from their respective recursion group, using their relative inner indices `0` and `1`. +Morally, in type-theoretic terms, these bindings represent the higher-kinded iso-recursive types ``` t1 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).0 t2 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).1 ``` -where `<...>` denotes a type tuple. Interestingly, a single syntactic type variable is enough for all types, because recursive types cannot nest by construction. Because it is unique, the variables do not actually need to be represented in the Wasm syntax. +where `<...>` denotes a type tuple. Interestingly, in our case, a single syntactic type variable is enough for all types, because recursive types cannot nest by construction. Because it is unique, the variables hence do not actually need to be represented in the Wasm syntax. On the other hand, by representing a recursive reference by a global index, lookup is still possible as usual and unrolling does not require subtitution. From e9d922664a8d53b0b3a7d7b5fb1cae3568fb4df2 Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Fri, 17 Sep 2021 08:00:11 +0200 Subject: [PATCH 06/15] Eps --- proposals/gc/MVP.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index f83aaf8de..40653e750 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -218,7 +218,7 @@ t1 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).0 t2 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).1 ``` where `<...>` denotes a type tuple. Interestingly, in our case, a single syntactic type variable is enough for all types, because recursive types cannot nest by construction. Because it is unique, the variables hence do not actually need to be represented in the Wasm syntax. -On the other hand, by representing a recursive reference by a global index, lookup is still possible as usual and unrolling does not require subtitution. +On the other hand, by representing a recursive reference by a global index, its definition can still be looked up in the context as usual and unrolling does not require subtitution. #### Equivalence From de796097ea2baa7cc9b71babaf550ccd150d7532 Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Fri, 17 Sep 2021 14:28:37 +0200 Subject: [PATCH 07/15] Fix unrolling --- proposals/gc/MVP.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index 40653e750..0feb8bb2a 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -140,7 +140,7 @@ ctxtype ::= | (rec *). * Unrolling a possibly recursive context type projects the respective item - `unroll($t) = unroll()` iff `C($t) = ` - `unroll() = ` - - `unroll((rec *).i) = (*)[i]` + - `unroll((rec *).i) = (*)[i][rec $t:=$t,...]` * Expanding a type definition unrolls it and returns its plain definition - `expand($t) = expand()` iff `C($t) = ` From 36d37a73b8c59f0f783bc7850c7d7982bd82dd4d Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Fri, 17 Sep 2021 17:55:12 +0200 Subject: [PATCH 08/15] Simplify semantics: use rec variable, but only in eq rules --- proposals/gc/MVP.md | 114 ++++++++++++++++++-------------------------- 1 file changed, 46 insertions(+), 68 deletions(-) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index 0feb8bb2a..9e64942bc 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -27,18 +27,6 @@ All three proposals are prerequisites. ### Types -#### Type Indices - -Type indices are extended with one bit of information that distinguishes regular from recursive type uses. - -* `typeidx` is optionally annotated with `rec` - - `typeidx ::= rec? ` - -The annotation is only used in the internal representation of types and marks type indices used within the scope of their own recursive type definition, i.e., distinguishes recursive from regular uses of type indices. It does not occur in the binary or text format. It is merely a technical annotation that is trivially derived from the syntax of type definitions but makes the rules simpler to formulate. - -Note: A practical implementation can easily derive and insert this extra information during decoding. - - #### Heap Types [Heap types](https://github.com/WebAssembly/function-references/blob/master/proposals/function-references/Overview.md#types) classify reference types and are extended: @@ -140,7 +128,7 @@ ctxtype ::= | (rec *). * Unrolling a possibly recursive context type projects the respective item - `unroll($t) = unroll()` iff `C($t) = ` - `unroll() = ` - - `unroll((rec *).i) = (*)[i][rec $t:=$t,...]` + - `unroll((rec *).i) = (*)[i]` * Expanding a type definition unrolls it and returns its plain definition - `expand($t) = expand()` iff `C($t) = ` @@ -148,7 +136,6 @@ ctxtype ::= | (rec *). - where `unroll() = sub x* ` - #### Type Validity Some of the rules define a type as `ok` for a certain index, written `ok(x)`. This controls uses of type indices as supertypes inside a recursive group: the subtype hierarchy must not be cyclic, and hence any type index used for a supertype is required to be smaller than the index `x` of the current type. @@ -167,15 +154,7 @@ Some of the rules define a type as `ok` for a certain index, written `ok(x)`. Th - iff `* ok($t)` under the extended context(!) - where `$t` is the next unused (i.e., current) type index - and `N = |*|-1` - - and `* = *[$t:=rec $t, ..., $t+N:=rec $t+N]` - - and `* = (rec *).0, ..., (rec *).N` - - Note: `*` marks all recursive occurrences of type indices from within this group with `rec`; this is expressed here by a validation-time substitution, but an implementation could insert the annotations on the fly during decoding. - - Because rec type indices `rec $t`cannot occur in the source, the following are invariants that are established by these rules: - - (1) rec indices only occur in the context and inside a `(rec *)`, - - (2) they only refer to indices from the same recursive group, - - (3) all internal indices within a recursive group are marked with rec, - - (4) all rec indices are bound by the same `(rec *).i` in the context. - Together, these invariants ensure that [type equivalence](#type-equivalence) on recursive types is equivalent to an _iso-recursive_ interpretation. + - and `* = (rec *).0, ..., (rec *).N` * a sequence of subtype's is valid of each of them is valid for their respective index - ` * ok($t)` @@ -207,53 +186,50 @@ $t2 = rect1t2.1 where rect1t2 = (rec - (struct (field i32 (ref (rec $t2)))) - (struct (field i64 (ref (rec $t1)))) + (struct (field i32 (ref $t2))) + (struct (field i64 (ref $t1))) ) ``` -That is, the recursive occurrences of type indices `$t1` and `$t2` are marked with `rec` and the types themselves are defined as projections from their respective recursion group, using their relative inner indices `0` and `1`. -Morally, in type-theoretic terms, these bindings represent the higher-kinded iso-recursive types -``` -t1 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).0 -t2 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).1 -``` -where `<...>` denotes a type tuple. Interestingly, in our case, a single syntactic type variable is enough for all types, because recursive types cannot nest by construction. Because it is unique, the variables hence do not actually need to be represented in the Wasm syntax. -On the other hand, by representing a recursive reference by a global index, its definition can still be looked up in the context as usual and unrolling does not require subtitution. +That is, the types are defined as projections from their respective recursion group, using their relative inner indices `0` and `1`. #### Equivalence -Type equivalence, written `t == t'` here, is defined inductively, as before. All rules are simply the canonical structural pointwise congruences, with the exception of type indices. +Type equivalence, written `t == t'` here, is essentially defined inductively. All rules are simply the canonical congruences, with the exception of the rule for recursive types. -Even recursive and supertype definitions are just congruences: +For the purpose of defining recursive type equivalence, type indices are extended with a special form that distinguishes regular from recursive type uses. -* two recursive types are equivalent if they are equivalent pointwise - - `(rec *) == (rec *)` - - iff `( == )*` - - Note: This rule is only used on types looked up in the context, where [recursive type indices](#types) have been marked with `rec` accordingly. That way, all recursive references are compared using the rule below, which prevents looping. +* `rec.` is a new form of type index + - `typeidx ::= ... | rec.` -* two subtypes are equivalent if their structure is equivalent and they have equivalent supertypes - - `(sub $t* ) == (sub $t'* )` - - iff ` == ` - - and `($t == $t')*` +This form is only used during equivalence checking, to identify and represent "back edges" inside a recursive type. It is merely a technical device for formulating the rules and cannot appear in source code. It is introduced by the following auxiliary meta-function: -Type indices are only equivalent if they are either both non-recursive or both recursive. Regular non-recursive type indices are compared structurally. +* Rolling a context type produces an _iso-recursive_ representation of its underlying recursion group + - `roll($t) = roll_$t()` iff `$t = ` + - `roll_$t() = ` + - `roll_$t((rec *).i) = (rec *).i[$t:=rec.0, ..., $t'+N:=rec.N]` iff `$t' = $t-i` and `N = |*|` + - Note: If a type is not recursive, `roll` is just the identity. + - Note: This definition assumes that all projections of the recursive type are bound to consecutive type indices, so that `$t-i` is the first of them. -* as before, two non-recursive type indices are equivalent if they define equivalent types - - `$t == $t'` - - iff `$t = ` and `$t' = ` and ` = ` +With that: -For type indices marked `rec`, the rules recursively assume that the respective pair of `(rec )` in their definitions is already equal and can be ignored. +* two regular type indices are equivalent if they define equivalent rolled context types: + - `$t == $t'` + - iff `roll($t) == roll($t')` * two recursive type indices are equivalent if they project the same index - - `(rec $t) == (rec $t')` - - iff `$t = (rec *).i` - - and `$t' = (rec *).i'` - - and `i = i'` + - `rec.i == rec.i'` + - iff `i = i'` -This is the only interesting rule. It is sound due to the [invariants](#type-validity) established for recursive type indices. +* two recursive types are equivalent if they are equivalent pointwise + - `(rec *) == (rec *)` + - iff `( == )*` + - Note: This rule is only used on types that have been rolled, which prevents looping. -Note: Semantically, a type index `rec $t` is best thought of as being a self type variable bound by the enclosing recursive type (that has been alpha-renamed to be the same on both sides during the equivalence check), plus a projection determined by the definition of `$t`. +* notably, two subtypes are equivalent if their structure is equivalent and they have equivalent supertypes + - `(sub $t* ) == (sub $t'* )` + - iff ` == ` + - and `($t == $t')*` Example: As explained above, the mutually recursive types ``` @@ -264,13 +240,8 @@ Example: As explained above, the mutually recursive types ``` would be recorded in the context as ``` -$t1 = (rec (struct (field i32 (ref (rec $t2)))) (struct (field i64 (ref (rec $t1))))).0 -$t2 = (rec (struct (field i32 (ref (rec $t2)))) (struct (field i64 (ref (rec $t1))))).1 -``` -which morally represents the higher-kinded iso-recursive types -``` -t1 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).0 -t2 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).1 +$t1 = (rec (struct (field i32 (ref $t2))) (struct (field i64 (ref $t1)))).0 +$t2 = (rec (struct (field i32 (ref $t2))) (struct (field i64 (ref $t1)))).1 ``` Consequently, if there was an equivalent pair of types, ``` @@ -281,17 +252,24 @@ Consequently, if there was an equivalent pair of types, ``` recorded in the context as ``` -$u1 = (rec (struct (field i32 (ref (rec $u2)))) (struct (field i64 (ref (rec $u1))))).0 -$u2 = (rec (struct (field i32 (ref (rec $u2)))) (struct (field i64 (ref (rec $u1))))).1 +$u1 = (rec (struct (field i32 (ref $u2))) (struct (field i64 (ref $u1)))).0 +$u2 = (rec (struct (field i32 (ref $u2))) (struct (field i64 (ref $u1)))).1 ``` -then they represent the iso-recursive types +then to check the equivalence `$t1 == $u1`, both types are rolled into iso-recursive types first: +``` +roll($t1) = (rec (struct (field i32 (ref rec.1))) (struct (field i64 (ref rec.0)))).0 +roll($u1) = (rec (struct (field i32 (ref rec.1))) (struct (field i64 (ref rec.0)))).0 +``` +In this case, it is immediately apparent that these are equivalent types. + +Note: In type-theoretic terms, these are higher-kinded iso-recursive types: ``` -u1 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).0 -u2 = (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).1 +roll($t1) ~ (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).0 +roll($t2) ~ (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).1 ``` -which are structural identical to the previous ones. With that in mind, the rule for comparing `(rec $t1) == (rec $u1)` just compares the projections `.0` in the definition of both types, which amounts to checking `a.0 == a.0`. +where `<...>` denotes a type tuple. However, in our case, a single syntactic type variable `rec` is enough for all types, because recursive types cannot nest by construction. -Note 2: This semantics implies that type equivalence checks can be implemented in constant-time by representing all types as trees and canonicalising them bottom-up in linear time upfront. For this purpose, all `(rec $t)` are treated as leaves, only representing the projection index `i` (representing the semantic type `a.i`) and omitting the type index `$t` itself. +Note 2: This semantics implies that type equivalence checks can be implemented in constant-time by representing all types as trees in rolled form and canonicalising them bottom-up in linear time upfront. Note 3: It's worth noting that the only observable difference to a nominal type system is the equivalence rule on (non-recursive) type indices: instead of looking at their definitions, a nominal system would require `$t = $t'` syntactically (at least as long as we ignore things like checking imports, where type indices become meaningless). From 7580a3048eef77b56571ae2534e2ed2ab23e7949 Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Fri, 17 Sep 2021 22:56:32 +0200 Subject: [PATCH 09/15] Clean up --- proposals/gc/MVP.md | 41 ++++++++++++++++++----------------------- 1 file changed, 18 insertions(+), 23 deletions(-) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index 9e64942bc..ef3ff133c 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -122,9 +122,6 @@ ctxtype ::= | (rec *). - `unpacked(t) = t` - `unpacked(pt) = i32` -* Context lookup `C($t)` is extended to recursive type indices by ignoring the `rec` marker. - - `C(rec $t) = C($t)` - * Unrolling a possibly recursive context type projects the respective item - `unroll($t) = unroll()` iff `C($t) = ` - `unroll() = ` @@ -205,17 +202,17 @@ For the purpose of defining recursive type equivalence, type indices are extende This form is only used during equivalence checking, to identify and represent "back edges" inside a recursive type. It is merely a technical device for formulating the rules and cannot appear in source code. It is introduced by the following auxiliary meta-function: * Rolling a context type produces an _iso-recursive_ representation of its underlying recursion group - - `roll($t) = roll_$t()` iff `$t = ` - - `roll_$t() = ` - - `roll_$t((rec *).i) = (rec *).i[$t:=rec.0, ..., $t'+N:=rec.N]` iff `$t' = $t-i` and `N = |*|` - - Note: If a type is not recursive, `roll` is just the identity. + - `tie($t) = tie_$t()` iff `$t = ` + - `tie_$t() = ` + - `tie_$t((rec *).i) = (rec *).i[$t:=rec.0, ..., $t'+N:=rec.N]` iff `$t' = $t-i` and `N = |*|` + - Note: If a type is not recursive, `tie` is just the identity. - Note: This definition assumes that all projections of the recursive type are bound to consecutive type indices, so that `$t-i` is the first of them. With that: -* two regular type indices are equivalent if they define equivalent rolled context types: +* two regular type indices are equivalent if they define equivalent tied context types: - `$t == $t'` - - iff `roll($t) == roll($t')` + - iff `tie($t) == tie($t')` * two recursive type indices are equivalent if they project the same index - `rec.i == rec.i'` @@ -224,7 +221,7 @@ With that: * two recursive types are equivalent if they are equivalent pointwise - `(rec *) == (rec *)` - iff `( == )*` - - Note: This rule is only used on types that have been rolled, which prevents looping. + - Note: This rule is only used on types that have been tied, which prevents looping. * notably, two subtypes are equivalent if their structure is equivalent and they have equivalent supertypes - `(sub $t* ) == (sub $t'* )` @@ -255,21 +252,21 @@ recorded in the context as $u1 = (rec (struct (field i32 (ref $u2))) (struct (field i64 (ref $u1)))).0 $u2 = (rec (struct (field i32 (ref $u2))) (struct (field i64 (ref $u1)))).1 ``` -then to check the equivalence `$t1 == $u1`, both types are rolled into iso-recursive types first: +then to check the equivalence `$t1 == $u1`, both types are tied into iso-recursive types first: ``` -roll($t1) = (rec (struct (field i32 (ref rec.1))) (struct (field i64 (ref rec.0)))).0 -roll($u1) = (rec (struct (field i32 (ref rec.1))) (struct (field i64 (ref rec.0)))).0 +tie($t1) = (rec (struct (field i32 (ref rec.1))) (struct (field i64 (ref rec.0)))).0 +tie($u1) = (rec (struct (field i32 (ref rec.1))) (struct (field i64 (ref rec.0)))).0 ``` In this case, it is immediately apparent that these are equivalent types. Note: In type-theoretic terms, these are higher-kinded iso-recursive types: ``` -roll($t1) ~ (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).0 -roll($t2) ~ (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).1 +tie($t1) ~ (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).0 +tie($t2) ~ (mu a. <(struct (field i32 (ref a.1))), (struct i64 (field (ref a.0)))>).1 ``` where `<...>` denotes a type tuple. However, in our case, a single syntactic type variable `rec` is enough for all types, because recursive types cannot nest by construction. -Note 2: This semantics implies that type equivalence checks can be implemented in constant-time by representing all types as trees in rolled form and canonicalising them bottom-up in linear time upfront. +Note 2: This semantics implies that type equivalence checks can be implemented in constant-time by representing all types as trees in tied form and canonicalising them bottom-up in linear time upfront. Note 3: It's worth noting that the only observable difference to a nominal type system is the equivalence rule on (non-recursive) type indices: instead of looking at their definitions, a nominal system would require `$t = $t'` syntactically (at least as long as we ignore things like checking imports, where type indices become meaningless). @@ -822,8 +819,7 @@ C |- dt ok(|C|) C |- dt -| C,dt x = |C| N = |dt*|-1 -dt'* = dt*[x:=rec x,...,x+N:=rec x+N] -C' = C,(rec dt'*).0,...,(rec dt'*).N +C' = C,(rec dt*).0,...,(rec dt*).N C' |- dt* ok(x) ------------------------------------- C |- rec dt* -| C' @@ -854,14 +850,13 @@ C |- struct.get i : [(ref x)] -> [t] #### Type Indices (`C |- == `) ``` -C |- C(x) == C(x') ------------------- +C |- tie(x) == tie(x') +---------------------- C |- x == x' -C(x) = (rec dt*).i -C(x') = (rec dt'*).i + -------------------- -C |- rec x == rec x' +C |- rec.i == rec.i ``` #### Value Types (`C |- == `) From 70af4a2157001da433935440d4eaf11c7076363d Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Tue, 28 Sep 2021 15:28:26 +0200 Subject: [PATCH 10/15] Update proposals/gc/MVP.md Co-authored-by: Thomas Lively <7121787+tlively@users.noreply.github.com> --- proposals/gc/MVP.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index ef3ff133c..0885ad50b 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -204,7 +204,7 @@ This form is only used during equivalence checking, to identify and represent "b * Rolling a context type produces an _iso-recursive_ representation of its underlying recursion group - `tie($t) = tie_$t()` iff `$t = ` - `tie_$t() = ` - - `tie_$t((rec *).i) = (rec *).i[$t:=rec.0, ..., $t'+N:=rec.N]` iff `$t' = $t-i` and `N = |*|` + - `tie_$t((rec *).i) = (rec *).i[$t':=rec.0, ..., $t'+N:=rec.N]` iff `$t' = $t-i` and `N = |*| - 1` - Note: If a type is not recursive, `tie` is just the identity. - Note: This definition assumes that all projections of the recursive type are bound to consecutive type indices, so that `$t-i` is the first of them. From f7cd0b6c7899300889161e429b5f87be40276792 Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Tue, 28 Sep 2021 16:02:10 +0200 Subject: [PATCH 11/15] Update proposals/gc/MVP.md Co-authored-by: Thomas Lively <7121787+tlively@users.noreply.github.com> --- proposals/gc/MVP.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index 0885ad50b..7cb43e218 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -896,7 +896,7 @@ C |- struct ft* == struct ft'* C |- ft == ft' -------------------------- -C |- arrat ft == array ft' +C |- array ft == array ft' ``` #### Defined Types (`C |- == `) From 553c1af3835355d66aeb19d3f86ce8d013c49132 Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Tue, 28 Sep 2021 16:27:36 +0200 Subject: [PATCH 12/15] Sync rule meta vars with grammar --- proposals/gc/MVP.md | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index 7cb43e218..918bac76b 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -204,7 +204,7 @@ This form is only used during equivalence checking, to identify and represent "b * Rolling a context type produces an _iso-recursive_ representation of its underlying recursion group - `tie($t) = tie_$t()` iff `$t = ` - `tie_$t() = ` - - `tie_$t((rec *).i) = (rec *).i[$t':=rec.0, ..., $t'+N:=rec.N]` iff `$t' = $t-i` and `N = |*| - 1` + - `tie_$t((rec *).i) = (rec *).i[$t':=rec.0, ..., $t'+N:=rec.N]` iff `$t' = $t-i` and `N = |*|-1` - Note: If a type is not recursive, `tie` is just the identity. - Note: This definition assumes that all projections of the recursive type are bound to consecutive type indices, so that `$t-i` is the first of them. @@ -796,7 +796,7 @@ C |- ft ok C |- array ft ok ``` -#### Defined Types (`C |- * ok(x)`) +#### Sub Types (`C |- * ok(x)`) ``` C |- st ok @@ -805,29 +805,29 @@ C |- st ok -------------------------- C |- sub x* st ok(x') -C |- dt ok(x) -C |- dt'* ok(x+1) +C |- st ok(x) +C |- st'* ok(x+1) ------------------- -C |- dt dt'* ok(x) +C |- st st'* ok(x) ``` -#### Type Definitions (`C |- * -| C'`) +#### Defined Types (`C |- * -| C'`) ``` -C |- dt ok(|C|) +C |- st ok(|C|) --------------- -C |- dt -| C,dt +C |- st -| C,st -x = |C| N = |dt*|-1 -C' = C,(rec dt*).0,...,(rec dt*).N -C' |- dt* ok(x) +x = |C| N = |st*|-1 +C' = C,(rec st*).0,...,(rec st*).N +C' |- st* ok(x) ------------------------------------- -C |- rec dt* -| C' +C |- rec st* -| C' -C |- td -| C' -C' |- td'* ok +C |- dt -| C' +C' |- dt'* ok --------------- -C |- td td'* ok +C |- dt dt'* ok ``` #### Instructions (`C |- : [t1*] -> [t2*]`) From afef45f5c4446ed58690bab49f20aa1554fa64e4 Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Wed, 29 Sep 2021 11:18:43 +0200 Subject: [PATCH 13/15] Remove rtt.sub and rtt depth --- proposals/gc/MVP.md | 81 ++++++++++++++++----------------------------- 1 file changed, 28 insertions(+), 53 deletions(-) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index 918bac76b..98bd4b260 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -47,10 +47,9 @@ All three proposals are prerequisites. - `heaptype ::= ... | i31` - the type of unboxed scalars -* `rtt ? ` is a new heap type that is a runtime representation of the static type `` - - `heaptype ::= ... | rtt ? ` - - `rtt n? t ok` iff `t ok` - - the constant `n`, if present, encodes the static knowledge that this type has `n` dynamic supertypes (see [Runtime types](#runtime-types)) +* `rtt ` is a new heap type that is a runtime representation of the static type `` + - `heaptype ::= ... | rtt ` + - `rtt t ok` iff `t ok` * Note: heap types `func` and `extern` already exist via [reference types proposal](https://github.com/WebAssembly/reference-types), and `(ref null? $t)` via [typed references](https://github.com/WebAssembly/function-references) @@ -75,8 +74,8 @@ New abbreviations are introduced for reference types in binary and text format, * `i31ref` is a new reference type - `i31ref == (ref i31)` -* `rtt ? ` is a new reference type - - `(rtt ? $t) == (ref (rtt ? $t))` +* `rtt ` is a new reference type + - `(rtt $t) == (ref (rtt $t))` #### Type Definitions @@ -306,13 +305,9 @@ In addition to the [existing rules](https://github.com/WebAssembly/function-refe - if `$t = ` - or `$t = type ht` and `ht <: func` (imports) -* `rtt n? $t` is a subtype of `eq` - - `rtt n? $t <: eq` - -* `rtt n $t` is a subtype of `rtt $t` - - `rtt n $t1 <: rtt $t2` - - if `$t1 == $t2` - - Note: `rtt n? $t1` is *not* a subtype of `rtt n? $t2`, if `$t1` is merely a subtype of `$t2`; such covariant subtyping would be unsound, since RTTs are used in both co- and contravariant roles (e.g., both when constructing and consuming a reference) +* `rtt $t` is a subtype of `eq` + - `rtt $t <: eq` + - Note: `rtt $t1` is *not* a subtype of `rtt $t2`, unless `$t1` and `$t2` are equivalent; covariant subtyping would be unsound, since RTTs are used in both co- and contravariant roles (e.g., both when constructing and consuming a reference) Note: This creates a hierarchy of *abstract* Wasm heap types that looks as follows. ``` @@ -363,39 +358,29 @@ Subtyping is not defined on type definitions. #### Runtime Types -* Runtime types (RTTs) are explicit values representing concrete types at runtime; a value of type `rtt ? ` is a dynamic representative of the static type ``. +* Runtime types (RTTs) are explicit values representing concrete types at runtime; a value of type `rtt ` is a dynamic representative of the static type ``. * All RTTs are explicitly created and all operations involving dynamic type information (like casts) operate on explicit RTT operands. This allows maximum flexibility and custom choices wrt which RTTs to represent a source type. -* There is a runtime subtyping hierarchy on RTTs; creating an RTT allows providing a *parent type* in the form of an existing RTT. - -* An RTT value r1 is *equal* to another RTT value r2 iff they both represent the same static type and either of the following holds: - - r1 and r2 both have no parents, or - - r1 and r2 both have equal RTT values as parents. - -* An RTT value r1 is a *sub-RTT* of another RTT value r2 iff either of the following holds: - - r1 and r2 are equal RTT values, or - - r1 has a parent that is a sub-RTT of r2. - -* The count `` in the static type of an RTT value, if present, denotes the length of the supertype chain, i.e., its "inheritance depth" of _concrete types_ (not counting abstract supertypes like `dataref` or `anyref`, which are always at the top of the hierarchy). If this information is present, it enables more efficient implementation of runtime casts in an engine; if it is absent (e.g., to abstract the depth of a subtype graph), then the engine has to read it from the dynamic RTT value. +* An RTT value r1 is *equal* to another RTT value r2 iff they both represent the same static type. -* Validation requires that each RTT's parent type is a representative of a static supertype; runtime subtyping hence is a sub-relation of static subtyping (a graph with fewer nodes and edges). +* An RTT value r1 is a *subtype* of another RTT value r2 iff they represent static types that are in a respective subtype relation. -* At the same time, runtime subtyping forms a linear hierarchy such that the relation can be checked efficiently using standard implementation techniques (the runtime subtype hierarchy is a tree-shaped graph). - -Note: RTT values correspond to type descriptors or "shape" objects as they exist in various engines. RTT equality can be implemented as a single pointer test by memoising RTT values. More interestingly, runtime casts along the hierachy encoded in these values can be implemented in an engine efficiently by using well-known techniques such as including a vector of its (direct and indirect) super-RTTs in each RTT value (with itself as the last entry). The value `` then denotes the length of this vector. A subtype check between two RTT values can be implemented as follows using such a representation. Assume RTT value v1 has static type `(rtt n1? $t1)` and v2 has type `(rtt n2? $t2)`. To check whether v1 denotes a sub-RTT of v2, first verify that `n1 >= n2` -- if both `n1` and `n2` are known statically, this can be performed at compile time; if either is not statically known, it has to be read from the respective RTT value dynamically, and `n1 >= n2` becomes a dynamic check. Then compare v2 to the n2-th entry in v1's supertype vector. If they are equal, v1 is a sub-RTT. +Note: RTT values correspond to type descriptors or "shape" objects as they exist in various engines. RTT equality can be implemented as a single pointer test by memoising RTT values. More interestingly, runtime casts along the hierachy encoded in these values can be implemented in an engine efficiently by using well-known techniques such as including a vector of its (direct and indirect) super-RTTs in each RTT value (with itself as the last entry). A subtype check between two RTT values can be implemented as follows using such a representation. Assume RTT value v1 has static type `(rtt $t1)` and v2 has type `(rtt $t2)`. Let `n1` and `n2` be the lenghts of the respective supertype vectors. To check whether v1 denotes a subtype RTT of v2, first verify that `n1 >= n2` -- if both `n1` and `n2` are known statically, this can be performed at compile time; if either is not statically known, it has to be read from the respective RTT value dynamically, and `n1 >= n2` becomes a dynamic check. Then compare v2 to the n2-th entry in v1's supertype vector. If they are equal, v1 is a subtype RTT. In the case of actual casts, the static type of RTT v1 (obtained from the value to cast) is not known at compile time, so `n1` is dynamic as well. (Note that `$t1` and `$t2` are not relevant for the dynamic semantics, but merely for validation.) +Note: This assumes that there is at most one supertype. For hierarchies with multiple supertypes, more complex tests would be necessary. + Example: Consider three types and corresponding RTTs: ``` (type $A (struct)) -(type $B (struct (field i32))) -(type $C (struct (field i32 i64))) +(type $B (sub $A (struct (field i32)))) +(type $C (sub $B (struct (field i32 i64)))) (global $rttA (rtt 0 $A) (rtt.canon $A)) -(global $rttB (rtt 1 $B) (rtt.sub $B (global.get $rttA))) -(global $rttC (rtt 2 $C) (rtt.sub $C (global.get $rttB))) +(global $rttB (rtt 1 $B) (rtt.canon $B)) +(global $rttC (rtt 2 $C) (rtt.canon $C)) ``` Here, `$rttA` would carry supertype vector `[$rttA]`, `$rttB` has `[$rttA, $rttB]`, and `$rttC` has `[$rttA, $rttB, $rttC]`. @@ -414,7 +399,7 @@ This can compile to machine code that (1) reads the RTT from `$x`, (2) checks th * Reference values of data or function type have an associated runtime type: - for structures or arrays, it is the RTT value provided upon creation, - - for functions, it is the RTT value for the function's type. + - for functions, it is the RTT value for the function's type (which may be recursive). * Note: as a future extension, we could allow a value's RTT to be a supertype of the value's actual type. For example, a structure or array with RTT `any` would become fully opaque to runtime type checks, and an implementation may choose to optimize away its RTT. @@ -430,11 +415,11 @@ This can compile to machine code that (1) reads the RTT from `$x`, (2) checks th #### Structures * `struct.new_with_rtt ` allocates a structure with RTT information determining its [runtime type](#values) and initialises its fields with given values - - `struct.new_with_rtt $t : [t'* (rtt n $t)] -> [(ref $t)]` + - `struct.new_with_rtt $t : [t'* (rtt $t)] -> [(ref $t)]` - iff `expand($t) = struct (mut t')*` * `struct.new_default_with_rtt ` allocates a structure of type `$t` and initialises its fields with default values - - `struct.new_default_with_rtt $t : [(rtt n $t)] -> [(ref $t)]` + - `struct.new_default_with_rtt $t : [(rtt $t)] -> [(ref $t)]` - iff `expand($t) = struct (mut t')*` - and all `t'*` are defaultable @@ -455,11 +440,11 @@ This can compile to machine code that (1) reads the RTT from `$x`, (2) checks th #### Arrays * `array.new_with_rtt ` allocates an array with RTT information determining its [runtime type](#values) - - `array.new_with_rtt $t : [t' i32 (rtt n $t)] -> [(ref $t)]` + - `array.new_with_rtt $t : [t' i32 (rtt $t)] -> [(ref $t)]` - iff `expand($t) = array (var t')` * `array.new_default_with_rtt ` allocates an array and initialises its fields with the default value - - `array.new_default_with_rtt $t : [i32 (rtt n $t)] -> [(ref $t)]` + - `array.new_default_with_rtt $t : [i32 (rtt $t)] -> [(ref $t)]` - iff `expand($t) = array (var t')` - and `t'` is defaultable @@ -572,16 +557,10 @@ Note: The `br_on_*` instructions allow an operand of unrelated reference type, e #### Runtime Types * `rtt.canon ` returns the RTT of the specified type - - `rtt.canon $t : [] -> [(rtt 0 $t)]` + - `rtt.canon $t : [] -> [(rtt $t)]` - multiple invocations of this instruction yield the same observable RTTs - this is a *constant instruction* -* `rtt.sub ` returns an RTT for `typeidx` as a sub-RTT of a the parent RTT operand - - `rtt.sub $t : [(rtt n? $t')] -> [(rtt (n+1)? $t)]` - - iff `(type $t) <: (type $t')` - - multiple invocations of this instruction with the same operand yield the same observable RTTs - - this is a *constant instruction* - TODO: Add the ability to generate new (non-canonical) RTT values to implement casting in nominal type hierarchies? @@ -590,19 +569,19 @@ TODO: Add the ability to generate new (non-canonical) RTT values to implement ca RTT-based casts can only be performed with respect to concrete types, and require a data or function reference as input, which are known to carry an RTT. * `ref.test` tests whether a reference value's [runtime type](#values) is a [runtime subtype](#runtime) of a given RTT - - `ref.test : [t' (rtt n? $t)] -> [i32]` + - `ref.test : [t' (rtt $t)] -> [i32]` - iff `t' <: (ref null data)` or `t' <: (ref null func)` - returns 1 if the first operand is not null and its runtime type is a sub-RTT of the RTT operand, 0 otherwise * `ref.cast` casts a reference value down to a type given by a RTT representation - - `ref.cast : [(ref null1? ht) (rtt n? $t)] -> [(ref null2? $t)]` + - `ref.cast : [(ref null1? ht) (rtt $t)] -> [(ref null2? $t)]` - iff `ht <: data` or `ht <: func` - and `null1? = null2?` - returns null if the first operand is null - traps if the first operand is not null and its runtime type is not a sub-RTT of the RTT operand * `br_on_cast ` branches if a value can be cast down to a given reference type - - `br_on_cast $l : [t0* t (rtt n? $t')] -> [t0* t]` + - `br_on_cast $l : [t0* t (rtt $t')] -> [t0* t]` - iff `$l : [t0* t']` - and `t <: (ref null data)` or `t <: (ref null func)` - and `(ref $t') <: t'` @@ -610,7 +589,7 @@ RTT-based casts can only be performed with respect to concrete types, and requir - passes cast operand along with branch, plus possible extra args * `br_on_cast_fail ` branches if a value can not be cast down to a given reference type - - `br_on_cast_fail $l : [t0* t (rtt n? $t')] -> [t0* (ref $t')]` + - `br_on_cast_fail $l : [t0* t (rtt $t')] -> [t0* (ref $t')]` - iff `$l : [t0* t']` - and `t <: (ref null data)` or `t <: (ref null func)` - and `t <: t'` @@ -625,7 +604,6 @@ Note: These instructions allow an operand of unrelated reference type, even thou In order to allow RTTs to be initialised as globals, the following extensions are made to the definition of *constant expressions*: * `rtt.canon` is a constant instruction -* `rtt.sub` is a constant instruction * `global.get` is a constant instruction and can access preceding (immutable) global definitions, not just imports as in the MVP @@ -653,7 +631,6 @@ This extends the [encodings](https://github.com/WebAssembly/function-references/ | -0x14 | `(ref null ht)` | `ht : heaptype (s33)` | from funcref proposal | | -0x15 | `(ref ht)` | `ht : heaptype (s33)` | from funcref proposal | | -0x16 | `i31ref` | | shorthand | -| -0x17 | `(rtt n $t)` | `n : u32`, `i : typeidx` | shorthand | | -0x18 | `(rtt $t)` | `i : typeidx` | shorthand | | -0x19 | `dataref` | | shorthand | @@ -669,7 +646,6 @@ The opcode for heap types is encoded as an `s33`. | -0x12 | `any` | | | | -0x13 | `eq` | | | | -0x16 | `i31` | | | -| -0x17 | `(rtt n i)` | `n : u32`, `i : typeidx` | | | -0x18 | `(rtt i)` | `i : typeidx` | | | -0x19 | `data` | | | @@ -712,7 +688,6 @@ The opcode for heap types is encoded as an `s33`. | 0xfb21 | `i31.get_s` | | | 0xfb22 | `i31.get_u` | | | 0xfb30 | `rtt.canon $t` | `$t : typeidx` | -| 0xfb31 | `rtt.sub $t` | `$t : typeidx` | | 0xfb40 | `ref.test $t` | `$t : typeidx` | | 0xfb41 | `ref.cast $t` | `$t : typeidx` | | 0xfb42 | `br_on_cast $l` | `$l : labelidx` | From a6385fddd4214790640f7b5ce698953d65c7e0d6 Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Mon, 14 Feb 2022 13:10:03 +0100 Subject: [PATCH 14/15] Address comments --- proposals/gc/MVP.md | 55 ++++++++++++++++++++++++--------------------- 1 file changed, 29 insertions(+), 26 deletions(-) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index 98bd4b260..2e529ad80 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -20,9 +20,7 @@ Based on the following proposals: * [typed function references](https://github.com/WebAssembly/function-references), which introduces typed references `(ref null? $t)` etc. -* [type imports](https://github.com/WebAssembly/proposal-type-imports), which allows type definitions to be imported and exported - -All three proposals are prerequisites. +Both proposals are prerequisites. ### Types @@ -81,14 +79,15 @@ New abbreviations are introduced for reference types in binary and text format, #### Type Definitions * `deftype` is the syntax for an entry in the type section, generalising the existing syntax - - `deftype ::= | rec *` + - `deftype ::= rec *` - `module ::= {..., types vec()}` - a `rec` definition defines a group of mutually recursive types that can refer to each other; it thereby defines several type indices at a time + - a single type definition, as in Wasm before this proposal, is reinterpreted as a short-hand for a recursive group containing just one type * `subtype` is a new category of type defining a single type, as a subtype of possible other types - `subtype ::= sub * ` - the preexisting syntax with no `sub` clause is redefined to be a shorthand for a `sub` clause with empty `typeidx` list: ` == sub () ` - - Note: This allows multiple supertypes. For the MVP, it could be restricted to at most one supertype. + - Note: This allows multiple supertypes. For the MVP, it is restricted to at most one supertype. * `strtype` is a new category of types covering the different forms of concrete structural reference types - `strtype ::= | | ` @@ -112,7 +111,7 @@ TODO: Need to be able to use `i31` as a type definition. Validity of a module is checked under a context storing the definitions for each type. In the case of recursive types, this definition is given by a respective projection from the full type: ``` -ctxtype ::= | (rec *). +ctxtype ::= . ``` #### Auxiliary Definitions @@ -122,12 +121,11 @@ ctxtype ::= | (rec *). - `unpacked(pt) = i32` * Unrolling a possibly recursive context type projects the respective item - - `unroll($t) = unroll()` iff `C($t) = ` - - `unroll() = ` + - `unroll($t) = unroll()` iff `$t = ` - `unroll((rec *).i) = (*)[i]` * Expanding a type definition unrolls it and returns its plain definition - - `expand($t) = expand()` iff `C($t) = ` + - `expand($t) = expand()` iff `$t = ` - `expand() = ` - where `unroll() = sub x* ` @@ -141,11 +139,7 @@ Some of the rules define a type as `ok` for a certain index, written `ok(x)`. Th - iff ` ok` and extends the context accordingly - and `* ok` under the extended context -* a plain type definition is valid if its `subtype` is at its type index - - ` ok` and extends the context with `` - - iff ` ok($t)` where `$t` is the next unused (i.e., current) type index - -* a recursive type definition is valid if its types are valid under the context containing all of them +* a group of recursive type definitions is valid if its types are valid under the context containing all of them - `rec * ok` and extends the context with `*` - iff `* ok($t)` under the extended context(!) - where `$t` is the next unused (i.e., current) type index @@ -202,10 +196,9 @@ This form is only used during equivalence checking, to identify and represent "b * Rolling a context type produces an _iso-recursive_ representation of its underlying recursion group - `tie($t) = tie_$t()` iff `$t = ` - - `tie_$t() = ` - `tie_$t((rec *).i) = (rec *).i[$t':=rec.0, ..., $t'+N:=rec.N]` iff `$t' = $t-i` and `N = |*|-1` - - Note: If a type is not recursive, `tie` is just the identity. - Note: This definition assumes that all projections of the recursive type are bound to consecutive type indices, so that `$t-i` is the first of them. + - Note: If a type is not recursive, `tie` is just the identity. With that: @@ -267,7 +260,8 @@ where `<...>` denotes a type tuple. However, in our case, a single syntactic typ Note 2: This semantics implies that type equivalence checks can be implemented in constant-time by representing all types as trees in tied form and canonicalising them bottom-up in linear time upfront. -Note 3: It's worth noting that the only observable difference to a nominal type system is the equivalence rule on (non-recursive) type indices: instead of looking at their definitions, a nominal system would require `$t = $t'` syntactically (at least as long as we ignore things like checking imports, where type indices become meaningless). +Note 3: It's worth noting that the only observable difference to the rules for a nominal type system is the equivalence rule on (non-recursive) type indices: instead of comparing the definitions of their recursive groups, a nominal system would require `$t = $t'` syntactically (at least as long as we ignore things like checking imports, where type indices become meaningless). +Consequently, using a single big recursion group in this system makes it behave like a nominal system. #### Subtyping @@ -300,10 +294,8 @@ In addition to the [existing rules](https://github.com/WebAssembly/function-refe * Any concrete type is a subtype of either `data` or `func` - `$t <: data` - if `$t = ` or `$t = ` - - or `$t = type ht` and `ht <: data` (imports) - `$t <: func` - if `$t = ` - - or `$t = type ht` and `ht <: func` (imports) * `rtt $t` is a subtype of `eq` - `rtt $t <: eq` @@ -649,14 +641,29 @@ The opcode for heap types is encoded as an `s33`. | -0x18 | `(rtt i)` | `i : typeidx` | | | -0x19 | `data` | | | -#### Defined Types +#### Structured Types | Opcode | Type | Parameters | | ------ | --------------- | ---------- | | -0x21 | `struct ft*` | `ft* : vec(fieldtype)` | | -0x22 | `array ft` | `ft : fieldtype` | -| -0x30 | `sub $t* st` | `$t* : vec(typeidx)`, `st : strtype` | -| -0x31 | `rec dt*` | `dt* : vec(subtype)` | + +#### Subtypes + +| Opcode | Type | Parameters | Note | +| ------ | --------------- | ---------- | ---- | +| -0x21 | `struct ft*` | `ft* : vec(fieldtype)` | shorthand | +| -0x22 | `array ft` | `ft : fieldtype` | shorthand | +| -0x30 | `sub $t* st` | `$t* : vec(typeidx)`, `st : strtype` | | + +#### Defined Types + +| Opcode | Type | Parameters | Note | +| ------ | --------------- | ---------- | ---- | +| -0x21 | `struct ft*` | `ft* : vec(fieldtype)` | shorthand | +| -0x22 | `array ft` | `ft : fieldtype` | shorthand | +| -0x30 | `sub $t* st` | `$t* : vec(typeidx)`, `st : strtype` | shorthand | +| -0x31 | `rec dt*` | `dt* : vec(subtype)` | | #### Field Types @@ -789,10 +796,6 @@ C |- st st'* ok(x) #### Defined Types (`C |- * -| C'`) ``` -C |- st ok(|C|) ---------------- -C |- st -| C,st - x = |C| N = |st*|-1 C' = C,(rec st*).0,...,(rec st*).N C' |- st* ok(x) From ab8bc68b73cac21faca3396e068ffb3fec1b8c8a Mon Sep 17 00:00:00 2001 From: Andreas Rossberg Date: Tue, 15 Feb 2022 09:34:50 +0100 Subject: [PATCH 15/15] Typos; clarification --- proposals/gc/MVP.md | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/proposals/gc/MVP.md b/proposals/gc/MVP.md index 2e529ad80..2949af4e8 100644 --- a/proposals/gc/MVP.md +++ b/proposals/gc/MVP.md @@ -315,9 +315,18 @@ RTTs are below `eq`. In addition, the abstract heap type `extern` is also a subtype of `any`. Its interpretation is defined by the host environment. It may contain additional host-defined types that are neither of the above three leaf type categories. -It may also overlap with some or all of these categories, as would be observable by applying a classification instruction like `ref.is_func` to a value of type `externref`. +It may also overlap with some or all of these categories, +as would be observable by applying a classification instruction like `ref.is_func` to a value of type `externref`. The possible outcomes of such an operation hence depend on the host environment. -(For example, in a JavaScript embedding, `externref` could be inhabited by all JS values -- which is a natural choice, because JavaScript is untyped; but some of its values are JS-side representations of Wasm values per the JS API, and those can also be observed as `data` or `func` references. Another possible interpretation could be that `data` is disjoint from `extern`, which would be determined by the coercions allowed by the JS API at the JS/Wasm boundary. While such an interpretation is probably not attractive for JavaScript, it would be natural in other embeddings such as the C/C++ API, where different references are represented with different host types.) +(For example, in a JavaScript embedding, `externref` could be inhabited by all JS values -- +which is a natural choice, because JavaScript is untyped; +but some of its values are JS-side representations of Wasm values per the JS API, +and those can also be observed as `data` or `func` references. +Another possible interpretation could be that `data` is disjoint from `extern`, +which would be determined by the coercions allowed by the JS API at the JS/Wasm boundary. +While such an interpretation is probably not attractive for JavaScript, +it would be natural in other embeddings such as the C/C++ API, +where different references are represented with different host types.) Note: In the future, this hierarchy could be refined to distinguish compound data types that are not subtypes of `eq`. @@ -358,9 +367,22 @@ Subtyping is not defined on type definitions. * An RTT value r1 is a *subtype* of another RTT value r2 iff they represent static types that are in a respective subtype relation. -Note: RTT values correspond to type descriptors or "shape" objects as they exist in various engines. RTT equality can be implemented as a single pointer test by memoising RTT values. More interestingly, runtime casts along the hierachy encoded in these values can be implemented in an engine efficiently by using well-known techniques such as including a vector of its (direct and indirect) super-RTTs in each RTT value (with itself as the last entry). A subtype check between two RTT values can be implemented as follows using such a representation. Assume RTT value v1 has static type `(rtt $t1)` and v2 has type `(rtt $t2)`. Let `n1` and `n2` be the lenghts of the respective supertype vectors. To check whether v1 denotes a subtype RTT of v2, first verify that `n1 >= n2` -- if both `n1` and `n2` are known statically, this can be performed at compile time; if either is not statically known, it has to be read from the respective RTT value dynamically, and `n1 >= n2` becomes a dynamic check. Then compare v2 to the n2-th entry in v1's supertype vector. If they are equal, v1 is a subtype RTT. +Note: RTT values correspond to type descriptors or "shape" objects as they exist in various engines. +RTT equality can be implemented as a single pointer test by memoising RTT values. +More interestingly, runtime casts along the hierarchy encoded in these values can be implemented in an engine efficiently +by using well-known techniques such as including a vector of its (direct and indirect) super-RTTs in each RTT value (with itself as the last entry). +A subtype check between two RTT values can be implemented as follows using such a representation. +Assume RTT value v1 has static type `(rtt $t1)` and v2 has type `(rtt $t2)`. +Let `n1` and `n2` be the lengths of the respective supertype vectors. +To check whether v1 denotes a subtype RTT of v2, first verify that `n1 >= n2` -- +if both `n1` and `n2` are known statically, this can be performed at compile time; +if either is not statically known (`$t1` and `n1` are typically unknown during a cast), +it has to be read from the respective RTT value dynamically, and `n1 >= n2` becomes a dynamic check. +Then compare v2 to the n2-th entry in v1's supertype vector. +If they are equal, v1 is a subtype RTT. In the case of actual casts, the static type of RTT v1 (obtained from the value to cast) is not known at compile time, so `n1` is dynamic as well. -(Note that `$t1` and `$t2` are not relevant for the dynamic semantics, but merely for validation.) +(Note that `$t1` and `$t2` are not relevant for the dynamic semantics, +but merely for validation.) Note: This assumes that there is at most one supertype. For hierarchies with multiple supertypes, more complex tests would be necessary.