Performance of type equivalence and subtype checking

I just wanted to chime in with some comments about the performance of those two bits in the current proposal. I do see performance overhead in it, but I don't see it as dire as the [relevant](https://github.com/WebAssembly/gc/blob/master/proposals/gc/Overview.md#type-equivalence) [sections](https://github.com/WebAssembly/gc/blob/master/proposals/gc/Overview.md#subtyping) make it sound. Specifically, this details a potential way to do it in:

- `O(1)` time for computing type equivalence. (It's literally pointer equality.)
- `O(nd)` time for computing subtyping, where `n` is the number of child nodes and `d` is the depth of the type.
- `O(b)` space, where `b` is the number of bytes used to define the backing `(type ...)`.

No action is necessarily required of this - it's purely informative.

### Internal structure

The proposal overview details the type grammar as this:

```
num_type       ::=  i32 | i64 | f32 | f64
ref_type       ::=  (ref <def_type>) | i31ref | anyref | anyfunc
value_type     ::=  <num_type> | <ref_type>

packed_type    ::=  i8 | i16
storage_type   ::=  <value_type> | <packed_type>
field_type     ::=  <storage_type> | (mut <storage_type>)

data_type      ::=  (struct <field_type>*) | (array <field_type>)
func_type      ::=  (func <value_type>* <value_type>*)
def_type       ::=  <data_type> | <func_type>
```

For type equality and subtype checking, all I care about is `value_type`, `def_type`, and, nothing else. The first is the type used for parameters and the second is used for type names. So I can transform the grammar to this to make it more regular and easier to process:

```
value_type   ::= i32 | i64 | f32 | f64 | i31ref | anyref | anyfunc | <ref_type>

storage_type ::= i8 | i16 | <value_type> | <packed_type>
field_type   ::= (imm <storage_type>)
field_type   ::= (mut <storage_type>)

ref_type     ::= (ref (struct <field_type>*))
ref_type     ::= (ref (array <field_type>))
ref_type     ::= (ref (func <value_type>* <value_type>*))
```

*(I can freely just convert the `def_type` types into `(ref <def_type>)` since `(type ...)` bodies can only ever contain a single `<def_type>`. Since that's the only difference, I can freely ignore it and just tie `(type ...)` to `ref_type` instances instead.)*

Non-reference types would be represented via sentinels referring to each type directly. These would be tagged to denote they're primitives and not pointers, to save memory. Reference types would be represented via pointers to nodes in a persistent tree, where "array" nodes carry a child type and "shape" nodes carry a child count followed by that many child types. "Shape" nodes can also carry a bit denoting whether they're callable, but this isn't required as function references' `nullptr` sentinel is enough to tell it apart from struct references. In each of these, there exist mutable and immutable variants, rather than `type` and `(mut type)`. This simplifies comparisons and saves a bit of memory in the process.

This of course doesn't strictly mirror the spec, but acts as a form of desugaring. Here's how the spec types would desugar:

- Array references would be represented as "array" nodes with the child type set to the array reference's child type.
- Struct references would be represented as "shape" nodes with the child types set to the respective fields' types.
- Function references would be represented as "callable shape" nodes with the child types set to the parameters and return values in order, delimited by a single `nullptr` sentinel.
- Type references that have not yet been declared would be represented as "callable array" nodes with the child type set to the referenced type index. (These would be resolved before any type equality or subtype checking can occur.)

And of course, type aliases are resolved to the relevant sentinel, *not* by name. This simplifies comparisons and is critical to the speed benefit.

This can be built up in memory and persisted using:

- A tree structure for shapes where each struct and function type represents a key path into that tree.
	- Node keys include either primitive types, array types, and shapes in this tree.
	- Node values include the corresponding shape for this path.
	- Node values are not created until *after* first used.
- An array of tree nodes representing array references.
- A map of uninitialized type indices to lists of dependent target pointers.
	- "Array" nodes are added via a pointer to their child field.
	- "Shape" nodes are added via a pointer to the relevant child index.

When parsing nodes, the path is used to identify the node to access, and if a node doesn't exist, it's created before returning or recursing. These nodes do need collected as normal heap objects, but they don't require a sophisticated algorithm. Absent recursive references, you could even use reference counting. These are meant to be globally shared and interned much like strings and well known symbols, but they just have to be global to a given execution context, not necessarily globally to the entire world.

After parsing a type, the type index map is searched with the current type index for the list of dependent nodes matching the type index key and that entry is then removed from the type index map. After that, each pointer in the list is directly assigned the new child.

This keeps the memory overhead worst-case `O(b)` space, where `b` is the number of bytes used to define the backing `(type ...)`. Retrieving and potentially allocating is itself done in `O(b)` worst-case (with `b` defined above), assuming constant-time allocation, memory loads, and memory stores. The constant factor itself isn't particularly high, and the memoization would likely mitigate most of it.

### Performing the checks

Type equivalence is simple pointer equality. Part of the reason to memoize the structs and function types is to allow this. Subtype checking is a bit more complicated: it requires some memory traversal and iteration. Assuming constant-time memory loads, it can be kept down to `O(nd)` time worst case, where `n` is the number of child nodes and `d` is the max depth of the smallest tree, or equivalently, `O(b)` time worst-case, where `b` is the number of bytes used to define the backing `(type ...)`. This does adapt - it skips entire structs that are equivalent and it skips entire structs that carry too many fields to be a supertype or equal. It also adapts recursively, so if it can't adapt to one child, it can still adapt to the rest. So it should not normally hit this worst case scenario.

Subtype checking would generally follow the following algorithm:

```cpp
#include <cstdint>
#include <cstddef>
#include <cstring>
#include <vector>
#define if_likely(T) if (__builtin_expect(!!(T), 1))
#define if_unlikely(T) if (__builtin_expect(!!(T), 0))
struct Type;
// Implementation omitted for brevity
extern bool isInequalPrimitiveAssignableTo(const Type* a, const Type* b);

enum class TypeInfo: uintptr_t {
	PrimitiveTag = 1 << 0,
	MutableTag = 1 << 1,
	Shape = 1 << 2,
	Callable = 1 << 3,
};

struct TypeAlloc {
	const TypeInfo info;
	union {
		struct {
			size_t total;
			const Type* children[];
		};
		const Type* child;
	};
};

static inline bool isPrimitive(const Type *type) {
	return (uintptr_t) type & (uintptr_t) TypeInfo::PrimitiveTag;
}

bool isAssignableTo(const Type *self, const Type *other) {
	while (true) {
		if_likely (self == other) return true;
		if_unlikely (isPrimitive(self) != isPrimitive(other)) return false;
		if_unlikely (isPrimitive(self)) {
			return isInequalPrimitiveAssignableTo(self, other);
		}
		const TypeAlloc* a = reinterpret_cast<const TypeAlloc*>(self);
		const TypeAlloc* b = reinterpret_cast<const TypeAlloc*>(other);
		const TypeInfo info = a->info;
		if_unlikely (info != b->info) return false;

		if_likely ((uintptr_t) info & (uintptr_t) TypeInfo::Shape) {
			size_t total = a->total;
			if (total < b->total) return false;
			for (size_t i = 0; i < total; i++) {
				if_unlikely (!isAssignableTo(a->children[i], b->children[i])) {
					return false;
				}
			}
			return true;
		}

		self = a->child;
		other = b->child;
	}
}
```

Of course, this is itself somewhat optimized, but there's still room. For one example, that struct check could check the fast path first, that of all identity, and break early if that's true. (This could be a simple `memcmp` after checking the total.)

### Other information

I don't have benchmarks. I just wanted to show where the baseline was academically, to show that it *is* efficiently implementable.

This is a pretty long issue comment, so it's possible I missed something. If I did, feel free to let me know and I'll correct it in an edit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance of type equivalence and subtype checking #68

Internal structure

Performing the checks

Other information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance of type equivalence and subtype checking #68

Description

Internal structure

Performing the checks

Other information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions