[Wasm GC] Add a GC-Lowering pass which lowers GC to MVP #4000

kripken · 2021-07-17T00:09:53Z

This converts e.g. a struct.set to an appropriate write to linear memory, etc.
The hard part is implementing things like RTT semantics with casts and
subtyping etc., which requires a runtime to be added there.

The layout of things in linear memory is pretty simple, and outlined at the
beginning of the pass code in a comment.

This is useful for performance evaluations of wasm GC (as it allows the wasm
to be compiled through LLVM, for example) as well as functioning as a polyfill
for wasm GC. The latter almost allows a language today to use wasm GC and to
compile down to MVP wasm for now until VMs implement it, except for the
collector not doing actual collection yet (that could be added later if there is
interest).

Tested on the existing wasm GC benchmarks from Dart. Not fuzzed yet.

Apologies for the size of this PR, but it's the minimal amount of code that
is actually usable and testable...

kripken · 2021-07-19T18:32:36Z

@tlively looks like the --help lit tests don't auto-update. is that intentional? I see they are .test and not .wast so I assume there is a reason?

tlively · 2021-07-19T18:36:53Z

Right, we would need a separate update script to auto-update those tests because their output is not Wasm. Writing a script that maintains the current factoring of shared options into different files might get complicated, so I thought that maintaining the tests by hand would be fine for now.

tlively

Not nearly finished reviewing, but here are some early comments.

tlively · 2021-07-19T22:36:08Z

src/passes/GCLowering.cpp

+//   | ptr*   | List of types. Each is a pointer to the rtt.canon for the    |
+//   |        |   type. In an rtt.canon, this points to the object itself,   |
+//   |        |   that is, we will have ptr => [kind, 1, ptr]. An rtt.sub    |
+//   |        |   copies the list of the parent, and appends the new type at |


IIUC, rtt.canon does not contain any supertypes, so wouldn't this list be empty rather than point to itself?

You're right, I'll clarify the comment. RttSupers only contains the parents, but here we must contain the full list, as we have no extra field to store the new type like Literal has (the "type" field there). So the list here is never empty.

tlively · 2021-07-19T22:44:19Z

src/passes/GCLowering.cpp

+  Expression*
+  makeSimpleSignedLoad(Expression* ptr, Type type, Address offset = 0) {
+    auto size = type.getByteSize();
+    return makeLoad(size, true, offset, size, ptr, type);
+  }


It looks like this is never used. Could we remove it and rename makeSimpleUnsignedLoad to makeSimpleLoad? I don't think the sign matters anyway if the load is the full width of the resulting data.

Good point, done.

tlively · 2021-07-19T22:45:37Z

src/passes/GCLowering.cpp

+  }
+
+  // Make a constant for a pointer value. This handles wasm32/64 differences.
+  Expression* makePointerConst(Address addr) {


Should these be methods on the standard Builder class? I image that as we do more with 64-bit memories, this kind of thing will become more common.

Maybe. I'd suggest leaving them here for now, and adding a pointer-builder.h header eventually when we find the need to use them elsewhere.

tlively · 2021-07-19T22:52:25Z

src/passes/GCLowering.cpp

+  }
+
+  // Null-check a pointer.
+  Expression* makePointerNullCheck(Expression* a) {


Maybe makePointerIsNull to clarify that it will return true if the pointer is null?

Good point, it's ambiguous as it is. Done.

src/passes/GCLowering.cpp

tlively · 2021-07-19T23:09:37Z

src/passes/GCLowering.cpp

+
+    // Record the original types of things, which may be needed later.
+    if (type.isRef() || type.isRtt()) {
+      originalTypes[getCurrentPointer()] = type.getHeapType();


Is getCurrentPointer() the same as curr?

No, getCurrent() gets curr, effectively, while getCurrentPointer gets the pointer to curr.

src/passes/GCLowering.cpp

Co-authored-by: Thomas Lively <[email protected]>

bashor · 2022-10-28T13:24:40Z

@kripken could you please rebase it to the main branch?

kripken · 2022-10-28T17:24:10Z

@bashor That would be a large effort, I'm afraid, as the spec has changed a lot in the last 1.5 years. The spec is still changing, also, so I think it would be best to wait for it to fully stabilize to not do a lot of redundant work here.

Also, note that this does not actually perform GC - it puts data in linear memory but does not collect, nor does it have logic to scan roots. Those could be added but it is another large chunk of work. (Though, if we wait then wasm may add a feature for root scanning at least.)

Do you have an urgent need for this?

bashor · 2022-10-31T19:19:35Z

The spec is still changing, also, so I think it would be best to wait for it to fully stabilize to not do a lot of redundant work here.

It feels to me like it's already stable enough.

Also, note that this does not actually perform GC - it puts data in linear memory but does not collect, nor does it have logic to scan roots.

It's already something to start with :)

Do you have an urgent need for this?

Well, we consider ways to try Kotlin/Wasm outside of browsers and it seems like the simplest way for now.

mraleph · 2022-11-08T11:11:47Z

Just leaving a note here that we are similarly interested in this from Dart2Wasm side. It's not pressing, but might be an interesting fallback for uses outside of the browser.

kripken · 2022-11-08T16:46:04Z

Good to know this would be useful. From my side, I intend to get to it once the spec and binaryen's implementation of it is stable (to avoid wasted work), and once performance is in a better place (which is what I'm focused on now).

Note though that getting this to full production-ready state would require tracking locals on the stack and adding a mark-sweep implementation. But leaking memory (as the PR does now) should be enough to unblock experiments in this space.

dcodeIO · 2022-11-11T04:47:14Z

Perhaps as a data point for a potential minimal integration story, here's what I could imagine: When building with GC lowering, instead of compiling a fresh module, construct from an existing module containing the "runtime". The runtime provides the necessary integration points for the lowering pass:

An alloc(size, id) function that is called with
- an array's or struct's byte size
- a unique id of the heap type
A getid(ptr) function to obtain the unique id (given to alloc) of an object according to the runtime's memory layout of GC objects.
A link(parentPtr, childPtr) function that is called when a parent-child relationship is established (say: parent.foo = child, where both parent and child are GC-typed), serving as an integration point for a GC that utilizes a write barrier.
A visit(ptr) function for visiting (marking) a GC object.
A heap_base global (for the pass to amend). Original value is where the lowering pass puts static information it requires, before amending the global's value to point behind it, becoming the new heap_base.
A stack_size global or pass argument, for a shadow stack region. Could either place the stack at the start of linear memory by convention, in a second memory, or insert at heap_base and amend again.

The pass would then additionally generate:

Spilling of GC-typed locals (pointers) to the shadow stack.
A visitGlobals function calling the runtime's provided visit for each GC-typed global.
A visitStack function calling the runtime's provided visit for each live shadow stack item. Can perhaps be merged with visitGlobals to become visitRoots.
A visitObject function switching over every possible heap type (by unique id), that is aware of each structs GC-typed fields, respectively arrays element type, and their offsets relative to the struct's or array's address after lowing, calling the runtime's provided visit function for the object and each GC-typed field or element with their respective pointer.

With this in place, an incremental GC could step when, say, alloc is called. Allocations are under the control of the linked runtime, starting at heap_base. Marking starts with visitGlobals and visitStack and the runtime can traverse from there by (incrementally) calling visitObject on what it finds to be reachable. Sweeping can then free any object that was alloced but didn't get touched by visit. Pinning, if necessary, can be implemented on the runtime side.

From the top of my head while looking at GC we use, but perhaps that's already useful :)

kripken · 2022-11-11T16:50:21Z

Thanks for the feedback @dcodeIO !

It does seem like if we want to integrate with incremental GC and write barriers then we'd need a fairly comprehensive "runtime" layer like that. Maybe it makes sense to do.

I was hoping the runtime could be simpler, though, since my hope is that wasm GC would be the fast version while the lowered MVP version could be slower since it's a fallback while VMs work to implement wasm GC (which is hopefully not for long). Given that, I was hoping incremental GC and write barriers etc. would not be needed in the MVP version. The runtime would then include something like alloc/free for getting space for GC objects, but it would leave mark/sweep to LowerGC (which would not be super-efficient). But, those are just some general thoughts, I don't have any full design in mind.

tlively · 2022-11-11T17:06:53Z

If we had wasm-merge functionality, we could merge in arbitrary runtime modules provided by us or provided by users with more specific needs. cc @ashleynh

dcodeIO · 2022-11-12T04:25:44Z

Made such a runtime (variant with language-provided alloc/free) to get an initial idea. 3KB, no memory or table on its own, incremental-capable, the start function can probably also be refactored away at the cost of a branch. The MM is a variant of TLSF, the GC is tri-color mark and sweep. Haven't tested, though, might or might not be functional already (what's alloc above is __new here). Perhaps that helps to judge complexity :)

kripken · 2022-11-28T19:43:13Z

This came up in an offline discussion today. Thinking about speed, it seems that even a simple wasm VM implementation of GC could be much faster than this polyfill (due to things like scanning the stack, etc.). Given that, it seems like the polyfill would only help for cases where speed doesn't matter too much.

In the discussion I joked that we could compile a wasm GC VM to wasm to run GC on VMs without GC. But maybe that actually makes some sense? If we don't care about speed, and just want a way to run the code, then that could be easy and good enough. This could use spidermonkey.wasm or wasm3 or something else.

mraleph · 2022-11-28T22:00:40Z

. Thinking about speed, it seems that even a simple wasm VM implementation of GC could be much faster than this polyfill (due to things like scanning the stack, etc.).

I think predicting actual speed is hard here. GC also has to scan the stack and compiled Wasm+GC code similarly has to spill live values on the stack across calls, so it's unclear to me if the difference is going to be all that bad.

On the other hand lowered Wasm can probably skip some of the checks that Wasm+GC needs to do to satisfy the type system.

kripken · 2022-11-28T22:19:20Z

@mraleph Ah, good point that the lowering can be a little unsafe where it makes sense. That would work in the other way and make it potentially faster.

Apparently WebAssembly#4000 did the exact same thing with the exact same name.

kripken added 30 commits May 21, 2021 12:49

work

c2656e6

work

d9a3429

builds

cec42f6

work

594fcbb

works

ad77987

sketcj

b5125ed

builds

4445afc

work on test

5845c45

work

b286fcf

format

e079269

fix

d1e29c3

work

cc9ed99

multi

1379314

work

ea26767

work

db3761d

work

a07ccf8

start arrays

070d12c

work

83450a3

wip

fc4ea64

format

c62d5e8

undo

45f8abd

another

98a26b7

build

d01a58d

cleaner

4adc9b7

test

df99c94

test

32b5174

test

73dff88

crash

74bec38

Merge remote-tracking branch 'origin/main' into lowergc

8f3a743

make runtime funcs

cd35b3a

kripken added 3 commits July 16, 2021 16:53

test

2dc3039

bad

1cf4a42

better

12cbc24

kripken requested review from tlively and aheejin July 17, 2021 00:09

update lit test manually

081041c

tlively reviewed Jul 20, 2021

View reviewed changes

kripken and others added 7 commits July 20, 2021 15:47

comment

fa4a915

simplify simple loads

c25cfd1

rename

0d0703b

rename layouts

3d800c0

const pointer

43004eb

Apply suggestions from code review

11de31f

Co-authored-by: Thomas Lively <[email protected]>

Merge remote-tracking branch 'origin/lowergc' into lowergc

ed9fed7

kripken mentioned this pull request Apr 11, 2022

Support GC to non-GC conversion #4590

Open

CountBleck added a commit to CountBleck/binaryen that referenced this pull request Aug 10, 2023

Remove the FIXMEs regarding originalTypes

84a9617

Apparently WebAssembly#4000 did the exact same thing with the exact same name.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Wasm GC] Add a GC-Lowering pass which lowers GC to MVP #4000

[Wasm GC] Add a GC-Lowering pass which lowers GC to MVP #4000

kripken commented Jul 17, 2021

kripken commented Jul 19, 2021

tlively commented Jul 19, 2021

tlively left a comment

tlively Jul 19, 2021

kripken Jul 20, 2021

tlively Jul 19, 2021

kripken Jul 20, 2021

tlively Jul 19, 2021

kripken Jul 20, 2021

tlively Jul 19, 2021

kripken Jul 20, 2021

tlively Jul 19, 2021

kripken Jul 20, 2021

bashor commented Oct 28, 2022

kripken commented Oct 28, 2022

bashor commented Oct 31, 2022

mraleph commented Nov 8, 2022

kripken commented Nov 8, 2022

dcodeIO commented Nov 11, 2022

kripken commented Nov 11, 2022

tlively commented Nov 11, 2022

dcodeIO commented Nov 12, 2022 •

edited

Loading

kripken commented Nov 28, 2022 •

edited

Loading

mraleph commented Nov 28, 2022

kripken commented Nov 28, 2022

[Wasm GC] Add a GC-Lowering pass which lowers GC to MVP #4000

Are you sure you want to change the base?

[Wasm GC] Add a GC-Lowering pass which lowers GC to MVP #4000

Conversation

kripken commented Jul 17, 2021

kripken commented Jul 19, 2021

tlively commented Jul 19, 2021

tlively left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bashor commented Oct 28, 2022

kripken commented Oct 28, 2022

bashor commented Oct 31, 2022

mraleph commented Nov 8, 2022

kripken commented Nov 8, 2022

dcodeIO commented Nov 11, 2022

kripken commented Nov 11, 2022

tlively commented Nov 11, 2022

dcodeIO commented Nov 12, 2022 • edited Loading

kripken commented Nov 28, 2022 • edited Loading

mraleph commented Nov 28, 2022

kripken commented Nov 28, 2022

dcodeIO commented Nov 12, 2022 •

edited

Loading

kripken commented Nov 28, 2022 •

edited

Loading