-
Notifications
You must be signed in to change notification settings - Fork 786
[Wasm GC] Add a GC-Lowering pass which lowers GC to MVP #4000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@tlively looks like the |
Right, we would need a separate update script to auto-update those tests because their output is not Wasm. Writing a script that maintains the current factoring of shared options into different files might get complicated, so I thought that maintaining the tests by hand would be fine for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not nearly finished reviewing, but here are some early comments.
// | ptr* | List of types. Each is a pointer to the rtt.canon for the | | ||
// | | type. In an rtt.canon, this points to the object itself, | | ||
// | | that is, we will have ptr => [kind, 1, ptr]. An rtt.sub | | ||
// | | copies the list of the parent, and appends the new type at | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, rtt.canon does not contain any supertypes, so wouldn't this list be empty rather than point to itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I'll clarify the comment. RttSupers only contains the parents, but here we must contain the full list, as we have no extra field to store the new type like Literal has (the "type" field there). So the list here is never empty.
src/passes/GCLowering.cpp
Outdated
Expression* | ||
makeSimpleSignedLoad(Expression* ptr, Type type, Address offset = 0) { | ||
auto size = type.getByteSize(); | ||
return makeLoad(size, true, offset, size, ptr, type); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is never used. Could we remove it and rename makeSimpleUnsignedLoad
to makeSimpleLoad
? I don't think the sign matters anyway if the load is the full width of the resulting data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done.
} | ||
|
||
// Make a constant for a pointer value. This handles wasm32/64 differences. | ||
Expression* makePointerConst(Address addr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be methods on the standard Builder
class? I image that as we do more with 64-bit memories, this kind of thing will become more common.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. I'd suggest leaving them here for now, and adding a pointer-builder.h
header eventually when we find the need to use them elsewhere.
src/passes/GCLowering.cpp
Outdated
} | ||
|
||
// Null-check a pointer. | ||
Expression* makePointerNullCheck(Expression* a) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe makePointerIsNull
to clarify that it will return true if the pointer is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, it's ambiguous as it is. Done.
|
||
// Record the original types of things, which may be needed later. | ||
if (type.isRef() || type.isRtt()) { | ||
originalTypes[getCurrentPointer()] = type.getHeapType(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is getCurrentPointer()
the same as curr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, getCurrent()
gets curr
, effectively, while getCurrentPointer
gets the pointer to curr
.
@kripken could you please rebase it to the main branch? |
@bashor That would be a large effort, I'm afraid, as the spec has changed a lot in the last 1.5 years. The spec is still changing, also, so I think it would be best to wait for it to fully stabilize to not do a lot of redundant work here. Also, note that this does not actually perform GC - it puts data in linear memory but does not collect, nor does it have logic to scan roots. Those could be added but it is another large chunk of work. (Though, if we wait then wasm may add a feature for root scanning at least.) Do you have an urgent need for this? |
It feels to me like it's already stable enough.
It's already something to start with :)
Well, we consider ways to try Kotlin/Wasm outside of browsers and it seems like the simplest way for now. |
Just leaving a note here that we are similarly interested in this from Dart2Wasm side. It's not pressing, but might be an interesting fallback for uses outside of the browser. |
Good to know this would be useful. From my side, I intend to get to it once the spec and binaryen's implementation of it is stable (to avoid wasted work), and once performance is in a better place (which is what I'm focused on now). Note though that getting this to full production-ready state would require tracking locals on the stack and adding a mark-sweep implementation. But leaking memory (as the PR does now) should be enough to unblock experiments in this space. |
Perhaps as a data point for a potential minimal integration story, here's what I could imagine: When building with GC lowering, instead of compiling a fresh module, construct from an existing module containing the "runtime". The runtime provides the necessary integration points for the lowering pass:
The pass would then additionally generate:
With this in place, an incremental GC could step when, say, From the top of my head while looking at GC we use, but perhaps that's already useful :) |
Thanks for the feedback @dcodeIO ! It does seem like if we want to integrate with incremental GC and write barriers then we'd need a fairly comprehensive "runtime" layer like that. Maybe it makes sense to do. I was hoping the runtime could be simpler, though, since my hope is that wasm GC would be the fast version while the lowered MVP version could be slower since it's a fallback while VMs work to implement wasm GC (which is hopefully not for long). Given that, I was hoping incremental GC and write barriers etc. would not be needed in the MVP version. The runtime would then include something like |
If we had |
Made such a runtime (variant with language-provided alloc/free) to get an initial idea. 3KB, no memory or table on its own, incremental-capable, the start function can probably also be refactored away at the cost of a branch. The MM is a variant of TLSF, the GC is tri-color mark and sweep. Haven't tested, though, might or might not be functional already (what's |
This came up in an offline discussion today. Thinking about speed, it seems that even a simple wasm VM implementation of GC could be much faster than this polyfill (due to things like scanning the stack, etc.). Given that, it seems like the polyfill would only help for cases where speed doesn't matter too much. In the discussion I joked that we could compile a wasm GC VM to wasm to run GC on VMs without GC. But maybe that actually makes some sense? If we don't care about speed, and just want a way to run the code, then that could be easy and good enough. This could use spidermonkey.wasm or wasm3 or something else. |
I think predicting actual speed is hard here. GC also has to scan the stack and compiled Wasm+GC code similarly has to spill live values on the stack across calls, so it's unclear to me if the difference is going to be all that bad. On the other hand lowered Wasm can probably skip some of the checks that Wasm+GC needs to do to satisfy the type system. |
@mraleph Ah, good point that the lowering can be a little unsafe where it makes sense. That would work in the other way and make it potentially faster. |
Apparently WebAssembly#4000 did the exact same thing with the exact same name.
This converts e.g. a
struct.set
to an appropriate write to linear memory, etc.The hard part is implementing things like RTT semantics with casts and
subtyping etc., which requires a runtime to be added there.
The layout of things in linear memory is pretty simple, and outlined at the
beginning of the pass code in a comment.
This is useful for performance evaluations of wasm GC (as it allows the wasm
to be compiled through LLVM, for example) as well as functioning as a polyfill
for wasm GC. The latter almost allows a language today to use wasm GC and to
compile down to MVP wasm for now until VMs implement it, except for the
collector not doing actual collection yet (that could be added later if there is
interest).
Tested on the existing wasm GC benchmarks from Dart. Not fuzzed yet.
Apologies for the size of this PR, but it's the minimal amount of code that
is actually usable and testable...