Skip to content

Collaborative Garbage Collection #1428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RossTate opened this issue Aug 16, 2021 · 2 comments
Open

Collaborative Garbage Collection #1428

RossTate opened this issue Aug 16, 2021 · 2 comments

Comments

@RossTate
Copy link

I'm finally following up on my presentation to the CG on April 13th. Sorry for the delay!

Technology

As a reminder, I had developed an algorithm that enables independent garbage collectors to have explicit "cross-references" into each others' heaps and yet, using just some seemingly lightweight coordination with a central cross-reference manager, be able to identify which (cross-)references are no longer reachable (even from other heaps). As concrete examples, these independent garbage collectors could be different GCs implemented in different linear memories and/or the host garbage collector (which the GC proposal directly integrates with). Furthermore, while collecting references is the intended application, the algorithm more generally is about resources and as such could be used to collect resources across independent systems, such as in the component model.

I won't reiterate the algorithm here, though an old write-up can be found here, and I'm happy to field questions. The slides explaining the algorithm can be found here in (pdf) or (pptx).

The only things the algorithm imposes on the independent garbage collectors is

  • crossrefs must be registered with a central (e.g. host-provided) cross-reference manager
  • incoming cross-references are treated as roots, but ones that might be "grey" (rather than "black")—the cross-reference manager provides this color information (as well as indicates which incoming cross-references are no longer needed, i.e. "white")
  • each GC must propagate the additional "grey" color and inform the cross-reference manager which outgoing cross-references are reachable from "black" roots (either "black" incoming cross-references or local roots) or just "grey" roots (just "grey" incoming cross-references) or are simply no longer needed (i.e. "white")

Importantly, the collaborating GCs can each be implemented completely differently, the cross-reference manager needs no knowledge of their internals, and they can all be running at different times/frequencies and in parallel or alternatingly.

The presentation did not offer a suggestion for how specifically to integrate the technology into either the WebAssembly ecosystem or spec.

Discussion

Given that, here are the discussion points that were raised during the meeting (and of course people are free to raise more here):

Granularity

The intent is that this is for very coarse-grained cross-reference tracking. That is, cross-references would be created/registered at foreign-function-call boundaries. At that point, a language implementation would likely make an internal proxy for the crossref, where the proxy has the typical structure expected by the implementation's runtime. When that runtime's garbage collector runs, it would color the internal references (including the proxies), and afterwards it would use the color of the proxies to update the cross-reference manager about the color of the crossrefs.

Firefox Compartments

@lukewagner mentioned that something like this was used in Firefox for compartments. Unfortunately the only link I could find on this was dead, so I'd be interested to hear more.

Integrating with core WebAssembly

For coordinating linear-memory GCs in separate modules, or possibly even for collecting resources across components, this algorithm can probably just be implemented as a library. But for coordinating linear-memory GCs with host-managed GC, one might need to extend core wasm. One way to do so might be something like "smart" tables, where the table is augmented with color information that the wasm module updates according to the algorithm. When the host (also acting as the cross-reference manager) determines an entry in the table is no longer "reachable", it could replace the entry with null. This could let linear-memory systems store externref values and the like in a way that still permits cycles between the linear-memory program and the (JS) host to be detected and cleaned up.

Multithreading

One potential application of this would be in multithreaded garbage collection, wherein each thread (or WebWorker) would have its own heap, and all references to objects in the heaps of other threads would be proxied through cross-references. (Though this is only one small part of the much larger topic of multithreading and garbage collection.)

Implementing Linear-Memory GC in Wasm

For using crossrefs for linear-memory GC (rather than, say, for Interface-Types resources), the question was asked about whether linear-memory GC could be implemented efficiently in WebAssembly. My understanding is that the key technology that is missing for this is some way to efficiently determine the linear-memory GC-roots currently on the stack, say by using some form of stack walking/inspection (e.g. #1356).

Integration with GC proposal

The GC proposal could be extended with a notion of cross-references (e.g. a type with corresponding instructions) that would let the host-managed garbage collector know how to propagate its color information to the cross-reference manager. (As mentioned before, in addition to integrating with linear-memory GCs, this could be used for cross-thread garbage collection—a cross-reference field of a structure would inform the single-threaded garbage collector where to propagate reachability to the cross-threaded garbage collector.) Similarly, the "smart" tables mentioned above could have GC-ref entries. Thus both proposals could integrate with and complement each other, or this proposal could be used to integrate GC with things like Interface-Types resources.

Next Steps

I do not have particularly strong feelings on how to employ this technology. For anyone that has an application in mind, I am happy to work with them to explore that direction. Feel free to use this thread as a space to test which such directions are viable and worth exploring.

@fitzgen
Copy link
Contributor

fitzgen commented Aug 17, 2021

Firefox Compartments

@lukewagner mentioned that something like this was used in Firefox for compartments. Unfortunately the only link I could find on this was dead, so I'd be interested to hear more.

It's been a while since I looked at this code, and I think a bit of it has changed since the introduction of JS::Realm, but here are a couple relevant links:

All references between compartments go through special proxy objects called cross-compartment wrappers. Compartments can be GC'd independently of each other, and all objects pointed to by a wrapper that is not in the set of compartments currently being GC'd are considered GC roots. These reverse edges are stored in the ObjectWrapperMap linked above.

I don't remember how/when the wrappers are GC'd globally. It may simply require GC'ing all compartments at the same time.

@titzer
Copy link

titzer commented Aug 18, 2021

Re: stack-walking API, I think it is far simpler, in terms of total system complexity, for applications to just use a shadow stack in memory. Wasm engines do not currently support deoptimization in top tiers, which would be necessary to support inspection and update of (reference) locals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants