Skip to content

Generalize runtime #1503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Oct 22, 2020
Merged

Generalize runtime #1503

merged 14 commits into from
Oct 22, 2020

Conversation

dcodeIO
Copy link
Member

@dcodeIO dcodeIO commented Oct 15, 2020

Minor changes.

  • Both the MM and the GC now use butterfly representation, with header before and data beyond the pointer
  • MM and GC are now separate, with __alloc, __realloc and __free now essentially being malloc, realloc and free in C
  • MM header now is one word (4 bytes in Wasm32, now also supports Wasm64)
  • GC header now is 16 bytes in Wasm32 (now also supports Wasm64)
  • Total object header now is 20 bytes in Wasm32 (one additional word to be utilized by tracing GCs)
  • __alloc does not take an id argument anymore
  • __alloc for GC objects is now __new (with id argument)
  • __realloc for GC objects is now __renew
  • __realloc and __free are now exposed by the full/stub runtimes
  • __allocArray is now __newArray
  • __allocBuffer is now __newBuffer
  • __allocString is now __newString
  • This is still ARC as an intermediate step, no tracing yet
  • gc.auto has been removed due to unforeseen consequences with the execution stack
  • Adds a new heap namespace providing alloc, realloc, free and reset (stub only)
  • Probably more, need to check the diff :)
  • I've read the contributing guidelines

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 15, 2020

New MM header

// ╒════════════ Memory manager block layout (32-bit) ═════════════╕
//    3                   2                   1
//  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0  bits
// ├─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┤
// │                           MM info                             │ -4
// ╞>ptr═══════════════════════════════════════════════════════════╡
// │                              ...                              │

New GC (full object) header

// ╒══════════ Garbage collector object layout (32-bit) ═══════════╕
//    3                   2                   1
//  1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0  bits
// ├─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┤
// │                     Memory manager block                      │ -20
// ╞═══════════════════════════════════════════════════════════════╡
// │                            GC info                            │ -16
// ├───────────────────────────────────────────────────────────────┤
// │                            GC info                            │ -12
// ├───────────────────────────────────────────────────────────────┤
// │                            RT id                              │ -8
// ├───────────────────────────────────────────────────────────────┤
// │                            RT size                            │ -4
// ╞>ptr═══════════════════════════════════════════════════════════╡
// │                              ...                              │

@dcodeIO dcodeIO changed the title Generalize runtime to prepare of tracing/Wasm GC Generalize runtime to prepare for tracing/Wasm GC Oct 15, 2020
@jtenner
Copy link
Contributor

jtenner commented Oct 15, 2020

@willemneal can we check to see if this breaks aspect?

@jtenner
Copy link
Contributor

jtenner commented Oct 15, 2020

Also, have me a chuckle. "minor changes." This looks like a really big deal. Thanks for the work.

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 15, 2020

Speaking of breaking changes, this would be the beginning of multiple breaking changes in a row. This one breaks anything depending on memory layout or runtime APIs already (very likely also aspect), and replacing ARC with ITCM will remove __retain and __release (potentially changing other things around as well), again breaking anything depending on the runtime (very likely also aspect). As such it seems reasonable to wait with updating aspect until the breaking changes are done, or I may make this a branch for the meantime and do one large breaking merge to master?

@MaxGraey
Copy link
Member

MaxGraey commented Oct 15, 2020

Regarding breaking changes. Just wondering why __alloc / __realloc changed to __new / __renew?

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 15, 2020

These do different things now. For instance, __alloc obtains a memory manager block without a runtime header (just one word mm header), while __new uses __alloc under the hood and also adds the object header for GC. That essentially gives everyone malloc/realloc/free with just one word overhead, e.g. to write C-like code that is not GCed. One particular place where this is useful is in the WASI bindings for instance, which really only needs malloc.

@MaxGraey
Copy link
Member

MaxGraey commented Oct 15, 2020

I see. It's just question about naming. What about use gc.alloc, gc.realloc, gc.collect under "gc" namespace? It could be exported to host as __gc_alloc and __gc_realloc.

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 16, 2020

We can add wrappers (or forwarding builtins) for memory.alloc, memory.realloc, memory.free if people think that's useful. The unfortunate side effect of this is that someone might want to export these manually, and has to do export { memory } which will also trigger compilation of memcpy and friends. In the past, the __ functions turned out allow more granular control, but I agree that these don't look as nice.

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 16, 2020

Could also be a builtin module maybe:

// std/malloc.ts
export { __alloc as malloc, __realloc as realloc, __free as free };
// assembly/index.ts
import { malloc, realloc, free } from "malloc";

@MaxGraey
Copy link
Member

MaxGraey commented Oct 16, 2020

I like convensions with memory namespace as well. It btw could be shorter as "mm" I guess. So:

// gc stuffs
gc.alloc, gc.realloc, gc.collect

// memory stuffs
mm.alloc, mm.realloc, mm.free, mm.reset

// other memory utils
mm.fill, mm.copy, mm.repeat, mm.compare

// intrinsics (for compatibility with wasm)
memory.fill, memory.copy, memory.init etc

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 16, 2020

What if we'd just rename

  • __alloc -> malloc
  • __realloc -> realloc
  • __free -> free

and make these available globally, but mark them @unsafe? Guess that'd be quite convenient for everyone already knowing enough of C, and makes it more a feature than an implementation detail.

Other than that I think it's fine to hide the GC details (except gc.collect) because these will be going away eventually anyway.

@MaxGraey
Copy link
Member

MaxGraey commented Oct 16, 2020

Also good variant. In this case it will be:

// gc stuffs
gc.alloc, gc.realloc

// /memory stuffs
alloc, realloc, free, reset

// intrinsics
memory.fill, memory.copy, memory.init

another alternative use "heap" namespace:

heap.alloc, heap.realloc, heap.free, heap.reset

I think in any case better use "alloc" instead "malloc"

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 16, 2020

I like the heap namespace idea :)

@MaxGraey
Copy link
Member

also great advantage of "heap" namespace we could put into it __heap_base via heap.base const which usually no one knows about. But with namespace we have great autocompletion.

@jtenner
Copy link
Contributor

jtenner commented Oct 16, 2020

First, I really like these changes. Would it be possible to be tagged on these sorts of changes more pre-emptively?

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 17, 2020

Looking at this more, it becomes apparent that we'll need some sort of shadow stack after all, albeit only for managed objects. A tracing GC will run incrementally with the program and needs to know what's currently on the stack so it doesn't prematurely free what the current function, for instance calling gc.collect, still needs. Might look something like this:

function doSomethingManaged(): void {
  var S = __stacksave(8);
  S[-1] = __new(123);
  S[-2] = __new(234);
  gc.collect(); // must not free 123 and 234
  __stackrestore(S);
}

So far it looks like the compiler will be able to tell statically how much stack space to reserve, and only needs to insert __stacksave and __stackrestore if stack space has to be reserved (managed objects new'ed). In fact this might even be useful for ARC, in that it doesn't have to retain/release locals, hmm. And once we have that, there might also be an opportunity to provide stack.alloc for users to utilize. Also means that we'll have to reserve stack space in between the end of static data and the start of the heap ofc. 🤔

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 18, 2020

One way to model a shadow stack like this could be:

  • An allocation of a managed object takes a stack slot (one per allocation site)
  • A function return of a managed object takes a stack slot (one volatile return slot per function)
  • Assignments to locals take a stack slot (one per unique managed local)

Inside loops, the same stack slot is taken by the same allocation or function return within the loop, effectively keeping everything explicitly or implicitly kept alive by the function alive while the function is. Overhead is the memory for maintaining a stack, potentially with a well-predicted check that the stack isn't overflown, stores to the stack slots, and traversing the stack in addition to other roots when marking/sweeping. Overall doesn't look so bad. A typical function using managed objects might end up with say 10 stack slots for example, ~40 bytes stack space, and a solid maximum stack size might be one page (65536 bytes, ~1638 depth).

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 19, 2020

Alternative: Only do GC work when the Wasm execution stack is fully unwound, i.e. when returning from a directly called export.

// index.ts

var _depth = 0;
var _collect_called = false;

export function someExport(): void {
  ++_depth;
  // code of someExport
  if (!--_depth) {
    if (_collect_called) doFullGc();
    else doSomeGc();
  }
}

Does not require a shadow stack, but also isn't as granularly incremental anymore. Much easier to implement however, at hardly any runtime cost. @MaxGraey Wdyt?

@MaxGraey
Copy link
Member

MaxGraey commented Oct 19, 2020

It seems it will be most of time is fully collected (stop the world mode). Not sure we need incremental collector in this case at all. I think we should investigate our efforts to approximated liveness-assisted GC but it required finish our IR first

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 19, 2020

Nothing of this is optimal, I agree, and it seems that the best we can do is to make a forward decision, ideally with Wasm GC in mind, so we are not blocked on this forever. Would everyone be OK with me making that decision and taking the blame for it, so that we can otherwise go on?

@dcodeIO
Copy link
Member Author

dcodeIO commented Oct 22, 2020

Going to merge this patch even without tracing along #1513, combining multiple breaking changes into one version, likely 0.17.

@dcodeIO dcodeIO changed the title Generalize runtime to prepare for tracing/Wasm GC Generalize runtime Oct 22, 2020
@dcodeIO dcodeIO merged commit 8c97612 into master Oct 22, 2020
@jtenner
Copy link
Contributor

jtenner commented Oct 22, 2020

Okay. It looks like this will be affecting as-pect user installs. Will have to work on this soon. Thanks for the changes!

@dcodeIO dcodeIO deleted the tracing-prep branch June 1, 2021 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants