GC/RT: Deferred Reference Counting? #1534

dcodeIO · 2020-11-03T18:36:29Z

With the changes introduced in #1503, in particular having one more header word to work with, the following deferred mechanism akin to autorelease pools may be possible to improve our ARC implementation:

Repurpose gcInfo2 (originally introduced for tracing) to maintain a per-function autorelease pool as a linked list
Let the list's tail be a sentinel value, e.g. -1, so it can be distinguished from null
When compiling a function, instead of refcounting locals (to emulate a stack), link objects being kept alive by the function to the function's autorelease pool / list. Typical entries are managed function arguments, foreign function return values and managed values loaded from memory.
If there's a reachable pool_track in a code path, insert a pool_create incl. the necessary additional local at the start of a function, and a pool_release when exiting the function.

// Prototypical implementation

import { OBJECT, TOTAL_OVERHEAD } from "rt/common";

/** Creates a new autorelease pool. */
@inline function pool_create(): usize {
  return -1;
}

/** Tracks an object in an autorelease pool. */
function pool_track(pool: usize, ptr: usize): usize {
  if (ptr) {
    var obj = changetype<OBJECT>(ptr - TOTAL_OVERHEAD);
    // Only track objects that are not yet part of a pool. If the object is part
    // of a pool already, it is either part of this pool, or of a parent pool,
    // which is guaranteed to be released later than this pool.
    if (!obj.gcInfo2) {
      obj.gcInfo2 = pool;
      pool = changetype<usize>(obj);
    }
  }
  return pool;
}

/** Releases an autorelease pool. */
function pool_release(pool: usize): void {
  while (pool != -1) {
    let obj = changetype<OBJECT>(pool);
    pool = obj.gcInfo2;
    obj.gcInfo2 = 0;
    decrement(changetype<usize>(obj));
  }
}

// Example (comments are compiler inserts)

class Obj {}

function foo(a: Obj): Obj {                             // `a` is likely tracked in an outer pool
  /**/var pool = pool_create();                         // create pool for the function
  /**/pool = pool_track(pool, changetype<usize>(a));    // track `a`, likely already in a pool
  var b = bar(a);
  /**/pool = pool_track(pool, changetype<usize>(b));    // track `b`, likely not yet in a pool
  /**/__retain(b);                                      // retain the return value
  /**/pool_release(pool);                               // release `a` (unlikely), `b` (likely)
  return b;
}

function bar(b: Obj): Obj {                             // `b` is likely tracked in an outer pool
  /**/var pool = pool_create();                         // create pool for the function
  /**/pool = pool_track(pool, changetype<usize>(b));    // track `b`, likely already in a pool
  /**/__retain(b);                                      // retain the return value
  /**/pool_release(pool);                               // release `b` (unlikely)
  return b;
}

Giving us:

No need to inject retain/release on locals, greatly simplifying compiler logic in that performing autoreleases is now a runtime instead of a compiler detail. Can then strip functionality like Constraints.WILL_RETAIN, LocalFlags.RETAINED, Compiler#performAutoreleases, finishAutoreleases, skippedAutoreleases, tryUndoAutorelease, delayAutorelease etc. from the compiler itself.
Reduces unnecessary write barriers on release, in that redundant retain/release pairs are implicitly deduplicated by means of adding objects to one pool max.
No additional allocations necessary to implement (i.e. no shadow stack)

I'm likely missing something, but seems promising, so pinning it here. @MaxGraey Thoughts?

The text was updated successfully, but these errors were encountered:

MaxGraey · 2020-11-03T19:15:03Z

That's great idea! Totally agree we could use Deferred RC. Also interesting Lazy RC approach based on pure RC: #89 (comment)

dcodeIO · 2020-11-19T06:01:23Z

Little update from my side: Teared the compiler apart meanwhile, removed the previous mechanism and added enough of the new one to do some basic tests. Turned out that more intrinsics are necessary to handle all possible cases, leading to

__retain / __release
__defer: Conditionally defers a release of an RC+1 value, e.g. received return value.
__forward: Conditionally ensures RC+1, e.g. sending return value with no pool.
__keepalive: Conditionally defers a release of an RC+0 value (about __defer(__forward(...))), e.g. accessing a global or field.
__commit / __commit_with_value

Currently contemplating, as this is becoming more complicated as I go and I can't quite imagine how documentation for all this stuff would not be half a book. Once more considering to sledgehammer our way out of this entirely with stop-the-world mark-sweep for the time being.

MaxGraey · 2020-11-19T13:12:02Z

Currently contemplating, as this is becoming more complicated as I go and I can't quite imagine how documentation for all this stuff would not be half a book

I don't think all that methods should be documented. As I understand only __retain / __release and probably __commit / __commit_with_value necessary for interop right? And rest will be using only internally in std

dcodeIO · 2020-11-19T23:17:02Z

So I followed the rabbit hole I dug in my last comment to see where that leads and sledgehammered reference counting out of existence in a new branch, and wow, did I like that.

Source code is recognizable in generated binaries again
Only RT exports to reason about are __new and __collect
Tracing GC is just ~270 LOC incl. comments

I then went a step further and burned the --runtime flag at the stake to make it one runtime (with a --noExportRuntime flag replacing none/half), and wow again. Less dramatically spoken, what fell out of that experiment feels so much more in line with my vision for AS as a "lean and mean" Wasm compiler thingy, that I am tempted to simply accept that __collect may only be called externally for the time being simply because the Wasm MVP doesn't support random access to the execution stack, with not calling __collect effectively yielding what stub was before but with a proper memory manager behind heap.alloc and friends.

Going to think about this for a bit since a radical change like this is certainly, well, spicy.

jtenner · 2020-11-20T01:27:32Z

Spicy, difficult to deal with, and exciting. My guess is pulling the band aid off now is easier rather than later. Just leave us all some time to test it out first so we can make migration instructions (and develop our testing frameworks more properly if that's okay!)

Change isn't always better. But if it make assemblyscript better in the long run, then we should accept the burden of migration sooner rather than later.

dcodeIO · 2020-11-23T16:30:31Z

So this just happened: Using the new tracing runtime, I managed to get the bootstrap test to work and decided to attach Rtrace to it expecting to make myself sad but this actually looks legit. Manually calling __collect at the end of the run yields

33554432b of memory, 415683 allocs, 2962 resizes, 2718 moves, 415642 frees

indicating that there are 41 objects still alive in globals somewhere, about in the ballpark of what I'd expect. As one can see, the compiler needs something in between 16 and 32mb of memory to perform the bootstrap test when collecting at the end.

The way this works now is that the loader itself implements __retain and __release (these are not exports anymore, so no call overhead), and usage of these is only necessary where objects remain live in JS beyond a call to __collect. Can be omitted in purely synchronous code. Now when __collect is called, an import env.mark is called that iterates over all the things currently retained and __marks them. Loader provides all that. The mechanism is about reference counting (pinning) at the boundary with tracing internally.

Also added an experimental __keepalive API that eliminates the need to call __retain and __release in engines supporting FinalizationRegistry (learned that's about everything but Safari). Quite bizarre implementation that returns a new Number(ptr) wrapper around the pointer, acting as a substitute to the pointer, with lifetime tracked by the JS engine. Once the JS engine collects that wrapper because it is not referenced any longer, the loader automatically __releases the external reference.

Open question remain how practical all this will be in the wild for long running or complex programs.

dcodeIO · 2020-11-24T08:38:13Z

Investigating the remaining 41 live objects, these are:

2 global Strings LIBRARY_PREFIX, INDEX_SUFFIX in src/common
1 global Uint8Array v128_zero (incl 1 ArrayBuffer) in src/util/vector
1 global Set declaredElements (incl 1 ArrayBuffer) in src/program, with 2 Elements
22 global Types Type.XY in src/types
1 global Map builtins (incl 2 ArrayBuffers) in src/builtins
1 global Map function_builtins (incl 2 ArrayBuffers) in src/builtins
1 global State reusableState in src/tokenizer
1 global Set typedElements (incl 1 ArrayBuffer) in src/program, with 2 Elements

So that's working as intended :)

stale · 2020-12-24T10:52:51Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

dcodeIO · 2021-01-29T00:08:57Z

Superseded by #1559

stale bot added the stale label Dec 24, 2020

dcodeIO closed this as completed Jan 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GC/RT: Deferred Reference Counting? #1534

GC/RT: Deferred Reference Counting? #1534

dcodeIO commented Nov 3, 2020

MaxGraey commented Nov 3, 2020

dcodeIO commented Nov 19, 2020

MaxGraey commented Nov 19, 2020

dcodeIO commented Nov 19, 2020

jtenner commented Nov 20, 2020

dcodeIO commented Nov 23, 2020

dcodeIO commented Nov 24, 2020

stale bot commented Dec 24, 2020

dcodeIO commented Jan 29, 2021

GC/RT: Deferred Reference Counting? #1534

GC/RT: Deferred Reference Counting? #1534

Comments

dcodeIO commented Nov 3, 2020

MaxGraey commented Nov 3, 2020

dcodeIO commented Nov 19, 2020

MaxGraey commented Nov 19, 2020

dcodeIO commented Nov 19, 2020

jtenner commented Nov 20, 2020

dcodeIO commented Nov 23, 2020

dcodeIO commented Nov 24, 2020

stale bot commented Dec 24, 2020

dcodeIO commented Jan 29, 2021