Skip to content

GC/RT: Deferred Reference Counting? #1534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dcodeIO opened this issue Nov 3, 2020 · 9 comments
Closed

GC/RT: Deferred Reference Counting? #1534

dcodeIO opened this issue Nov 3, 2020 · 9 comments
Labels

Comments

@dcodeIO
Copy link
Member

dcodeIO commented Nov 3, 2020

With the changes introduced in #1503, in particular having one more header word to work with, the following deferred mechanism akin to autorelease pools may be possible to improve our ARC implementation:

  • Repurpose gcInfo2 (originally introduced for tracing) to maintain a per-function autorelease pool as a linked list
  • Let the list's tail be a sentinel value, e.g. -1, so it can be distinguished from null
  • When compiling a function, instead of refcounting locals (to emulate a stack), link objects being kept alive by the function to the function's autorelease pool / list. Typical entries are managed function arguments, foreign function return values and managed values loaded from memory.
  • If there's a reachable pool_track in a code path, insert a pool_create incl. the necessary additional local at the start of a function, and a pool_release when exiting the function.
// Prototypical implementation

import { OBJECT, TOTAL_OVERHEAD } from "rt/common";

/** Creates a new autorelease pool. */
@inline function pool_create(): usize {
  return -1;
}

/** Tracks an object in an autorelease pool. */
function pool_track(pool: usize, ptr: usize): usize {
  if (ptr) {
    var obj = changetype<OBJECT>(ptr - TOTAL_OVERHEAD);
    // Only track objects that are not yet part of a pool. If the object is part
    // of a pool already, it is either part of this pool, or of a parent pool,
    // which is guaranteed to be released later than this pool.
    if (!obj.gcInfo2) {
      obj.gcInfo2 = pool;
      pool = changetype<usize>(obj);
    }
  }
  return pool;
}

/** Releases an autorelease pool. */
function pool_release(pool: usize): void {
  while (pool != -1) {
    let obj = changetype<OBJECT>(pool);
    pool = obj.gcInfo2;
    obj.gcInfo2 = 0;
    decrement(changetype<usize>(obj));
  }
}
// Example (comments are compiler inserts)

class Obj {}

function foo(a: Obj): Obj {                             // `a` is likely tracked in an outer pool
  /**/var pool = pool_create();                         // create pool for the function
  /**/pool = pool_track(pool, changetype<usize>(a));    // track `a`, likely already in a pool
  var b = bar(a);
  /**/pool = pool_track(pool, changetype<usize>(b));    // track `b`, likely not yet in a pool
  /**/__retain(b);                                      // retain the return value
  /**/pool_release(pool);                               // release `a` (unlikely), `b` (likely)
  return b;
}

function bar(b: Obj): Obj {                             // `b` is likely tracked in an outer pool
  /**/var pool = pool_create();                         // create pool for the function
  /**/pool = pool_track(pool, changetype<usize>(b));    // track `b`, likely already in a pool
  /**/__retain(b);                                      // retain the return value
  /**/pool_release(pool);                               // release `b` (unlikely)
  return b;
}

Giving us:

  • No need to inject retain/release on locals, greatly simplifying compiler logic in that performing autoreleases is now a runtime instead of a compiler detail. Can then strip functionality like Constraints.WILL_RETAIN, LocalFlags.RETAINED, Compiler#performAutoreleases, finishAutoreleases, skippedAutoreleases, tryUndoAutorelease, delayAutorelease etc. from the compiler itself.
  • Reduces unnecessary write barriers on release, in that redundant retain/release pairs are implicitly deduplicated by means of adding objects to one pool max.
  • No additional allocations necessary to implement (i.e. no shadow stack)

I'm likely missing something, but seems promising, so pinning it here. @MaxGraey Thoughts?

@MaxGraey
Copy link
Member

MaxGraey commented Nov 3, 2020

That's great idea! Totally agree we could use Deferred RC. Also interesting Lazy RC approach based on pure RC: #89 (comment)

@dcodeIO
Copy link
Member Author

dcodeIO commented Nov 19, 2020

Little update from my side: Teared the compiler apart meanwhile, removed the previous mechanism and added enough of the new one to do some basic tests. Turned out that more intrinsics are necessary to handle all possible cases, leading to

  • __retain / __release
  • __defer: Conditionally defers a release of an RC+1 value, e.g. received return value.
  • __forward: Conditionally ensures RC+1, e.g. sending return value with no pool.
  • __keepalive: Conditionally defers a release of an RC+0 value (about __defer(__forward(...))), e.g. accessing a global or field.
  • __commit / __commit_with_value

Currently contemplating, as this is becoming more complicated as I go and I can't quite imagine how documentation for all this stuff would not be half a book. Once more considering to sledgehammer our way out of this entirely with stop-the-world mark-sweep for the time being.

@MaxGraey
Copy link
Member

Currently contemplating, as this is becoming more complicated as I go and I can't quite imagine how documentation for all this stuff would not be half a book

I don't think all that methods should be documented. As I understand only __retain / __release and probably __commit / __commit_with_value necessary for interop right? And rest will be using only internally in std

@dcodeIO
Copy link
Member Author

dcodeIO commented Nov 19, 2020

So I followed the rabbit hole I dug in my last comment to see where that leads and sledgehammered reference counting out of existence in a new branch, and wow, did I like that.

  • Source code is recognizable in generated binaries again
  • Only RT exports to reason about are __new and __collect
  • Tracing GC is just ~270 LOC incl. comments

I then went a step further and burned the --runtime flag at the stake to make it one runtime (with a --noExportRuntime flag replacing none/half), and wow again. Less dramatically spoken, what fell out of that experiment feels so much more in line with my vision for AS as a "lean and mean" Wasm compiler thingy, that I am tempted to simply accept that __collect may only be called externally for the time being simply because the Wasm MVP doesn't support random access to the execution stack, with not calling __collect effectively yielding what stub was before but with a proper memory manager behind heap.alloc and friends.

Going to think about this for a bit since a radical change like this is certainly, well, spicy.

@jtenner
Copy link
Contributor

jtenner commented Nov 20, 2020

Spicy, difficult to deal with, and exciting. My guess is pulling the band aid off now is easier rather than later. Just leave us all some time to test it out first so we can make migration instructions (and develop our testing frameworks more properly if that's okay!)

Change isn't always better. But if it make assemblyscript better in the long run, then we should accept the burden of migration sooner rather than later.

@dcodeIO
Copy link
Member Author

dcodeIO commented Nov 23, 2020

So this just happened: Using the new tracing runtime, I managed to get the bootstrap test to work and decided to attach Rtrace to it expecting to make myself sad but this actually looks legit. Manually calling __collect at the end of the run yields

33554432b of memory, 415683 allocs, 2962 resizes, 2718 moves, 415642 frees

indicating that there are 41 objects still alive in globals somewhere, about in the ballpark of what I'd expect. As one can see, the compiler needs something in between 16 and 32mb of memory to perform the bootstrap test when collecting at the end.

The way this works now is that the loader itself implements __retain and __release (these are not exports anymore, so no call overhead), and usage of these is only necessary where objects remain live in JS beyond a call to __collect. Can be omitted in purely synchronous code. Now when __collect is called, an import env.mark is called that iterates over all the things currently retained and __marks them. Loader provides all that. The mechanism is about reference counting (pinning) at the boundary with tracing internally.

Also added an experimental __keepalive API that eliminates the need to call __retain and __release in engines supporting FinalizationRegistry (learned that's about everything but Safari). Quite bizarre implementation that returns a new Number(ptr) wrapper around the pointer, acting as a substitute to the pointer, with lifetime tracked by the JS engine. Once the JS engine collects that wrapper because it is not referenced any longer, the loader automatically __releases the external reference.

Open question remain how practical all this will be in the wild for long running or complex programs.

@dcodeIO
Copy link
Member Author

dcodeIO commented Nov 24, 2020

Investigating the remaining 41 live objects, these are:

  • 2 global Strings LIBRARY_PREFIX, INDEX_SUFFIX in src/common
  • 1 global Uint8Array v128_zero (incl 1 ArrayBuffer) in src/util/vector
  • 1 global Set declaredElements (incl 1 ArrayBuffer) in src/program, with 2 Elements
  • 22 global Types Type.XY in src/types
  • 1 global Map builtins (incl 2 ArrayBuffers) in src/builtins
  • 1 global Map function_builtins (incl 2 ArrayBuffers) in src/builtins
  • 1 global State reusableState in src/tokenizer
  • 1 global Set typedElements (incl 1 ArrayBuffer) in src/program, with 2 Elements

So that's working as intended :)

@stale
Copy link

stale bot commented Dec 24, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Dec 24, 2020
@dcodeIO
Copy link
Member Author

dcodeIO commented Jan 29, 2021

Superseded by #1559

@dcodeIO dcodeIO closed this as completed Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants