Skip to content

[generator] Use GC.KeepAlive for reference type method parameters. #725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 24, 2020

Conversation

jpobst
Copy link
Contributor

@jpobst jpobst commented Sep 22, 2020

Fixes: #719

Under certain circumstances, it is possible for a method parameter to get GC collected before native code is finished using it. Add an appropriate GC.KeepAlive (param) to ensure the object stays alive until we are done with it. As an optimization, this is only needed for reference types.

Context: #719

@brendanzagaeski has been investigating a Xamarin.Android app crash:

 JNI DETECTED ERROR IN APPLICATION: use of deleted global reference 0x3d86
     from android.view.View crc64720bb2db43a66fe9.FragmentContainer.n_onCreateView(android.view.LayoutInflater, android.view.ViewGroup, android.os.Bundle)
 …
   at crc64720bb2db43a66fe9.FragmentContainer.n_onCreateView(Native method)
   at crc64720bb2db43a66fe9.FragmentContainer.onCreateView(FragmentContainer.java:41)

This had been a head-scratcher, but we had a GREF log, so all should
be clear, right?

 09-10 17:56:48.280 10123 10123 I monodroid-gref: +g+ grefc 1141 gwrefc 0 obj-handle 0x79/I -> new-handle 0x3d86/G from thread '(null)'(1)
 09-10 17:56:48.294 10123 10123 I monodroid-gref: +w+ grefc 1140 gwrefc 2 obj-handle 0x3d86/G -> new-handle 0x1e3/W from thread 'finalizer'(10123)
 09-10 17:56:48.294 10123 10123 I monodroid-gref: -g- grefc 1139 gwrefc 2 handle 0x3d86/G from thread 'finalizer'(10123)

The GREF log *wasn't* immediately clear: sure, the GREF was turned
into a Weak-GREF, and the Weak-GREF was then collected, but none of
this explained *how* were were using this deleted GREF.

(We were at this state of affairs for months: we "know" we're using
a deleted GREF, but we don't know *how* or *why*.  It was a very hard
to hit bug.)

Eventually we had a ["that's funny"][0] event: *sure*, the GREF is
deleted, but:

 1. Why is it being deleted by the finalizer?
 2. …14ms *after construction*?

A [Garbage Collection][1] refresher may be in order, but in short:

 1. All `Java.Lang.Object` subclasses are "bridged" objects.

 2. During a GC, all "collectable" bridged objects are gathered.
    A collectable object is one in which nothing in the managed
    GC references the object.

 3. Once the GC is complete, all gathered collectable bridged objects
    are passed to a "cross references" callback.

    The callback is called *outside* the "scope" of a GC; "the world"
    is *not* frozen, other threads may be executing.

 4. The "cross references" callback is the
    `MonoGCBridgeCallbacks::cross_references` field provided provided
    to [`mono_gc_register_bridge_callbacks()`][2].

    In a Xamarin.Android app, the "cross references" callback will
    "toggle" a JNI Global Reference to a JNI Weak Global Reference,
    invoke a Java-side GC, and then try to obtain a
    JNI Global Reference from the JNI Weak Global Reference.
    If a non-`NULL` pointer is returned, the bridged object is kept
    alive.  If a `NULL` pointer is returned, the bridged object will
    be considered dead, and will be added to the Finalization Queue
    (as `Java.Lang.Object` has a finalizer).

Thus, it seemed "funny" that within 14ms an instance was created,
GC'd, and determined to be garbage, *especially* when we *knew* that
this instance was being passed to Java, which we expected to retain
the instance.  (Yet wasn't…?)

After much banging of heads, and the yeoman's work of creating a
simplified and consistently reproducible test case, we *think* we
know the cause of the crash.

Consider our normal Managed-to-Java marshaling code, e.g.

 partial class MaterialButton  {
     public unsafe MaterialButton (global::Android.Content.Context context)
      : base (IntPtr.Zero, JniHandleOwnership.DoNotTransfer)
     {
         const string __id = "(Landroid/content/Context;)V";

         if (((global::Java.Lang.Object) this).Handle != IntPtr.Zero)
             return;

         try {
             JniArgumentValue* __args = stackalloc JniArgumentValue [1];
 /* 1 */     __args [0] = new JniArgumentValue ((context == null) ? IntPtr.Zero : ((global::Java.Lang.Object) context).Handle);
 /* 2 */     var __r = _members.InstanceMethods.StartCreateInstance (__id, ((object) this).GetType (), __args);
             SetHandle (__r.Handle, JniHandleOwnership.TransferLocalRef);
 /* 3 */     _members.InstanceMethods.FinishCreateInstance (__id, this, __args);
         } finally {
         }
     }
 }

At (1), `context.Handle` is -- a JNI GREF -- is stored into a
`JniArgumentValue* __args` value, and at (3) `__args` is passed into
JNI, which will presumably "Do Something" with that handle.

However, nothing ties `context.Handle` to `context`, so from
(2) onward, `context` *may* be eligible for garbage collection.

See also Chris Brumme's [Lifetime, GC.KeepAlive, handle recycling][3]
blog article.  It's about .NET Framework, but the same fundamental
multithreading concepts apply.

The fix is to *ensure* that `context` is kept alive
*for as long as* `context.Handle` will be used, i.e. across the
JNI `_members.InstanceMethods.FinishCreateInstance()` call:

 partial class MaterialButton  {
     public unsafe MaterialButton (global::Android.Content.Context context)
      : base (IntPtr.Zero, JniHandleOwnership.DoNotTransfer)
     {
         const string __id = "(Landroid/content/Context;)V";

         if (((global::Java.Lang.Object) this).Handle != IntPtr.Zero)
             return;

         try {
             JniArgumentValue* __args = stackalloc JniArgumentValue [1];
 /* 1 */     __args [0] = new JniArgumentValue ((context == null) ? IntPtr.Zero : ((global::Java.Lang.Object) context).Handle);
 /* 2 */     var __r = _members.InstanceMethods.StartCreateInstance (__id, ((object) this).GetType (), __args);
             SetHandle (__r.Handle, JniHandleOwnership.TransferLocalRef);
 /* 3 */     _members.InstanceMethods.FinishCreateInstance (__id, this, __args);
         } finally {
 /* 4 */     global::System.GC.KeepAlive (context);
         }
     }
 }

This should prevent e.g. `context` from being prematurely GC'd,
which in turn should prevent the `JNI DETECTED ERROR` message.

Update `tools/generator` to emit `GC.KeepAlive()` statements for
every parameter type which isn't a value type (`enum`, `int`, `string`,
etc.).  `string` is considered a value type because we always send a
"deep copy" of the string contents, so it won't matter if the `string`
instance is GC'd immediately.

[0]: https://quoteinvestigator.com/2015/03/02/eureka-funny/
[1]: https://docs.microsoft.com/xamarin/android/internals/garbage-collection
[2]: http://docs.go-mono.com/?link=xhtml%3adeploy%2fmono-api-gc.html
[3]: https://docs.microsoft.com/archive/blogs/cbrumme/lifetime-gc-keepalive-handle-recycling
@jpobst jpobst marked this pull request as ready for review September 22, 2020 14:36
@jonpryor jonpryor merged commit 1f21f38 into master Sep 24, 2020
@jonpryor jonpryor deleted the master-keepalives branch September 24, 2020 23:54
@jpobst jpobst added this to the 11.1 (16.9 / 8.9) milestone Oct 13, 2020
@github-actions github-actions bot locked and limited conversation to collaborators Apr 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use GC.KeepAlive() to prevent early collection of method parameters
2 participants