Skip to content

[generator] Use GC.KeepAlive for reference type method parameters. #722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Sep 21, 2020

Conversation

jpobst
Copy link
Contributor

@jpobst jpobst commented Sep 18, 2020

NOTE: This is a PR against d16-8. This will have to rewritten for master due to other changes in this code.

Context: #719

Under certain circumstances, it is possible for a method parameter to get GC collected before native code is finished using it. Add an appropriate GC.Collect (param) to ensure the object stays alive until we are done with it. As an optimization, this is only needed for reference types.

@@ -162,6 +162,7 @@ public partial class MyClass {
_members.InstanceMethods.InvokeVirtualVoidMethod (__id, this, __args);
} finally {
JNIEnv.DeleteLocalRef (native_value);
global::System.GC.KeepAlive (value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not need to be addressed for d16-8: We don't need a GC.KeepAlive() for types which will be "deep marshaled". This includes System.String and array types. For master, would we be able to skip this?

@@ -260,6 +261,7 @@ public partial class MyClass {
return __rm;
} finally {
JNIEnv.DeleteLocalRef (native_key);
global::System.GC.KeepAlive (key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not need to be addressed for d16-8: We don't need a GC.KeepAlive() for types which will be "deep marshaled". This includes System.String and array types. For master, would we be able to skip this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should be able to do that.

For the array types, would we never need the KeepAlive, or does it depend on the array element type?

ie: int[] versus object[]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will depend on the array element type, as if we have a Java.Lang.Object[], then everything within the array needs to be kept alive, and GC.KeepAlive(array) will ensure that everything within the array is kept alive…

So maybe all array types should retain the GC.KeepAlive()

@@ -103,6 +103,7 @@ public partial class MyClass {
__args [0] = new JniArgumentValue (value);
_members.InstanceMethods.InvokeAbstractVoidMethod (__id, this, __args);
} finally {
global::System.GC.KeepAlive (value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byte is a value type. Can we remove this GC.KeepAlive()?

@jonpryor
Copy link
Member

jonpryor commented Sep 18, 2020

Commit message:

Context: https://github.com/xamarin/java.interop/issues/719

@brendanzagaeski has been investigating a Xamarin.Android app crash:

	JNI DETECTED ERROR IN APPLICATION: use of deleted global reference 0x3d86
	    from android.view.View crc64720bb2db43a66fe9.FragmentContainer.n_onCreateView(android.view.LayoutInflater, android.view.ViewGroup, android.os.Bundle)
	
	  at crc64720bb2db43a66fe9.FragmentContainer.n_onCreateView(Native method)
	  at crc64720bb2db43a66fe9.FragmentContainer.onCreateView(FragmentContainer.java:41)

This had been a head-scratcher, but we had a GREF log, so all should
be clear, right?

	09-10 17:56:48.280 10123 10123 I monodroid-gref: +g+ grefc 1141 gwrefc 0 obj-handle 0x79/I -> new-handle 0x3d86/G from thread '(null)'(1)
	09-10 17:56:48.294 10123 10123 I monodroid-gref: +w+ grefc 1140 gwrefc 2 obj-handle 0x3d86/G -> new-handle 0x1e3/W from thread 'finalizer'(10123)
	09-10 17:56:48.294 10123 10123 I monodroid-gref: -g- grefc 1139 gwrefc 2 handle 0x3d86/G from thread 'finalizer'(10123)

The GREF log *wasn't* immediately clear: sure, the GREF was turned
into a Weak-GREF, and the Weak-GREF was then collected, but none of
this explained *how* were were using this deleted GREF.

(We were at this state of affairs for months: we "know" we're using
a deleted GREF, but we don't know *how* or *why*.  It was a very hard
to hit bug.)

Eventually we had a ["that's funny"][0] event: *sure*, the GREF is
deleted, but:

 1. Why is it being deleted by the finalizer?
 2. …14ms *after construction*?

A [Garbage Collection][1] refresher may be in order, but in short:

 1. All `Java.Lang.Object` subclasses are "bridged" objects.

 2. During a GC, all "collectable" bridged objects are gathered.
    A collectable object is one in which nothing in the managed
    GC references the object.

 3. Once the GC is complete, all gathered collectable bridged objects
    are passed to a "cross references" callback.

    The callback is called *outside* the "scope" of a GC; "the world"
    is *not* frozen, other threads may be executing.

 4. The "cross references" callback is the
    `MonoGCBridgeCallbacks::cross_references` field provided provided
    to [`mono_gc_register_bridge_callbacks()`][2].
    
    In a Xamarin.Android app, the "cross references" callback will
    "toggle" a JNI Global Reference to a JNI Weak Global Reference,
    invoke a Java-side GC, and then try to obtain a
    JNI Global Reference from the JNI Weak Global Reference.
    If a non-`NULL` pointer is returned, the bridged object is kept
    alive.  If a `NULL` pointer is returned, the bridged object will
    be considered dead, and will be added to the Finalization Queue
    (as `Java.Lang.Object` has a finalizer).

Thus, it seemed "funny" that within 14ms an instance was created,
GC'd, and determined to be garbage, *especially* when we *knew* that
this instance was being passed to Java, which we expected to retain
the instance.  (Yet wasn't…?)

After much banging of heads, and the yeoman's work of creating a
simplified and consistently reproducible test case, we *think* we
know the cause of the crash.

Consider our normal Managed-to-Java marshaling code, e.g.

	partial class MaterialButton  {
	    public unsafe MaterialButton (global::Android.Content.Context context)
	    	: base (IntPtr.Zero, JniHandleOwnership.DoNotTransfer)
	    {
	        const string __id = "(Landroid/content/Context;)V";

	        if (((global::Java.Lang.Object) this).Handle != IntPtr.Zero)
	            return;

	        try {
	            JniArgumentValue* __args = stackalloc JniArgumentValue [1];
	/* 1 */     __args [0] = new JniArgumentValue ((context == null) ? IntPtr.Zero : ((global::Java.Lang.Object) context).Handle);
	/* 2 */     var __r = _members.InstanceMethods.StartCreateInstance (__id, ((object) this).GetType (), __args);
	            SetHandle (__r.Handle, JniHandleOwnership.TransferLocalRef);
	/* 3 */     _members.InstanceMethods.FinishCreateInstance (__id, this, __args);
	        } finally {
	        }
	    }
	}

At (1), `context.Handle` is -- a JNI GREF -- is stored into a
`JniArgumentValue* __args` value, and at (3) `__args` is passed into
JNI, which will presumably "Do Something" with that handle.

However, nothing ties `context.Handle` to `context`, so from
(2) onward, `context` *may* be eligible for garbage collection.

See also Chris Brumme's [Lifetime, GC.KeepAlive, handle recycling][3]
blog article.  It's about .NET Framework, but the same fundamental
multithreading concepts apply.

The fix is to *ensure* that `context` is kept alive
*for as long as* `context.Handle` will be used, i.e. across the
JNI `_members.InstanceMethods.FinishCreateInstance()` call:

	partial class MaterialButton  {
	    public unsafe MaterialButton (global::Android.Content.Context context)
	    	: base (IntPtr.Zero, JniHandleOwnership.DoNotTransfer)
	    {
	        const string __id = "(Landroid/content/Context;)V";

	        if (((global::Java.Lang.Object) this).Handle != IntPtr.Zero)
	            return;

	        try {
	            JniArgumentValue* __args = stackalloc JniArgumentValue [1];
	/* 1 */     __args [0] = new JniArgumentValue ((context == null) ? IntPtr.Zero : ((global::Java.Lang.Object) context).Handle);
	/* 2 */     var __r = _members.InstanceMethods.StartCreateInstance (__id, ((object) this).GetType (), __args);
	            SetHandle (__r.Handle, JniHandleOwnership.TransferLocalRef);
	/* 3 */     _members.InstanceMethods.FinishCreateInstance (__id, this, __args);
	        } finally {
	/* 4 */     global::System.GC.KeepAlive (context);
	        }
	    }
	}

This should prevent e.g. `context` from being prematurely GC'd,
which in turn should prevent the `JNI DETECTED ERROR` message.

Update `tools/generator` to emit `GC.KeepAlive()` statements for
every parameter type which isn't a value type (`enum`, `int`, `string`,
etc.).  `string` is considered a value type because we always send a
"deep copy" of the string contents, so it won't matter if the `string`
instance is GC'd immediately.

[0]: https://quoteinvestigator.com/2015/03/02/eureka-funny/
[1]: https://docs.microsoft.com/xamarin/android/internals/garbage-collection
[2]: http://docs.go-mono.com/?link=xhtml%3adeploy%2fmono-api-gc.html
[3]: https://docs.microsoft.com/archive/blogs/cbrumme/lifetime-gc-keepalive-handle-recycling

@brendanzagaeski
Copy link
Contributor

Commit message comments:

A [Garbage Collection]][1] refresher may be in order, but in short:

There's a tiny typo of a double ]].


 2. Bridge objects are sent to the
    `MonoGCBridgeCallbacks::cross_references` field provided provided
    to [`mono_gc_register_bridge_callbacks()`][2] to see if they're
    *actually* collectable.

 3. Bridge objects (2) are only sent if they've gone through a
    "normal" GC *once* and were eligible for "normal" collection, and
    then are sent through the bridge callback during a *subsequent* GC.

I got tripped up when I first read through this because when I was reading 2 before seeing 3, I thought 2 was saying that all bridged objects are sent to cross_references(). Maybe it would be OK to swap the order of 2 and 3 and reword a tiny bit, something like:

 2. Bridge objects that have gone through a "normal" GC *once* and
    were eligible for "normal" collection are sent through an
    additional bridge callback step during a *subsequent* GC.

 3. In the additional callback step, bridge objects are sent to the
    `MonoGCBridgeCallbacks::cross_references` field provided provided
    to [`mono_gc_register_bridge_callbacks()`][2] to see if they're
    *actually* collectable.

 3. Bridge objects (2) are only sent if they've gone through a
    "normal" GC *once* and were eligible for "normal" collection, and
    then are sent through the bridge callback during a *subsequent* GC.

Thus, it seemed "funny" that within 14ms an instance was created,
GC'd (determined to be garbage), and GC'd again (bridge processing),
*especially* when we *knew* that this instance was being passed to
Java, which was expected to retain the instance.  (Yet wasn't…?)

Based on the log output from the test apps, it seemed like the conversion to a weak global reference and back to a plain (strong) global reference all happened on the first GC right after the problematic type was instantiated, so it didn't seem like there were two GCs involved. Maybe in the latest versions, a single GC covers all 3 steps: marking the managed types (world stopped), running the Java GC (world running), and then completing the managed collection (world stopped)?

Example log lines to illustrate, with an initial gref handle value of 0x26f6:

I mono-stdout: Before instantiation.
I monodroid-gref: +g+ grefc 11 gwrefc 0 obj-handle 0x89/L -> new-handle 0x26ea/G from thread '(null)'(1)
I monodroid-gref: +g+ grefc 12 gwrefc 0 obj-handle 0x79/I -> new-handle 0x26f6/G from thread '(null)'(1)
I monodroid-gref: +g+ grefc 13 gwrefc 0 obj-handle 0x75/I -> new-handle 0x2706/G from thread '(null)'(1)
I monodroid-gref: +w+ grefc 13 gwrefc 1 obj-handle 0x26f6/G -> new-handle 0x1b7/W from thread 'finalizer'(6779)
I monodroid-gref: -g- grefc 12 gwrefc 1 handle 0x26f6/G from thread 'finalizer'(6779)
I dappsingleview: Explicit concurrent copying GC freed 3641(686KB) AllocSpace objects, 1(20KB) LOS objects, 59% free, 1043KB/2MB, paused 36us total 4.594ms
I monodroid-gref: +g+ grefc 13 gwrefc 1 obj-handle 0x1b7/W -> new-handle 0x26f2/G from thread 'finalizer'(6779)
I monodroid-gref: -w- grefc 13 gwrefc 0 handle 0x1b7/W from thread 'finalizer'(6779)
D Mono    : GC_TAR_BRIDGE bridges 1 objects 1 opaque 0 colors 1 colors-bridged 1 colors-visible 1 xref 0 cache-hit 0 cache-semihit 0 cache-miss 0 setup 0.01ms tarjan 0.00ms scc-setup 0.01ms gather-xref 0.00ms xref-setup 0.00ms cleanup 0.01ms
D Mono    : GC_BRIDGE: Complete, was running for 5.07ms
D Mono    : GC_MAJOR: (user request) time 0.79ms, stw 1.01ms los size: 0K in use: 0K
D Mono    : GC_MAJOR_SWEEP: major size: 848K in use: 93K

If the grefs can indeed turn over on the first GC after they're created, then there could be a chance that the "good" lifetime for the problematic object would also have been quite close to 14ms in the original few test scenarios. For example, if by chance the nursery filled up just after HideSoftInputFromWindow() returned instead of just before it returned, the "good" lifetime might not have been much longer than 14ms, depending on how long the call to InvokeAbstractBooleanMethod() took to return. If that's true, then it was perhaps only coincidental that the 14ms liftime looked suspicious.


	partial class InputMethodManager {
	    public unsafe bool HideSoftInputFromWindow (Android.OS.IBinder? windowToken, [global::Android.Runtime.GeneratedEnum] Android.Views.InputMethods.HideSoftInputFlags flags)
	    {
	        const string __id = "hideSoftInputFromWindow.(Landroid/os/IBinder;I)Z";
	        try {
	            JniArgumentValue* __args = stackalloc JniArgumentValue [2];
	/* 1 */     __args [0] = new JniArgumentValue ((windowToken == null) ? IntPtr.Zero : ((global::Java.Lang.Object) windowToken).Handle);
	/* 2 */     __args [1] = new JniArgumentValue ((int) flags);
	/* 3 */     var __rm = _members.InstanceMethods.InvokeAbstractBooleanMethod (__id, this, __args);
	            return __rm;
	        } finally {
	        }
	    }
	}

In case it might be desirable to match it up with the scenario from the original larger test app that was associated with the FragmentContainer.n_onCreateView() call stack in the beginning of the commit message, this example could optionally use the matching Google.Android.Material.Button.MaterialButton..ctor marshaling code:

	partial class MaterialButton  {
		public unsafe MaterialButton (global::Android.Content.Context context)
			: base (IntPtr.Zero, JniHandleOwnership.DoNotTransfer)
		{
			const string __id = "(Landroid/content/Context;)V";

			if (((global::Java.Lang.Object) this).Handle != IntPtr.Zero)
				return;

			try {
				JniArgumentValue* __args = stackalloc JniArgumentValue [1];
	/* 1 */			__args [0] = new JniArgumentValue ((context == null) ? IntPtr.Zero : ((global::Java.Lang.Object) context).Handle);
	/* 2 */			var __r = _members.InstanceMethods.StartCreateInstance (__id, ((object) this).GetType (), __args);
				SetHandle (__r.Handle, JniHandleOwnership.TransferLocalRef);
	/* 3 */			_members.InstanceMethods.FinishCreateInstance (__id, this, __args);
			} finally {
			}
		}
	}

And then windowToken could be replaced with context in the paragraphs after that. Of course, the HideSoftInputFromWindow() scenario could also happen in a call stack involving FragmentContainer.n_onCreateView(), so the current example is fine too.


[0]: https://quoteinvestigator.com/2015/03/02/eureka-funny/
[1]: https://docs.microsoft.com/en-us/xamarin/android/internals/garbage-collection
[2]: http://docs.go-mono.com/?link=xhtml%3adeploy%2fmono-api-gc.html
[3]: https://docs.microsoft.com/en-us/archive/blogs/cbrumme/lifetime-gc-keepalive-handle-recycling

Optionally, the en-us/ segment of the docs.microsoft.com URLs could be omitted to allow automatic selection of localized versions for anyone navigating to the links.

Copy link
Contributor

@brendanzagaeski brendanzagaeski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ha, I just noticed something that might be good to fix up before merging this PR. I got the example marshaling code for the MaterialButton constructor by building the AndroidX bindings, and for that build I was actually using the installer package from the xamarin-android PR that includes the change from this Java.Interop PR: dotnet/android#5136.

For other generated methods in the output, GC.KeepAlive() is included as expected:

public virtual unsafe void AddOnCheckedChangeListener (global::Google.Android.Material.Button.MaterialButton.IOnCheckedChangeListener listener)
{
	const string __id = "addOnCheckedChangeListener.(Lcom/google/android/material/button/MaterialButton$OnCheckedChangeListener;)V";
	try {
		JniArgumentValue* __args = stackalloc JniArgumentValue [1];
		__args [0] = new JniArgumentValue ((listener == null) ? IntPtr.Zero : ((global::Java.Lang.Object) listener).Handle);
		_members.InstanceMethods.InvokeVirtualVoidMethod (__id, this, __args);
	} finally {
		global::System.GC.KeepAlive (listener);
	}
}

But it's not yet being added for the MaterialButton constructors:

// Metadata.xml XPath constructor reference: path="/api/package[@name='com.google.android.material.button']/class[@name='MaterialButton']/constructor[@name='MaterialButton' and count(parameter)=3 and parameter[1][@type='android.content.Context'] and parameter[2][@type='android.util.AttributeSet'] and parameter[3][@type='int']]"
[Register (".ctor", "(Landroid/content/Context;Landroid/util/AttributeSet;I)V", "")]
public unsafe MaterialButton (global::Android.Content.Context context, global::Android.Util.IAttributeSet attrs, int defStyleAttr)
	: base (IntPtr.Zero, JniHandleOwnership.DoNotTransfer)
{
	const string __id = "(Landroid/content/Context;Landroid/util/AttributeSet;I)V";

	if (((global::Java.Lang.Object) this).Handle != IntPtr.Zero)
		return;

	try {
		JniArgumentValue* __args = stackalloc JniArgumentValue [3];
		__args [0] = new JniArgumentValue ((context == null) ? IntPtr.Zero : ((global::Java.Lang.Object) context).Handle);
		__args [1] = new JniArgumentValue ((attrs == null) ? IntPtr.Zero : ((global::Java.Lang.Object) attrs).Handle);
		__args [2] = new JniArgumentValue (defStyleAttr);
		var __r = _members.InstanceMethods.StartCreateInstance (__id, ((object) this).GetType (), __args);
		SetHandle (__r.Handle, JniHandleOwnership.TransferLocalRef);
		_members.InstanceMethods.FinishCreateInstance (__id, this, __args);
	} finally {
	}
}

Google.Android.Material.Button.MaterialButton.cs.txt

@BrzVlad
Copy link
Member

BrzVlad commented Sep 19, 2020

This looks good, there are some excessive KeepAlive's. I agree with @jonpryor that we don't need the keep alive for non-ref types, or arrays of non-ref. Overdoing it is not the end of the world, since the overhead is tiny, but I would add a comment somewhere if we are being conservative about it.

@jpobst
Copy link
Contributor Author

jpobst commented Sep 19, 2020

@brendanzagaeski Good catch! It looks like constructors use a different code path. I can add that on Monday.

@jonpryor
Copy link
Member

Replying before I update the commit message

@brendanzagaeski wrote:

it seemed like the conversion to a weak global reference and back to a plain (strong) global reference all happened on the first GC right after the problematic type was instantiated

Insufficient information provided; it's impossible to tell. All we know is that the GREF +g+ is created outside of a GC, and that the +w+/-g- is done within the MonoGCBridgeCallbacks::cross_references callback…which is also "outside" of a GC, but after part of the GC work has been performed. We have no way of knowing how many GC's occurred between the +g+ and the +w+ without enabling Mono's GC logging (export MONO_LOG_MASK=gc, etc.), which the original log file didn't contain.

My (incorrect?) understanding/asumption is that the GC bridge is built upon/reuses some of the infrastructure for finalizable objects, and that -- like finalizable objects -- bridged objects are first collected "normally", and when they are considered to be garbage/unreferenced, they are placed on a "finalization queue", and it is during the a subsequent GC which processes the finalization queue that the GC bridge callback is invoked.

@BrzVlad: Is the above correct? Are bridgeable objects "like" finalizable objects? Furthermore, considering that e.g. Java.Lang.Object has a finalizer, is that "quasi-moot", in that -- as they have a finalizer -- they need to be treated like finalizable objects anyway?

Example log lines to illustrate, with an initial gref handle value of 0x26f6:

The provided example does have Mono's GC log messages enabled, which is handy, and further suggests that I'm wrong (yay?), and that the bridge callback is invoked as part of the first GC, and not part of a "second" "process the finalization queue" GC.

That said, Mono's GC messages don't appear to print when the GC started, only after it's completed, which is a tad irksome.

@BrzVlad
Copy link
Member

BrzVlad commented Sep 21, 2020

Since I have worked only sporadically with bridge code, take what I say with a grain of salt. I haven't tested this in practice, but the flow should be as follows :

There are two objects OC#, OJava. OC# always has a finalizer. Let's say both objects are no longer reachable.

Mono GC1 occurs, OC# is seen as being dead, but, because it is bridge object, it is resurrected and part of the bridge objects whose liveness needs to be checked on java side. World is resumed, Gref for OJava is switched to weak ref. Collection is triggered on Java side. OJava is dead after collection, Mono GC is informed that OC# is technically no longer a bridge object (meaning java no longer keeps this object alive).

Mono GC2 occurs, OC# is seen as being dead, but, because it has a finalizer, it is resurrected and its finalizer is enqueued.

Mono GC3 occurs. At this point if time, OC# is probably dead for good, its memory is reclaimed.

In conclusion, java object can die immediately after C# object is initially not reachable, while the C# object will survive a few more GC iterations, since it gets resurrected.

@jpobst
Copy link
Contributor Author

jpobst commented Sep 21, 2020

Added a commit to:

  • Exclude GC.KeepAlive for java.lang.String.
  • Add KeepAlives for parameters in constructors.

Note: For constructors that take a Java.Lang.ICharSequence parameter we add an additional constructor overload that takes a string instead. For 16.8, these overloaded string constructors will generate a GC.KeepAlive because we'll need to do some deeper changes to fix that. I will make these changes for the master version.

@jonpryor jonpryor merged commit 79d9533 into d16-8 Sep 21, 2020
@jonpryor jonpryor deleted the keep-alive branch September 21, 2020 18:14
@jpobst jpobst added this to the 11.0 (16.8 / 8.8) milestone Sep 22, 2020
@github-actions github-actions bot locked and limited conversation to collaborators Apr 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants