-
Notifications
You must be signed in to change notification settings - Fork 5.1k
[release/9.0] Fix edge cases in Tarjan GC bridge (Android) #114391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/9.0] Fix edge cases in Tarjan GC bridge (Android) #114391
Conversation
…otnet#112970) Fix typo in GC bridge comparison message (SCCS -> XREFS)
* [mono][sgen] Fix DUMP_GRAPH debug option build for tarjan bridge * [mono][sgen] Don't create ScanData* during debug dumping of SCCs It serves no purpose and it would later crash the runtime since we didn't patch the lockword back in place. * [mono][sgen] Fix some null deref crashes in DUMP_GRAPH debug option * [mono][tests] Add bridge tests These are ported from some of the bridge tests we had on mono/mono. In order to test them we compare between the output of the new and the tarjan bridge.
…formation (dotnet#112825) * Fix an edge case in the Tarjan SCC that lead to losing xref information In the Tarjan SCC bridge processing there's a color graph used to find out connections between SCCs. There was a rare case which only manifested when a cycle in the object graph points to another cycle that points to a bridge object. We only recognized direct bridge pointers but not pointers to other non-bridge SCCs that in turn point to bridges and where we already calculated the xrefs. These xrefs were then lost. * Add test case to sgen-bridge-pathologies and add an assert to catch the original bug * Add review --------- Co-authored-by: Vlad Brezae <[email protected]>
…duplication (dotnet#113044) * [SGen/Tarjan] Handle edge case with node heaviness changing due to deduplication Do early deduplication Fix Windows build Add test cases to sgen-bridge-pathologies * Move test code * Remove old code * Add extra check (no change to functionality)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. we will take for consideration in 9.0.x
@filipnavara there's a failure in the new test on wasm:
Not sure why it doesn't happen in main |
Would it make sense to disable the test on WASM to unblock us? WASM doesn't use the GC bridge anyway. (I am currently in middle of moving our office so I don't know when I will be able to address it.) |
Sure, I can handle it. |
Friendly reminder that code complete for the may release is next Monday April 14th. If you want this change included in that release, please merge the PR before EOD Monday. |
/ba-g failures in the wasm tests are unrelated, it doesn't even compile this code. |
703efd5
into
dotnet:release/9.0-staging
I can no longer reproduce the original bug reported by a customer in #115611 using their repro project with the latest changes. |
Backport of #112825, #112970, #113044 and #113703 to release/9.0-staging
/cc @BrzVlad @vitek-karas
Customer Impact
Two customer issues on Android were traced back to bugs in the MonoVM Tarjan GC bridge responsible for bridging .NET and Java garbage collectors.
In the first issue (#115611) the runtime failed to report edges to the Android GC bridge code, resulting in alive objects collected in the Java runtime and their .NET peers throwing
ObjectDisposedException
. The particular pattern could have been triggered by code usingawait
on the UI thread where the continuation was scheduled to run through synchronization context on the same thread.In the second issue (#106410) the runtime crashed on an assertion for object graphs of certain shape. The cause of the crash was an incorrect assumption about duplicate edges in a compressed object graph and how later stages of the algorithm process them. When the later stages of the processing deduplicated the edges it may have resulted in an unforseen change to some of the graph properties leading to a logic error in the algorithm.
Regression
Both of these bugs existed in .NET MonoVM / Android code since the inception. Legacy Xamarin.Android offered 3 different bridge algorithms (
old
,new
, andtarjan
) with thenew
bridge being the default for a while. The default was changed totarjan
before MonoVM was merged into .NET.Testing
Test cases reproducing both issues were presented and verified against the fix. Additionally, a new testing code was introduced to specifically test the problematic patterns as part of .NET runtime testing infrastructure.
Risk
Medium - the changes simply make reporting the graph more accurate. Even though we have high confidence in the changes, we recognize it does touch a sensitive area.