Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Ensure threads are merged when tearing down the Rasterizer #19919

Merged
merged 20 commits into from
Aug 19, 2020

Conversation

cyanglaz
Copy link
Contributor

@cyanglaz cyanglaz commented Jul 21, 2020

Description

This is to solve the deadlock in Shell::OnPlatformViewDestroy when dynamic thread merging is enabled.
In this solution, before we tearDown the rasterizer, we make sure the threads are merged.
We also make sure the threads are un-merged after tearing down the rasterizer, so we can re-create the surface on the raster thread.

go/flutter-thread-merging-rasterizer-teardown

TODO:
1. remove logging code.
2. add tests.

Related Issues

Fixes flutter/flutter#57067, flutter/flutter#23975

Tests

Checklist

Before you create this PR confirm that it meets all requirements listed below by checking the relevant checkboxes ([x]). This will ensure a smooth and quick review process.

  • I read the contributor guide and followed the process outlined there for submitting PRs.
  • I signed the CLA.
  • I read and followed the C++, Objective-C, Java style guides for the engine.
  • I read the tree hygiene wiki page, which explains my responsibilities.
  • I updated/added relevant documentation.
  • All existing and new tests are passing.
  • I am willing to follow-up on review comments in a timely manner.

Breaking Change

Did any tests fail when you ran them? Please read handling breaking changes.

@flutter-dashboard
Copy link

It looks like this pull request may not have tests. Please make sure to add tests before merging. If you need an exemption to this rule, contact Hixie on the #hackers channel in Chat.

Reviewers: Read the Tree Hygiene page and make sure this patch meets those guidelines before LGTMing.

@cyanglaz
Copy link
Contributor Author

@iskakaushik @chinmaygarde I haven't documented the new APIs in this PR nor have I added tests yet. I wanted to run this solution by you guys first. Let me know if you see obvious drawbacks in this solution.

@auto-assign auto-assign bot requested a review from liyuqian July 21, 2020 19:19
@cyanglaz cyanglaz requested a review from blasten July 21, 2020 19:21
@blasten
Copy link

blasten commented Jul 22, 2020

direction LGTM. How are you planning to test it?

@cyanglaz
Copy link
Contributor Author

direction LGTM. How are you planning to test it?

I think we can test this using ShellTests. (https://github.com/flutter/engine/blob/master/shell/common/shell_test.h)
I just haven't wired it up yet as I want to get an approval of the general direction of this solution first.

@chinmaygarde chinmaygarde self-requested a review July 27, 2020 21:17
@chinmaygarde chinmaygarde added the Work in progress (WIP) Not ready (yet) for review! label Jul 30, 2020
task_runners_.GetPlatformTaskRunner()) {
return true;
}
if (surface_ == nullptr || raster_thread_merger_.get() == nullptr) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in what case surface_ = nullptr ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the rasterizer has already been torn down, the surface_ would be null. And in that case we shouldn't try to merge threads or wait the threads to be merged.

Copy link
Contributor Author

@cyanglaz cyanglaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blasten I updated the PR with your comments.

task_runners_.GetPlatformTaskRunner()) {
return true;
}
if (surface_ == nullptr || raster_thread_merger_.get() == nullptr) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the rasterizer has already been torn down, the surface_ would be null. And in that case we shouldn't try to merge threads or wait the threads to be merged.

@blasten
Copy link

blasten commented Aug 12, 2020

EmbedderTest.PushingMutlipleFramesSetsUpNewRecordingCanvasWithoutCustomCompositor test failed in the last run

Comment on lines 437 to 440
/// @attention If raster and platform task runners are not the same or not
/// merged. This method will try to merge the task runners and
/// might block the current thread and wait until the 2 task
/// runners are merged.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: run-on sentence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link

@blasten blasten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM + just minor nits. Excited to give this a try!

@@ -26,6 +32,15 @@ void RasterThreadMerger::MergeWithLease(size_t lease_term) {
is_merged_ = task_queues_->Merge(platform_queue_id_, gpu_queue_id_);
lease_term_ = lease_term;
}
merged_condition_.notify_one();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grab lock.

@@ -55,7 +67,9 @@ class RasterThreadMerger
fml::TaskQueueId gpu_queue_id_;
fml::RefPtr<fml::MessageLoopTaskQueues> task_queues_;
std::atomic_int lease_term_;
bool is_merged_;
std::atomic_bool is_merged_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be replaced by lease_term_ > 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

@cyanglaz cyanglaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iskakaushik @blasten Updated with suggests and also added some unit tests for raster_thread_merger

@@ -55,7 +67,9 @@ class RasterThreadMerger
fml::TaskQueueId gpu_queue_id_;
fml::RefPtr<fml::MessageLoopTaskQueues> task_queues_;
std::atomic_int lease_term_;
bool is_merged_;
std::atomic_bool is_merged_;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 437 to 440
/// @attention If raster and platform task runners are not the same or not
/// merged. This method will try to merge the task runners and
/// might block the current thread and wait until the 2 task
/// runners are merged.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cyanglaz
Copy link
Contributor Author

Some tests are failing because we are running raster_thread_merger on platforms that don't support it. See flutter/flutter#38844
I've created a PR to fix it #20487. We need to land 20487 in order to fix the tests here.

@@ -246,7 +246,7 @@ bool MessageLoopTaskQueues::Unmerge(TaskQueueId owner) {
bool MessageLoopTaskQueues::Owns(TaskQueueId owner,
TaskQueueId subsumed) const {
std::lock_guard guard(queue_mutex_);
return subsumed == queue_entries_.at(owner)->owner_of || owner == subsumed;
return subsumed == queue_entries_.at(owner)->owner_of;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would you mind adding a TODO, and filing an issue about reverting this logic once Android is fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually might not need to revert this as I think this make sense. waiting for @iskakaushik to confirm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this is good.

fml::AutoResetWaitableEvent term_platform;
fml::AutoResetWaitableEvent latch_wait_until_merged;
std::thread thread_platform([&]() {
TEST(RasterThreadMerger, HandleTaskQueuesAreTheSame) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing checks for ASSERT_TRUE(raster_thread_merger_->TaskQueuesAreSame()), and ASSERT_FALSE(raster_thread_merger_->TaskQueuesAreSame()) in a separate TEST.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make raster_thread_merger_->TaskQueuesAreSame() private. The user of the raster_thread_merger shouldn't need to know if the merging is static or dynamic. So the test only needed to test if the threads are merged.

Copy link

@blasten blasten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM + some minor nits

Copy link
Contributor

@iskakaushik iskakaushik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (engine) {
engine->OnOutputSurfaceDestroyed();
}
// Step 1: Next, tell the raster thread that its rasterizer should suspend
// access to the underlying surface.
if (should_post_raster_task) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyanglaz I was thinking about this. Sorry for all these questions:

  1. Is it possible that should_post_raster_task is true at this point even though it should be false?
  2. Why isn't this checking a global shared state (such as RasterThreadMerger::IsMerged()) instead of a local state?
  3. Could this be the reason why we need to merge the threads to fix the race condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. should_post_raster_task was only exists for static thread merging. This value is always true for dynamic thread merging.
  2. I'm not sure why it was a local var, but global state would also work.
  3. This is the reason why we needed to merge threads here. Because thread merging happens on raster thread and this is running on platform thread, we cannot know if threads are merged.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the platform thread is blocked at this point waiting on the latch, and the raster queue is running on the platform thread, can we release the latch and indicate that the raster should be teared down right after the latch is awaited?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes platform-ios waiting for tree to go green This PR is approved and tested, but waiting for the tree to be green to land.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shell:OnPlatformViewDestroyed not working with thread merging
6 participants