Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Avoid capturing this unsafely in MultiFrameCodec #16824

Merged
merged 11 commits into from
Mar 10, 2020

Conversation

dnfield
Copy link
Contributor

@dnfield dnfield commented Feb 27, 2020

fixes flutter/flutter#13359

Creates a State object to share as a shared_ptr instead of capturing this. I don't believe we can use a fml::WeakPtr, since we would have to create it on the IO task runner, which might be blocked until after the MultiFrameCodec is destroyed on the UI runner.

Avoids decoding the image if the collect happens before we start image decoding - this seems like the right thing to do to me, but in honesty if I don't do this the test is segfaulting when I try to drain the unref queue and I'm not quite sure why.

Moves some of the logic that's sharable from the Dart Isolate unit tests to a common file for use by the image decoder tests.

@Qxyat
Copy link
Contributor

Qxyat commented Mar 4, 2020

We encounter a problem which is the app crash at the time the engine exit due to MultiFrameCodec share between ui and io thread. When this one will land?

@dnfield
Copy link
Contributor Author

dnfield commented Mar 4, 2020

Ahh this was passing on macOS but not Linux.

I'll fix up the test harness and then it just needs a review.

@dnfield dnfield requested a review from liyuqian March 4, 2020 17:26
@dnfield
Copy link
Contributor Author

dnfield commented Mar 4, 2020

Test harness was missing a config to export the needed symbols on Linux - done and should all be passing now.

@@ -557,5 +612,87 @@ TEST(ImageDecoderTest, VerifySubpixelDecodingPreservesExifOrientation) {
assert_image(decode({}, 100));
}

TEST_F(ImageDecoderFixtureTest,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is pretty big, it can't be groked with a passing perusal. I recommend adding a docstring at the beginning of the test, or pulling out some helper functions so that the gist can be parsed quicker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to the top. I'm not quite sure how to refactor this into smaller functions while still preserving the ordering that's needed, but maybe when I get in we can talk about it some more offline.

latch.Wait();

// Destroy the IO manager
runners.GetIOTaskRunner()->PostTask([&]() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is alot of work here to serialize async operations with latches. It would be easier to follow if you made a PerformSyncTask function that takes in a task runner and a lambda and doesn't complete until the task runner finishes the lambda.

I've tried to add this in the past and have met resistance, maybe for tests at least people will be accepting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is idiomatic code for how the rest of the engine repo does it. I'd be hesitant to add a new test-only class for this that itself needs to be tested...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boilerplate that detracts from the point shouldn't be encouraged. I wasn't think a new class I was thinking this:

void PerformSyncTask(TaskRunner runner, Function func) {
  Latch latch;
  runner->PerformTask({
    func();
    latch.signal();
  });
  latch.wait();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The synchronization work here is also highly constructed to simulate collecting the engine while IO work is ongoing in a repeatable, non-flaky way. I'm not sure it would help to have something like PerformSyncTask, since I need to specifically interleave certain actions. I hope the comment I added at the top clarifies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this specific test, I already have the latch and would need to have it anyway. The added function would only save a line or two of code here :\

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same thing could be said of for-loops versus foreach-loops. It isn't about LOC, its about avoiding errors, extra variables, and having a meaningful abstraction with which to build off of.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - and I think such a decision on style is something we should follow consistently in the repo.

I'm not opposed to making this change so much as opposed to doing it differently just here, or even just in this test file, compared to the way we do it everywhere else. It'd be easier to just refactor all the callsites of this if they're all pretty similar, rather than doing it specially here.

fml::WeakPtr<GrContext> resourceContext,
fml::RefPtr<flutter::SkiaUnrefQueue> unref_queue,
size_t trace_id);
struct State {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class docstring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It's probably not in proper doxygen format, but this is a private class anyway.

@@ -0,0 +1,198 @@
// Copyright 2013 The Flutter Authors. All rights reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be a a big refactor. It would be easier to review and manage if this was split into a different PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll split this into a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnfield dnfield mentioned this pull request Mar 4, 2020
tonic::DartState::Current(), callback_handle),
this, trace_id, ui_task_runner = task_runners.GetUITaskRunner(),
io_manager = dart_state->GetIOManager()]() mutable {
GetNextFrameAndInvokeCallback(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this misuse of this in other places. The pattern to fix it in most places seems to be sending in an fml::WeakPtr, and check its validity before dereference so another thread can safely destruct the object. I wonder if there's any particular reason why we can't use fml::WeakPtr here, and have to create a new State object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't use a WeakPtr here because we have to access the object across threads, and WeakPtr is not thread safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we can't create the weak ptr on the IO task runner because our test case would fail - e.g. if the IO runner is blocked, and the object gets collected before the task to create the weak ptr runs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liyuqian: While weak pointers can be copied safely across threads, they may only be used on the thread on which their factories live. And weak pointer factories can only be created and collected on a single thread. In this case, this is the UI thread. Your intuition is absolutely correct but only in cases where the object will only be used on the one thread. This is not one of those cases.

// don't need to be explicitly supplied by the embedder. In AOT, these
// snapshots will be present in the application AOT dylib.
if (DartVM::IsRunningPrecompiledCode()) {
PrepareSettingsForAOTWithSymbols(settings, aot_symbols_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: FML_CHECK the result here. We don't want to fail later if for some reason we mess up symbol packaging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -14,7 +20,56 @@
namespace flutter {
namespace testing {

using ImageDecoderFixtureTest = ThreadTest;
class ImageDecoderFixtureTest : public ThreadTest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Consider putting this in its own TU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// runner.
// - Unlatches the IO task runner
auto settings = CreateSettingsForFixture();
settings.leak_vm = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe configure these in CreateSettingsForFixture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

auto gif_codec = SkCodec::MakeFromData(gif_mapping);
ASSERT_TRUE(gif_codec);

auto loop = fml::ConcurrentMessageLoop::Create();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't being used anywhere. Also, one will be created for you when you instantiate the VMRef. Just getting rid of it is fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed - this was copypasted from another test and I missed dropping it during refactoring.

fml::RefPtr<MultiFrameCodec> codec;
EXPECT_TRUE(isolate->RunInIsolateScope([&]() -> bool {
Dart_Handle library = Dart_LookupLibrary(Dart_NewStringFromCString(
"package:engine_root/ui/fixtures/ui_test.dart"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the test depend on the path of the fixture which would be brittle. Doesn't Dart_RootLibrary achieve the same thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, I didn't know about Dart_RootLibrary. Done.

tonic::DartState::Current(), callback_handle),
this, trace_id, ui_task_runner = task_runners.GetUITaskRunner(),
io_manager = dart_state->GetIOManager()]() mutable {
GetNextFrameAndInvokeCallback(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liyuqian: While weak pointers can be copied safely across threads, they may only be used on the thread on which their factories live. And weak pointer factories can only be created and collected on a single thread. In this case, this is the UI thread. Your intuition is absolutely correct but only in cases where the object will only be used on the one thread. This is not one of those cases.

const std::unique_ptr<SkCodec> codec_;
const int frameCount_;
const int repetitionCount_;
int nextFrameIndex_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While using shared pointers solves the lifecycle issues of State, it does not prevent data races due to concurrent access of state fields. Thats why non-const members of such objects are usually guarded by a mutex. In this case, auditing all uses seems to indicate that that all non-const members (except live_) are only read and updated on the IO thread. So it's safe but not obvious how.

Can we group these non const members together with a comment that makes it clear that these will only be read and updated on the IO thread? If we want to be paranoid, we could use a mutex but that is probably not necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I'd like to avoid the mutex since this is a potentially performance critical path, e.g. for an animated image firing on every vsync.

state = state_, trace_id,
ui_task_runner = task_runners.GetUITaskRunner(),
io_manager = dart_state->GetIOManager()]() mutable {
state->GetNextFrameAndInvokeCallback(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The live_ variable is unnecessary. You can copy state via a weak_ptr instead of a shared_ptr. Here, you can attempt a lock() and if that fails, you can do what you are doing in line 144.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can do what you are doing in line 144.

Wait. Why are we even invoking the next frame callback on a muli-frame codec that has died (which would be indicated by the weak pointer lock failing)?

Instead of invoking the next frame callback when the weak pointer lock fails, scheduled a task onto the UI thread that just releases the callback but not invoke it (that has Dart state and you cant just drop it on the IO thread).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, variable that check for "live"-ness are usually always an anti-pattern as the information can be invalid immediately after the check. There are valid use cases like double checked locks of course.

In this case, it seems it was an optimization I suppose. But again, the same can be achieved using weak_ptrs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on where it's safe to create the weak_ptr for the State. I can't do it on the UI thread, and by the time I get to the IO thread the UI object might have been collected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh you mean a std::weak_ptr. I get it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I was thinking we had to still make the callback because some Dart code might be awaiting the completion of the callback, but now that I think about it that doesn't make sense, since the only way this can happen is if the engine is collected. If anyone is actually awaiting the callback, the codec must still be alive. I'm correct in that thinking, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - I believe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only question is whether we need to guard a bit more so that if we get collected after this (e.g. we're still alive before we tell Skia to decode a frame, but we're dead after Skia finishes) - will this still be safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock will fail in that case.

@dnfield dnfield added the waiting for tree to go green This PR is approved and tested, but waiting for the tree to be green to land. label Mar 10, 2020
@fluttergithubbot
Copy link
Contributor

This pull request is not suitable for automatic merging in its current state.

  • The status or check suite Linux Host Engine has failed. Please fix the issues identified (or deflake) before re-applying this label.
  • The status or check suite Mac Web Engine has failed. Please fix the issues identified (or deflake) before re-applying this label.

@fluttergithubbot fluttergithubbot removed the waiting for tree to go green This PR is approved and tested, but waiting for the tree to be green to land. label Mar 10, 2020
@dnfield
Copy link
Contributor Author

dnfield commented Mar 10, 2020

I've run this test with the same random seed that flaked over 100 times on both mac and linux locally in profile mode - it hasn't failed. I'm retrying the bot here: https://ci.chromium.org/p/flutter/builders/try/Linux%20Host%20Engine/4950. If it succeeds I plan to land it.

The mac web failure is a flake that has nothing to do with this PR.

@dnfield
Copy link
Contributor Author

dnfield commented Mar 10, 2020

FWIW, the flake on Linux host_profile didn't even happen in the new test or codepath touched by this PR :\

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 10, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 10, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 11, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 11, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 11, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 11, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 11, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 11, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 11, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 11, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 11, 2020
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Mar 11, 2020
jason-simmons added a commit to jason-simmons/flutter that referenced this pull request Mar 11, 2020
2020-03-11 [email protected] Roll src/third_party/dart 4093d08271f6..37530145ff53 (4 commits) (flutter/engine#17090)
2020-03-11 [email protected] Roll src/third_party/skia bf355123ae3b..0340292972b9 (9 commits) (flutter/engine#17089)
2020-03-11 [email protected] Roll fuchsia/sdk/core/mac-amd64 from r_oCI... to 0Z8VF... (flutter/engine#17087)
2020-03-11 [email protected] Roll fuchsia/sdk/core/linux-amd64 from v32mJ... to X3Xm2... (flutter/engine#17086)
2020-03-11 [email protected] Remove the unused method on iOS surface to make the resource context current. (flutter/engine#17084)
2020-03-11 [email protected] Revert "Add support for the Metal backend on all iOS builds. (flutter#17080)" (flutter/engine#17088)
2020-03-11 [email protected] Roll src/third_party/dart ace1d9b9213a..4093d08271f6 (12 commits) (flutter/engine#17082)
2020-03-11 [email protected] Add support for the Metal backend on all iOS builds. (flutter/engine#17080)
2020-03-11 [email protected] Roll src/third_party/skia d3f67dbf9f36..bf355123ae3b (9 commits) (flutter/engine#17079)
2020-03-11 [email protected] Disable Embedder11yTest::A11yTreeIsConsistent to unblock LUCI. (flutter/engine#17081)
2020-03-10 [email protected] Gather demangled stack traces and report the same to console on crashes. (flutter/engine#16450)
2020-03-10 [email protected] Implement asynchronous texture uploads when using the Metal backend on iOS. (flutter/engine#17046)
2020-03-10 [email protected] Roll src/third_party/dart 97674262bc29..ace1d9b9213a (14 commits) (flutter/engine#17078)
2020-03-10 [email protected] Add RTree to flow (flutter/engine#16923)
2020-03-10 [email protected] Roll src/third_party/skia 78dac6dcb222..d3f67dbf9f36 (6 commits) (flutter/engine#17072)
2020-03-10 [email protected] Revert "Fix bounds of image_filter_layer (flutter#16960)" (flutter/engine#17074)
2020-03-10 [email protected] Use the ELF loader to setup AOT symbols in benchmark runner. (flutter/engine#17051)
2020-03-10 [email protected] Roll src/third_party/skia 23899c64e3db..78dac6dcb222 (19 commits) (flutter/engine#17069)
2020-03-10 [email protected] Roll dart to 97674262bc29447dc59d5c93024b18b27d4bcf98. (flutter/engine#17067)
2020-03-10 [email protected] [web] Fixes for Firefox & Safari double underline decoration bugs. (flutter/engine#16994)
2020-03-10 [email protected] Avoid capturing this unsafely in MultiFrameCodec (flutter/engine#16824)
2020-03-10 [email protected] Revert "Revert "fix shadows and mask filter blurs (flutter#16963)" (flutter#17008)" (flutter/engine#17040)
2020-03-10 [email protected] Add support for firefox mac installer. Update web_ui pubspec for http.wq (flutter/engine#17044)
2020-03-09 [email protected] fix "TREE INCONSISTENT" noise in compositing_test.dart (flutter/engine#16995)
2020-03-09 [email protected] Add more child lifecycle tests (flutter/engine#16689)
2020-03-09 [email protected] Add libfreetype6-dev to desktop Linux dependencies (flutter/engine#17020)
2020-03-09 [email protected] Disable shell benchmarks (flutter/engine#17038)
2020-03-09 [email protected] Fix bounds of image_filter_layer (flutter/engine#16960)
2020-03-09 [email protected] Record fml and shell benchmarks (flutter/engine#16991)
2020-03-09 [email protected] Roll src/third_party/skia c56950442dd1..23899c64e3db (11 commits) (flutter/engine#17033)
2020-03-09 [email protected] use commit date instead of author date (flutter/engine#17032)
jason-simmons added a commit to flutter/flutter that referenced this pull request Mar 11, 2020
2020-03-11 [email protected] Roll src/third_party/dart 4093d08271f6..37530145ff53 (4 commits) (flutter/engine#17090)
2020-03-11 [email protected] Roll src/third_party/skia bf355123ae3b..0340292972b9 (9 commits) (flutter/engine#17089)
2020-03-11 [email protected] Roll fuchsia/sdk/core/mac-amd64 from r_oCI... to 0Z8VF... (flutter/engine#17087)
2020-03-11 [email protected] Roll fuchsia/sdk/core/linux-amd64 from v32mJ... to X3Xm2... (flutter/engine#17086)
2020-03-11 [email protected] Remove the unused method on iOS surface to make the resource context current. (flutter/engine#17084)
2020-03-11 [email protected] Revert "Add support for the Metal backend on all iOS builds. (#17080)" (flutter/engine#17088)
2020-03-11 [email protected] Roll src/third_party/dart ace1d9b9213a..4093d08271f6 (12 commits) (flutter/engine#17082)
2020-03-11 [email protected] Add support for the Metal backend on all iOS builds. (flutter/engine#17080)
2020-03-11 [email protected] Roll src/third_party/skia d3f67dbf9f36..bf355123ae3b (9 commits) (flutter/engine#17079)
2020-03-11 [email protected] Disable Embedder11yTest::A11yTreeIsConsistent to unblock LUCI. (flutter/engine#17081)
2020-03-10 [email protected] Gather demangled stack traces and report the same to console on crashes. (flutter/engine#16450)
2020-03-10 [email protected] Implement asynchronous texture uploads when using the Metal backend on iOS. (flutter/engine#17046)
2020-03-10 [email protected] Roll src/third_party/dart 97674262bc29..ace1d9b9213a (14 commits) (flutter/engine#17078)
2020-03-10 [email protected] Add RTree to flow (flutter/engine#16923)
2020-03-10 [email protected] Roll src/third_party/skia 78dac6dcb222..d3f67dbf9f36 (6 commits) (flutter/engine#17072)
2020-03-10 [email protected] Revert "Fix bounds of image_filter_layer (#16960)" (flutter/engine#17074)
2020-03-10 [email protected] Use the ELF loader to setup AOT symbols in benchmark runner. (flutter/engine#17051)
2020-03-10 [email protected] Roll src/third_party/skia 23899c64e3db..78dac6dcb222 (19 commits) (flutter/engine#17069)
2020-03-10 [email protected] Roll dart to 97674262bc29447dc59d5c93024b18b27d4bcf98. (flutter/engine#17067)
2020-03-10 [email protected] [web] Fixes for Firefox & Safari double underline decoration bugs. (flutter/engine#16994)
2020-03-10 [email protected] Avoid capturing this unsafely in MultiFrameCodec (flutter/engine#16824)
2020-03-10 [email protected] Revert "Revert "fix shadows and mask filter blurs (#16963)" (#17008)" (flutter/engine#17040)
2020-03-10 [email protected] Add support for firefox mac installer. Update web_ui pubspec for http.wq (flutter/engine#17044)
2020-03-09 [email protected] fix "TREE INCONSISTENT" noise in compositing_test.dart (flutter/engine#16995)
2020-03-09 [email protected] Add more child lifecycle tests (flutter/engine#16689)
2020-03-09 [email protected] Add libfreetype6-dev to desktop Linux dependencies (flutter/engine#17020)
2020-03-09 [email protected] Disable shell benchmarks (flutter/engine#17038)
2020-03-09 [email protected] Fix bounds of image_filter_layer (flutter/engine#16960)
2020-03-09 [email protected] Record fml and shell benchmarks (flutter/engine#16991)
2020-03-09 [email protected] Roll src/third_party/skia c56950442dd1..23899c64e3db (11 commits) (flutter/engine#17033)
2020-03-09 [email protected] use commit date instead of author date (flutter/engine#17032)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't pass MultiFrameCodec across threads
7 participants