-
Notifications
You must be signed in to change notification settings - Fork 6k
Avoid capturing this unsafely in MultiFrameCodec #16824
Conversation
We encounter a problem which is the app crash at the time the engine exit due to |
Ahh this was passing on macOS but not Linux. I'll fix up the test harness and then it just needs a review. |
Test harness was missing a config to export the needed symbols on Linux - done and should all be passing now. |
@@ -557,5 +612,87 @@ TEST(ImageDecoderTest, VerifySubpixelDecodingPreservesExifOrientation) { | |||
assert_image(decode({}, 100)); | |||
} | |||
|
|||
TEST_F(ImageDecoderFixtureTest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is pretty big, it can't be groked with a passing perusal. I recommend adding a docstring at the beginning of the test, or pulling out some helper functions so that the gist can be parsed quicker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment to the top. I'm not quite sure how to refactor this into smaller functions while still preserving the ordering that's needed, but maybe when I get in we can talk about it some more offline.
latch.Wait(); | ||
|
||
// Destroy the IO manager | ||
runners.GetIOTaskRunner()->PostTask([&]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is alot of work here to serialize async operations with latches. It would be easier to follow if you made a PerformSyncTask function that takes in a task runner and a lambda and doesn't complete until the task runner finishes the lambda.
I've tried to add this in the past and have met resistance, maybe for tests at least people will be accepting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is idiomatic code for how the rest of the engine repo does it. I'd be hesitant to add a new test-only class for this that itself needs to be tested...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boilerplate that detracts from the point shouldn't be encouraged. I wasn't think a new class I was thinking this:
void PerformSyncTask(TaskRunner runner, Function func) {
Latch latch;
runner->PerformTask({
func();
latch.signal();
});
latch.wait();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The synchronization work here is also highly constructed to simulate collecting the engine while IO work is ongoing in a repeatable, non-flaky way. I'm not sure it would help to have something like PerformSyncTask
, since I need to specifically interleave certain actions. I hope the comment I added at the top clarifies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On this specific test, I already have the latch and would need to have it anyway. The added function would only save a line or two of code here :\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same thing could be said of for-loops versus foreach-loops. It isn't about LOC, its about avoiding errors, extra variables, and having a meaningful abstraction with which to build off of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - and I think such a decision on style is something we should follow consistently in the repo.
I'm not opposed to making this change so much as opposed to doing it differently just here, or even just in this test file, compared to the way we do it everywhere else. It'd be easier to just refactor all the callsites of this if they're all pretty similar, rather than doing it specially here.
fml::WeakPtr<GrContext> resourceContext, | ||
fml::RefPtr<flutter::SkiaUnrefQueue> unref_queue, | ||
size_t trace_id); | ||
struct State { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. It's probably not in proper doxygen format, but this is a private class anyway.
testing/dart_isolate_runner.h
Outdated
@@ -0,0 +1,198 @@ | |||
// Copyright 2013 The Flutter Authors. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to be a a big refactor. It would be easier to review and manage if this was split into a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll split this into a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tonic::DartState::Current(), callback_handle), | ||
this, trace_id, ui_task_runner = task_runners.GetUITaskRunner(), | ||
io_manager = dart_state->GetIOManager()]() mutable { | ||
GetNextFrameAndInvokeCallback( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen this misuse of this
in other places. The pattern to fix it in most places seems to be sending in an fml::WeakPtr
, and check its validity before dereference so another thread can safely destruct the object. I wonder if there's any particular reason why we can't use fml::WeakPtr
here, and have to create a new State
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't use a WeakPtr here because we have to access the object across threads, and WeakPtr is not thread safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we can't create the weak ptr on the IO task runner because our test case would fail - e.g. if the IO runner is blocked, and the object gets collected before the task to create the weak ptr runs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liyuqian: While weak pointers can be copied safely across threads, they may only be used on the thread on which their factories live. And weak pointer factories can only be created and collected on a single thread. In this case, this is the UI thread. Your intuition is absolutely correct but only in cases where the object will only be used on the one thread. This is not one of those cases.
// don't need to be explicitly supplied by the embedder. In AOT, these | ||
// snapshots will be present in the application AOT dylib. | ||
if (DartVM::IsRunningPrecompiledCode()) { | ||
PrepareSettingsForAOTWithSymbols(settings, aot_symbols_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: FML_CHECK
the result here. We don't want to fail later if for some reason we mess up symbol packaging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -14,7 +20,56 @@ | |||
namespace flutter { | |||
namespace testing { | |||
|
|||
using ImageDecoderFixtureTest = ThreadTest; | |||
class ImageDecoderFixtureTest : public ThreadTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Consider putting this in its own TU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// runner. | ||
// - Unlatches the IO task runner | ||
auto settings = CreateSettingsForFixture(); | ||
settings.leak_vm = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Maybe configure these in CreateSettingsForFixture
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
auto gif_codec = SkCodec::MakeFromData(gif_mapping); | ||
ASSERT_TRUE(gif_codec); | ||
|
||
auto loop = fml::ConcurrentMessageLoop::Create(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't being used anywhere. Also, one will be created for you when you instantiate the VMRef. Just getting rid of it is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed - this was copypasted from another test and I missed dropping it during refactoring.
fml::RefPtr<MultiFrameCodec> codec; | ||
EXPECT_TRUE(isolate->RunInIsolateScope([&]() -> bool { | ||
Dart_Handle library = Dart_LookupLibrary(Dart_NewStringFromCString( | ||
"package:engine_root/ui/fixtures/ui_test.dart")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the test depend on the path of the fixture which would be brittle. Doesn't Dart_RootLibrary
achieve the same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, I didn't know about Dart_RootLibrary
. Done.
tonic::DartState::Current(), callback_handle), | ||
this, trace_id, ui_task_runner = task_runners.GetUITaskRunner(), | ||
io_manager = dart_state->GetIOManager()]() mutable { | ||
GetNextFrameAndInvokeCallback( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liyuqian: While weak pointers can be copied safely across threads, they may only be used on the thread on which their factories live. And weak pointer factories can only be created and collected on a single thread. In this case, this is the UI thread. Your intuition is absolutely correct but only in cases where the object will only be used on the one thread. This is not one of those cases.
lib/ui/painting/multi_frame_codec.h
Outdated
const std::unique_ptr<SkCodec> codec_; | ||
const int frameCount_; | ||
const int repetitionCount_; | ||
int nextFrameIndex_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While using shared pointers solves the lifecycle issues of State
, it does not prevent data races due to concurrent access of state fields. Thats why non-const members of such objects are usually guarded by a mutex. In this case, auditing all uses seems to indicate that that all non-const members (except live_
) are only read and updated on the IO thread. So it's safe but not obvious how.
Can we group these non const members together with a comment that makes it clear that these will only be read and updated on the IO thread? If we want to be paranoid, we could use a mutex but that is probably not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I'd like to avoid the mutex since this is a potentially performance critical path, e.g. for an animated image firing on every vsync.
state = state_, trace_id, | ||
ui_task_runner = task_runners.GetUITaskRunner(), | ||
io_manager = dart_state->GetIOManager()]() mutable { | ||
state->GetNextFrameAndInvokeCallback( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The live_
variable is unnecessary. You can copy state
via a weak_ptr
instead of a shared_ptr
. Here, you can attempt a lock()
and if that fails, you can do what you are doing in line 144.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do what you are doing in line 144.
Wait. Why are we even invoking the next frame callback on a muli-frame codec that has died (which would be indicated by the weak pointer lock failing)?
Instead of invoking the next frame callback when the weak pointer lock fails, scheduled a task onto the UI thread that just releases the callback but not invoke it (that has Dart state and you cant just drop it on the IO thread).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, variable that check for "live"-ness are usually always an anti-pattern as the information can be invalid immediately after the check. There are valid use cases like double checked locks of course.
In this case, it seems it was an optimization I suppose. But again, the same can be achieved using weak_ptrs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear on where it's safe to create the weak_ptr for the State. I can't do it on the UI thread, and by the time I get to the IO thread the UI object might have been collected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohhh you mean a std::weak_ptr. I get it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason I was thinking we had to still make the callback because some Dart code might be await
ing the completion of the callback, but now that I think about it that doesn't make sense, since the only way this can happen is if the engine is collected. If anyone is actually awaiting the callback, the codec must still be alive. I'm correct in that thinking, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - I believe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only question is whether we need to guard a bit more so that if we get collected after this (e.g. we're still alive before we tell Skia to decode a frame, but we're dead after Skia finishes) - will this still be safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock will fail in that case.
This pull request is not suitable for automatic merging in its current state.
|
I've run this test with the same random seed that flaked over 100 times on both mac and linux locally in profile mode - it hasn't failed. I'm retrying the bot here: https://ci.chromium.org/p/flutter/builders/try/Linux%20Host%20Engine/4950. If it succeeds I plan to land it. The mac web failure is a flake that has nothing to do with this PR. |
FWIW, the flake on Linux host_profile didn't even happen in the new test or codepath touched by this PR :\ |
2020-03-11 [email protected] Roll src/third_party/dart 4093d08271f6..37530145ff53 (4 commits) (flutter/engine#17090) 2020-03-11 [email protected] Roll src/third_party/skia bf355123ae3b..0340292972b9 (9 commits) (flutter/engine#17089) 2020-03-11 [email protected] Roll fuchsia/sdk/core/mac-amd64 from r_oCI... to 0Z8VF... (flutter/engine#17087) 2020-03-11 [email protected] Roll fuchsia/sdk/core/linux-amd64 from v32mJ... to X3Xm2... (flutter/engine#17086) 2020-03-11 [email protected] Remove the unused method on iOS surface to make the resource context current. (flutter/engine#17084) 2020-03-11 [email protected] Revert "Add support for the Metal backend on all iOS builds. (flutter#17080)" (flutter/engine#17088) 2020-03-11 [email protected] Roll src/third_party/dart ace1d9b9213a..4093d08271f6 (12 commits) (flutter/engine#17082) 2020-03-11 [email protected] Add support for the Metal backend on all iOS builds. (flutter/engine#17080) 2020-03-11 [email protected] Roll src/third_party/skia d3f67dbf9f36..bf355123ae3b (9 commits) (flutter/engine#17079) 2020-03-11 [email protected] Disable Embedder11yTest::A11yTreeIsConsistent to unblock LUCI. (flutter/engine#17081) 2020-03-10 [email protected] Gather demangled stack traces and report the same to console on crashes. (flutter/engine#16450) 2020-03-10 [email protected] Implement asynchronous texture uploads when using the Metal backend on iOS. (flutter/engine#17046) 2020-03-10 [email protected] Roll src/third_party/dart 97674262bc29..ace1d9b9213a (14 commits) (flutter/engine#17078) 2020-03-10 [email protected] Add RTree to flow (flutter/engine#16923) 2020-03-10 [email protected] Roll src/third_party/skia 78dac6dcb222..d3f67dbf9f36 (6 commits) (flutter/engine#17072) 2020-03-10 [email protected] Revert "Fix bounds of image_filter_layer (flutter#16960)" (flutter/engine#17074) 2020-03-10 [email protected] Use the ELF loader to setup AOT symbols in benchmark runner. (flutter/engine#17051) 2020-03-10 [email protected] Roll src/third_party/skia 23899c64e3db..78dac6dcb222 (19 commits) (flutter/engine#17069) 2020-03-10 [email protected] Roll dart to 97674262bc29447dc59d5c93024b18b27d4bcf98. (flutter/engine#17067) 2020-03-10 [email protected] [web] Fixes for Firefox & Safari double underline decoration bugs. (flutter/engine#16994) 2020-03-10 [email protected] Avoid capturing this unsafely in MultiFrameCodec (flutter/engine#16824) 2020-03-10 [email protected] Revert "Revert "fix shadows and mask filter blurs (flutter#16963)" (flutter#17008)" (flutter/engine#17040) 2020-03-10 [email protected] Add support for firefox mac installer. Update web_ui pubspec for http.wq (flutter/engine#17044) 2020-03-09 [email protected] fix "TREE INCONSISTENT" noise in compositing_test.dart (flutter/engine#16995) 2020-03-09 [email protected] Add more child lifecycle tests (flutter/engine#16689) 2020-03-09 [email protected] Add libfreetype6-dev to desktop Linux dependencies (flutter/engine#17020) 2020-03-09 [email protected] Disable shell benchmarks (flutter/engine#17038) 2020-03-09 [email protected] Fix bounds of image_filter_layer (flutter/engine#16960) 2020-03-09 [email protected] Record fml and shell benchmarks (flutter/engine#16991) 2020-03-09 [email protected] Roll src/third_party/skia c56950442dd1..23899c64e3db (11 commits) (flutter/engine#17033) 2020-03-09 [email protected] use commit date instead of author date (flutter/engine#17032)
2020-03-11 [email protected] Roll src/third_party/dart 4093d08271f6..37530145ff53 (4 commits) (flutter/engine#17090) 2020-03-11 [email protected] Roll src/third_party/skia bf355123ae3b..0340292972b9 (9 commits) (flutter/engine#17089) 2020-03-11 [email protected] Roll fuchsia/sdk/core/mac-amd64 from r_oCI... to 0Z8VF... (flutter/engine#17087) 2020-03-11 [email protected] Roll fuchsia/sdk/core/linux-amd64 from v32mJ... to X3Xm2... (flutter/engine#17086) 2020-03-11 [email protected] Remove the unused method on iOS surface to make the resource context current. (flutter/engine#17084) 2020-03-11 [email protected] Revert "Add support for the Metal backend on all iOS builds. (#17080)" (flutter/engine#17088) 2020-03-11 [email protected] Roll src/third_party/dart ace1d9b9213a..4093d08271f6 (12 commits) (flutter/engine#17082) 2020-03-11 [email protected] Add support for the Metal backend on all iOS builds. (flutter/engine#17080) 2020-03-11 [email protected] Roll src/third_party/skia d3f67dbf9f36..bf355123ae3b (9 commits) (flutter/engine#17079) 2020-03-11 [email protected] Disable Embedder11yTest::A11yTreeIsConsistent to unblock LUCI. (flutter/engine#17081) 2020-03-10 [email protected] Gather demangled stack traces and report the same to console on crashes. (flutter/engine#16450) 2020-03-10 [email protected] Implement asynchronous texture uploads when using the Metal backend on iOS. (flutter/engine#17046) 2020-03-10 [email protected] Roll src/third_party/dart 97674262bc29..ace1d9b9213a (14 commits) (flutter/engine#17078) 2020-03-10 [email protected] Add RTree to flow (flutter/engine#16923) 2020-03-10 [email protected] Roll src/third_party/skia 78dac6dcb222..d3f67dbf9f36 (6 commits) (flutter/engine#17072) 2020-03-10 [email protected] Revert "Fix bounds of image_filter_layer (#16960)" (flutter/engine#17074) 2020-03-10 [email protected] Use the ELF loader to setup AOT symbols in benchmark runner. (flutter/engine#17051) 2020-03-10 [email protected] Roll src/third_party/skia 23899c64e3db..78dac6dcb222 (19 commits) (flutter/engine#17069) 2020-03-10 [email protected] Roll dart to 97674262bc29447dc59d5c93024b18b27d4bcf98. (flutter/engine#17067) 2020-03-10 [email protected] [web] Fixes for Firefox & Safari double underline decoration bugs. (flutter/engine#16994) 2020-03-10 [email protected] Avoid capturing this unsafely in MultiFrameCodec (flutter/engine#16824) 2020-03-10 [email protected] Revert "Revert "fix shadows and mask filter blurs (#16963)" (#17008)" (flutter/engine#17040) 2020-03-10 [email protected] Add support for firefox mac installer. Update web_ui pubspec for http.wq (flutter/engine#17044) 2020-03-09 [email protected] fix "TREE INCONSISTENT" noise in compositing_test.dart (flutter/engine#16995) 2020-03-09 [email protected] Add more child lifecycle tests (flutter/engine#16689) 2020-03-09 [email protected] Add libfreetype6-dev to desktop Linux dependencies (flutter/engine#17020) 2020-03-09 [email protected] Disable shell benchmarks (flutter/engine#17038) 2020-03-09 [email protected] Fix bounds of image_filter_layer (flutter/engine#16960) 2020-03-09 [email protected] Record fml and shell benchmarks (flutter/engine#16991) 2020-03-09 [email protected] Roll src/third_party/skia c56950442dd1..23899c64e3db (11 commits) (flutter/engine#17033) 2020-03-09 [email protected] use commit date instead of author date (flutter/engine#17032)
fixes flutter/flutter#13359
Creates a
State
object to share as ashared_ptr
instead of capturing this. I don't believe we can use afml::WeakPtr
, since we would have to create it on the IO task runner, which might be blocked until after the MultiFrameCodec is destroyed on the UI runner.Avoids decoding the image if the collect happens before we start image decoding - this seems like the right thing to do to me, but in honesty if I don't do this the test is segfaulting when I try to drain the unref queue and I'm not quite sure why.
Moves some of the logic that's sharable from the Dart Isolate unit tests to a common file for use by the image decoder tests.