Made Picture::toImage happen on the IO thread with no need for an onscreen surface. #9813

gaaclarke · 2019-07-12T16:44:26Z

This fixes ShaderWarmUp for iOS (possibly Android too).
References: flutter/flutter#35923

I suspect there are a few other issues related to this bug too, this was probably an issue for Android as well. Here are a few extra benefits of this change:

Now ui.Picture.toImage won't interfere with regular rendering.
In the fullscreen case we should be getting a few more milliseconds of shader warmup
It simplifies the API by removing the SnapshotDelegate

I have a few concerns for this PR:

The thread safety of PersistentCache is suspect. I'll have to audit it further. Chinmay said it should be threadsafe.
There is a race condition where rendering the first frame to the onscreen surface is not waiting for the shader warmup to complete. In the worse case (assuming PersistentCache is threadsafe) we have already compiled a shader in the warmup but the first frame won't have access to it. To fix that we'd have to give the ShaderWarmUp some ability to block the GPU thread.

I'm open to any suggestions for automated testing.

surface.

liyuqian

Thank you Aaron!

This has been something that I wanted to do for a long time. I'd say that doing warm-up/snapshot on the IO thread without blocking is better than doing the work on the GPU thread, or blocking GPU thread in some other ways. By blocking, we're giving users no frame at all during the warm-up; by not blocking, the users can at least see some frames as soon as possible. Of course, we can always provide developers flexibility on whether to block or not.

For unit tests, I think you can add something to testing/dart/canvas_test.dart: record a Picture, calls Picture.toImage, and then check that its pixels are expected. The only challenge I know is how to run the Dart test locally. So far, the only way I know is to run testing/run_tests.sh. You can modify the script to only run the test that you care about.

Once this gets landed, please feel free to add some comments about the performance and UX difference between this new IO-thread approach, and the old GPU-thread approach in the design doc (https://docs.google.com/document/d/1dTcjrK5Nn7roa3WftaKcoOk7-2u_2HItCahlNc73Xt8/edit#heading=h.ar902g5pm5jo) for our future reference.

liyuqian · 2019-07-12T22:58:31Z

lib/ui/painting/picture.cc

+  }
+}
+
+sk_sp<SkImage> MakeRasterSnapshot(sk_sp<SkPicture> picture,


Here are some points that Chinmay and I discussed offline, and think would be nice to add to the code comment:

MakeRasterSnapshot makes a CPU-backed image (instead of a GPU-backed image). It's enables peeking pixels from the Dart code, or encoding the image for storage. But it introduces extra overhead for converting GPU images to CPU images. (Raster means CPU in Skia's jargon.)

Maybe it's nice to also have a MakeSnapshot API that generates a GPU-backed image without the GPU-to-CPU overhead. It doesn't have to be added in this PR. Maybe just left a comment here and add it in a future PR, potentially with some benchmarks to verify the performance gain of MakeSnapshot over MakeRasterSnapshot.

Shader warm-up only needs MakeSnapshot (and as Skia pointed out, we can further modify the GrContext to skip all GPU operations other than shader compilations). We can still preserve the behavior of using MakeRasterSnapshot in this PR. But eventually we'll want to switch to a cheaper version. Maybe leave a comment in the code so we don't forget this.

In some cases, the developer may just want to have a GPU-backed snapshot, and directly composite it back into a GPU surface. The MakeRasterSnapshot will be quite expensive for this case as it triggers a GPU-to-CPU copy, and a CPU-to-GPU copy. Maybe leave a comment here to suggest that we should use the future MakeSnapshot API for this case.

Done. I'll add an issue once this lands.

I'd like to add that #2 was discussed here at length:
#8835

Would it be possible to work #2 in since this section of code is related? Having a GPU-backed image and avoiding the GPU-to-CPU overhead would be beneficial.

Adding @Hixie - I will soon address the concerns brought up there about increased documentation and tests in this area.

I'd like to add that #2 was discussed here at length:
#8835

Would it be possible to work #2 in since this section of code is related? Having a GPU-backed image and avoiding the GPU-to-CPU overhead would be beneficial.

Adding @Hixie - I will soon address the concerns brought up there about increased documentation and tests in this area.

Noted. I think it's something we should do as well, but I think it should be prioritized against other issues and not serve as an impediment to this change.

liyuqian · 2019-07-12T23:00:52Z

lib/ui/painting/picture.cc

+  }
+
+  {
+    TRACE_EVENT0("flutter", "DeviceHostTransfer");


Maybe leave a comment here like "Here device means GPU and host means CPU; this is different from use cases like Flutter driver tests where device means mobile devices and host means laptops/desktops."

chinmaygarde

The ui_unittests target now is OpenGL ES capable on the IO thread. Please add a unit-test for this. See the see image_decoder_unittests.cc for a harness you can use as a reference.

chinmaygarde · 2019-07-15T18:32:51Z

lib/ui/painting/picture.cc

+        MakeSnapshotSurface(picture_bounds, resource_context);
+    sk_sp<SkImage> raster_image = MakeRasterSnapshot(picture, surface);
+
+    fml::TaskRunner::RunNowOrPostTask(


RunNowOrPostTask is only used if there is some form of latching involved. There isn't any here. Please post the task directly to the UI task runner. I realize this may have been because the previous code did the same. But there really is no reason to do this here.

I had to revert this too. It was causing a deadlock for some tests.

chinmaygarde · 2019-07-15T18:37:39Z

lib/ui/painting/picture.cc

+/// @param[in] picture The picture that will get converted to an image.
+/// @param[in] surface The surface tha will be used to render the picture.  This
+///                    will be CPU or GPU based.
+/// @todo Currently this creates a RAM backed image regardless of what type of


"RAM backed" is odd terminology. Even device memory is in RAM right? Can we stick to just the device/host terminology like you have done for the traces?

IMO device/host is very confusing. Even yuqian asked me to add a comment to his code to clarify them. Images in RAM vs VRAM is the model I've used when dealing with the discrete GPU model. Can you tell me exactly what you want here and I'll plop it in.

chinmaygarde · 2019-07-15T18:42:37Z

lib/ui/painting/picture.cc

+sk_sp<SkSurface> MakeSnapshotSurface(const SkISize& picture_size,
+                                     fml::WeakPtr<GrContext> resource_context) {
+  SkImageInfo image_info = SkImageInfo::MakeN32Premul(
+      picture_size.width(), picture_size.height(), SkColorSpace::MakeSRGB());


If the colorspace is nullptr, it defaults to SRGB. We have stuck to this default behavior elsewhere. Please do the same here as well.

I had to remove this change, it started causing tests to fail.

gaaclarke · 2019-07-15T21:05:32Z

The ui_unittests target now is OpenGL ES capable on the IO thread. Please add a unit-test for this. See the see image_decoder_unittests.cc for a harness you can use as a reference.

I've tested the unit-test I wrote for yuqian. It actually is running on the GPU. Maybe we can add an optional 3rd argument that will specify that you want it to run on the GPU if available. I'm a bit worried this could mess with golden image tests as-is since it might be moving those over to GPU from CPU.

~~I attempted this Chinmay but it is non-trivial for the following reasons:~~
~~1) The function I would be testing takes as input and output Dart_Handle's. I'm sure there is a way to introspect into them. I'm not sure it's worth doing from the C++ side.~~
~~2) The functions rely on a global UIDartState. We could factor that out to an argument and rework it as a subset needed for my function.~~
~~3) I'd have to rewrite the golden image testing I made for Dart in C++.~~

I think the best way to test this would be to run the test on the Dart side like I did at yuqian's request. Do you have any ideas how we could take what you did in ui_unittests and get Dart running on top of it so we can test it similarly?

chinmaygarde · 2019-07-15T22:07:57Z

It actually is running on the GPU.

Flutter tester does not run the GPU backend. Are you sure? The ui_unittests can run with both the GPU and CPU backend on a per test basis.

chinmaygarde · 2019-07-15T22:10:02Z

For example, see image_decoder_unittests.cc for ValidImageResultsInSuccess and CanDecodeWithoutAGPUContext. That target also sets up Dart code to run in AOT as well as JIT modes.

This reverts commit 3e3e711.

gaaclarke · 2019-07-15T23:00:59Z

It actually is running on the GPU.

Flutter tester does not run the GPU backend. Are you sure? The ui_unittests can run with both the GPU and CPU backend on a per test basis.

I was mistaken, no they run on the CPU.

gaaclarke · 2019-07-15T23:17:38Z

@chinmaygarde We talked about this offline, the automated test tests the CPU render path. The GPU render path was only tested manually and there is some future work to get Dart running (with dependencies) on the GPU backend.

chinmaygarde · 2019-07-15T23:33:41Z

Filed flutter/flutter#36224 and flutter/flutter#36225 to remind us to continue this yak shave.

This reverts commit ca8776e.

…r an onscreen surface. (flutter/engine#9813)

jason-simmons · 2019-07-17T02:31:16Z

This affected the memory usage metrics in the performance benchmarks.

Picture.toImage is called during app startup by the ShaderWarmUp class in the framework. After this change, the total memory usage of Gallery at startup is reported as several MB higher than before in the Android dumpsys meminfo report. The difference is due to a combination of increased native heap and graphics memory consumption.

I'm not sure why moving picture rasterization to the IO thread has this effect. This may be related to implementation details of how Android malloc handles requests from multiple threads (similar to what we were looking at in flutter/flutter#36079)

@chinmaygarde

gaaclarke · 2019-07-17T16:50:24Z

This affected the memory usage metrics in the performance benchmarks.

Picture.toImage is called during app startup by the ShaderWarmUp class in the framework. After this change, the total memory usage of Gallery at startup is reported as several MB higher than before in the Android dumpsys meminfo report. The difference is due to a combination of increased native heap and graphics memory consumption.

Are the performance benchmarks running with the OpenGL backend? If so, a memory difference is expected because previously shaderwarmup could be incorrectly running on the cpu renderer which meant no shaders actually warmup. The added memory should be compiled shaders. I wouldn't expect those to be several megs in size though. Even as raw text they are maybe 1MB.

gaaclarke · 2019-07-17T16:55:36Z

This affected the memory usage metrics in the performance benchmarks.

Another thing to consider is that depending on how you are measuring memory usage this could cause a higher number because the code is more parallelized. For example if it is measuring peak allocations we are now potentially running the GPU thread at the same time as the IO thread so peak usage could be higher. Whereas previously everything we put on the GPU thread so peak usage would be gated on what that one thread was doing.

liyuqian · 2019-07-17T17:07:55Z

It seems that most memory regressions are on Android, and I believe the shader warm-up were previuosly using GPUs on Android (it was only using CPUs on iOS).

Having resources on the IO thread in addition to those in the GPU thread might be a good explanation. In that case, maybe we should find some way to confirm the conjecture. Finally, if the 10MB cost was unavoidable in the IO thread, then we need to figure out the trade-offs to decide whether to use GPU or IO thread by default (the decisions could be different for Andoird and iOS), and probably provide options to developers to choose which thread to use.

jason-simmons · 2019-07-17T20:52:34Z

You can see the effect on memory usage by starting a simple app created with flutter create and running adb shell dumpsys meminfo.

I took these samples on a Nexus 5X.

Before this PR:

Java Heap:     4124
Native Heap:   13340
Code:          12560
Stack:         556
Graphics:      50792
Private Other: 15660
System:        5161

After:

Java Heap:     4172
Native Heap:   23224
Code:          13556
Stack:         568
Graphics:      61824
Private Other: 15692
System:        7232

The additional memory usage appears to start when surface->getCanvas()->flush() is called on the IO thread in MakeRasterSnapshot.

gaaclarke · 2019-07-18T16:12:57Z

Here are traces from before and after this patch of the first frame:

post.json.gz
pre.json.gz

gaaclarke · 2019-07-18T16:52:53Z

Here is one more trace that shows the gpu no longer being blocked by the warmup. This results in a first frame that is roughly 75ms sooner.

concurrent.json.gz

liyuqian · 2019-08-09T19:39:54Z

Several notes as @xster and I recently discussed:

iOS persistent cache only has GLSL but not binary. Hence doing warm-up on the IO thread is going to hurt the iOS performance as the shaders covered in DefaultShaderWarmUp still need to be compiled from GLSL to binary.
Warm-up on the IO thread also makes some GrContext related resource warm-up not working as IO thread and GPU thread have different GrContexts. (I remember one particular example is the ProgramBuilder which takes about 30ms to initialize in @xster 's trace.)
Before this patch, our GPU thread warm-up probably doesn't work for iOS add-to-app scenario. But it should be working on standalone iOS app as traces in Made Picture::toImage happen on the IO thread with no need for an onscreen surface. #9813 (comment) show.

Hence we might want to reconsider how to do shader warm-up on iOS. It seems to be better to use the GPU thread for standalone iOS apps.

The update is copied from an update we made to a Google-internal client: cl/260202900 The update will save 1 shader compilation. This should help solve our regression: flutter#31203 More regressions on iOS might be introduced later by flutter/engine#9813 (comment) Unfortunately, we didn't rebase our benchmarks so such regressions were not detected. Hence to fully solve flutter#31203, we might need to revert some change in flutter/engine#9813 to make iOS shader warm-up happen on the GPU thread again.

The update is copied from an update we made to a Google-internal client: cl/260202900 The update will save 1 shader compilation. This should help solve our regression: #31203 More regressions on iOS might be introduced later by flutter/engine#9813 (comment) Unfortunately, we didn't rebase our benchmarks so such regressions were not detected. Hence to fully solve #31203, we might need to revert some change in flutter/engine#9813 to make iOS shader warm-up happen on the GPU thread again.

Made Picture::toImage happen on the IO thread with no need for a

4a43442

surface.

googlebot added the cla: yes label Jul 12, 2019

gaaclarke requested review from chinmaygarde and liyuqian July 12, 2019 16:46

gaaclarke added 2 commits July 12, 2019 09:50

Removed more references to snapshot_delegate.h

cfa2f79

Updated formatting of comments.

38ca94d

liyuqian reviewed Jul 12, 2019

View reviewed changes

liyuqian mentioned this pull request Jul 13, 2019

Document //flutter/shell/common/rasterizer #9809

Merged

gaaclarke added 2 commits July 15, 2019 10:05

Added docstring to MakeRasterImage.

cebfbfd

Added clarification comment.

4063f54

chinmaygarde reviewed Jul 15, 2019

View reviewed changes

gaaclarke added 3 commits July 15, 2019 12:39

Added golden image test.

1cf6dd8

Removed RunNowOrPostTask.

ca8776e

Removed explicit colorspace.

3e3e711

Revert "Removed explicit colorspace."

b3678f9

This reverts commit 3e3e711.

chinmaygarde approved these changes Jul 15, 2019

View reviewed changes

Revert "Removed RunNowOrPostTask."

651e55c

This reverts commit ca8776e.

gaaclarke merged commit 78a8ca0 into flutter:master Jul 16, 2019

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 16, 2019

78a8ca0 Made Picture::toImage happen on the IO thread with no need fo…

b392e05

…r an onscreen surface. (flutter/engine#9813)

This was referenced Jul 16, 2019

Roll engine 74af88ba9f8b..950064f9e2ef (5 commits) flutter/flutter#36235

Closed

Roll engine 74af88ba9f8b..b25a1a406d0b (6 commits) flutter/flutter#36238

Merged

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Jul 16, 2019

78a8ca0 Made Picture::toImage happen on the IO thread with no need fo…

3baa784

…r an onscreen surface. (flutter/engine#9813)

gaaclarke mentioned this pull request Jul 16, 2019

Momentary blank screen transitioning into a FlutterViewController for the first time with a prewarmed engine flutter/flutter#32937

Closed

gaaclarke mentioned this pull request Jul 19, 2019

Performance audit embedding Flutter view initialization time flutter/flutter#32945

Closed

liyuqian mentioned this pull request Aug 9, 2019

Update shader warm-up for recent Skia changes flutter/flutter#37955

Merged

liyuqian mentioned this pull request Oct 31, 2019

RenderRepaintBoundary.toImage() occasionally returns a blank image flutter/flutter#43085

Closed

Made Picture::toImage happen on the IO thread with no need for an onscreen surface. #9813

Made Picture::toImage happen on the IO thread with no need for an onscreen surface. #9813

Uh oh!

Conversation

gaaclarke commented Jul 12, 2019

Uh oh!

liyuqian left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chinmaygarde left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gaaclarke commented Jul 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chinmaygarde commented Jul 15, 2019

Uh oh!

chinmaygarde commented Jul 15, 2019

Uh oh!

gaaclarke commented Jul 15, 2019

Uh oh!

gaaclarke commented Jul 15, 2019

Uh oh!

chinmaygarde commented Jul 15, 2019

Uh oh!

jason-simmons commented Jul 17, 2019

Uh oh!

gaaclarke commented Jul 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gaaclarke commented Jul 17, 2019

Uh oh!

liyuqian commented Jul 17, 2019

Uh oh!

jason-simmons commented Jul 17, 2019

Uh oh!

gaaclarke commented Jul 18, 2019

Uh oh!

gaaclarke commented Jul 18, 2019

Uh oh!

liyuqian commented Aug 9, 2019

Uh oh!

Uh oh!

gaaclarke commented Jul 15, 2019 •

edited

Loading

gaaclarke commented Jul 17, 2019 •

edited

Loading