-
Notifications
You must be signed in to change notification settings - Fork 6k
Made Picture::toImage happen on the IO thread with no need for an onscreen surface. #9813
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Aaron!
This has been something that I wanted to do for a long time. I'd say that doing warm-up/snapshot on the IO thread without blocking is better than doing the work on the GPU thread, or blocking GPU thread in some other ways. By blocking, we're giving users no frame at all during the warm-up; by not blocking, the users can at least see some frames as soon as possible. Of course, we can always provide developers flexibility on whether to block or not.
For unit tests, I think you can add something to testing/dart/canvas_test.dart
: record a Picture, calls Picture.toImage
, and then check that its pixels are expected. The only challenge I know is how to run the Dart test locally. So far, the only way I know is to run testing/run_tests.sh
. You can modify the script to only run the test that you care about.
Once this gets landed, please feel free to add some comments about the performance and UX difference between this new IO-thread approach, and the old GPU-thread approach in the design doc (https://docs.google.com/document/d/1dTcjrK5Nn7roa3WftaKcoOk7-2u_2HItCahlNc73Xt8/edit#heading=h.ar902g5pm5jo) for our future reference.
} | ||
} | ||
|
||
sk_sp<SkImage> MakeRasterSnapshot(sk_sp<SkPicture> picture, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some points that Chinmay and I discussed offline, and think would be nice to add to the code comment:
-
MakeRasterSnapshot
makes a CPU-backed image (instead of a GPU-backed image). It's enables peeking pixels from the Dart code, or encoding the image for storage. But it introduces extra overhead for converting GPU images to CPU images. (Raster means CPU in Skia's jargon.) -
Maybe it's nice to also have a
MakeSnapshot
API that generates a GPU-backed image without the GPU-to-CPU overhead. It doesn't have to be added in this PR. Maybe just left a comment here and add it in a future PR, potentially with some benchmarks to verify the performance gain ofMakeSnapshot
overMakeRasterSnapshot
. -
Shader warm-up only needs
MakeSnapshot
(and as Skia pointed out, we can further modify theGrContext
to skip all GPU operations other than shader compilations). We can still preserve the behavior of usingMakeRasterSnapshot
in this PR. But eventually we'll want to switch to a cheaper version. Maybe leave a comment in the code so we don't forget this. -
In some cases, the developer may just want to have a GPU-backed snapshot, and directly composite it back into a GPU surface. The
MakeRasterSnapshot
will be quite expensive for this case as it triggers a GPU-to-CPU copy, and a CPU-to-GPU copy. Maybe leave a comment here to suggest that we should use the futureMakeSnapshot
API for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I'll add an issue once this lands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to add that #2 was discussed here at length:
#8835
Would it be possible to work #2 in since this section of code is related? Having a GPU-backed image and avoiding the GPU-to-CPU overhead would be beneficial.
Adding @Hixie - I will soon address the concerns brought up there about increased documentation and tests in this area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to add that #2 was discussed here at length:
#8835Would it be possible to work #2 in since this section of code is related? Having a GPU-backed image and avoiding the GPU-to-CPU overhead would be beneficial.
Adding @Hixie - I will soon address the concerns brought up there about increased documentation and tests in this area.
Noted. I think it's something we should do as well, but I think it should be prioritized against other issues and not serve as an impediment to this change.
} | ||
|
||
{ | ||
TRACE_EVENT0("flutter", "DeviceHostTransfer"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe leave a comment here like "Here device means GPU and host means CPU; this is different from use cases like Flutter driver tests where device means mobile devices and host means laptops/desktops."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ui_unittests
target now is OpenGL ES capable on the IO thread. Please add a unit-test for this. See the see image_decoder_unittests.cc
for a harness you can use as a reference.
MakeSnapshotSurface(picture_bounds, resource_context); | ||
sk_sp<SkImage> raster_image = MakeRasterSnapshot(picture, surface); | ||
|
||
fml::TaskRunner::RunNowOrPostTask( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RunNowOrPostTask
is only used if there is some form of latching involved. There isn't any here. Please post the task directly to the UI task runner. I realize this may have been because the previous code did the same. But there really is no reason to do this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to revert this too. It was causing a deadlock for some tests.
/// @param[in] picture The picture that will get converted to an image. | ||
/// @param[in] surface The surface tha will be used to render the picture. This | ||
/// will be CPU or GPU based. | ||
/// @todo Currently this creates a RAM backed image regardless of what type of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"RAM backed" is odd terminology. Even device memory is in RAM right? Can we stick to just the device/host terminology like you have done for the traces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO device/host is very confusing. Even yuqian asked me to add a comment to his code to clarify them. Images in RAM vs VRAM is the model I've used when dealing with the discrete GPU model. Can you tell me exactly what you want here and I'll plop it in.
sk_sp<SkSurface> MakeSnapshotSurface(const SkISize& picture_size, | ||
fml::WeakPtr<GrContext> resource_context) { | ||
SkImageInfo image_info = SkImageInfo::MakeN32Premul( | ||
picture_size.width(), picture_size.height(), SkColorSpace::MakeSRGB()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the colorspace is nullptr, it defaults to SRGB
. We have stuck to this default behavior elsewhere. Please do the same here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to remove this change, it started causing tests to fail.
I've tested the unit-test I wrote for yuqian. It actually is running on the GPU. Maybe we can add an optional 3rd argument that will specify that you want it to run on the GPU if available. I'm a bit worried this could mess with golden image tests as-is since it might be moving those over to GPU from CPU.
|
Flutter tester does not run the GPU backend. Are you sure? The |
For example, see image_decoder_unittests.cc for |
This reverts commit 3e3e711.
I was mistaken, no they run on the CPU. |
@chinmaygarde We talked about this offline, the automated test tests the CPU render path. The GPU render path was only tested manually and there is some future work to get Dart running (with dependencies) on the GPU backend. |
Filed flutter/flutter#36224 and flutter/flutter#36225 to remind us to continue this yak shave. |
This reverts commit ca8776e.
…r an onscreen surface. (flutter/engine#9813)
…r an onscreen surface. (flutter/engine#9813)
This affected the memory usage metrics in the performance benchmarks.
I'm not sure why moving picture rasterization to the IO thread has this effect. This may be related to implementation details of how Android malloc handles requests from multiple threads (similar to what we were looking at in flutter/flutter#36079) |
Are the performance benchmarks running with the OpenGL backend? If so, a memory difference is expected because previously shaderwarmup could be incorrectly running on the cpu renderer which meant no shaders actually warmup. The added memory should be compiled shaders. I wouldn't expect those to be several megs in size though. Even as raw text they are maybe 1MB. |
Another thing to consider is that depending on how you are measuring memory usage this could cause a higher number because the code is more parallelized. For example if it is measuring peak allocations we are now potentially running the GPU thread at the same time as the IO thread so peak usage could be higher. Whereas previously everything we put on the GPU thread so peak usage would be gated on what that one thread was doing. |
It seems that most memory regressions are on Android, and I believe the shader warm-up were previuosly using GPUs on Android (it was only using CPUs on iOS). Having resources on the IO thread in addition to those in the GPU thread might be a good explanation. In that case, maybe we should find some way to confirm the conjecture. Finally, if the 10MB cost was unavoidable in the IO thread, then we need to figure out the trade-offs to decide whether to use GPU or IO thread by default (the decisions could be different for Andoird and iOS), and probably provide options to developers to choose which thread to use. |
You can see the effect on memory usage by starting a simple app created with I took these samples on a Nexus 5X. Before this PR:
After:
The additional memory usage appears to start when |
Here are traces from before and after this patch of the first frame: |
Here is one more trace that shows the gpu no longer being blocked by the warmup. This results in a first frame that is roughly 75ms sooner. |
Several notes as @xster and I recently discussed:
Hence we might want to reconsider how to do shader warm-up on iOS. It seems to be better to use the GPU thread for standalone iOS apps. |
The update is copied from an update we made to a Google-internal client: cl/260202900 The update will save 1 shader compilation. This should help solve our regression: flutter#31203 More regressions on iOS might be introduced later by flutter/engine#9813 (comment) Unfortunately, we didn't rebase our benchmarks so such regressions were not detected. Hence to fully solve flutter#31203, we might need to revert some change in flutter/engine#9813 to make iOS shader warm-up happen on the GPU thread again.
The update is copied from an update we made to a Google-internal client: cl/260202900 The update will save 1 shader compilation. This should help solve our regression: #31203 More regressions on iOS might be introduced later by flutter/engine#9813 (comment) Unfortunately, we didn't rebase our benchmarks so such regressions were not detected. Hence to fully solve #31203, we might need to revert some change in flutter/engine#9813 to make iOS shader warm-up happen on the GPU thread again.
This fixes ShaderWarmUp for iOS (possibly Android too).
References: flutter/flutter#35923
I suspect there are a few other issues related to this bug too, this was probably an issue for Android as well. Here are a few extra benefits of this change:
I have a few concerns for this PR:
I'm open to any suggestions for automated testing.