[Impeller] take advantage of DisplayList culling #41606

flar · 2023-04-29T01:46:46Z

Switching the calls to dispatch into an Impeller Dispatcher to use a cull rect to enable pre-culling of the out-of-bounds ops.

This change showed an improvement of around 2x on the rendering performance of the non-intersecting platform view benchmark, but that was measured without the recent changes to the destructive blend modes in Impeller renderer.

flar · 2023-04-29T01:49:58Z

A/B Comparison of Framework ToT against a local engine that includes both flutter/flutter#125717 and this PR:

                   Score                     Average A (noise) Average B (noise) Speed-up
average_frame_build_time_millis                   0.56 (0.00%)      0.46 (0.00%)  1.22x  
worst_frame_build_time_millis                     3.04 (0.00%)      4.26 (0.00%)  0.71x  
90th_percentile_frame_build_time_millis           0.96 (0.00%)      0.65 (0.00%)  1.48x  
99th_percentile_frame_build_time_millis           2.93 (0.00%)      2.94 (0.00%)  1.00x  
average_frame_rasterizer_time_millis             55.05 (0.00%)     25.20 (0.00%)  2.18x  
worst_frame_rasterizer_time_millis              105.66 (0.00%)     62.40 (0.00%)  1.69x  
90th_percentile_frame_rasterizer_time_millis     72.62 (0.00%)     31.63 (0.00%)  2.30x  
99th_percentile_frame_rasterizer_time_millis    104.08 (0.00%)     54.28 (0.00%)  1.92x

jonahwilliams · 2023-04-29T02:00:29Z

shell/gpu/gpu_surface_metal_impeller.mm

+        if (!buffer_damage.has_value()) {
+          auto size = surface->GetSize();
+          buffer_damage = SkIRect::MakeWH(size.width, size.height);
+          clip_rect = impeller::IRect::MakeXYWH(0, 0, size.width, size.height);
+        }
+        impeller::DlDispatcher impeller_dispatcher(clip_rect.value());
+        display_list->Dispatch(impeller_dispatcher, buffer_damage.value());


I implemented this a bit differently for impeller, rather than adding a clip - i added a translate and then shrunk the size of the root render pass, which causes impeller to cull.

translate: https://github.com/flutter/engine/blob/main/flow/compositor_context.cc#L189

Shrink: https://github.com/flutter/engine/blob/main/impeller/renderer/backend/metal/surface_mtl.mm#L57

I'm not sure if this clip would already do the right thing though.

which causes impeller to cull.

Causes it to cull where?

impeller::Dispatcher::DrawRect immediately calls Canvas::DrawRect without any tests.
Canvas::DrawRect (sometimes does something different for strokes and mask blurs) immediately creates and adds an entity without any other checks.

Where is the culling done if that rectangle was so far outside the clip that it isn't visible? The Dispatch method on DL can do this kind of bounds culling and then the impeller::Dispatcher never even sees the op...

I'm not sure this code here can apply a translation. If there is no damage rect, the cull rect is the size of the surface so it is already sized correctly and 0-based. If there is a damage rect, then the remainder of the surface must remain intact so we need to render the culled ops "in situ" with respect to origin and surface size.

While I still don't think I see all of the relevance of your comment, I don't think we can do translations and resizes of surfaces here because of the nature of this code and the damage rect...?

Ding ding! With your later comment that the blinking cursor was missing I think I finally got what you are saying. Unlike the other platforms that reuse the original framebuffer surface to render the partial damage inside a clip in the middle of the surface, Impeller renders the damage to a reduced size surface and then blits it into place?

In that case then the Impeller cull rect should always be set to just the rect(size) value with no origin, but the Dispatch call should be called with the damage rect since it is in the coordinate space of the original frame?

Makes sense to me

Looking again, it looks like the translate happens before the frame DisplayList is generated, so both rects should be 0-origin-based?

And, doing all of that fixes the blinking cursor problem...

bdero · 2023-04-29T02:09:14Z

impeller/aiks/canvas.cc

@@ -23,15 +23,25 @@
 namespace impeller {

 Canvas::Canvas() {
-  Initialize();
+  Initialize(Rect::MakeLTRB(-1e9, -1e9, 1e9, 1e9));


Nit: We've been using Rect::MakeMaximum() for max coverage/bounds.

How well does MakeMaximum deal with being transformed?

Rather than worry about specifying a huge clip that might encourage degenerate math, I changed the cull_rect to an std::optional.

bdero · 2023-04-29T02:57:32Z

impeller/entity/entity_pass.h

@@ -236,6 +236,7 @@ class EntityPass {

 struct CanvasStackEntry {
  Matrix xformation;
+  Rect clip_bounds;


Could we rename this to clip_coverage and add a docstring noting that this rect is screen space? Impeller loosely standardized around using "bounds" for local space and "coverage" for screen space (unless qualified with a specific space term).

One question, does "coverage" imply exact coverage? This would be a conservative estimate of just the bounds of the device clip.

I moved this struct to canvas.h since it's not used from anywhere else, and how do you feel about "cull_rect" for a name since it is only used for culling, not for clipping?

Nah, "coverage" is always assumed to be a best effort minimum in Impeller. "cull_rect" SGTM.

bdero · 2023-04-29T03:33:08Z

impeller/aiks/canvas.cc

+const Rect Canvas::GetCurrentLocalClipBounds() const {
+  Matrix inverse = xformation_stack_.back().xformation.Invert();
+  return xformation_stack_.back().clip_bounds.TransformBounds(inverse);
+}


Might be good to add a regular test (doesn't use OpenPlaygroundHere) in aiks_unittests which uses this method to verify the intersection.

Is there a way I can have a test that creates a wide ranging DL, then dispatches it to an impeller Dispatcher with a cull_rect (and one that does that with a sub-DisplayList as well), and then checks that there are only Entities for the ops that should have made it through the culling?

I created a new canvas_unittests file and wrote a whole suite of tests which found several bugs in Rect::CutOut().

With respect to making sure that the Entities and Passes are culled by the cull rect, we already have tests of the DisplayList RTree mechanism that make sure that the culled Dispatch variant correctly omits ops in the playback, and with the new tests in canvas_unittests.cc that check that Impeller's Canvas manages the cull rect according to the desired behavior - is there anything left to test here?

I'd say this level of test is beyond the call of duty. 😆

Well, given that it found errors in an existing geom method, isn't it just "at the call of duty"?

We already cover rect methods in geometry_unittests.cc. It sounds like I probably missed a case there.

bdero · 2023-04-29T03:39:38Z

Impeller already tracks the clip bounds in EntityPass at render time, but this change will prevent lots of unnecessary API traffic from hitting Impeller in the first place when platform views are being used, which seems like a good idea to me.

I've been strongly considering pulling some of the culling behavior out into Aiks anyhow so that we can throw away culled things earlier, and so I'm not opposed to tracking a screen space coverage rectangle for the clip in the transform stack.

flar · 2023-04-30T10:19:42Z

Another run of the Impeller non-intersecting platform view benchmark on the latest commit produced similar results to before:

                   Score                     Average A (noise) Average B (noise) Speed-up
average_frame_build_time_millis                   0.56 (0.00%)      0.48 (0.00%)  1.16x  
worst_frame_build_time_millis                     3.30 (0.00%)      4.72 (0.00%)  0.70x  
90th_percentile_frame_build_time_millis           0.93 (0.00%)      0.88 (0.00%)  1.06x  
99th_percentile_frame_build_time_millis           3.11 (0.00%)      3.61 (0.00%)  0.86x  
average_frame_rasterizer_time_millis             56.10 (0.00%)     25.27 (0.00%)  2.22x  
worst_frame_rasterizer_time_millis              105.56 (0.00%)     61.11 (0.00%)  1.73x  
90th_percentile_frame_rasterizer_time_millis     73.32 (0.00%)     31.10 (0.00%)  2.36x  
99th_percentile_frame_rasterizer_time_millis    103.67 (0.00%)     54.79 (0.00%)  1.89x

jonahwilliams · 2023-05-01T16:41:44Z

I tried running this on the Flutter gallery (the old one in flutter/flutter). On the text fields demo, the blinking cursor no longer shows up correctly and the console is filled with messages of the form:

dispatched with culling rect on DL with no rtree

flar · 2023-05-01T20:18:40Z

I tried running this on the Flutter gallery (the old one in flutter/flutter). On the text fields demo, the blinking cursor no longer shows up correctly and the console is filled with messages of the form:
dispatched with culling rect on DL with no rtree

I'll have to see where we are creating these DLs without an rtree, but the blinking cursor is likely because I didn't understand what you were saying above about how Impeller manages partial repaints. I'm curious if my latest reply there indicates if I'm on the same page.

bdero

LGTM! (Modulo the blinking cursor issue mentioned by Jonah)

… methods

flar · 2023-05-01T22:07:40Z

The latest version seems to fix both the blinking cursor and "DL with no RTree" problems.

…ngine#41606)

…125858) flutter/engine@7d87410...024bf94 2023-05-02 [email protected] Roll Skia from 38e56b6da8f9 to c9e0992be00b (3 revisions) (flutter/engine#41655) 2023-05-02 [email protected] Roll Dart SDK from dc4a048e3cf7 to 25c29435f73e (1 revision) (flutter/engine#41654) 2023-05-01 [email protected] [Impeller] take advantage of DisplayList culling (flutter/engine#41606) 2023-05-01 [email protected] Use os_dimension in framework tests. (flutter/engine#41649) 2023-05-01 [email protected] Turn @staticInterop tear-off into closure (flutter/engine#41643) 2023-05-01 [email protected] Roll Fuchsia Linux SDK from SJOgKviZ-kwWd1Z1u... to ur2ymZJCZSj64s6Q2... (flutter/engine#41648) Also rolling transitive DEPS: fuchsia/sdk/core/linux-amd64 from SJOgKviZ-kwW to ur2ymZJCZSj6 If this roll has caused a breakage, revert this CL and stop the roller using the controls here: https://autoroll.skia.org/r/flutter-engine-flutter-autoroll Please CC [email protected],[email protected],[email protected] on the revert to ensure that a human is aware of the problem. To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose To report a problem with the AutoRoller itself, please file a bug: https://bugs.chromium.org/p/skia/issues/entry?template=Autoroller+Bug Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md

zanderso · 2023-05-02T15:59:50Z

SkiaPerf is reporting some nice results from this PR: https://flutter-flutter-perf.skia.org/e/?begin=1682965609&end=1683036391&keys=X5edb569869225d453e9d784e6e3952ea&num_commits=50&request_type=1&xbaroffset=34645

zanderso · 2023-05-02T16:00:51Z

Also reductions in cpu/gpu util on the backdrop filter benchmarks https://flutter-flutter-perf.skia.org/e/?begin=1682965609&end=1683036391&keys=X711aa6c13f994bdd30843f360c7e708f&num_commits=50&request_type=1&xbaroffset=34645

flar · 2023-05-02T18:59:19Z

I have been going through these results. The impact on the non-intersecting platform view benchmark was less than I measured on my iPhone 6 by quite a lot.

I did not expect any impact on any benchmarks that didn't use Impeller, though. All of the changes were in code specific to Impeller, so the massive improvement on the backdrop perf benchmark was unexpected. I can see that it would benefit from the culling, though, as the animation is not over the backdrop entries so with proper culling they will not be attempted on each frame. But, that would only be impacted by this PR if the benchmark was run using Impeller...? (Of course, since all benchmarks are run on Impeller now, it makes sense that the backdrop benchmark would see a huge benefit from the culling)

jonahwilliams · 2023-05-02T19:29:51Z

I think the much earlier culling is driving most of those improvements. Since the dirty region itself is fairly small (just the animating rect). Arguably we should turn of partial repaint for those benchmarks...

For the non-intersecting platform views, there is not partial repaint so it seems like the additional culling may not be significant once we fixed Impeller to not encode those extra instructions? They're just draw rects so its fairly cheap.

I think the opportunity for that benchmark is in recognizing that the overlay surfaces are empty. To take advantage of that, culling isn't sufficient - we need to skip the entire surface.

flar · 2023-05-02T20:10:07Z

I think the much earlier culling is driving most of those improvements. Since the dirty region itself is fairly small (just the animating rect). Arguably we should turn of partial repaint for those benchmarks...

I need to go back and check out the backdrop perf, but I think it was originally testing the caching of BDFLayers and that was before partial repaints. The current situation makes an excellent test of partial repaints, though. If we "fixed" the benchmark to continue rendering the BDFLayers by using a larger animating rect (or one above and one below), we'd then be measuring Impeller's lack of layer caching.

flar · 2023-05-02T20:12:35Z

For the non-intersecting platform views, there is not partial repaint so it seems like the additional culling may not be significant once we fixed Impeller to not encode those extra instructions? They're just draw rects so its fairly cheap.

Actually, isn't there partial repaint, but it isn't very significant? The scrolling list is the only part changing so the header would not need to be redrawn...?

Either way, there is culling because before we didn't cull to anything and now we cull at least to the size of the surface which skips all of the data that is outside of the surface. I haven't looked closely at what is generated by the benchmark, but depending on the "overdraw" of the scrolling list, tons of non-visible rects may have been rendered on each frame.

Update: It looks like "ShouldRender" was doing culling against the entity's coverage which may have eliminated a lot of them, but the DL culling is per-op, not per-entity, so perhaps it eliminated a bunch more?

jonahwilliams · 2023-05-02T20:22:44Z

Partial repaint is disabled with platform views. Also neither backend raster cached backdrop filters

flar · 2023-05-02T20:45:46Z

Partial repaint is disabled with platform views.

Good point. So the gain there is the granularity of the culling.

Also neither backend raster cached backdrop filters

When the benchmark was written it was to measure the performance of backdrop filters in order to track the effects of caching them. There was a prototype, but getting it right ended up being harder than expected and so the work was abandoned, but the benchmark remains. At the time we decided that it was good to keep it to measure the performance of that operation even if our short-term plans didn't pan out as we might have other ways of improving it. (flutter/flutter#34870)

jonahwilliams reviewed Apr 29, 2023

View reviewed changes

bdero reviewed Apr 29, 2023

View reviewed changes

bdero assigned flar Apr 29, 2023

bdero added the e: impeller label Apr 29, 2023

bdero changed the title ~~take advantage of DisplayList culling in Impeller~~ [Impeller] take advantage of DisplayList culling Apr 29, 2023

flar requested a review from bdero April 30, 2023 10:23

bdero approved these changes May 1, 2023

View reviewed changes

flar added 4 commits May 1, 2023 14:55

take advantage of DisplayList culling in Impeller

03dbc94

make cull_rect std::optional and add unit tests of Canvas clip bounds…

81af5eb

… methods

fix warnings about DL with no RTree while running gallery

77ad0e7

fix cull_rect handling in Impeller submit callback

fa1cfa8

flar force-pushed the Impeller-culling-dispatch branch from 756e78b to fa1cfa8 Compare May 1, 2023 21:57

flar requested a review from jonahwilliams May 1, 2023 22:07

flar added the autosubmit Merge PR when tree becomes green via auto submit App label May 1, 2023

auto-submit bot merged commit e64aee0 into flutter:main May 1, 2023

engine-flutter-autoroll mentioned this pull request May 2, 2023

Roll Flutter Engine from 7d87410a51d5 to 4162dcc6daec (5 revisions) flutter/flutter#125850

Closed

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request May 2, 2023

e64aee0be [Impeller] take advantage of DisplayList culling (flutter/e…

3ea8d86

…ngine#41606)

engine-flutter-autoroll mentioned this pull request May 2, 2023

Roll Flutter Engine from 7d87410a51d5 to 4162dcc6daec (5 revisions) flutter/flutter#125856

Closed

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request May 2, 2023

e64aee0be [Impeller] take advantage of DisplayList culling (flutter/e…

fca5b67

…ngine#41606)

engine-flutter-autoroll mentioned this pull request May 2, 2023

Roll Flutter Engine from 7d87410a51d5 to 024bf946232d (6 revisions) flutter/flutter#125858

Merged

engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request May 2, 2023

e64aee0be [Impeller] take advantage of DisplayList culling (flutter/e…

7f7f254

…ngine#41606)

jonahwilliams mentioned this pull request May 6, 2023

[Impeller] Enabling RTree regressed canvas drawing with a large number of operations flutter/flutter#126202

Closed

jonahwilliams mentioned this pull request Nov 24, 2023

[Impeller] CPU overhead of many small canvas operations is surprisingly high. flutter/flutter#138004

Closed

flar mentioned this pull request Nov 27, 2023

SurfaceFrame root DisplayLists will no longer prepare an RTree #48422

Merged

[Impeller] take advantage of DisplayList culling #41606

[Impeller] take advantage of DisplayList culling #41606

Uh oh!

Conversation

flar commented Apr 29, 2023

Uh oh!

flar commented Apr 29, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdero May 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bdero commented Apr 29, 2023

Uh oh!

flar commented Apr 30, 2023

Uh oh!

jonahwilliams commented May 1, 2023

Uh oh!

flar commented May 1, 2023

Uh oh!

bdero left a comment

Choose a reason for hiding this comment

Uh oh!

flar commented May 1, 2023

Uh oh!

zanderso commented May 2, 2023

Uh oh!

zanderso commented May 2, 2023

Uh oh!

flar commented May 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonahwilliams commented May 2, 2023

Uh oh!

flar commented May 2, 2023

Uh oh!

flar commented May 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

bdero May 1, 2023 •

edited

Loading

flar commented May 2, 2023 •

edited

Loading

flar commented May 2, 2023 •

edited

Loading

flar commented May 2, 2023 •

edited

Loading