Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

[Impeller] take advantage of DisplayList culling #41606

Merged
merged 4 commits into from
May 1, 2023

Conversation

flar
Copy link
Contributor

@flar flar commented Apr 29, 2023

Switching the calls to dispatch into an Impeller Dispatcher to use a cull rect to enable pre-culling of the out-of-bounds ops.

This change showed an improvement of around 2x on the rendering performance of the non-intersecting platform view benchmark, but that was measured without the recent changes to the destructive blend modes in Impeller renderer.

@flar
Copy link
Contributor Author

flar commented Apr 29, 2023

A/B Comparison of Framework ToT against a local engine that includes both flutter/flutter#125717 and this PR:

                   Score                     Average A (noise) Average B (noise) Speed-up
average_frame_build_time_millis                   0.56 (0.00%)      0.46 (0.00%)  1.22x  
worst_frame_build_time_millis                     3.04 (0.00%)      4.26 (0.00%)  0.71x  
90th_percentile_frame_build_time_millis           0.96 (0.00%)      0.65 (0.00%)  1.48x  
99th_percentile_frame_build_time_millis           2.93 (0.00%)      2.94 (0.00%)  1.00x  
average_frame_rasterizer_time_millis             55.05 (0.00%)     25.20 (0.00%)  2.18x  
worst_frame_rasterizer_time_millis              105.66 (0.00%)     62.40 (0.00%)  1.69x  
90th_percentile_frame_rasterizer_time_millis     72.62 (0.00%)     31.63 (0.00%)  2.30x  
99th_percentile_frame_rasterizer_time_millis    104.08 (0.00%)     54.28 (0.00%)  1.92x  

Comment on lines 121 to 127
if (!buffer_damage.has_value()) {
auto size = surface->GetSize();
buffer_damage = SkIRect::MakeWH(size.width, size.height);
clip_rect = impeller::IRect::MakeXYWH(0, 0, size.width, size.height);
}
impeller::DlDispatcher impeller_dispatcher(clip_rect.value());
display_list->Dispatch(impeller_dispatcher, buffer_damage.value());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this a bit differently for impeller, rather than adding a clip - i added a translate and then shrunk the size of the root render pass, which causes impeller to cull.

translate: https://github.com/flutter/engine/blob/main/flow/compositor_context.cc#L189

Shrink: https://github.com/flutter/engine/blob/main/impeller/renderer/backend/metal/surface_mtl.mm#L57

I'm not sure if this clip would already do the right thing though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which causes impeller to cull.

Causes it to cull where?

impeller::Dispatcher::DrawRect immediately calls Canvas::DrawRect without any tests.
Canvas::DrawRect (sometimes does something different for strokes and mask blurs) immediately creates and adds an entity without any other checks.

Where is the culling done if that rectangle was so far outside the clip that it isn't visible? The Dispatch method on DL can do this kind of bounds culling and then the impeller::Dispatcher never even sees the op...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this code here can apply a translation. If there is no damage rect, the cull rect is the size of the surface so it is already sized correctly and 0-based. If there is a damage rect, then the remainder of the surface must remain intact so we need to render the culled ops "in situ" with respect to origin and surface size.

While I still don't think I see all of the relevance of your comment, I don't think we can do translations and resizes of surfaces here because of the nature of this code and the damage rect...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ding ding! With your later comment that the blinking cursor was missing I think I finally got what you are saying. Unlike the other platforms that reuse the original framebuffer surface to render the partial damage inside a clip in the middle of the surface, Impeller renders the damage to a reduced size surface and then blits it into place?

In that case then the Impeller cull rect should always be set to just the rect(size) value with no origin, but the Dispatch call should be called with the damage rect since it is in the coordinate space of the original frame?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking again, it looks like the translate happens before the frame DisplayList is generated, so both rects should be 0-origin-based?

And, doing all of that fixes the blinking cursor problem...

@@ -23,15 +23,25 @@
namespace impeller {

Canvas::Canvas() {
Initialize();
Initialize(Rect::MakeLTRB(-1e9, -1e9, 1e9, 1e9));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We've been using Rect::MakeMaximum() for max coverage/bounds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How well does MakeMaximum deal with being transformed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than worry about specifying a huge clip that might encourage degenerate math, I changed the cull_rect to an std::optional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@@ -236,6 +236,7 @@ class EntityPass {

struct CanvasStackEntry {
Matrix xformation;
Rect clip_bounds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename this to clip_coverage and add a docstring noting that this rect is screen space? Impeller loosely standardized around using "bounds" for local space and "coverage" for screen space (unless qualified with a specific space term).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, does "coverage" imply exact coverage? This would be a conservative estimate of just the bounds of the device clip.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this struct to canvas.h since it's not used from anywhere else, and how do you feel about "cull_rect" for a name since it is only used for culling, not for clipping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, "coverage" is always assumed to be a best effort minimum in Impeller. "cull_rect" SGTM.

const Rect Canvas::GetCurrentLocalClipBounds() const {
Matrix inverse = xformation_stack_.back().xformation.Invert();
return xformation_stack_.back().clip_bounds.TransformBounds(inverse);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add a regular test (doesn't use OpenPlaygroundHere) in aiks_unittests which uses this method to verify the intersection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way I can have a test that creates a wide ranging DL, then dispatches it to an impeller Dispatcher with a cull_rect (and one that does that with a sub-DisplayList as well), and then checks that there are only Entities for the ops that should have made it through the culling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a new canvas_unittests file and wrote a whole suite of tests which found several bugs in Rect::CutOut().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With respect to making sure that the Entities and Passes are culled by the cull rect, we already have tests of the DisplayList RTree mechanism that make sure that the culled Dispatch variant correctly omits ops in the playback, and with the new tests in canvas_unittests.cc that check that Impeller's Canvas manages the cull rect according to the desired behavior - is there anything left to test here?

Copy link
Member

@bdero bdero May 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this level of test is beyond the call of duty. 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, given that it found errors in an existing geom method, isn't it just "at the call of duty"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already cover rect methods in geometry_unittests.cc. It sounds like I probably missed a case there.

@bdero
Copy link
Member

bdero commented Apr 29, 2023

Impeller already tracks the clip bounds in EntityPass at render time, but this change will prevent lots of unnecessary API traffic from hitting Impeller in the first place when platform views are being used, which seems like a good idea to me.

I've been strongly considering pulling some of the culling behavior out into Aiks anyhow so that we can throw away culled things earlier, and so I'm not opposed to tracking a screen space coverage rectangle for the clip in the transform stack.

@bdero bdero changed the title take advantage of DisplayList culling in Impeller [Impeller] take advantage of DisplayList culling Apr 29, 2023
@flar
Copy link
Contributor Author

flar commented Apr 30, 2023

Another run of the Impeller non-intersecting platform view benchmark on the latest commit produced similar results to before:

                   Score                     Average A (noise) Average B (noise) Speed-up
average_frame_build_time_millis                   0.56 (0.00%)      0.48 (0.00%)  1.16x  
worst_frame_build_time_millis                     3.30 (0.00%)      4.72 (0.00%)  0.70x  
90th_percentile_frame_build_time_millis           0.93 (0.00%)      0.88 (0.00%)  1.06x  
99th_percentile_frame_build_time_millis           3.11 (0.00%)      3.61 (0.00%)  0.86x  
average_frame_rasterizer_time_millis             56.10 (0.00%)     25.27 (0.00%)  2.22x  
worst_frame_rasterizer_time_millis              105.56 (0.00%)     61.11 (0.00%)  1.73x  
90th_percentile_frame_rasterizer_time_millis     73.32 (0.00%)     31.10 (0.00%)  2.36x  
99th_percentile_frame_rasterizer_time_millis    103.67 (0.00%)     54.79 (0.00%)  1.89x  

@flar flar requested a review from bdero April 30, 2023 10:23
@jonahwilliams
Copy link
Member

I tried running this on the Flutter gallery (the old one in flutter/flutter). On the text fields demo, the blinking cursor no longer shows up correctly and the console is filled with messages of the form:

dispatched with culling rect on DL with no rtree

@flar
Copy link
Contributor Author

flar commented May 1, 2023

I tried running this on the Flutter gallery (the old one in flutter/flutter). On the text fields demo, the blinking cursor no longer shows up correctly and the console is filled with messages of the form:

dispatched with culling rect on DL with no rtree

I'll have to see where we are creating these DLs without an rtree, but the blinking cursor is likely because I didn't understand what you were saying above about how Impeller manages partial repaints. I'm curious if my latest reply there indicates if I'm on the same page.

Copy link
Member

@bdero bdero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (Modulo the blinking cursor issue mentioned by Jonah)

@flar flar force-pushed the Impeller-culling-dispatch branch from 756e78b to fa1cfa8 Compare May 1, 2023 21:57
@flar
Copy link
Contributor Author

flar commented May 1, 2023

The latest version seems to fix both the blinking cursor and "DL with no RTree" problems.

@flar flar requested a review from jonahwilliams May 1, 2023 22:07
@flar flar added the autosubmit Merge PR when tree becomes green via auto submit App label May 1, 2023
@auto-submit auto-submit bot merged commit e64aee0 into flutter:main May 1, 2023
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request May 2, 2023
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request May 2, 2023
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request May 2, 2023
auto-submit bot pushed a commit to flutter/flutter that referenced this pull request May 2, 2023
…125858)

flutter/engine@7d87410...024bf94

2023-05-02 [email protected] Roll Skia from 38e56b6da8f9 to c9e0992be00b (3 revisions) (flutter/engine#41655)
2023-05-02 [email protected] Roll Dart SDK from dc4a048e3cf7 to 25c29435f73e (1 revision) (flutter/engine#41654)
2023-05-01 [email protected] [Impeller] take advantage of DisplayList culling (flutter/engine#41606)
2023-05-01 [email protected] Use os_dimension in framework tests. (flutter/engine#41649)
2023-05-01 [email protected] Turn @staticInterop tear-off into closure (flutter/engine#41643)
2023-05-01 [email protected] Roll Fuchsia Linux SDK from SJOgKviZ-kwWd1Z1u... to ur2ymZJCZSj64s6Q2... (flutter/engine#41648)

Also rolling transitive DEPS:
  fuchsia/sdk/core/linux-amd64 from SJOgKviZ-kwW to ur2ymZJCZSj6

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-engine-flutter-autoroll
Please CC [email protected],[email protected],[email protected] on the revert to ensure that a human
is aware of the problem.

To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://bugs.chromium.org/p/skia/issues/entry?template=Autoroller+Bug

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
@zanderso
Copy link
Member

zanderso commented May 2, 2023

@zanderso
Copy link
Member

zanderso commented May 2, 2023

@flar
Copy link
Contributor Author

flar commented May 2, 2023

I have been going through these results. The impact on the non-intersecting platform view benchmark was less than I measured on my iPhone 6 by quite a lot.

I did not expect any impact on any benchmarks that didn't use Impeller, though. All of the changes were in code specific to Impeller, so the massive improvement on the backdrop perf benchmark was unexpected. I can see that it would benefit from the culling, though, as the animation is not over the backdrop entries so with proper culling they will not be attempted on each frame. But, that would only be impacted by this PR if the benchmark was run using Impeller...? (Of course, since all benchmarks are run on Impeller now, it makes sense that the backdrop benchmark would see a huge benefit from the culling)

@jonahwilliams
Copy link
Member

I think the much earlier culling is driving most of those improvements. Since the dirty region itself is fairly small (just the animating rect). Arguably we should turn of partial repaint for those benchmarks...

For the non-intersecting platform views, there is not partial repaint so it seems like the additional culling may not be significant once we fixed Impeller to not encode those extra instructions? They're just draw rects so its fairly cheap.

I think the opportunity for that benchmark is in recognizing that the overlay surfaces are empty. To take advantage of that, culling isn't sufficient - we need to skip the entire surface.

@flar
Copy link
Contributor Author

flar commented May 2, 2023

I think the much earlier culling is driving most of those improvements. Since the dirty region itself is fairly small (just the animating rect). Arguably we should turn of partial repaint for those benchmarks...

I need to go back and check out the backdrop perf, but I think it was originally testing the caching of BDFLayers and that was before partial repaints. The current situation makes an excellent test of partial repaints, though. If we "fixed" the benchmark to continue rendering the BDFLayers by using a larger animating rect (or one above and one below), we'd then be measuring Impeller's lack of layer caching.

@flar
Copy link
Contributor Author

flar commented May 2, 2023

For the non-intersecting platform views, there is not partial repaint so it seems like the additional culling may not be significant once we fixed Impeller to not encode those extra instructions? They're just draw rects so its fairly cheap.

Actually, isn't there partial repaint, but it isn't very significant? The scrolling list is the only part changing so the header would not need to be redrawn...?

Either way, there is culling because before we didn't cull to anything and now we cull at least to the size of the surface which skips all of the data that is outside of the surface. I haven't looked closely at what is generated by the benchmark, but depending on the "overdraw" of the scrolling list, tons of non-visible rects may have been rendered on each frame.

Update: It looks like "ShouldRender" was doing culling against the entity's coverage which may have eliminated a lot of them, but the DL culling is per-op, not per-entity, so perhaps it eliminated a bunch more?

@jonahwilliams
Copy link
Member

Partial repaint is disabled with platform views. Also neither backend raster cached backdrop filters

@flar
Copy link
Contributor Author

flar commented May 2, 2023

Partial repaint is disabled with platform views.

Good point. So the gain there is the granularity of the culling.

Also neither backend raster cached backdrop filters

When the benchmark was written it was to measure the performance of backdrop filters in order to track the effects of caching them. There was a prototype, but getting it right ended up being harder than expected and so the work was abandoned, but the benchmark remains. At the time we decided that it was good to keep it to measure the performance of that operation even if our short-term plans didn't pan out as we might have other ways of improving it. (flutter/flutter#34870)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
autosubmit Merge PR when tree becomes green via auto submit App e: impeller
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants