Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Conversation

@flar
Copy link
Contributor

@flar flar commented Apr 24, 2020

A recent change to how OpacityLayer manages its single child for caching purposes (see #14559) actually broke the caching of the single child. This fix not only restores the caching for the OpacityLayer, it adds tests that will detect if it successfully caches its child in practice (as long as there is only one).

@flar flar requested review from iskakaushik and liyuqian April 24, 2020 00:00
@auto-assign auto-assign bot requested a review from gaaclarke April 24, 2020 00:00
Copy link
Contributor

@liyuqian liyuqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding MockRasterCache to our unit tests is great! We might want to minimize the duplicate code in MockRasterCache so we're testing the real RasterCache code instead of testing the duplicate code in MockRasterCache. I see that your MockRasterCache revealed some poor designs of the old RasterCache which make the mock test quite difficult. Please let me know if you're able to fix it in this PR. If not, maybe I'll try to come up with a cleaning PR to improve the design and unblock your mock test.

@flar flar force-pushed the fix-and-test-opacity-child-caching branch from 54724ac to 8a77a19 Compare May 16, 2020 00:00
@flar flar requested a review from chinmaygarde May 16, 2020 00:01
@flar
Copy link
Contributor Author

flar commented May 16, 2020

The A/B results from the fading_child_animation_perf__timeline_summary benchmark with this latest version of the fix:

                   Score                     Average A (noise)  Average B (noise)   Speed-up
average_frame_build_time_millis                   1.70 (2.65%)       1.90 (5.50%)    0.90x  
worst_frame_build_time_millis                     5.50 (10.38%)      9.43 (29.76%)   0.58x  
90th_percentile_frame_build_time_millis           2.64 (4.73%)       2.98 (27.60%)   0.89x  
99th_percentile_frame_build_time_millis           4.50 (3.32%)       5.49 (0.56%)    0.82x  
average_frame_rasterizer_time_millis              4.61 (0.64%)       3.79 (0.41%)    1.22x  
worst_frame_rasterizer_time_millis                8.23 (9.12%)       9.23 (4.08%)    0.89x  
90th_percentile_frame_rasterizer_time_millis      4.99 (0.57%)       4.06 (0.57%)    1.23x  
99th_percentile_frame_rasterizer_time_millis      6.20 (1.84%)       7.19 (2.31%)    0.86x  
average_vsync_transitions_missed                  0.00 (0.00%)       0.50 (100.00%)  0.00x  
90th_percentile_vsync_transitions_missed          0.00 (0.00%)       0.50 (100.00%)  0.00x  
99th_percentile_vsync_transitions_missed          0.00 (0.00%)       0.50 (100.00%)  0.00x  

Note that some of the results that are shown as "worse" had a high variability and were mainly worse due to some outlier results. As it was, I had to do a lot of runs because every Nth run would get locked at 16ms which isn't helpful as it is not due to the performance of this code in either form (pre-or-post fix) and is due to getting stuck on a vsync wait.

The main items of interest are the 20% improvement in average and 90% rasterizer times.

I think I know one reason why the 99% and worst frames are not as good - currently we preroll the child layers even though we've already cached their rasterization. While there are some aspects of preroll that we might want to continue to do once we've decided to cache a layer, we don't want the child layers to be doing their own raster caching since it will be irrelevent - their output is already cached at an ancestor level. Indeed that's exactly what happens with this benchmark - the child is a picture layer and it happily wastes lots of cycles deciding and populating a cache with its contents even though they won't be used. I've decided that this issue should be fixed outside the scope of this particular work since it is already wasting cycles whether or not this fix lands.

Copy link
Member

@gaaclarke gaaclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the format of your docstrings, they aren't doxygen compatible.

canvas_size.height());
internal_nodes_canvas.addCanvas(canvas);
Layer::PaintContext paintContext = {
(SkCanvas*)&internal_nodes_canvas,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: argument name comments would be a bit helpful here. Google has standardized on `/arg_name=/

Copy link
Contributor Author

@flar flar May 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I find that notation detracts from readability more than it helps. It has only appeared in a couple of places mostly in the Fuchsia modules so far and the few that I looked at look like what was once simple code becoming text soup. That discussion is probably for another venue, though.

In this particular case, every field is being copied over except for a couple. The somewhat clean code that makes it obvious that this particular process is a "mostly copy transfer/constructor" would start to look like /* really_long_and_obvious_name= */ really_long_and_obvious_name,. Is that adding any value at that point?

The only two lines which are less than obvious are the null for the view_embedder (which is basically, we don't get to this code if there was a view_embedder in the first place so I could replace it with context->view_embedder and it would be identical and more obvious, but waste code) and passing the supplied canvas variable to the leaf_nodes_canvas field only because the field has an adorned name.

I didn't write this code, I moved it a few lines in the source file. I feel it's pretty obvious so I don't see much value in adding what amounts to obfuscation notations unrelated to the actual fix that is happening just because a diff engine's robotic algorithm makes it look like new code in the review tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaaclarke I've made some updates on the code that you commented on, did they satisfy the changes you were requesting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From that page:

When the meaning of a function argument is nonobvious, consider one of the following remedies:
- ...
- ...
- ...
- ...
- As a last resort, use comments to clarify argument meanings at the call site.

I don't think the remaining arguments qualify as "when the meaning is nonobvious" and I don't think this situation calls for a last resort. Further, I think that blindly using that last resort rule on every function call goes against everything that paragraph stands for. Their own example of a clean source example only uses it for one argument and that argument isn't clear from context.

The format isn't dictated, they only indicate that a comment be used - and I used a comment in the manner that all of the surrounding code in that modules uses.

@flar
Copy link
Contributor Author

flar commented May 18, 2020

I have the new refactoring done, but I am going over the doxygen style guidelines before I do another push...

@flar
Copy link
Contributor Author

flar commented May 19, 2020

I've updated a lot of doc comments and did the refactoring to simplify the subclassing of MockRasterCache. Running the A/B test again for verification produces the following results on 5 runs:

                   Score                     Average A (noise)  Average B (noise)   Speed-up
average_frame_build_time_millis                   1.78 (1.38%)       1.84 (6.09%)    0.97x  
worst_frame_build_time_millis                     5.52 (13.25%)      6.01 (11.60%)   0.92x  
90th_percentile_frame_build_time_millis           2.60 (3.62%)       2.76 (29.30%)   0.94x  
99th_percentile_frame_build_time_millis           4.60 (3.92%)       5.38 (2.48%)    0.86x  
average_frame_rasterizer_time_millis              4.68 (0.73%)       3.77 (0.58%)    1.24x  
worst_frame_rasterizer_time_millis                9.18 (9.14%)       9.58 (12.81%)   0.96x  
90th_percentile_frame_rasterizer_time_millis      5.07 (0.91%)       4.04 (0.35%)    1.25x  
99th_percentile_frame_rasterizer_time_millis      6.37 (6.76%)       6.60 (9.45%)    0.97x  
average_vsync_transitions_missed                  0.00 (0.00%)       0.20 (200.00%)  0.00x  
90th_percentile_vsync_transitions_missed          0.00 (0.00%)       0.20 (200.00%)  0.00x  
99th_percentile_vsync_transitions_missed          0.00 (0.00%)       0.20 (200.00%)  0.00x  

@flar
Copy link
Contributor Author

flar commented May 20, 2020

I'm trying to figure out why the Linux Host Engine build is dying in the RasterCache testing code.

For one thing, it is intermittent - maybe 50% of the time. Perhaps it is related to some uninitialized variables. It doesn't seem to happen without my changes, though, so it is not a pre-existing condition.

The exact assert failure is in the Skia quickReject method during the call to drawImage from RCR::draw() and it is an assert that the device bounds are empty iff the raster clip is empty, which seems to be a setup issue in the canvas code, but then why would it only occur with these changes? (Note that the canvas in question is not created by the RasterCache code, it is created by the test code in raster_cache_unittests.cc.)

Still investigating...

@flar
Copy link
Contributor Author

flar commented May 20, 2020

I'm following up with Skia, but the failures I'm seeing in the Linux tests here seem to be a long-standing issue with the empty constructor for SkCanvas. The RasterCache unit tests use an empty SkCanvas to test the Draw() methods and the empty constructor leaves an uninitialized fDeviceClipBounds field which later causes an assert. I'm not sure why we never ran into this before this fix, but if we were always allocating them out of freshly mapped memory it would have always been an empty rect. Changing something in the code can cause it to be allocated in stale memory and then the unitialized values would show through.

I'm following up with Skia to verify my assumptions, but pending that I don't think there is anything wrong with this fix as is.

Either way, it is ready for review...

@flar flar force-pushed the fix-and-test-opacity-child-caching branch from 86ff9a6 to 3daf579 Compare May 21, 2020 09:24
@flar
Copy link
Contributor Author

flar commented May 21, 2020

I have rebased the fix on top of the Skia roll with their fix to the SkCanvas init so hopefully the Linux tests will be successful now.

@flar flar force-pushed the fix-and-test-opacity-child-caching branch from 3daf579 to 185635b Compare May 21, 2020 19:06
@flar flar force-pushed the fix-and-test-opacity-child-caching branch from 6b9c595 to ac3f66e Compare May 22, 2020 00:40
@flar
Copy link
Contributor Author

flar commented May 27, 2020

This is still blocked on changes requested by @gaaclarke

Copy link
Member

@gaaclarke gaaclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, fyi @brief is optional, doxygen will take the first sentence to mean brief.

@flar
Copy link
Contributor Author

flar commented May 28, 2020

lgtm, fyi @brief is optional, doxygen will take the first sentence to mean brief.

Good to know. It looks 99% similar to javadocs, and javadocs has that same "first sentence" rule, but I don't know what all the differences might be.

@flar
Copy link
Contributor Author

flar commented May 28, 2020

I'm going to hold off until we discover the recent regression in the engine before I squash and merge. Hopefully we'll hear back on the bisect in the next hour or two.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla: yes perf: speed Performance issues related to (mostly rendering) speed severe: performance Relates to speed or footprint issues.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants