-
Notifications
You must be signed in to change notification settings - Fork 6k
[Impeller] Adds the ability to specify a golden threshold #40824
Conversation
fyi @Piinks @chinmaygarde |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want thresholds, we want to follow the Skia GPU teams guidelines on having the Gold results be tackled out of band.
I thought those things are mutually exclusive? No matter how failures are triaged we don't want this test to be tripping up all the time, right? |
Golden file changes have been found for this pull request. Click here to view and triage (e.g. because this is an intentional change). If you are still iterating on this change and are not ready to resolve the images on the Flutter Gold dashboard, consider marking this PR as a draft pull request above. You will still be able to view image results on the dashboard, commenting will be silenced, and the check will not try to resolve itself until marked ready for review. |
I'm actually a bit confused by SkiaGold's dashboard, the max diff pixels is 7864, yet the different number of pixels is reported as 24. Shouldn't this not have been flagged? |
cc @mdebbar
I am not familiar with how the engine is configured for Gold. |
The understanding is that each gold test has multiple valid variants. As long as a triager confirms that two variants are sufficiently alike, the test will pass unless an entirely new variant is detected. |
Ahh yea, I've heard mention of such a feature. Who can help us with that? |
The way the fuzzy matcher works in Gold is by doing two checks:
What's happening is your image is passing the 1st condition but it's failing on the 2nd. |
@chinmaygarde there was a lot of talk about this in different channels. I'll try to summarize it:
I think we still want a tiny bit of fuzzing in addition to the other mechanisms. This PR makes it so all the tests pass if less than 1% of pixels are different by less than 4 color component deltas. (I increased the color delta for the rotated text to 40). PTAL, let me know if you want to discuss this further and we can try to suss everything out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still want a tiny bit of fuzzing in addition to the other mechanisms...
I think that is totally reasonable. But I'd rather not have the tests get into the business of determining the deltas like in the current version of the patch.
If there is a global fuzz value you are comfortable with, let's set it in the test harness. If not, we can leave it out altogether. I'd rather not have to think about how to figure out maxDiffPixelsPercent
and max_color_delta
for each screenshot. That is cognitive overhead we don't want to add every time we write a test. Especially if the process is going to be semi-manual anyway.
Okay, I switched it up so the whole test_runner has the same values across the whole tests. |
Golden file changes are available for triage from new commit, Click here to view. |
I'm a bit confused about what Skia gold is doing. The color delta is clearly set to 8 but the skia gold report has it set to zero: Let's land this and keep an eye on it. It may be some latent state or inability to change the fuzzy threshold after the fact, we'll see. |
This does not appear to be working as intended. The correct config for fuzzy matching is being received by gold, but it looks like this image is just flaky. @camsim99 came across it in #40924 and looking at the digest, we can see the image has a different color dot for every commit. This means it is not producing a consistent image, and is blocking PRs from landing. https://flutter-engine-gold.skia.org/search?issue=40924&crs=github&patchsets=3&corpus=flutter-engine |
reverts #40818
Pre-launch Checklist
///
).If you need help, consider asking for advice on the #hackers-new channel on Discord.