[minibench] Drop outliers from benchmark result #8919

kirklandsign · 2025-03-04T04:55:01Z

Summary

Currently the result has large variance from outliers, so only use 80% samples in the middle (trimmean 0.2)

Test plan

CI

Currently the result has large variance from outliers, so only use 80% samples in the middle (trimmean 0.2)

pytorch-bot · 2025-03-04T04:55:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8919

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ce0902f with merge base 2ee3ffa ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-03-04T04:55:55Z

@kirklandsign has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kirklandsign · 2025-03-04T05:37:35Z

@guangy10 @huydhn

The result is not bad. Now the difference between different runs is reduced to about 2%.

For load time, unfortunately it cannot be addressed yet, because we only have 1 load time measurement during a run.

@huydhn now I use "avg_inference_latency" field, but actually it's trimmean. Please let me know if you are unhappy with using the existing field. Honestly I think it's ok 😜

ic4
xnnpack_q8
Samsung Galaxy S24 (Android 14)
24.74 → 25.03
51.46 → 64.21
ic4
xnnpack_q8
Samsung Galaxy S24 Ultra (Android 14)
23.4 → 23.72
51.05 → 56.44
ic4
xnnpack_q8
Samsung Galaxy S24+ (Android 14)
26.17 → 26.48
64.59 → 56.06

facebook-github-bot · 2025-03-04T05:44:12Z

@kirklandsign has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2025-03-04T06:23:52Z

@kirklandsign has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kirklandsign · 2025-03-04T07:05:53Z

Again, very promising result after I tried one more time.

ic4
xnnpack_q8
Samsung Galaxy S24 (Android 14)
23.53 → 24.74
76.1 → 51.46
ic4
xnnpack_q8
Samsung Galaxy S24 Ultra (Android 14)
23.69 → 23.4
55.37 → 51.05
ic4
xnnpack_q8
Samsung Galaxy S24+ (Android 14)
27.44 → 26.17
55.61 → 64.59

huydhn

LGTM!

huydhn · 2025-03-04T08:29:54Z

@huydhn now I use "avg_inference_latency" field, but actually it's trimmean. Please let me know if you are unhappy with using the existing field. Honestly I think it's ok 😜

I'm ok with this too although I don't know enough statistics to decide if 0.2 is a reasonable value to use, i.e. why not 0.1.

For the load time, I think we could consider running minibench via abd multiple times but it feels overkill, maybe it's ok to have a load time with high variance and just use a higher alert threshold for that metric on the dashboard.

kirklandsign · 2025-03-04T17:40:56Z

I'm ok with this too although I don't know enough statistics to decide if 0.2 is a reasonable value to use, i.e. why not 0.1.

Unfortunately I tried 0.1 but it is not so good. It's quite left skewed.

Currently the result has large variance from outliers, so only use 80% samples in the middle (trimmean 0.2)

[minibench] Drop outliers from benchmark result

646f6cc

Currently the result has large variance from outliers, so only use 80% samples in the middle (trimmean 0.2)

kirklandsign requested a review from tarun292 as a code owner March 4, 2025 04:55

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 4, 2025

Merge branch 'main' into kirk-use-trimmean

62d5b73

kirklandsign temporarily deployed to upload-benchmark-results March 4, 2025 05:32 — with GitHub Actions Inactive

kirklandsign added the topic: not user facing label Mar 4, 2025

format

a622dc1

kirklandsign requested review from huydhn and guangy10 March 4, 2025 05:43

kirklandsign mentioned this pull request Mar 4, 2025

Benchmark: Fix metrics high variability from run-to-run #8603

Open

Merge branch 'main' into kirk-use-trimmean

ce0902f

kirklandsign temporarily deployed to upload-benchmark-results March 4, 2025 06:59 — with GitHub Actions Inactive

huydhn approved these changes Mar 4, 2025

View reviewed changes

kirklandsign merged commit 09ad20a into main Mar 4, 2025
61 checks passed

kirklandsign deleted the kirk-use-trimmean branch March 4, 2025 17:42

zonglinpeng pushed a commit that referenced this pull request Mar 6, 2025

[minibench] Drop outliers from benchmark result (#8919)

c3b7ef9

Currently the result has large variance from outliers, so only use 80% samples in the middle (trimmean 0.2)

github-actions bot mentioned this pull request Mar 10, 2025

Weekly pr metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#16

Open

This was referenced Mar 17, 2025

Weekly pr metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#18

Open

Weekly pr metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#20

Open

github-actions bot mentioned this pull request Mar 31, 2025

Weekly pr metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#22

Open

github-actions bot mentioned this pull request Apr 7, 2025

Weekly pr metrics report - 2025-03-01..2025-03-07 wdvr/pytorch#26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[minibench] Drop outliers from benchmark result #8919

[minibench] Drop outliers from benchmark result #8919

kirklandsign commented Mar 4, 2025

pytorch-bot bot commented Mar 4, 2025 •

edited

Loading

facebook-github-bot commented Mar 4, 2025

kirklandsign commented Mar 4, 2025

facebook-github-bot commented Mar 4, 2025

facebook-github-bot commented Mar 4, 2025

kirklandsign commented Mar 4, 2025

huydhn left a comment

huydhn commented Mar 4, 2025 •

edited

Loading

kirklandsign commented Mar 4, 2025

[minibench] Drop outliers from benchmark result #8919

[minibench] Drop outliers from benchmark result #8919

Conversation

kirklandsign commented Mar 4, 2025

Summary

Test plan

pytorch-bot bot commented Mar 4, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8919

✅ No Failures

facebook-github-bot commented Mar 4, 2025

kirklandsign commented Mar 4, 2025

facebook-github-bot commented Mar 4, 2025

facebook-github-bot commented Mar 4, 2025

kirklandsign commented Mar 4, 2025

huydhn left a comment

Choose a reason for hiding this comment

huydhn commented Mar 4, 2025 • edited Loading

kirklandsign commented Mar 4, 2025

pytorch-bot bot commented Mar 4, 2025 •

edited

Loading

huydhn commented Mar 4, 2025 •

edited

Loading