Skip to content

Auto-post results in PRs that significantly affect perf #711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nnethercote opened this issue Jul 24, 2020 · 7 comments · Fixed by #992
Closed

Auto-post results in PRs that significantly affect perf #711

nnethercote opened this issue Jul 24, 2020 · 7 comments · Fixed by #992
Labels
C-feature-request A feature request

Comments

@nnethercote
Copy link
Contributor

When a PR lands and is measured, if it improves or regresses performance by more than a certain amount, it would be good for the results to be automatically added in a comment on the PR. That way we'll get faster feedback to PR authors.

Choosing the threshold of significance is the hard part. When doing manual perf triage I usually record ones where at least one run changed by more than 1%. But occasionally I ignore ones, because some of our benchmarks are noisy. So maybe we should require that 3 or more runs change by 1% or more.

@jyn514
Copy link
Member

jyn514 commented Jul 24, 2020

1% seems like a pretty low bar. Maybe 3% so we're sure it's not noise?

@Mark-Simulacrum
Copy link
Member

I think we should start at 1% and go up (or add exceptions) from there, since false positives are pretty low impact here IMO.

@nnethercote
Copy link
Contributor Author

And maybe for Doc changes the threshold should be higher? Not sure about that, they're still so new, I don't have a good feel for their variability.

@jyn514
Copy link
Member

jyn514 commented Jul 24, 2020

Higher threshold sounds fine. 90% of doc time normally comes from running cargo check on the earlier crates so it's not high impact.

@jyn514
Copy link
Member

jyn514 commented Jul 24, 2020

I guess the exception is enormous single crates ... How awful would it be to add winapi or stm32ral to the benchmarks?

@Mark-Simulacrum
Copy link
Member

We do not currently have budget to add any more benchmarks, indeed there is active discussion about culling our benchmarks (or optimizing what we collect etc, see https://rust-lang.zulipchat.com/#narrow/stream/247081-t-compiler.2Fperformance/topic/script-servo-2.20takes.20way.20too.20long).

@tgnottingham
Copy link
Contributor

I was just about to write up an issue that suggested posting the perf results for all merge commits (or all merge commits receiving a perf run--not sure those are different).

Perhaps we could post either a comment indicating the final perf results, or one indicating the final perf results and that there was a potential regression.

The usefulness of the former would be more for posterity and investigation. It would be nice to be able to go to a PR and easily see what its final merged perf results were, especially if the heuristic we use to indicate regressions doesn't catch all forms of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request A feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants