Skip to content

How best to reproduce CI perf results locally? #1592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Nadrieril opened this issue May 23, 2023 · 6 comments
Closed

How best to reproduce CI perf results locally? #1592

Nadrieril opened this issue May 23, 2023 · 6 comments

Comments

@Nadrieril
Copy link
Member

In this PR, rust-timer found a 4% regression on instruction counts on the match-stress benchmark. When measuring locally, I consistently find instead a 14% improvement on that same benchmark (even after rebasing on master). This is pretty annoying because I can't try to find the source of the regression locally at all.

Do you know what could cause such a difference? Could a difference in architecture explain that? Is there anything I could do to make the results closer to CI? Maybe flags or environment variables I could set?

@Mark-Simulacrum
Copy link
Member

How are you building locally? CI uses a bunch of different settings than default (e.g., PGO, different LTO configuration), all of which can have large impact, particularly for stress tests dominated by a small amount of code.

@Nadrieril
Copy link
Member Author

My config.toml is just profile = "compiler" and I use the binary produced by ./x.py test tests/ui found in ./build/host/stage1/bin/rustc

@nnethercote
Copy link
Contributor

When the CI results don't match my local results I usually assume that PGO is the cause. +4% vs -14% is an unusually large difference, though!

@nnethercote
Copy link
Contributor

Oh, one important thing: here is the config.toml I use

changelog-seen = 2

[rust]
debuginfo-level = 1
use-lld = true
jemalloc = true

If you're on Linux, the jemalloc = true is important line, because the shipped Linux compiler uses jemalloc and that can make a big difference.

@nnethercote
Copy link
Contributor

I don't think there is any more to be done, so I will close this issue. Please reopen if you disagree.

@Nadrieril
Copy link
Member Author

I tried jemalloc and thin LTO and the difference is the same. I assume it's PGO then, which seems like a pain to make work.

I think what could be done is a paragraph in the README that mentions these options (is there a way to fully know what settings the CI uses btw?) and PGO for the next person who is confused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants