-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Remove layout::size_align
.
#72189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove layout::size_align
.
#72189
Conversation
r? @kennytm (rust_highfive has picked a reviewer for you, use r? to override) |
r? @ghost |
I was getting weird perf results with this trivial change locally, so I'm going to do a perf CI run. @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit bf8ab30732fe02fa27ef69db654830e0802320e6 with merge e394819560f57a7fd5628b99e3e6d5b5d85dfeab... |
☀️ Try build successful - checks-actions, checks-azure |
Queued e394819560f57a7fd5628b99e3e6d5b5d85dfeab with parent 23ffeea, future comparison URL. |
Finished benchmarking try commit e394819560f57a7fd5628b99e3e6d5b5d85dfeab, comparison URL. |
These results are so weird for such a trivial change.
I will investigate further. |
First, let's look at the improvement in
The -452M for Looking at
becoming this:
Notice the -452M for the Encoding/decoding certainly dominates that benchmark run. Here are Cachegrind results without this PR's patch applied:
Perhaps the lesson here is this: because encoding/decoding code dominates the execution time of these runs, therefore minor changes in inlining and code generation of that code occasionally has outsized effects? Also, this is a really weird benchmark. We don't normally see encoding/decoding anything like that dominant. |
As for the big
After:
The number of executions of some of the queries drops from 166 to 65. All other query counts are either identical or almost identical between the runs. I don't know why the number of queries would change so drastically. Any suggestions? |
The LTO query changes are probably different CGU partitioning, cc @rust-lang/wg-mir-opt |
Now for
The regression is clearly all to do with metadata encoding. I see these results for the old code:
and new:
There are much higher counts for all the function entries/exits in the new code, so this definitely looks like inlining effects. I barely understand the encoding code but presumably it's writing stuff into vectors, which would explain how |
Now for
Old code:
New code:
|
In general, it seems like the growth path for |
I was able to replicate the |
Also, this is a case where instruction counts are misleading. The instruction count improvement was 38%, but on CI the wall-time improvement was only 7.8%, and on my machine it was only 2.6%. Presumably this relates to the fact that LLVM codegen is multi-threaded. |
Yes -- CGU partitioning is intended to be deterministic on the set of inputs (and even under some changes, for incremental) but we've historically seen that it can be quite prone to shifts when altering underlying functions. See rust-lang/compiler-team#281 for a recent meeting proposal. |
The |
Let's try this again, to see if anything has changed. @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit bf8ab30732fe02fa27ef69db654830e0802320e6 with merge f2dd1363e1fd146df330d259407f731150074f05... |
☀️ Try build successful - checks-azure |
Queued f2dd1363e1fd146df330d259407f731150074f05 with parent 4512721, future comparison URL. |
This isn't going anywhere. |
5d95e14
to
423615a
Compare
423615a
to
086cecb
Compare
Let's give it one more try. @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 086cecb with merge 57ad04fc500e914cfa0d54b97a7e879f751b1a17... |
☀️ Try build successful - checks-actions, checks-azure |
Queued 57ad04fc500e914cfa0d54b97a7e879f751b1a17 with parent 1f5d69d, future comparison URL. |
Finished benchmarking try commit (57ad04fc500e914cfa0d54b97a7e879f751b1a17): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
Still terrible results. |
Make `Layout::new::<T>()` a constant instead of multiple NullOps rust-lang#72189 and rust-lang#79827 suggest that this is perf-relevant, so let's see what happens. r? ghost
No description provided.