-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Experiment with using HashMap::with_capacity
throughout the compiler
#137005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Do note that parts of this test case are flawed. Also, I had played around with these test cases – and variations of them – for a few hours when they were first shared, and noticed that some performance could already be gained by skipping a few steps resize steps. By doing some I also tested some code where hash values where cached, but that ended up being less clearly beneficial, even with the expensive default hasher, at least in this test case. It may very well be the case that it's more effective for cases where the maps contains any more complex types1 than simple word- or pointer-sized ones, where hashing could involve touching more data and/or following some indirection. Footnotes
|
@rustbot claim I'm interested in working on this. I've made some contributions to rust-lang/rust-clippy before. Is that experience sufficient to work with a mentor or should I be looking for even easier issues? |
Reserve `HashSet` capacity before inserting cfgs/check-cfgs This PR tries to reserve capacity before inserting cfgs/check-cfgs into their hashset. Related to rust-lang#137005, mostly an experiment, but even if perf is neutral it still a good practice.
I think that analysing the hashmap usage should be doable. You can build the compiler with debuginfo ( |
you could also get a pre-made cachegrind file for any given perf run; see rust-lang/rustc-dev-guide#1692. but yes ideally you could create it locally, that lets you verify sooner that your changes have an impact. note that we build with PGO and LTO in CI, so your numbers will not match perf.rlo exactly. but as long as you have a relative improvement that's still encouraging. |
Back in #90743 I did try initializing the types and predicates interning tables with a higher initial capacity and also more aggressive resizing, neither approach looked good on perf. |
I think this needs some care. I suspect there will be an extremely skewed distribution, e.g. 99.9% of the potential benefit would come from 0.1% of the hashmaps. This PR has already inspired #137069, which was about pre-sizing a hashmap with a few dozen entries in it. That is not a good use of anyone's time. So for anyone planning to look into this, please do some measurements first to determine which hashmap are reliably the biggest, and don't waste time on anything smaller. I suspect hashmaps with hundreds of thousands or millions of entries would be good candidates. |
[perf experiment] Changed interners to start preallocated with an increased capacity Inspired by rust-lang#137005. *Not meant to be merged in its current form* Added a `with_capacity` function to `InternedSet`. Changed the `CtxtInterners` to start with `InternedSets` preallocated with a capacity. This *does* increase memory usage at very slightly(by 1 MB at the start), altough that increase quickly disaperars for larger crates(since they require such capacity anyway). A local perf run indicates this improves compiletimes for small crates(like `ripgrep`), without a negative effect on larger ones:  The current default capacities are choosen somewhat arbitrarily, and are relatively low. Depending on what kind of memory usage is acceptable, it may be beneficial to increase that capacity for some interners. From a second local perf run(with capacity of `_type` increased to `131072`), it looks like increasing the size of the preallocated type interner has the biggest impact:  What would be the maximum acceptable memory usage increase? I think most people would not mind sacrificing 1-2MB for an improvement in compile speed, but I am curious what is the general opinion here.
[perf experiment] Changed interners to start preallocated with an increased capacity Inspired by rust-lang#137005. *Not meant to be merged in its current form* Added a `with_capacity` function to `InternedSet`. Changed the `CtxtInterners` to start with `InternedSets` preallocated with a capacity. This *does* increase memory usage at very slightly(by 1 MB at the start), altough that increase quickly disaperars for larger crates(since they require such capacity anyway). A local perf run indicates this improves compiletimes for small crates(like `ripgrep`), without a negative effect on larger ones:  The current default capacities are choosen somewhat arbitrarily, and are relatively low. Depending on what kind of memory usage is acceptable, it may be beneficial to increase that capacity for some interners. From a second local perf run(with capacity of `_type` increased to `131072`), it looks like increasing the size of the preallocated type interner has the biggest impact:  What would be the maximum acceptable memory usage increase? I think most people would not mind sacrificing 1-2MB for an improvement in compile speed, but I am curious what is the general opinion here.
[perf experiment] Changed interners to start preallocated with an increased capacity Inspired by rust-lang#137005. *Not meant to be merged in its current form* Added a `with_capacity` function to `InternedSet`. Changed the `CtxtInterners` to start with `InternedSets` preallocated with a capacity. This *does* increase memory usage at very slightly(by 1 MB at the start), altough that increase quickly disaperars for larger crates(since they require such capacity anyway). A local perf run indicates this improves compiletimes for small crates(like `ripgrep`), without a negative effect on larger ones:  The current default capacities are choosen somewhat arbitrarily, and are relatively low. Depending on what kind of memory usage is acceptable, it may be beneficial to increase that capacity for some interners. From a second local perf run(with capacity of `_type` increased to `131072`), it looks like increasing the size of the preallocated type interner has the biggest impact:  What would be the maximum acceptable memory usage increase? I think most people would not mind sacrificing 1-2MB for an improvement in compile speed, but I am curious what is the general opinion here.
…lFir Change interners to start preallocated with an increased capacity Inspired by rust-lang#137005. Added a `with_capacity` function to `InternedSet`. Changed the `CtxtInterners` to start with `InternedSets` preallocated with a capacity. This *does* increase memory usage at very slightly(by ~1 MB at the start), altough that increase quickly disaperars for larger crates(since they require such capacity anyway). A local perf run indicates this improves compiletimes for small crates(like `ripgrep`), without a negative effect on larger ones.
Change interners to start preallocated with an increased capacity Inspired by rust-lang/rust#137005. Added a `with_capacity` function to `InternedSet`. Changed the `CtxtInterners` to start with `InternedSets` preallocated with a capacity. This *does* increase memory usage at very slightly(by ~1 MB at the start), altough that increase quickly disaperars for larger crates(since they require such capacity anyway). A local perf run indicates this improves compiletimes for small crates(like `ripgrep`), without a negative effect on larger ones.
I have been looking into where most resizes of HashMaps occur(when building and optimized build of cargo), and I have a list of the most common sources(made using
EDIT: changed the results to be weigheted based on reallocation size(which corresponds to the cost of a resize) |
Yep, this is exactly the "99.9% of the potential benefit would come from 0.1% of the hashmaps" I mentioned earlier :) |
Change interners to start preallocated with an increased capacity Inspired by rust-lang/rust#137005. Added a `with_capacity` function to `InternedSet`. Changed the `CtxtInterners` to start with `InternedSets` preallocated with a capacity. This *does* increase memory usage at very slightly(by ~1 MB at the start), altough that increase quickly disaperars for larger crates(since they require such capacity anyway). A local perf run indicates this improves compiletimes for small crates(like `ripgrep`), without a negative effect on larger ones.
Change interners to start preallocated with an increased capacity Inspired by rust-lang/rust#137005. Added a `with_capacity` function to `InternedSet`. Changed the `CtxtInterners` to start with `InternedSets` preallocated with a capacity. This *does* increase memory usage at very slightly(by ~1 MB at the start), altough that increase quickly disaperars for larger crates(since they require such capacity anyway). A local perf run indicates this improves compiletimes for small crates(like `ripgrep`), without a negative effect on larger ones.
We have found several times that rustc's performance is basically a giant benchmark of our hashmaps. @orlp notes:
If we can avoid resizing our hashmaps, that will avoid quite a lot of needless hashing. Given that so much of our benchmarks are made up of hashing, that could have a significant overall performance improvement. @orlp wrote this helpful benchmark giving an idea of the order of magnitude improvement we're talking about:
There are two main parts to this issue.
hashbrown::raw::RawTableInner::resize_inner
). This will also let us know which maps are most in need of pre-allocating (which will be helpful in part 2).(and then, of course, actually switching to using
with_capacity
)@rustbot label T-compiler I-slow E-medium E-mentor
The text was updated successfully, but these errors were encountered: