-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Improve zerofill in Vec::resize and Read::read_to_end #26849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
buf.set_len(len + additional); | ||
buf[len..].set_memory(0); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be a method on Vec<u8>
or even Vec<T> where T: Copy
in general (with element supplied). The latter should work since a plain for loop with *elt = 0
will be recognized as memset for the T=u8 case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe .resize() would be fine if it used a for loop and not extend. I'll look. I sort of take that back. Doing it right will just be another place with a complicated for loop..
looks good to me, worthwhile hack for now |
but do you have any kind of measurements (or a benchmark)? |
I've personally not been a huge fan of small optimizations like this in the codebase in the past as it's basically impossible to keep track of "what incantation today optimizes to what we want". In the ideal world we'd fix |
This bit me today in |
This PR came out of a post here: https://users.rust-lang.org/t/reading-from-stdin-performance/2025 . The difference is pretty severe - for the shootout reverse-complement test we're talking maybe 15-20% of the total runtime is taken up by zeroing of a buffer for a read in file. It's about 10x slower to 0-init a Vec by extend vs memset. If the sentiment is against adding small hacks like this, I'd favour either (if it's reasonably achievable) making extend optimise appropriately as soon as possible, or adding a simpler method to vec for Copy data that will optimise more reliably (see https://users.rust-lang.org/t/reading-from-stdin-performance/2025/13?u=alisdairo ). I realise that we probably don't want to pollute Vec with too many methods, but the inability to safely 0-init/extend a buffer with reasonable performance is a pretty bad black eye for a lot of performance sensitive work - especially in a world where we avoid use of uninitialised buffers. |
I did the benchmarks in that forum thread, but that's just different ways to zerofill a vector (code here). I'll use @AlisdairO and try this new code in a file reading benchmark of the simple kind. (The benchmark is for zeroing 64 kB on an corei7-avx)
After the forum post I confirmed |
@alexcrichton It's a pretty important optimization -- it's the same area where we have contemplated even using uninitialized memory. So just calling a better memset.. it's necessary, we need to do this, we just need to find the best way how to. |
Just to clarify, this same pattern is used directly in BufReader. Would fixing that be in scope for this PR or should I file a separate bug? |
@gmjosack: Those two lines should just be Benchmark check.. it does! Huzzah!
Edit: Added a commit for that. I think that's enough? |
Thanks! Looking forward to this PR making it through. This had pretty significant impact on some code discussed on Reddit. |
@gmjosack The good news is that zeroing memory can be 10x faster, is that enough to reach parity with your Python program? |
Yeah, even dropping with_capacity to 2k (over the default 64k) increased performance significantly. |
I think that in Instead tweaking and trying to find a safe code that compiles to memset would be fragile and would fit the description of "what incantation today optimizes to what we want" much better. This code is head on, we use |
The more I think about this the more I think it makes sense to add a method to Vec to extend/initialise to some constant. I can understand reluctance about over-polluting Vec, but I think there's a not-so-terrible argument even in the absence of a performance improvement - which is that the ergonomics of extend are not especially great for simple initialisation. Add to that the fact that the simpler code is clearly easier for the compiler to optimise, making it possible to avoid building up hacks in the codebase to work around it. |
Benchmark for simplest possible input file → vector data grab. Using unbuffered File since we'll read directly into the Vec anyway. // based on @AlisdairO's test
use std::env;
use std::fs;
use std::io::Read;
fn main() {
let arg = env::args().nth(1).unwrap();
let mut f = fs::File::open(arg).unwrap();
let mut data = Vec::with_capacity(1024);
f.read_to_end(&mut data).unwrap();
println!("{}", data.len());
} Input file size: 493 MB. Compiler options: rustc 1.3.0-nightly (20f421c 2015-07-06)
With this PR:
And yes, for some reason perf localizes the decimal point.. Edit: Timings using |
Like @Stebalien said on the forum, there is room for improvement. This is just the general |
Improving
|
Pushed new angle of attack: improving Vec::resize, by having it share code with The new version of the PR introduces no new unsafe code, only tweaking |
} else { | ||
self.truncate(new_len); | ||
} | ||
} | ||
|
||
/// Extend the vector by `n` additional clones of `value`. | ||
#[inline(always)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you omit this #[inline(always)]
tag? This seems like a fairly significant chunk of code to always inline. I would think the tag could be omitted entirely as well (this is already generic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only used in two locations and I want it to inline into them (Vec::resize and from_elem). Do you think this is not going to work out that way? It's a private function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok since #[inline(always)] "leaks out" this is not as intended.
Nice wins @bluss! I'm always a fan of re-using or improving what exists :) r=me with a few minor comments here and there |
We needed a more efficient way to zerofill the vector in read_to_end. This to reduce the memory intialization overhead to a minimum. Use the implementation of `std::vec::from_elem` (used for the vec![] macro) for Vec::resize as well. For simple element types like u8, this compiles to memset, so it makes Vec::resize much more efficient.
Use the vec![] macro directly to create a sized, zeroed vector. This should result in a big speedup when creating BufReader, because vec![0; cap] compiles to a memset call, while the previous extend code currently did not.
Removed the inline, addressed .clone(). I took the liberty of renaming |
relnotes → better perf is good news that might want to be bragged about. |
nice. |
Improve zerofill in Vec::resize and Read::read_to_end We needed a more efficient way to zerofill the vector in read_to_end. This to reduce the memory intialization overhead to a minimum. Use the implementation of `std::vec::from_elem` (used for the vec![] macro) for Vec::resize as well. For simple element types like u8, this compiles to memset, so it makes Vec::resize much more efficient.
In a followup to PR #26849, improve one more location for I/O where we can use `Vec::resize` to ensure better performance when zeroing buffers. Use the `vec![elt; n]` macro everywhere we can in the tree. It replaces `repeat(elt).take(n).collect()` which is more verbose, requires type hints, and right now produces worse code. `vec![]` is preferable for vector initialization. The `vec![]` replacement touches upon one I/O path too, Stdin::read for windows, and that should be a small improvement. r? @alexcrichton
Thanks for tagging this for relnotes @bluss. |
Improve zerofill in Vec::resize and Read::read_to_end
We needed a more efficient way to zerofill the vector in read_to_end.
This to reduce the memory intialization overhead to a minimum.
Use the implementation of
std::vec::from_elem
(used for the vec![]macro) for Vec::resize as well. For simple element types like u8, this
compiles to memset, so it makes Vec::resize much more efficient.