-
Notifications
You must be signed in to change notification settings - Fork 13.3k
read_to_end is very slow (>30x slower than 0.10) #15177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you be sure that the 0.10 version isn't being optimized away by returning the result of |
Yes, it was a part of a larger example when I noticed this. Adding black_box around does not change the results. |
I'm only able to run |
I have code doing |
Exporting |
@thestinger I'm not satisfied with that answer. A 30x performance regression in a common utility function as not something to sweep under the rug. |
@brson: It's not a 30x performance regression in a common utility function. It's a micro-benchmark of the utility function where dirty page purging performs poorly.
It's a clear cut memory vs. performance trade-off. Either it consumes more memory or it wastes time purging the dirty pages. Disabling it by default would fix the regression on some microbenchmarks, but memory usage would regress back to where it was before jemalloc was used. Allocators need to make many performance compromises, and this happens to be one with a user-facing knob. It could be exposed from the heap module with jemalloc used to have a special case specifically to make microbenchmarks perform better with dirty page purging enabled, but the special case was removed because it was quite arbitrary. I think it's better for a microbenchmark to be an honest reflection of the performance on real workloads. |
Perhaps it does make sense to disable purging by default to improve performance in trivial applications and have large applications like the Rust compiler and Servo to explicitly turn on purging. It's not the choice jemalloc makes upstream though. |
A demonstration of how the dirty page purging ratio plays into this: extern crate test;
use test::{Bencher, black_box};
use std::io::BufReader;
#[bench]
fn read_to_end(b: &mut Bencher) {
let bytes = Vec::from_elem(100, 10u8);
b.iter(|| {
let mut reader = BufReader::new(bytes.as_slice());
black_box(reader.read_to_end())
})
}
extern crate test;
use test::{Bencher, black_box};
use std::io::BufReader;
#[bench]
fn read_to_end(b: &mut Bencher) {
let _huge= Vec::<u64>::with_capacity(100000);
let bytes = Vec::from_elem(100, 10u8);
b.iter(|| {
let mut reader = BufReader::new(bytes.as_slice());
black_box(reader.read_to_end())
})
}
|
Closing in favour of #18236 which is an actionable issue. There is no performance bug here, but there is a tunable performance vs. memory trade-off and we should consider a different default. |
Fix panic in `handle_code_action` 🤞 that CI is happy with this, edited this via github
master:
0.10:
The text was updated successfully, but these errors were encountered: