Description
I asked about it on user.rust-lang.org and one user suggested that it may be a bug in optimizer indeed. So I should state that the things below are reproducable both in nightly and stable latest rust on x86_64-pc-windows-msvc triplet and x86_64-unknown-linux-gnu (tested inside WSL).
So our test subject would be the following snippet, that solves Max subarray problem
pub fn max_subarray_bad(arr: &[i32]) -> (usize, usize, i32)
{
let prefixes = arr
.iter()
.enumerate()
.scan((0, 0), |s, (i, v)| {
if s.1 > 0 {
s.1 = s.1 + *v;
} else {
*s = (i, *v);
}
Some(*s)
});
let (right_idx, (left_idx, sum)) = prefixes
.enumerate()
.max_by_key(|&(_, (_, sum))| sum)
.unwrap();
(left_idx, right_idx + 1, sum)
}
If we benchmark it with criterion crate with this benchmark code:
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn benchmark_linear(c: &mut Criterion) {
const N: usize = 1000000;
c.bench_function(&format!("max_subarray([..]) N = {}", N), |b| {
b.iter(|| max_subarray::max_subarray_bad(black_box(&vec![0; N])))
});
}
criterion_group!(benches, benchmark_linear);
criterion_main!(benches);
Then the output of cargo bench
on my machine would be
Benchmarking max_subarray([..]) N = 1000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.6s or reduce sample count to 30.
max_subarray([..]) N = 1000000
time: [3.2324 ms 3.2700 ms 3.3141 ms]
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
Running target\release\deps\scratch-b68a42551ab01289.exe
But with the slight change of moving out 0 in expression s.1 > 0
in let binding outside of the closure can make a great difference. So the function is now this:
pub fn max_subarray_bad(arr: &[i32]) -> (usize, usize, i32)
{
let zro = 0;
let prefixes = arr
.iter()
.enumerate()
.scan((0, 0), |s, (i, v)| {
if s.1 > zro {
s.1 = s.1 + *v;
} else {
*s = (i, *v);
}
Some(*s)
});
let (right_idx, (left_idx, sum)) = prefixes
.enumerate()
.max_by_key(|&(_, (_, sum))| sum)
.unwrap();
(left_idx, right_idx + 1, sum)
}
But cargo bench
output indicates almost 20% performance gain!
Benchmarking max_subarray([..]) N = 1000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 12.9s or reduce sample count to 40.
max_subarray([..]) N = 1000000
time: [2.5705 ms 2.5806 ms 2.5913 ms]
change: [-20.260% -19.668% -19.124%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
Running target\release\deps\scratch-b68a42551ab01289.exe
You can check that changing function back and forth with replacing 0 and zro in that expression indeed results in 20% performance change.
By the way, if we change let zro = 0
into const zro: i32 = 0
it results in performance drop too.
It looks like a bug in optimizer for me. Could someone verify it?