Skip to content

Strange perforamnce drops with const literals in closures #68632

Closed as not planned
@idkravitz

Description

@idkravitz

I asked about it on user.rust-lang.org and one user suggested that it may be a bug in optimizer indeed. So I should state that the things below are reproducable both in nightly and stable latest rust on x86_64-pc-windows-msvc triplet and x86_64-unknown-linux-gnu (tested inside WSL).

So our test subject would be the following snippet, that solves Max subarray problem

pub fn max_subarray_bad(arr: &[i32]) -> (usize, usize, i32)
{
    let prefixes = arr
        .iter()
        .enumerate()
        .scan((0, 0), |s, (i, v)| {
            if s.1 > 0 {
                s.1 = s.1 + *v;
            } else {
                *s = (i, *v);
            }
            Some(*s)
        });
    let (right_idx, (left_idx, sum)) = prefixes
        .enumerate()
        .max_by_key(|&(_, (_, sum))| sum)
        .unwrap();
    
    (left_idx, right_idx + 1, sum)
}

If we benchmark it with criterion crate with this benchmark code:

use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn benchmark_linear(c: &mut Criterion) {
    const N: usize = 1000000;
    c.bench_function(&format!("max_subarray([..]) N = {}", N), |b| {
        b.iter(|| max_subarray::max_subarray_bad(black_box(&vec![0; N])))
    });
}

criterion_group!(benches, benchmark_linear);
criterion_main!(benches);

Then the output of cargo bench on my machine would be

Benchmarking max_subarray([..]) N = 1000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 16.6s or reduce sample count to 30.
max_subarray([..]) N = 1000000
                        time:   [3.2324 ms 3.2700 ms 3.3141 ms]
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) high mild
  4 (4.00%) high severe

     Running target\release\deps\scratch-b68a42551ab01289.exe

But with the slight change of moving out 0 in expression s.1 > 0 in let binding outside of the closure can make a great difference. So the function is now this:

pub fn max_subarray_bad(arr: &[i32]) -> (usize, usize, i32)
{
    let zro = 0;
    let prefixes = arr
        .iter()
        .enumerate()
        .scan((0, 0), |s, (i, v)| {
            if s.1 > zro {
                s.1 = s.1 + *v;
            } else {
                *s = (i, *v);
            }
            Some(*s)
        });
    let (right_idx, (left_idx, sum)) = prefixes
        .enumerate()
        .max_by_key(|&(_, (_, sum))| sum)
        .unwrap();
    
    (left_idx, right_idx + 1, sum)
}

But cargo bench output indicates almost 20% performance gain!

Benchmarking max_subarray([..]) N = 1000000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 12.9s or reduce sample count to 40.
max_subarray([..]) N = 1000000
                        time:   [2.5705 ms 2.5806 ms 2.5913 ms]
                        change: [-20.260% -19.668% -19.124%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

     Running target\release\deps\scratch-b68a42551ab01289.exe

You can check that changing function back and forth with replacing 0 and zro in that expression indeed results in 20% performance change.

By the way, if we change let zro = 0 into const zro: i32 = 0 it results in performance drop too.

It looks like a bug in optimizer for me. Could someone verify it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-bugCategory: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions