Skip to content

32x performance regression for AVX2 intrinsics in Rust v1.87 #142603

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Shnatsel opened this issue Jun 17, 2025 · 5 comments
Closed

32x performance regression for AVX2 intrinsics in Rust v1.87 #142603

Shnatsel opened this issue Jun 17, 2025 · 5 comments
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) C-bug Category: This is a bug. regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@Shnatsel
Copy link
Member

Reproduction steps

git clone https://github.com/mcountryman/simd-adler32
cd simd-adler32
cargo +1.86.0 bench
cargo +1.87.0 bench

On AVX2-capable CPUs, on Rust 1.86 you can see the AVX2 codepath is performing well, at or above the SSE3 level.

On Rust 1.87 performance collapses completely, with the AVX2 implementation being the slowest, behind even the scalar implementation.

SSE2, SSE3 and AVX-512 are not affected.

The simd-adler32 crate uses runtime feature detection and explicit AVX2 intrinsics from std::arch.

Version it worked on

1.86

Version with regression

1.87

rustc --version --verbose:

rustc 1.87.0 (17067e9ac 2025-05-09)
binary: rustc
commit-hash: 17067e9ac6d7ecb70e50f92c1944e545188d2359
commit-date: 2025-05-09
host: x86_64-unknown-linux-gnu
release: 1.87.0
LLVM version: 20.1.1
@Shnatsel Shnatsel added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Jun 17, 2025
@rustbot rustbot added needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jun 17, 2025
@workingjubilee
Copy link
Member

Does this repro on nightly?

@jieyouxu jieyouxu added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. A-SIMD Area: SIMD (Single Instruction Multiple Data) E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed regression-untriaged Untriaged performance or correctness regression. labels Jun 17, 2025
@Shnatsel
Copy link
Member Author

It is not an issue on the latest nightly:

rustc 1.89.0-nightly (586ad391f 2025-06-15)
binary: rustc
commit-hash: 586ad391f5ee4519acc7cae340e34673bae762b1
commit-date: 2025-06-15
host: x86_64-unknown-linux-gnu
release: 1.89.0-nightly
LLVM version: 20.1.5

@workingjubilee
Copy link
Member

beta:

variants/avx2-10b       time:   [4.3867 ns 4.4002 ns 4.4148 ns]                               
                        thrpt:  [2.1095 GiB/s 2.1166 GiB/s 2.1230 GiB/s]
                 change:
                        time:   [-27.344% -27.114% -26.882%] (p = 0.00 < 0.05)
                        thrpt:  [+36.766% +37.200% +37.635%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
variants/avx2-10k       time:   [144.94 ns 145.11 ns 145.27 ns]                              
                        thrpt:  [64.109 GiB/s 64.181 GiB/s 64.256 GiB/s]
                 change:
                        time:   [-98.792% -98.789% -98.787%] (p = 0.00 < 0.05)
                        thrpt:  [+8140.7% +8160.3% +8179.8%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) low mild
variants/avx2-100k      time:   [1.4382 µs 1.4408 µs 1.4435 µs]                                
                        thrpt:  [64.517 GiB/s 64.639 GiB/s 64.756 GiB/s]
                 change:
                        time:   [-98.791% -98.786% -98.782%] (p = 0.00 < 0.05)
                        thrpt:  [+8110.1% +8139.9% +8168.7%]
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

sorry for the inconvenience, please wait ~10 days for this to be fixed!

@jieyouxu jieyouxu removed E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 17, 2025
@nikic
Copy link
Contributor

nikic commented Jun 17, 2025

Do we have any idea what caused/fixed this? I initially thought it might be #139029, but that shouldn't affect the avx2 code working on __m256i, right?

@tgross35
Copy link
Contributor

I was thinking the same thing at first - but you are correct, #135408 only touched vector types up to 128 bits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) C-bug Category: This is a bug. regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants