Skip to content

Unnecessary loop unrolling to handle tail when tail length has a smaller known size #130795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
okaneco opened this issue Sep 24, 2024 · 0 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@okaneco
Copy link
Contributor

okaneco commented Sep 24, 2024

In the following code, the first while loop should process 8 bytes at a time and exit early if an invalid byte is found.

The remaining bytes should be known to be bytes.len() % 8, but the auto-vectorization unrolls to test again for 32 bytes and 8 bytes at a time
https://rust.godbolt.org/z/z8hnb9PGY

pub const fn is_ascii(bytes: &[u8]) -> bool {
    const N1: usize = 8;
    let mut i = 0;
    while i + N1 <= bytes.len() {
        let chunk_end = i + N1;
        let mut count = 0;
        while i < chunk_end {
            count += (bytes[i] <= 127) as u8;
            i += 1;
        }

        if count != N1 as u8 {
            return false;
        }
    }

    // Process the remaining `bytes.len() % N` bytes.
    let mut is_ascii = true;
    while i < bytes.len() {
        is_ascii &= bytes[i] <= 127;
        i += 1;
    }
    is_ascii
}
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Sep 24, 2024
@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

3 participants