Skip to content

matches!(n, -1 | 1) for signed NonZero suboptimal #84311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Juici opened this issue Apr 18, 2021 · 3 comments
Open

matches!(n, -1 | 1) for signed NonZero suboptimal #84311

Juici opened this issue Apr 18, 2021 · 3 comments
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Juici
Copy link
Contributor

Juici commented Apr 18, 2021

The code generated when comparing a signed non-zero type to -1 or 1 varies wildly, and appears to generate suboptimal branching code.

A more in-depth analysis of the generated assembly can be found here: https://godbolt.org/z/qxzGrv9nn.

Basic

Looking at these two functions, one would expect them to generate the same assembly.

pub fn is_one_1(len: NonZeroI32) -> bool {
    matches!(len.get(), -1 | 1)
}

pub fn is_one_2(len: NonZeroI32) -> bool {
    len.get() == -1 || len.get() == 1
}

However that is not the case, with is_one_1 in fact branching as seen below. Whilst I have not run benchmarks to see if branching is more performant, one would expect it not to be.

example::is_one_1:
        cmp     edi, 1
        je      .LBB0_3
        cmp     edi, -1
        jne     .LBB0_2
.LBB0_3:
        mov     al, 1
        ret
.LBB0_2:
        xor     eax, eax
        ret

example::is_one_2:
        cmp     edi, -1
        sete    cl
        cmp     edi, 1
        sete    al
        or      al, cl
        ret

What if we include 0 in the comparison?

Since len is a NonZeroI32, it is guaranteed that len.get() != 0 otherwise it is undefined behaviour. This means that the function below should behave the same as is_one_1.

pub fn is_one_3(len: NonZeroI32) -> bool {
    matches!(len.get(), -1 | 0 | 1)
}

In fact the generated assembly for this is the smallest yet.

example::is_one_3:
        add     edi, 1
        cmp     edi, 3
        setb    al
        ret

This assembly is actually also generated when we include 0 in the comparison in the style of is_one_2.

pub fn is_one_4(len: NonZeroI32) -> bool {
    len.get() == -1 || len.get() == 0 || len.get() == 1
}
example::is_one_4:
        add     edi, 1
        cmp     edi, 3
        setb    al
        ret

Expected behaviour

The expected behaviour of the compiler would be to generate assembly that matches between the different implementations of this check.

@Juici Juici added the C-bug Category: This is a bug. label Apr 18, 2021
@jyn514 jyn514 added A-codegen Area: Code generation I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 18, 2021
@AngelicosPhosphoros
Copy link
Contributor

Relates to #73338

I think, this issue cannot be solved without LLVM patch.
Luckily, @nikic already patched LLVM (llvm/llvm-project@9bad7de) and this issue would likely to be solved after upgrading LLVM.

@scottmcm
Copy link
Member

Note that it's normal for NonZero to not actually help LLVM on .get(), see #49572

@alex
Copy link
Member

alex commented Dec 29, 2024

The original two functions now generate identical (clever) x86: https://rust.godbolt.org/z/vaP1dGaax . Just needs a codegen test (presumabely).

@nikic nikic added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants