Skip to content

Improve llvm.ucmp.iN.i1 codegen (specifically the 1-bit inputs case) #129401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
scottmcm opened this issue Mar 1, 2025 · 7 comments
Open

Improve llvm.ucmp.iN.i1 codegen (specifically the 1-bit inputs case) #129401

scottmcm opened this issue Mar 1, 2025 · 7 comments
Assignees
Labels
good first issue https://github.com/llvm/llvm-project/contribute llvm:SelectionDAG SelectionDAGISel as well missed-optimization

Comments

@scottmcm
Copy link

scottmcm commented Mar 1, 2025

(Context: I was making a rustc PR and accidentally regressed bool::cmp by having it use llvm.ucmp, thus this bug that it would be nice if ucmp just was smart about it.)

llvm.ucmp.i8.i1(a, b) is actually the same as just zext(a) - zext(b): https://alive2.llvm.org/ce/z/oHq3bh

But today they don't codegen the same: https://llvm.godbolt.org/z/nxWdYhvTo

define noundef range(i8 -1, 2) i8 @src(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr {
start:
  %0 = call i8 @llvm.ucmp.i8.i1(i1 %a, i1 %b)
  ret i8 %0
}

define noundef range(i8 -1, 2) i8 @tgt(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr {
start:
  %aa = zext i1 %a to i8
  %bb = zext i1 %b to i8
  %0 = sub nsw i8 %aa, %bb
  ret i8 %0
}

on x64 gives

src:                                    # @src
        cmp     dil, sil
        seta    al
        sbb     al, 0
        ret
tgt:                                    # @tgt
        mov     eax, edi
        sub     al, sil
        ret

I don't know if it's better to InstSimplify the ucmp to sext(b) + zext(a) or to improve the codegen for the i1 case, but either way, it'd be nice if the intrinsic worked optimally for i1 in addition to the wider widths.

EDIT: based on comments below, sound like it'd be better to have the codegen special-case this, rather than optimize it away in the middle-end.

@bassiounix
Copy link
Contributor

The arguments of llvm.ucmp may be of any integer type or a vector with integer element type. The argument types must match each other, and the return type must be at least as wide as i2, to hold the three possible return values (-1, 0, 1).

Does these rules apply to the second approach?

@scottmcm
Copy link
Author

scottmcm commented Mar 2, 2025

I wrote this specifically about when the inputs are i1. The sub approach is wrong for other sizes, and for things like llvm.ucmp.i8.i8 or llvm.ucmp.i8.i32 the current codegen is already great.

(I suppose this could also be about anything that's range(iN 0, 2), but that's much less important IMHO.)

@scottmcm scottmcm changed the title Improve llvm.ucmp.i8.i1 codegen Improve llvm.ucmp.iN.i1 codegen (specifically the 1-bit inputs case) Mar 2, 2025
@dtcxzyw dtcxzyw added good first issue https://github.com/llvm/llvm-project/contribute missed-optimization llvm:SelectionDAG SelectionDAGISel as well and removed llvm:optimizations labels Mar 3, 2025
@dtcxzyw
Copy link
Member

dtcxzyw commented Mar 3, 2025

We can handle this special case in TargetLowering::expandCMP.

@llvmbot
Copy link
Member

llvmbot commented Mar 3, 2025

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

  1. Check that no other contributor has already been assigned to this issue. If you believe that no one is actually working on it despite an assignment, ping the person. After one week without a response, the assignee may be changed.
  2. In the comments of this issue, request for it to be assigned to you, or just create a pull request after following the steps below. Mention this issue in the description of the pull request.
  3. Fix the issue locally.
  4. Run the test suite locally. Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests.
  5. Create a Git commit.
  6. Run git clang-format HEAD~1 to format your changes.
  7. Open a pull request to the upstream repository on GitHub. Detailed instructions can be found in GitHub's documentation. Mention this issue in the description of the pull request.

If you have any further questions about this issue, don't hesitate to ask via a comment in the thread below.

@llvmbot
Copy link
Member

llvmbot commented Mar 3, 2025

@llvm/issue-subscribers-good-first-issue

Author: None (scottmcm)

(Context: I was making a rustc PR and accidentally regressed `bool::cmp` by having it use `llvm.ucmp`, thus this bug that it would be nice if `ucmp` just was smart about it.)

llvm.ucmp.i8.i1(a, b) is actually the same as just zext(a) - zext(b): <https://alive2.llvm.org/ce/z/oHq3bh>

But today they don't codegen the same: <https://llvm.godbolt.org/z/nxWdYhvTo>

define noundef range(i8 -1, 2) i8 @<!-- -->src(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr {
start:
  %0 = call i8 @<!-- -->llvm.ucmp.i8.i1(i1 %a, i1 %b)
  ret i8 %0
}

define noundef range(i8 -1, 2) i8 @<!-- -->tgt(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr {
start:
  %aa = zext i1 %a to i8
  %bb = zext i1 %b to i8
  %0 = sub nsw i8 %aa, %bb
  ret i8 %0
}

on x64 gives

src:                                    # @<!-- -->src
        cmp     dil, sil
        seta    al
        sbb     al, 0
        ret
tgt:                                    # @<!-- -->tgt
        mov     eax, edi
        sub     al, sil
        ret

I don't know if it's better to InstSimplify the ucmp to sext(b) + zext(a) or to improve the codegen for the i1 case, but either way, it'd be nice if the intrinsic worked optimally for i1 in addition to the wider widths.

@dipeshs809
Copy link
Contributor

Hi @scottmcm , I would like to work on this issue, can you assign this to me ?

@scottmcm
Copy link
Author

scottmcm commented Mar 4, 2025

@dipeshs809 I have no permissions to assign people here, but your note in chat here is probably sufficient to claim it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue https://github.com/llvm/llvm-project/contribute llvm:SelectionDAG SelectionDAGISel as well missed-optimization
Projects
None yet
Development

No branches or pull requests

6 participants