Improve `llvm.ucmp.iN.i1` codegen (specifically the 1-bit inputs case) #129401

scottmcm · 2025-03-01T20:59:30Z

(Context: I was making a rustc PR and accidentally regressed bool::cmp by having it use llvm.ucmp, thus this bug that it would be nice if ucmp just was smart about it.)

llvm.ucmp.i8.i1(a, b) is actually the same as just zext(a) - zext(b): https://alive2.llvm.org/ce/z/oHq3bh

But today they don't codegen the same: https://llvm.godbolt.org/z/nxWdYhvTo

define noundef range(i8 -1, 2) i8 @src(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr {
start:
  %0 = call i8 @llvm.ucmp.i8.i1(i1 %a, i1 %b)
  ret i8 %0
}

define noundef range(i8 -1, 2) i8 @tgt(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr {
start:
  %aa = zext i1 %a to i8
  %bb = zext i1 %b to i8
  %0 = sub nsw i8 %aa, %bb
  ret i8 %0
}

on x64 gives

src:                                    # @src
        cmp     dil, sil
        seta    al
        sbb     al, 0
        ret
tgt:                                    # @tgt
        mov     eax, edi
        sub     al, sil
        ret

~~I don't know if it's better to InstSimplify the ucmp to sext(b) + zext(a) or to improve the codegen for the i1 case, but either way,~~ it'd be nice if the intrinsic worked optimally for i1 in addition to the wider widths.

EDIT: based on comments below, sound like it'd be better to have the codegen special-case this, rather than optimize it away in the middle-end.

The text was updated successfully, but these errors were encountered:

bassiounix · 2025-03-02T02:51:28Z

The arguments of llvm.ucmp may be of any integer type or a vector with integer element type. The argument types must match each other, and the return type must be at least as wide as i2, to hold the three possible return values (-1, 0, 1).

Does these rules apply to the second approach?

scottmcm · 2025-03-02T05:18:56Z

I wrote this specifically about when the inputs are i1. The sub approach is wrong for other sizes, and for things like llvm.ucmp.i8.i8 or llvm.ucmp.i8.i32 the current codegen is already great.

(I suppose this could also be about anything that's range(iN 0, 2), but that's much less important IMHO.)

dtcxzyw · 2025-03-03T05:35:21Z

We can handle this special case in TargetLowering::expandCMP.

llvmbot · 2025-03-03T05:35:26Z

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

Check that no other contributor has already been assigned to this issue. If you believe that no one is actually working on it despite an assignment, ping the person. After one week without a response, the assignee may be changed.
In the comments of this issue, request for it to be assigned to you, or just create a pull request after following the steps below. Mention this issue in the description of the pull request.
Fix the issue locally.
Run the test suite locally. Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests.
Create a Git commit.
Run git clang-format HEAD~1 to format your changes.
Open a pull request to the upstream repository on GitHub. Detailed instructions can be found in GitHub's documentation. Mention this issue in the description of the pull request.

If you have any further questions about this issue, don't hesitate to ask via a comment in the thread below.

llvmbot · 2025-03-03T05:35:28Z

@llvm/issue-subscribers-good-first-issue

Author: None (scottmcm)

(Context: I was making a rustc PR and accidentally regressed `bool::cmp` by having it use `llvm.ucmp`, thus this bug that it would be nice if `ucmp` just was smart about it.)

llvm.ucmp.i8.i1(a, b) is actually the same as just zext(a) - zext(b): <https://alive2.llvm.org/ce/z/oHq3bh>

But today they don't codegen the same: <https://llvm.godbolt.org/z/nxWdYhvTo>

define noundef range(i8 -1, 2) i8 @<!-- -->src(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr {
start:
  %0 = call i8 @<!-- -->llvm.ucmp.i8.i1(i1 %a, i1 %b)
  ret i8 %0
}

define noundef range(i8 -1, 2) i8 @<!-- -->tgt(i1 noundef zeroext %a, i1 noundef zeroext %b) unnamed_addr {
start:
  %aa = zext i1 %a to i8
  %bb = zext i1 %b to i8
  %0 = sub nsw i8 %aa, %bb
  ret i8 %0
}

on x64 gives

src:                                    # @<!-- -->src
        cmp     dil, sil
        seta    al
        sbb     al, 0
        ret
tgt:                                    # @<!-- -->tgt
        mov     eax, edi
        sub     al, sil
        ret

I don't know if it's better to InstSimplify the ucmp to sext(b) + zext(a) or to improve the codegen for the i1 case, but either way, it'd be nice if the intrinsic worked optimally for i1 in addition to the wider widths.

dipeshs809 · 2025-03-04T17:49:41Z

Hi @scottmcm , I would like to work on this issue, can you assign this to me ?

scottmcm · 2025-03-04T18:44:29Z

@dipeshs809 I have no permissions to assign people here, but your note in chat here is probably sufficient to claim it.

llvmbot added the new issue label Mar 1, 2025

scottmcm changed the title ~~Improve llvm.ucmp.i8.i1 codegen~~ Improve llvm.ucmp.iN.i1 codegen (specifically the 1-bit inputs case) Mar 2, 2025

frederick-vs-ja added llvm:optimizations and removed new issue labels Mar 3, 2025

dtcxzyw added good first issue https://github.com/llvm/llvm-project/contribute missed-optimization llvm:SelectionDAG SelectionDAGISel as well and removed llvm:optimizations labels Mar 3, 2025

dtcxzyw assigned dipeshs809 Mar 5, 2025

scottmcm mentioned this issue Apr 2, 2025

[X86] Recognize Clang's <=> code pattern #60012

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `llvm.ucmp.iN.i1` codegen (specifically the 1-bit inputs case) #129401

Improve `llvm.ucmp.iN.i1` codegen (specifically the 1-bit inputs case) #129401

scottmcm commented Mar 1, 2025 •

edited

Loading

bassiounix commented Mar 2, 2025

scottmcm commented Mar 2, 2025

dtcxzyw commented Mar 3, 2025

llvmbot commented Mar 3, 2025

llvmbot commented Mar 3, 2025

dipeshs809 commented Mar 4, 2025

scottmcm commented Mar 4, 2025

Improve llvm.ucmp.iN.i1 codegen (specifically the 1-bit inputs case) #129401

Improve llvm.ucmp.iN.i1 codegen (specifically the 1-bit inputs case) #129401

Comments

scottmcm commented Mar 1, 2025 • edited Loading

bassiounix commented Mar 2, 2025

scottmcm commented Mar 2, 2025

dtcxzyw commented Mar 3, 2025

llvmbot commented Mar 3, 2025

llvmbot commented Mar 3, 2025

dipeshs809 commented Mar 4, 2025

scottmcm commented Mar 4, 2025

Improve `llvm.ucmp.iN.i1` codegen (specifically the 1-bit inputs case) #129401

Improve `llvm.ucmp.iN.i1` codegen (specifically the 1-bit inputs case) #129401

scottmcm commented Mar 1, 2025 •

edited

Loading