Skip to content

Add GSoC 2024 3-way comparison page #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 7, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions content/posts/2024-08-29-gsoc-three-way-comparison.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
author: "Volodymyr Vasylkun (Poseydon)"
date: "2024-10-07"
tags: ["GSoC", "optimization", "intrinsics"]
title: "GSoC 2024: 3-way comparison intrinsics"
---

Hello everyone! My name is Volodymyr, and in this post I would like to talk about the project I have been working on for the past couple of months as part of Google Summer of Code 2024. The aim of the project was to introduce 3-way comparison intrinsics to LLVM IR and add a decent level of optimizations for them.

# Background

Three-way comparison is an operation present in many high-level languages, such as C++ and its spaceship operator or Rust and the `Ord` trait. It operates on two values for which there is a defined comparison operation and returns `-1` if the first operand is less than the second, `0` if they are equal, and `1` otherwise. At the moment, compilers that use LLVM express this operation using different sequences of instructions which are optimized and lowered individually rather than as a single operation. Adding an intrinsic for this operation would therefore help us generate better machine code on some targets, as well as potentially optimize patterns in the middle-end that we didn't optimize before.

# What was done

Over the course of the project I have added two new intrinsics to the LLVM IR: `llvm.ucmp` for an unsigned 3-way comparison and `llvm.scmp` for a signed comparison. They both take two arguments that must be integers or vectors of integers and return an integer or a vector of integers with the same number of elements. The arguments and the result do not need to have the same type.

In the middle-end the following passes received some support for these intrinsics:

* InstSimplify ([#1](https://github.com/llvm/llvm-project/pull/93730), [#2](https://github.com/llvm/llvm-project/pull/95601))
* InstCombine ([#1](https://github.com/llvm/llvm-project/pull/96118), [#2](https://github.com/llvm/llvm-project/pull/98360), [#3](https://github.com/llvm/llvm-project/pull/101049), [#4](https://github.com/llvm/llvm-project/pull/105272), [#5](https://github.com/llvm/llvm-project/pull/105583))
* [CorrelatedValuePropagation](https://github.com/llvm/llvm-project/pull/97235)
* [ConstraintElimination](https://github.com/llvm/llvm-project/pull/97974)

I have also added folds of idiomatic ways that a 3-way comparison can be expressed to a call to the corresponding intrinsic.

In the backend there are two different ways of expanding the intrinsics: [as a nested select](https://github.com/llvm/llvm-project/pull/91871) (i.e. `(x < y) ? -1 : (x > y ? 1 : 0)`) or [as a subtraction of zero-extended comparisons](https://github.com/llvm/llvm-project/pull/98774) (`zext(x > y) - zext(x < y)`). The second option is the default one, but targets can choose to use the first one through a TLI hook.

# Results

I think that overall the project was successful and brought a small positive change to LLVM. To demonstrate its impact in a small test case, the following function in C++ that uses the spaceship operator was compiled twice, first with Clang 18.1 and then with Clang built from the main branch of LLVM repository:

```C++
#include <compare>

std::strong_ordering cmp(unsigned int a, unsigned int b)
{
return a <=> b;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say that we would get an even more compelling example by using the spaceship operator, because it generates pretty terrible asm on 18.1 (8 instructions instead of 3): https://c.godbolt.org/z/WGP7vba14

But we need llvm/llvm-project#107314 to land before we actually get the good new codegen for this pattern :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch has landed, so now this example goes from

spaceship(unsigned int, unsigned int):
        xor     ecx, ecx
        cmp     edi, esi
        mov     eax, 0
        sbb     eax, eax
        or      al, 1
        cmp     edi, esi
        movzx   eax, al
        cmove   eax, ecx
        ret

to

spaceship(unsigned int, unsigned int):
        cmp     edi, esi
        seta    al
        sbb     al, 0
        ret

which is quite nice!

```

With Clang 18.1:

```text
; ====== LLVM IR ======
define i8 @cmp(i32 %a, i32 %b) {
entry:
%cmp.lt = icmp ult i32 %a, %b
%sel.lt = select i1 %cmp.lt, i8 -1, i8 1
%cmp.eq = icmp eq i32 %a, %b
%sel.eq = select i1 %cmp.eq, i8 0, i8 %sel.lt
ret i8 %sel.eq
}

; ====== x86_64 assembly ======
cmp:
xor ecx, ecx
cmp edi, esi
mov eax, 0
sbb eax, eax
or al, 1
cmp edi, esi
movzx eax, al
cmove eax, ecx
ret
```

With freshly built Clang:

```plain
; ====== LLVM IR ======
define i8 @cmp(i32 %a, i32 %b) {
entry:
%sel.eq = tail call i8 @llvm.ucmp.i8.i32(i32 %a, i32 %b)
ret i8 %sel.eq
}

; ====== x86_64 assembly ======
cmp:
cmp edi, esi
seta al
sbb al, 0
ret
```

As you can see, the number of instructions in the generated code had gone down considerably (from 8 to 3 excluding `ret`). Although this isn't much and is a small synthetic test, it can still make a noticeable impact if code like this is found in a hot path somewhere.

The impact of these changes on real-world code is much harder to quantify. Looking at llvm-opt-benchmark, there are quite a few places where the intrinsics are being used, which suggests that some improvement must have taken place, although it is unlikely to be significant in all but very few cases.

# Future Work

There are still many opportunities for optimization in the middle-end, some of which are already known and being worked on at the time of writing this, others are yet to be discovered. I would also like to allow pointers and vectors of pointers to be valid operands for the intrinsics, although that would be quite a minor change. In the backend I would also like to work on better handling of intrinsics in GlobalISel, which is something that I didn't have enough time for and other members of LLVM community had helped me with.

# Acknowledgements

None of this would have been possible without my two amazing mentors, Nikita Popov and Dhruv Chawla, and the LLVM community as a whole. Thank you for helping me on this journey and I am looking forward to working with you in the future.