Skip to content

PartialOrd derived for C-like enum isn't properly optimized for the <= operator #73338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pubfnbar opened this issue Jun 14, 2020 · 6 comments · Fixed by #83819
Closed

PartialOrd derived for C-like enum isn't properly optimized for the <= operator #73338

pubfnbar opened this issue Jun 14, 2020 · 6 comments · Fixed by #83819
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@pubfnbar
Copy link
Contributor

The manual version:

use std::{
    cmp::Ordering,
    time::SystemTime,
};

#[repr(u32)]
#[derive(Copy, Clone, Eq, PartialEq)]
enum Foo {
    Zero,
    One,
    Two,
}

impl PartialOrd for Foo {
    fn partial_cmp(&self, _rhs: &Foo) -> Option<Ordering> {
        panic!()
    }

    fn le(&self, rhs: &Foo) -> bool {
        *self as u32 <= *rhs as u32
    }
}

fn main() {
    let opaque = SystemTime::now()
        .duration_since(SystemTime::UNIX_EPOCH)
        .unwrap()
        .as_secs() as u32
        % 3;

    let foo = unsafe { *(&opaque as *const _ as *const Foo) };

    if foo <= Foo::One {
        println!("here");
    }
}

The manual version ASM:

playground::main:
	subq	$88, %rsp
	callq	*std::time::SystemTime::now@GOTPCREL(%rip)
	movq	%rax, 56(%rsp)
	movq	%rdx, 64(%rsp)
	leaq	8(%rsp), %rdi
	leaq	56(%rsp), %rsi
	xorl	%edx, %edx
	xorl	%ecx, %ecx
	callq	*std::time::SystemTime::duration_since@GOTPCREL(%rip)
	cmpq	$1, 8(%rsp)
	je	.LBB4_4
	movl	16(%rsp), %eax
	movl	$2863311531, %ecx
	imulq	%rax, %rcx
	shrq	$33, %rcx
	leal	(%rcx,%rcx,2), %ecx
	subl	%ecx, %eax
	cmpl	$1, %eax
	ja	.LBB4_3
	leaq	.L__unnamed_2(%rip), %rax
	movq	%rax, 8(%rsp)
	movq	$1, 16(%rsp)
	movq	$0, 24(%rsp)
	leaq	.L__unnamed_3(%rip), %rax
	movq	%rax, 40(%rsp)
	movq	$0, 48(%rsp)
	leaq	8(%rsp), %rdi
	callq	*std::io::stdio::_print@GOTPCREL(%rip)

The derived version:

use std::{
    cmp::Ordering,
    time::SystemTime,
};

#[repr(u32)]
#[derive(Copy, Clone, Eq, PartialEq, PartialOrd)]
enum Foo {
    Zero,
    One,
    Two,
}

fn main() {
    let opaque = SystemTime::now()
        .duration_since(SystemTime::UNIX_EPOCH)
        .unwrap()
        .as_secs() as u32
        % 3;

    let foo = unsafe { *(&opaque as *const _ as *const Foo) };

    if foo <= Foo::One {
        println!("here");
    }
}

The derived version ASM:

playground::main:
	subq	$88, %rsp
	callq	*std::time::SystemTime::now@GOTPCREL(%rip)
	movq	%rax, 56(%rsp)
	movq	%rdx, 64(%rsp)
	leaq	8(%rsp), %rdi
	leaq	56(%rsp), %rsi
	xorl	%edx, %edx
	xorl	%ecx, %ecx
	callq	*std::time::SystemTime::duration_since@GOTPCREL(%rip)
	cmpq	$1, 8(%rsp)
	je	.LBB4_4
	movl	16(%rsp), %eax
	movl	$2863311531, %ecx
	imulq	%rax, %rcx
	shrq	$33, %rcx
	leal	(%rcx,%rcx,2), %ecx
	xorl	%edx, %edx
	subl	%ecx, %eax
	setne	%dl
	xorl	%ecx, %ecx
	cmpl	$1, %eax
	leaq	-1(%rdx,%rdx), %rax
	cmovneq	%rax, %rcx
	addq	$1, %rcx
	cmpq	$1, %rcx
	ja	.LBB4_3
	leaq	.L__unnamed_2(%rip), %rax
	movq	%rax, 8(%rsp)
	movq	$1, 16(%rsp)
	movq	$0, 24(%rsp)
	leaq	.L__unnamed_3(%rip), %rax
	movq	%rax, 40(%rsp)
	movq	$0, 48(%rsp)
	leaq	8(%rsp), %rdi
	callq	*std::io::stdio::_print@GOTPCREL(%rip)

The diff is that while the manual version does:

	subl	%ecx, %eax
	cmpl	$1, %eax

...the derived version does:

	xorl	%edx, %edx
	subl	%ecx, %eax
	setne	%dl
	xorl	%ecx, %ecx
	cmpl	$1, %eax
	leaq	-1(%rdx,%rdx), %rax
	cmovneq	%rax, %rcx
	addq	$1, %rcx
	cmpq	$1, %rcx

All the other PartialOrd operators are optimized properly.

@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 14, 2020
@dotdash
Copy link
Contributor

dotdash commented Jun 24, 2020

LLVM can't properly optimize the match against Some(Less | Equal), if we instead match against None | Some(Greater) and negate the result, LLVM does find a way to optimize the code, because the IR we generate is somewhat simpler having to handle only one of the Some() cases.

I didn't really look into how to maybe teach LLVM to handle this case, and didn't see anything obvious in the way we generate IR that could be improved here either.

Should I prepare a PR to adjust the default impl of le with a comment that explains why it's written that way, or should we investigate more here?

@pubfnbar
Copy link
Contributor Author

I wonder if this issue is related to #83623

@AngelicosPhosphoros
Copy link
Contributor

I wonder if this issue is related to #83623

We could test it in next nightly.

@AngelicosPhosphoros
Copy link
Contributor

I wonder if this issue is related to #83623

We could test it in next nightly.

Well, this issue doesn't fixed by it (which is expected because we changed MIR of && and ||, not match) so we need to make changes in LLVM or change our generated code.

@AngelicosPhosphoros
Copy link
Contributor

Created PR with fix.

@nikic
Copy link
Contributor

nikic commented Apr 4, 2021

Upstream fix for LLVM: llvm/llvm-project@9bad7de

bors added a commit to rust-lang-ci/rust that referenced this issue Apr 5, 2021
…artial-eq-impl, r=Mark-Simulacrum

Optimize jumps in PartialOrd le

Closes rust-lang#73338
This change stops default implementation of `le()` method of PartialOrd from generating jumps.
@bors bors closed this as completed in ed0d8fa Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants