Skip to content

Fix has_cpuid implementation #492

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 25, 2018
Merged

Fix has_cpuid implementation #492

merged 11 commits into from
Jun 25, 2018

Conversation

gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented Jun 23, 2018

@gnzlbg gnzlbg changed the title Temporary "fix" for has_cpuid implementation Fix has_cpuid implementation Jun 23, 2018
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 23, 2018

cc @petrochenkov @japaric @parched @rkruppe please review

I am not 100% sure of the inline assembly, and it is really hard to find useful documentation about it for rust.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 23, 2018

I also would like to add one comment to each line of inline assembly, but it seems that this is not supported.

pop %eax
xor %eax, %ecx
shrd %eax, 21
popfd
Copy link

@stevecheckoway stevecheckoway Jun 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either this line (along with the corresponding pushfd) is going to break test_has_cpuid which is testing that eflags has changed or, more likely, that test was already totally broken and should just be deleted. In fact, since the comparison is going to potentially modify eflags, I don't think that test makes any sense at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_has_cpuid just checks that the function returns true, there is a new test that checks that has_cpuid is indempotent. Maybe both tests should also check that the EFLAGS register wasn't modified, what do you think?

Copy link
Contributor Author

@gnzlbg gnzlbg Jun 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I think I didn't fully gotten what you meant on the first read. I just changed the test to verify that calling has_cpuid does not modify the EFLAGS since I think it should be idempotent.

That is, it should try to modify the eflags, see if they were modified, and then revert the modification before returning the result.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    fn test_has_cpuid() {
        unsafe {
            let before = __readeflags();

            if cpuid::has_cpuid() {
                assert!(before != __readeflags());
            } else {
                assert!(before == __readeflags());
            }
        }
    }

is testing that eflags changes if cpuid is supported and doesn't change if it's not. But the second and third __readeflags() happen after performing a comparison and a conditional jump. The comparison (probably a test instruction in this case) is going to set/reset bits in eflags which makes the comparison with before meaningless.

@hanna-kruppe
Copy link

@gnzlbg

I also would like to add one comment to each line of inline assembly, but it seems that this is not supported.

It's supported, but you have to use assembly comment syntax, which is # rather than // on x86.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 23, 2018

@rkruppe thanks, I haven't used Rust inline assembly in a long time, and I just wasn't able to find any good documentation about this anywhere.

Copy link

@stevecheckoway stevecheckoway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. There's no guarantee what the state of eflags is going to be at any point as far as I know.

For one thing, the comparison at the end of has_cpuid and certainly the debug_assert! in debug mode is going to set bits in eflags.

I think you should delete test_has_cpuid altogether. test_has_cpuid_idempotent is fine as you initially wrote it.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 23, 2018

So I've added some comments to the assembly, and I've fixed the test. I'll still wait for @petrochenkov 's feedback before merging this. Let me know if the comments are unclear or you find any other issues.

// If it is, then `cpuid` is available.
let eax: i32;
asm!(r#"
# Save ecx (__fastcall needs it preserved) and save a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect either saving ecx in the asm or declaring it as clobbered, not both. If it's in the list of clobbered registers, the backend is responsible for saving and restoring ecx if it contains a live value at the locations. Since that's simpler and less error prone than manual saving and restoring (and can even avoid the save-restore dance altogether in many cases), I'd prefer that.

# So if the value of the 21st bit is 1, cpuid is available,
# and if it is zero, it isn't because we didn't manage to
# modify it:
shrd %eax, 21
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem is that this is the instruction for doubles, it should be shr, or shrl, but none of these appear to work.

@stevecheckoway
Copy link

stevecheckoway commented Jun 23, 2018

Here's an alternative implementation that's slightly shorter and doesn't explicitly name the registers leaving llvm free to choose them.

    let result: u32;
    let _temp: u32;
    unsafe {
        asm!(r#"
            pushfd
            pop     $0
            mov     $1, $0
            xor     $0, 0x200000
            push    $0
            popfd
            pushfd
            pop     $0
            xor     $0, $1
            shr     $0, 21
            "#
            : "=r"(result), "=r"(_temp)
            :
            : "cc", "memory"
            : "intel");
    }
    result != 0

Testing for not 0 leads to very slightly better code gen than testing for 1. I don't think this needs to be volatile since it only needs to be emitted if the result is actually used.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 24, 2018

@stevecheckoway I've pushed your solution in the last commit, mind reviewing?

@@ -78,34 +78,53 @@ pub unsafe fn __cpuid(leaf: u32) -> CpuidResult {
}

/// Does the host support the `cpuid` instruction?
#[inline]
#[inline(never)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

# are zero:
xor $0, $1
# Store in $0 the value of the 21st bit
shr $0, 21

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this was in @stevecheckoway's version too, but I wonder why keep shifting if you're going to check for non-zero anyway?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, the real reason that I left the shift is because I didn't want to read this table in sufficient detail to determine if there were any important interactions between the popfd and pushfd.

image

There's actually a good reason to do the shift (or use result & 0x20_0000 != 0). Of course, it's possible that an interrupt could occur between the popfd and the pushfd and the OS would be free to change flags like the AC flag or (even less likely) the IOPL bits. But a much better reason to do it is the trap flag can be set by attaching a debugger, breaking on the pushfd and single stepping it.

Here's an example program demonstrating this:

#![feature(asm)]

fn main() {
    let eflags1: u32;
    let eflags2: u32;
    unsafe {
        asm!(r#"
            pushfd
            mov     $0, dword ptr [esp]
            popfd
            pushfd
            pop     $1
            "#
            : "=r"(eflags1), "=r"(eflags2)
            :
            : "cc", "memory"
            : "intel");
    }
    println!("{:x}", eflags1);
    println!("{:x}", eflags2);
}

Running it outside the debugger gives

$ rustc -g --target i686-apple-darwin bad.rs
$ ./bad
286
286
$ ./bad
282
282

so you can see that the parity flag is changing between runs for some reason, but the two attempts reading the flags are consistent. Now let's do it with a debugger (edited to remove the unhelpful output)

$ otool -tv bad|grep -A1 popfl
0000279f	popfl
000027a0	pushfl
$ rust-lldb bad
[snip]
(lldb) b &bad::main
Breakpoint 1: where = bad`bad::main::hc90b428e8cbd006a at bad.rs:3, address = 0x00002780
(lldb) r
Process 3423 launched: '/Users/steve/bad' (i386)
Process 3423 stopped
[snip]
Target 0: (bad) stopped.
(lldb) br set -a 0x27a0
Breakpoint 2: where = bad`bad::main::hc90b428e8cbd006a + 32 at bad.rs:7, address = 0x000027a0
(lldb) c
Process 3423 resuming
Process 3423 stopped
[snip]
Target 0: (bad) stopped.
(lldb) si
Process 3423 stopped
[snip]
Target 0: (bad) stopped.
(lldb) c
Process 3423 resuming
286
386
Process 3423 exited with status = 0 (0x00000000)

So as you can see, the other flags really can change. Now I don't have access to an old-enough Intel processor that cpuid isn't supported to demonstrate the issue there. So either the shift or a mask is needed to select the appropriate bit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. If the other bits could change, it would surely be safer to do the bitwise and?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm really tempted to be minimally smart here. It seems best to just stick to the exact sequence of instructions Intel suggests in the document linked in the comment, they gotta know best, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think there’s any danger of Intel adding new flags to eflags and all of bits 22 through 31 (63 in rflags) are reserved and have value zero.

That said, not doing a shift but instead masking result and testing for nonzero in Rust code and not in assembly should compile to a test r32, imm32 and setne r8 rather than shr r32, imm8 test r32, r32 and setne r8. I can’t imagine that’s ever going to be important though.

Using Intel’s code is fine too. It’s just doing some unnecessary work

Copy link

@stevecheckoway stevecheckoway Jun 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the 32-bit toolchains supported by rustup, only one potentially doesn't have the cpuid instruction:

i386-apple-ios
i586-pc-windows-msvc
i586-unknown-linux-gnu
i586-unknown-linux-musl
i686-apple-darwin (installed)
i686-linux-android
i686-pc-windows-gnu
i686-pc-windows-msvc
i686-unknown-freebsd
i686-unknown-linux-gnu
i686-unknown-linux-musl

I think i386-apple-ios is the iOS simulator. I don't know if it actually supports cpuid. All of the rest of them are targeting at least a Pentium which supports cpuid. Unless there are plans to support early 486s, this could just return true.

Copy link
Contributor Author

@gnzlbg gnzlbg Jun 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM supports more targets than the ones rustc supports and these can be targeted using xargo which is being integrated into cargo.

I think it is a good idea to add optimizations of this feature for targets that are known to have the cpuid function. The easiest one might be to return true on x86 if the target supports mmx. That covers all those targets except i386-apple-ios.

EDIT: I've filled #497 , PRs welcome ;)

@gnzlbg gnzlbg merged commit fd3a3f2 into rust-lang:master Jun 25, 2018
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jun 25, 2018

So thanks you all for the feedback! I was traveling this weekend and this should have been fixed better sooner but hopefully now its good enough. Should be picked up by the next stdsimd update in rust-lang/rust : rust-lang/rust#51695

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants