Fix has_cpuid implementation #492

gnzlbg · 2018-06-23T11:19:42Z

Closes rust-lang/rust#51691

rust-lang/rust#51691

gnzlbg · 2018-06-23T12:08:50Z

cc @petrochenkov @japaric @parched @rkruppe please review

I am not 100% sure of the inline assembly, and it is really hard to find useful documentation about it for rust.

gnzlbg · 2018-06-23T12:09:40Z

I also would like to add one comment to each line of inline assembly, but it seems that this is not supported.

stevecheckoway · 2018-06-23T12:30:35Z

coresimd/x86/cpuid.rs

+                pop %eax
+                xor %eax, %ecx
+                shrd %eax, 21
+                popfd


Either this line (along with the corresponding pushfd) is going to break test_has_cpuid which is testing that eflags has changed or, more likely, that test was already totally broken and should just be deleted. In fact, since the comparison is going to potentially modify eflags, I don't think that test makes any sense at all.

test_has_cpuid just checks that the function returns true, there is a new test that checks that has_cpuid is indempotent. Maybe both tests should also check that the EFLAGS register wasn't modified, what do you think?

sorry, I think I didn't fully gotten what you meant on the first read. I just changed the test to verify that calling has_cpuid does not modify the EFLAGS since I think it should be idempotent.

That is, it should try to modify the eflags, see if they were modified, and then revert the modification before returning the result.

fn test_has_cpuid() { unsafe { let before = __readeflags(); if cpuid::has_cpuid() { assert!(before != __readeflags()); } else { assert!(before == __readeflags()); } } }

is testing that eflags changes if cpuid is supported and doesn't change if it's not. But the second and third __readeflags() happen after performing a comparison and a conditional jump. The comparison (probably a test instruction in this case) is going to set/reset bits in eflags which makes the comparison with before meaningless.

hanna-kruppe · 2018-06-23T12:37:09Z

@gnzlbg

I also would like to add one comment to each line of inline assembly, but it seems that this is not supported.

It's supported, but you have to use assembly comment syntax, which is # rather than // on x86.

gnzlbg · 2018-06-23T12:38:16Z

@rkruppe thanks, I haven't used Rust inline assembly in a long time, and I just wasn't able to find any good documentation about this anywhere.

stevecheckoway

I don't think this is correct. There's no guarantee what the state of eflags is going to be at any point as far as I know.

For one thing, the comparison at the end of has_cpuid and certainly the debug_assert! in debug mode is going to set bits in eflags.

I think you should delete test_has_cpuid altogether. test_has_cpuid_idempotent is fine as you initially wrote it.

gnzlbg · 2018-06-23T13:04:16Z

So I've added some comments to the assembly, and I've fixed the test. I'll still wait for @petrochenkov 's feedback before merging this. Let me know if the comments are unclear or you find any other issues.

hanna-kruppe · 2018-06-23T13:10:32Z

coresimd/x86/cpuid.rs

+            // If it is, then `cpuid` is available.
+            let eax: i32;
+            asm!(r#"
+                # Save ecx (__fastcall needs it preserved) and save a


I would expect either saving ecx in the asm or declaring it as clobbered, not both. If it's in the list of clobbered registers, the backend is responsible for saving and restoring ecx if it contains a live value at the locations. Since that's simpler and less error prone than manual saving and restoring (and can even avoid the save-restore dance altogether in many cases), I'd prefer that.

…y clobbered

gnzlbg · 2018-06-23T14:55:36Z

coresimd/x86/cpuid.rs

+                # So if the value of the 21st bit is 1, cpuid is available,
+                # and if it is zero, it isn't because we didn't manage to
+                # modify it:
+                shrd %eax, 21


So the problem is that this is the instruction for doubles, it should be shr, or shrl, but none of these appear to work.

stevecheckoway · 2018-06-23T15:23:26Z

Here's an alternative implementation that's slightly shorter and doesn't explicitly name the registers leaving llvm free to choose them.

    let result: u32;
    let _temp: u32;
    unsafe {
        asm!(r#"
            pushfd
            pop     $0
            mov     $1, $0
            xor     $0, 0x200000
            push    $0
            popfd
            pushfd
            pop     $0
            xor     $0, $1
            shr     $0, 21
            "#
            : "=r"(result), "=r"(_temp)
            :
            : "cc", "memory"
            : "intel");
    }
    result != 0

Testing for not 0 leads to very slightly better code gen than testing for 1. I don't think this needs to be volatile since it only needs to be emitted if the result is actually used.

gnzlbg · 2018-06-24T09:07:56Z

@stevecheckoway I've pushed your solution in the last commit, mind reviewing?

hanna-kruppe · 2018-06-24T09:38:28Z

coresimd/x86/cpuid.rs

@@ -78,34 +78,53 @@ pub unsafe fn __cpuid(leaf: u32) -> CpuidResult {
 }

 /// Does the host support the `cpuid` instruction?
-#[inline]
+#[inline(never)]


hanna-kruppe · 2018-06-24T09:40:29Z

coresimd/x86/cpuid.rs

+                    # are zero:
+                    xor     $0, $1
+                    # Store in $0 the value of the 21st bit
+                    shr     $0, 21


I know this was in @stevecheckoway's version too, but I wonder why keep shifting if you're going to check for non-zero anyway?

Honestly, the real reason that I left the shift is because I didn't want to read this table in sufficient detail to determine if there were any important interactions between the popfd and pushfd.

There's actually a good reason to do the shift (or use result & 0x20_0000 != 0). Of course, it's possible that an interrupt could occur between the popfd and the pushfd and the OS would be free to change flags like the AC flag or (even less likely) the IOPL bits. But a much better reason to do it is the trap flag can be set by attaching a debugger, breaking on the pushfd and single stepping it.

Here's an example program demonstrating this:

#![feature(asm)] fn main() { let eflags1: u32; let eflags2: u32; unsafe { asm!(r#" pushfd mov $0, dword ptr [esp] popfd pushfd pop $1 "# : "=r"(eflags1), "=r"(eflags2) : : "cc", "memory" : "intel"); } println!("{:x}", eflags1); println!("{:x}", eflags2); }

Running it outside the debugger gives

$ rustc -g --target i686-apple-darwin bad.rs $ ./bad 286 286 $ ./bad 282 282

so you can see that the parity flag is changing between runs for some reason, but the two attempts reading the flags are consistent. Now let's do it with a debugger (edited to remove the unhelpful output)

$ otool -tv bad|grep -A1 popfl 0000279f popfl 000027a0 pushfl $ rust-lldb bad [snip] (lldb) b &bad::main Breakpoint 1: where = bad`bad::main::hc90b428e8cbd006a at bad.rs:3, address = 0x00002780 (lldb) r Process 3423 launched: '/Users/steve/bad' (i386) Process 3423 stopped [snip] Target 0: (bad) stopped. (lldb) br set -a 0x27a0 Breakpoint 2: where = bad`bad::main::hc90b428e8cbd006a + 32 at bad.rs:7, address = 0x000027a0 (lldb) c Process 3423 resuming Process 3423 stopped [snip] Target 0: (bad) stopped. (lldb) si Process 3423 stopped [snip] Target 0: (bad) stopped. (lldb) c Process 3423 resuming 286 386 Process 3423 exited with status = 0 (0x00000000)

So as you can see, the other flags really can change. Now I don't have access to an old-enough Intel processor that cpuid isn't supported to demonstrate the issue there. So either the shift or a mask is needed to select the appropriate bit.

Great point. If the other bits could change, it would surely be safer to do the bitwise and?

Actually, I'm really tempted to be minimally smart here. It seems best to just stick to the exact sequence of instructions Intel suggests in the document linked in the comment, they gotta know best, right?

I don’t think there’s any danger of Intel adding new flags to eflags and all of bits 22 through 31 (63 in rflags) are reserved and have value zero.

That said, not doing a shift but instead masking result and testing for nonzero in Rust code and not in assembly should compile to a test r32, imm32 and setne r8 rather than shr r32, imm8 test r32, r32 and setne r8. I can’t imagine that’s ever going to be important though.

Using Intel’s code is fine too. It’s just doing some unnecessary work

Looking at the 32-bit toolchains supported by rustup, only one potentially doesn't have the cpuid instruction:

i386-apple-ios i586-pc-windows-msvc i586-unknown-linux-gnu i586-unknown-linux-musl i686-apple-darwin (installed) i686-linux-android i686-pc-windows-gnu i686-pc-windows-msvc i686-unknown-freebsd i686-unknown-linux-gnu i686-unknown-linux-musl

I think i386-apple-ios is the iOS simulator. I don't know if it actually supports cpuid. All of the rest of them are targeting at least a Pentium which supports cpuid. Unless there are plans to support early 486s, this could just return true.

LLVM supports more targets than the ones rustc supports and these can be targeted using xargo which is being integrated into cargo.

I think it is a good idea to add optimizations of this feature for targets that are known to have the cpuid function. The easiest one might be to return true on x86 if the target supports mmx. That covers all those targets except i386-apple-ios.

EDIT: I've filled #497 , PRs welcome ;)

gnzlbg · 2018-06-25T08:17:41Z

So thanks you all for the feedback! I was traveling this weekend and this should have been fixed better sooner but hopefully now its good enough. Should be picked up by the next stdsimd update in rust-lang/rust : rust-lang/rust#51695

gnzlbg added 2 commits June 23, 2018 13:07

Fixes has_cpuid implementation

7f12738

rust-lang/rust#51691

use inline assembly instead of the __{read,write}eflags intrinsics

a374f60

gnzlbg changed the title ~~Temporary "fix" for has_cpuid implementation~~ Fix has_cpuid implementation Jun 23, 2018

gnzlbg mentioned this pull request Jun 23, 2018

std::arch::x86::has_cpuid() alternates between true and false when CPUID is supported rust-lang/rust#51691

Closed

stevecheckoway reviewed Jun 23, 2018

View reviewed changes

fix has_cpuid_test

f87c949

stevecheckoway suggested changes Jun 23, 2018

View reviewed changes

fix test and add comments

5760ad3

hanna-kruppe reviewed Jun 23, 2018

View reviewed changes

no need to save/restore ecx in the inline assembly because its alread…

b12aef6

…y clobbered

est31 mentioned this pull request Jun 23, 2018

Update stdsimd rust-lang/rust#51695

Merged

gnzlbg commented Jun 23, 2018

View reviewed changes

fix the shift

277a49a

Use Stephen Checkoway solution

3605ad3

hanna-kruppe reviewed Jun 24, 2018

View reviewed changes

gnzlbg added 4 commits June 24, 2018 11:57

remove unnecessary shift

c70093b

remove unnecesary unsafe block

6ee5955

use and to test the 21st bit

9f9427e

hoist the and out of the inline assembly

1831237

gnzlbg merged commit fd3a3f2 into rust-lang:master Jun 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix has_cpuid implementation #492

Fix has_cpuid implementation #492

gnzlbg commented Jun 23, 2018 •

edited

Loading

gnzlbg commented Jun 23, 2018

gnzlbg commented Jun 23, 2018

stevecheckoway Jun 23, 2018 •

edited

Loading

gnzlbg Jun 23, 2018

gnzlbg Jun 23, 2018 •

edited

Loading

stevecheckoway Jun 23, 2018

hanna-kruppe commented Jun 23, 2018

gnzlbg commented Jun 23, 2018

stevecheckoway left a comment

gnzlbg commented Jun 23, 2018

hanna-kruppe Jun 23, 2018

gnzlbg Jun 23, 2018

stevecheckoway commented Jun 23, 2018 •

edited

Loading

gnzlbg commented Jun 24, 2018

hanna-kruppe Jun 24, 2018

hanna-kruppe Jun 24, 2018

stevecheckoway Jun 24, 2018

hanna-kruppe Jun 24, 2018

hanna-kruppe Jun 24, 2018

stevecheckoway Jun 24, 2018

stevecheckoway Jun 24, 2018 •

edited

Loading

gnzlbg Jun 24, 2018 •

edited

Loading

gnzlbg commented Jun 25, 2018

Fix has_cpuid implementation #492

Fix has_cpuid implementation #492

Conversation

gnzlbg commented Jun 23, 2018 • edited Loading

gnzlbg commented Jun 23, 2018

gnzlbg commented Jun 23, 2018

stevecheckoway Jun 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gnzlbg Jun 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hanna-kruppe commented Jun 23, 2018

gnzlbg commented Jun 23, 2018

stevecheckoway left a comment

Choose a reason for hiding this comment

gnzlbg commented Jun 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevecheckoway commented Jun 23, 2018 • edited Loading

gnzlbg commented Jun 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevecheckoway Jun 24, 2018 • edited Loading

Choose a reason for hiding this comment

gnzlbg Jun 24, 2018 • edited Loading

Choose a reason for hiding this comment

gnzlbg commented Jun 25, 2018

gnzlbg commented Jun 23, 2018 •

edited

Loading

stevecheckoway Jun 23, 2018 •

edited

Loading

gnzlbg Jun 23, 2018 •

edited

Loading

stevecheckoway commented Jun 23, 2018 •

edited

Loading

stevecheckoway Jun 24, 2018 •

edited

Loading

gnzlbg Jun 24, 2018 •

edited

Loading