-
Notifications
You must be signed in to change notification settings - Fork 287
Fix has_cpuid implementation #492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
has_cpuid
implementation
cc @petrochenkov @japaric @parched @rkruppe please review I am not 100% sure of the inline assembly, and it is really hard to find useful documentation about it for rust. |
I also would like to add one comment to each line of inline assembly, but it seems that this is not supported. |
coresimd/x86/cpuid.rs
Outdated
pop %eax | ||
xor %eax, %ecx | ||
shrd %eax, 21 | ||
popfd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either this line (along with the corresponding pushfd
) is going to break test_has_cpuid
which is testing that eflags has changed or, more likely, that test was already totally broken and should just be deleted. In fact, since the comparison is going to potentially modify eflags
, I don't think that test makes any sense at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_has_cpuid
just checks that the function returns true
, there is a new test that checks that has_cpuid
is indempotent. Maybe both tests should also check that the EFLAGS register wasn't modified, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I think I didn't fully gotten what you meant on the first read. I just changed the test to verify that calling has_cpuid
does not modify the EFLAGS
since I think it should be idempotent.
That is, it should try to modify the eflags, see if they were modified, and then revert the modification before returning the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn test_has_cpuid() {
unsafe {
let before = __readeflags();
if cpuid::has_cpuid() {
assert!(before != __readeflags());
} else {
assert!(before == __readeflags());
}
}
}
is testing that eflags
changes if cpuid
is supported and doesn't change if it's not. But the second and third __readeflags()
happen after performing a comparison and a conditional jump. The comparison (probably a test
instruction in this case) is going to set/reset bits in eflags
which makes the comparison with before
meaningless.
It's supported, but you have to use assembly comment syntax, which is |
@rkruppe thanks, I haven't used Rust inline assembly in a long time, and I just wasn't able to find any good documentation about this anywhere. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. There's no guarantee what the state of eflags
is going to be at any point as far as I know.
For one thing, the comparison at the end of has_cpuid
and certainly the debug_assert!
in debug mode is going to set bits in eflags
.
I think you should delete test_has_cpuid
altogether. test_has_cpuid_idempotent
is fine as you initially wrote it.
So I've added some comments to the assembly, and I've fixed the test. I'll still wait for @petrochenkov 's feedback before merging this. Let me know if the comments are unclear or you find any other issues. |
coresimd/x86/cpuid.rs
Outdated
// If it is, then `cpuid` is available. | ||
let eax: i32; | ||
asm!(r#" | ||
# Save ecx (__fastcall needs it preserved) and save a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect either saving ecx in the asm or declaring it as clobbered, not both. If it's in the list of clobbered registers, the backend is responsible for saving and restoring ecx if it contains a live value at the locations. Since that's simpler and less error prone than manual saving and restoring (and can even avoid the save-restore dance altogether in many cases), I'd prefer that.
coresimd/x86/cpuid.rs
Outdated
# So if the value of the 21st bit is 1, cpuid is available, | ||
# and if it is zero, it isn't because we didn't manage to | ||
# modify it: | ||
shrd %eax, 21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the problem is that this is the instruction for doubles, it should be shr, or shrl, but none of these appear to work.
Here's an alternative implementation that's slightly shorter and doesn't explicitly name the registers leaving llvm free to choose them. let result: u32;
let _temp: u32;
unsafe {
asm!(r#"
pushfd
pop $0
mov $1, $0
xor $0, 0x200000
push $0
popfd
pushfd
pop $0
xor $0, $1
shr $0, 21
"#
: "=r"(result), "=r"(_temp)
:
: "cc", "memory"
: "intel");
}
result != 0 Testing for not 0 leads to very slightly better code gen than testing for 1. I don't think this needs to be volatile since it only needs to be emitted if the result is actually used. |
@stevecheckoway I've pushed your solution in the last commit, mind reviewing? |
coresimd/x86/cpuid.rs
Outdated
@@ -78,34 +78,53 @@ pub unsafe fn __cpuid(leaf: u32) -> CpuidResult { | |||
} | |||
|
|||
/// Does the host support the `cpuid` instruction? | |||
#[inline] | |||
#[inline(never)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
coresimd/x86/cpuid.rs
Outdated
# are zero: | ||
xor $0, $1 | ||
# Store in $0 the value of the 21st bit | ||
shr $0, 21 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this was in @stevecheckoway's version too, but I wonder why keep shifting if you're going to check for non-zero anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, the real reason that I left the shift is because I didn't want to read this table in sufficient detail to determine if there were any important interactions between the popfd
and pushfd
.
There's actually a good reason to do the shift (or use result & 0x20_0000 != 0
). Of course, it's possible that an interrupt could occur between the popfd
and the pushfd
and the OS would be free to change flags like the AC flag or (even less likely) the IOPL bits. But a much better reason to do it is the trap flag can be set by attaching a debugger, breaking on the pushfd
and single stepping it.
Here's an example program demonstrating this:
#![feature(asm)]
fn main() {
let eflags1: u32;
let eflags2: u32;
unsafe {
asm!(r#"
pushfd
mov $0, dword ptr [esp]
popfd
pushfd
pop $1
"#
: "=r"(eflags1), "=r"(eflags2)
:
: "cc", "memory"
: "intel");
}
println!("{:x}", eflags1);
println!("{:x}", eflags2);
}
Running it outside the debugger gives
$ rustc -g --target i686-apple-darwin bad.rs
$ ./bad
286
286
$ ./bad
282
282
so you can see that the parity flag is changing between runs for some reason, but the two attempts reading the flags are consistent. Now let's do it with a debugger (edited to remove the unhelpful output)
$ otool -tv bad|grep -A1 popfl
0000279f popfl
000027a0 pushfl
$ rust-lldb bad
[snip]
(lldb) b &bad::main
Breakpoint 1: where = bad`bad::main::hc90b428e8cbd006a at bad.rs:3, address = 0x00002780
(lldb) r
Process 3423 launched: '/Users/steve/bad' (i386)
Process 3423 stopped
[snip]
Target 0: (bad) stopped.
(lldb) br set -a 0x27a0
Breakpoint 2: where = bad`bad::main::hc90b428e8cbd006a + 32 at bad.rs:7, address = 0x000027a0
(lldb) c
Process 3423 resuming
Process 3423 stopped
[snip]
Target 0: (bad) stopped.
(lldb) si
Process 3423 stopped
[snip]
Target 0: (bad) stopped.
(lldb) c
Process 3423 resuming
286
386
Process 3423 exited with status = 0 (0x00000000)
So as you can see, the other flags really can change. Now I don't have access to an old-enough Intel processor that cpuid
isn't supported to demonstrate the issue there. So either the shift or a mask is needed to select the appropriate bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. If the other bits could change, it would surely be safer to do the bitwise and?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm really tempted to be minimally smart here. It seems best to just stick to the exact sequence of instructions Intel suggests in the document linked in the comment, they gotta know best, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think there’s any danger of Intel adding new flags to eflags and all of bits 22 through 31 (63 in rflags) are reserved and have value zero.
That said, not doing a shift but instead masking result and testing for nonzero in Rust code and not in assembly should compile to a test r32, imm32
and setne r8
rather than shr r32, imm8
test r32, r32
and setne r8
. I can’t imagine that’s ever going to be important though.
Using Intel’s code is fine too. It’s just doing some unnecessary work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the 32-bit toolchains supported by rustup, only one potentially doesn't have the cpuid
instruction:
i386-apple-ios
i586-pc-windows-msvc
i586-unknown-linux-gnu
i586-unknown-linux-musl
i686-apple-darwin (installed)
i686-linux-android
i686-pc-windows-gnu
i686-pc-windows-msvc
i686-unknown-freebsd
i686-unknown-linux-gnu
i686-unknown-linux-musl
I think i386-apple-ios
is the iOS simulator. I don't know if it actually supports cpuid
. All of the rest of them are targeting at least a Pentium which supports cpuid
. Unless there are plans to support early 486s, this could just return true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LLVM supports more targets than the ones rustc supports and these can be targeted using xargo
which is being integrated into cargo
.
I think it is a good idea to add optimizations of this feature for targets that are known to have the cpuid function. The easiest one might be to return true
on x86
if the target supports mmx
. That covers all those targets except i386-apple-ios
.
EDIT: I've filled #497 , PRs welcome ;)
So thanks you all for the feedback! I was traveling this weekend and this should have been fixed better sooner but hopefully now its good enough. Should be picked up by the next |
Closes rust-lang/rust#51691