-
Notifications
You must be signed in to change notification settings - Fork 287
Implement avx512bf16 intrinsics #998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Amanieu (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
"vcvtneps2bf16 {1} {{k1}}, {0}", | ||
in(xmm_reg) a, | ||
inout(xmm_reg) src => result, | ||
in("edi") mask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to use at&t syntax for now. Some older LLVM versions that are still supported by rustc don't correctly support intel syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bjorn3 for your suggestion. I have updated it.
I tested and debugged my code with launching the command "TARGET=x86_64-unknown-linux-gnu ci/run.sh" on the cooperlake. Are any other tests required? |
in(xmm_reg) a, | ||
lateout(xmm_reg) result, | ||
options(att_syntax) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this implemented using asm!
instead of LLVM intrinsics?
You should fix the CI errors.
|
: "={xmm0}"(result) | ||
: "{xmm0}"(a) | ||
: "xmm0" | ||
:); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use llvm_asm!
, it is going to be deprecated soon.
Instead just wrap the asm!
in a cfg_if
and provide a dummy unreachable!()
implementation on other platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. I have updated the code following your suggestions.
crates/core_arch/Cargo.toml
Outdated
@@ -25,3 +25,6 @@ maintenance = { status = "experimental" } | |||
[dev-dependencies] | |||
stdarch-test = { version = "0.*", path = "../stdarch-test" } | |||
std_detect = { version = "0.*", path = "../std_detect" } | |||
|
|||
[dependencies] | |||
cfg-if = "0.1.10" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core_arch can't have any dependencies as it is included in libcore, which everyone else depends on, including cfg-if.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Then is there any way to use cfg_if?
cfg_if! { | ||
if #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] { | ||
let mut result:__m128bh; | ||
asm!( | ||
"vcvtneps2bf16 {0}, {1}", | ||
in(xmm_reg) a, | ||
lateout(xmm_reg) result, | ||
options(att_syntax) | ||
); | ||
result | ||
} | ||
else { | ||
unreachable!() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cfg_if! { | |
if #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] { | |
let mut result:__m128bh; | |
asm!( | |
"vcvtneps2bf16 {0}, {1}", | |
in(xmm_reg) a, | |
lateout(xmm_reg) result, | |
options(att_syntax) | |
); | |
result | |
} | |
else { | |
unreachable!() | |
} | |
} | |
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))] | |
{ | |
let mut result:__m128bh; | |
asm!( | |
"vcvtneps2bf16 {0}, {1}", | |
in(xmm_reg) a, | |
lateout(xmm_reg) result, | |
options(att_syntax) | |
); | |
result | |
} | |
#[cfg(not(any(target_arch = "x86", target_arch = "x86_64")))] | |
{ | |
unreachable!() | |
} |
This should work instead of cfg-if.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much. It works. :)
The CI tests for i686-unknown-linux-gnu are still failed. It always failed with the error "register class |
We really should be using LLVM intrinsics for this instead of inline asm. From what I understand, the main issue is that we can't represent a |
|
I would recommend asking the compiler team on Zulip. I think this would require defining a wrapper struct like this: #[repr(simd_mask(4))]
struct i1x4(u8); Then you would need to adjust the calling convention code to handle this type. But I'm sure the compiler team will have a clearer idea of what would be required. |
Thanks for all of your advices. You are right and we should call LLVM intrinsics to avoid the issues. Actually I asked the question in rust github several days ago. But they didn't give me the final answer. I will try to ask it on Zulip. |
|
OK. I will also mention this to them. |
thread 'core_arch::x86::avx512bf16::assert__mm256_mask_dpbf16_ps_vdpbf16ps' panicked at 'failed to find instruction When rustc tried to generate your code, it did not use vcvtne2ps2bf16 instruction. |
Yes, I tried it on my test machine. They are generated correctly. It seems that vcvtneps2bf16 is not supported by the compiler in the CI environment. |
You can remove i586 in .github/workflows/main.yml temporary for test.
|
Hi
Hi Amanieu, does bf16 related feature supported by the rust compiler in CI environment? The CI tests failed because failed to find instruction vdpbf16ps in the disassembly', crates/stdarch-test/src/lib.rs:148:9 |
The |
Maybe try upgrading the |
How to upgrade it? Do I have the permission to do it? |
Just modify the |
If the MSVC targets still give you trouble then you can disable the test for those targets in the source code by modifying the |
@Amanieu Thank you very much for your help. If we want to run those bf16 tests in CI, another emulator needs to be set up. From what I learned, only cooper lake supports avx512bf16. If we start the emulator through "sde64 -cpx" to simulate cooper lake, then the other tests such as vnni will fail. So I suggest to start a separate emulator for bf16 tests only if CI doesn't allow to skip any tests. |
I thought cooper lake did support vnni? In any case, I'm happy to merge this if you don't think you can get the emulator working. |
Sorry for my typo, cooper lake doesn't support vbmi,vbmi2, so those tests will fail. Could you please help to merge this PR? I will add a new emulator target for bf16 tests later. |
Implement all avx512bf16 intrinsic APIs.