Skip to content

Implement avx512bf16 intrinsics #998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Feb 10, 2021
Merged

Conversation

kangshan1157
Copy link
Contributor

Implement all avx512bf16 intrinsic APIs.

@rust-highfive
Copy link

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Amanieu (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

"vcvtneps2bf16 {1} {{k1}}, {0}",
in(xmm_reg) a,
inout(xmm_reg) src => result,
in("edi") mask,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use at&t syntax for now. Some older LLVM versions that are still supported by rustc don't correctly support intel syntax.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bjorn3 for your suggestion. I have updated it.

@kangshan1157
Copy link
Contributor Author

I tested and debugged my code with launching the command "TARGET=x86_64-unknown-linux-gnu ci/run.sh" on the cooperlake. Are any other tests required?

in(xmm_reg) a,
lateout(xmm_reg) result,
options(att_syntax)
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this implemented using asm! instead of LLVM intrinsics?

@Amanieu
Copy link
Member

Amanieu commented Feb 9, 2021

You should fix the CI errors.

  • The asm! is causing the doc build to fail, but you can fix that by #[cfg]ing it out for non-x86.
  • The verifier doesn't support the newly added types.

: "={xmm0}"(result)
: "{xmm0}"(a)
: "xmm0"
:);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use llvm_asm!, it is going to be deprecated soon.

Instead just wrap the asm! in a cfg_if and provide a dummy unreachable!() implementation on other platforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. I have updated the code following your suggestions.

@@ -25,3 +25,6 @@ maintenance = { status = "experimental" }
[dev-dependencies]
stdarch-test = { version = "0.*", path = "../stdarch-test" }
std_detect = { version = "0.*", path = "../std_detect" }

[dependencies]
cfg-if = "0.1.10"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core_arch can't have any dependencies as it is included in libcore, which everyone else depends on, including cfg-if.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Then is there any way to use cfg_if?

Comment on lines 153 to 167
cfg_if! {
if #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] {
let mut result:__m128bh;
asm!(
"vcvtneps2bf16 {0}, {1}",
in(xmm_reg) a,
lateout(xmm_reg) result,
options(att_syntax)
);
result
}
else {
unreachable!()
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cfg_if! {
if #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] {
let mut result:__m128bh;
asm!(
"vcvtneps2bf16 {0}, {1}",
in(xmm_reg) a,
lateout(xmm_reg) result,
options(att_syntax)
);
result
}
else {
unreachable!()
}
}
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
{
let mut result:__m128bh;
asm!(
"vcvtneps2bf16 {0}, {1}",
in(xmm_reg) a,
lateout(xmm_reg) result,
options(att_syntax)
);
result
}
#[cfg(not(any(target_arch = "x86", target_arch = "x86_64")))]
{
unreachable!()
}

This should work instead of cfg-if.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much. It works. :)

@kangshan1157
Copy link
Contributor Author

kangshan1157 commented Feb 9, 2021

The CI tests for i686-unknown-linux-gnu are still failed. It always failed with the error "register class xmm_reg requires the sse target feature". I added "target_feature(enable = "sse"), but it didn't work.
It seems that asm! doesn't support well on that target. Do I have to use "llvm_asm!" for x86 target?

@Amanieu
Copy link
Member

Amanieu commented Feb 9, 2021

xmm_reg can't be used on i586 because it doesn't support SSE (enabling it with #[target_feature] doesn't help, it needs to be globally enabled).

We really should be using LLVM intrinsics for this instead of inline asm. From what I understand, the main issue is that we can't represent a i1x4 with Rust code. Support for this needs to be added to rustc.

@minybot
Copy link
Contributor

minybot commented Feb 9, 2021

We really should be using LLVM intrinsics for this instead of inline asm. From what I understand, the main issue is that we can't represent a i1x4 with Rust code. Support for this needs to be added to rustc.
Btw, how difficult is it to add i1 to rustc?

@Amanieu
Copy link
Member

Amanieu commented Feb 9, 2021

I would recommend asking the compiler team on Zulip.

I think this would require defining a wrapper struct like this:

#[repr(simd_mask(4))]
struct i1x4(u8);

Then you would need to adjust the calling convention code to handle this type.

But I'm sure the compiler team will have a clearer idea of what would be required.

@kangshan1157
Copy link
Contributor Author

Thanks for all of your advices. You are right and we should call LLVM intrinsics to avoid the issues. Actually I asked the question in rust github several days ago. But they didn't give me the final answer. I will try to ask it on Zulip.
Moreover I will remove those asm implemented functions in this PR and recover them once i1x4 is supported by rustc.

@minybot
Copy link
Contributor

minybot commented Feb 10, 2021

Thanks for all of your advices. You are right and we should call LLVM intrinsics to avoid the issues. Actually I asked the question in rust github several days ago. But they didn't give me the final answer. I will try to ask it on Zulip.
Moreover I will remove those asm implemented functions in this PR and recover them once i1x4 is supported by rustc.
In some of my case to implement avx512f_avx512vl. LLVM use i1x2. I think rustc need to support i1 to solve those issues.

@kangshan1157
Copy link
Contributor Author

Thanks for all of your advices. You are right and we should call LLVM intrinsics to avoid the issues. Actually I asked the question in rust github several days ago. But they didn't give me the final answer. I will try to ask it on Zulip.
Moreover I will remove those asm implemented functions in this PR and recover them once i1x4 is supported by rustc.
In some of my case to implement avx512f_avx512vl. LLVM use i1x2. I think rustc need to support i1 to solve those issues.

OK. I will also mention this to them.
Moreover CI tests for x86_64-unknown-linux-gnu and i586-unkonwn-linux-gnu failed because of the error "thread 'core_arch::x86::avx512bf16::assert__mm512_maskz_cvtneps_pbh_vcvtneps2bf16' panicked at 'failed to find instruction vcvtneps2bf16 in the disassembly', crates/stdarch-test/src/lib.rs:148:9". How can I resolve this? I tried them on my test machine and they work normally.

@minybot
Copy link
Contributor

minybot commented Feb 10, 2021

Moreover CI tests for x86_64-unknown-linux-gnu and i586-unkonwn-linux-gnu failed because of the error "thread 'core_arch::x86::avx512bf16::assert__mm512_maskz_cvtneps_pbh_vcvtneps2bf16' panicked at 'failed to find instruction vcvtneps2bf16 in the disassembly', crates/stdarch-test/src/lib.rs:148:9". How can I resolve this? I tried them on my test machine and they work normally.

thread 'core_arch::x86::avx512bf16::assert__mm256_mask_dpbf16_ps_vdpbf16ps' panicked at 'failed to find instruction vdpbf16ps in the disassembly', crates/stdarch-test/src/lib.rs:148:9

When rustc tried to generate your code, it did not use vcvtne2ps2bf16 instruction.
---- core_arch::x86::avx512bf16::assert__mm512_cvtne2ps_pbh_vcvtne2ps2bf16 stdout ----
disassembly for stdarch_test_shim__mm512_cvtne2ps_pbh_vcvtne2ps2bf16:
0: lea 0x1fdfe8(%rip),%rax # 2edcdf <anon.e2d3528a6cfe9ff3fa80755a098d527b.13.llvm.18001095552927935278+0x34>
1: lea 0x611312(%rip),%rcx # 701010 <_ZN12stdarch_test11_DONT_DEDUP17h0d83dd6e7a766ff3E>
2: mov %rax,(%rcx)
3: (bad)
4: rol $0xf,%ebx
5: (bad)
6: test %al,(%rax)
7: add %al,(%rax)
8:
Usually, it is relating to your code error.
Do you try to compile your code to assembly?

@kangshan1157
Copy link
Contributor Author

Moreover CI tests for x86_64-unknown-linux-gnu and i586-unkonwn-linux-gnu failed because of the error "thread 'core_arch::x86::avx512bf16::assert__mm512_maskz_cvtneps_pbh_vcvtneps2bf16' panicked at 'failed to find instruction vcvtneps2bf16 in the disassembly', crates/stdarch-test/src/lib.rs:148:9". How can I resolve this? I tried them on my test machine and they work normally.

thread 'core_arch::x86::avx512bf16::assert__mm256_mask_dpbf16_ps_vdpbf16ps' panicked at 'failed to find instruction vdpbf16ps in the disassembly', crates/stdarch-test/src/lib.rs:148:9

When rustc tried to generate your code, it did not use vcvtne2ps2bf16 instruction.
---- core_arch::x86::avx512bf16::assert__mm512_cvtne2ps_pbh_vcvtne2ps2bf16 stdout ----
disassembly for stdarch_test_shim__mm512_cvtne2ps_pbh_vcvtne2ps2bf16:
0: lea 0x1fdfe8(%rip),%rax # 2edcdf <anon.e2d3528a6cfe9ff3fa80755a098d527b.13.llvm.18001095552927935278+0x34>
1: lea 0x611312(%rip),%rcx # 701010 <_ZN12stdarch_test11_DONT_DEDUP17h0d83dd6e7a766ff3E>
2: mov %rax,(%rcx)
3: (bad)
4: rol $0xf,%ebx
5: (bad)
6: test %al,(%rax)
7: add %al,(%rax)
8:
Usually, it is relating to your code error.
Do you try to compile your code to assembly?

Yes, I tried it on my test machine. They are generated correctly. It seems that vcvtneps2bf16 is not supported by the compiler in the CI environment.

@minybot
Copy link
Contributor

minybot commented Feb 10, 2021

You can remove i586 in .github/workflows/main.yml temporary for test.

Yes, I tried it on my test machine. They are generated correctly. It seems that vcvtneps2bf16 is not supported by the compiler in the CI environment.

@kangshan1157
Copy link
Contributor Author

I would recommend asking the compiler team on Zulip.

I think this would require defining a wrapper struct like this:

#[repr(simd_mask(4))]
struct i1x4(u8);

Then you would need to adjust the calling convention code to handle this type.

But I'm sure the compiler team will have a clearer idea of what would be required.

Hi

I would recommend asking the compiler team on Zulip.

I think this would require defining a wrapper struct like this:

#[repr(simd_mask(4))]
struct i1x4(u8);

Then you would need to adjust the calling convention code to handle this type.

But I'm sure the compiler team will have a clearer idea of what would be required.

Hi Amanieu, does bf16 related feature supported by the rust compiler in CI environment? The CI tests failed because failed to find instruction vdpbf16ps in the disassembly', crates/stdarch-test/src/lib.rs:148:9

@Amanieu
Copy link
Member

Amanieu commented Feb 10, 2021

The (bad) in the disassembly is actually because the disassembler is too old and doesn't recognize the instruction. LLVM is still generating the correct code.

@Amanieu
Copy link
Member

Amanieu commented Feb 10, 2021

Maybe try upgrading the Dockerfile from ubuntu:18.04 to ubuntu:20.04.

@kangshan1157
Copy link
Contributor Author

Maybe try upgrading the Dockerfile from ubuntu:18.04 to ubuntu:20.04.

How to upgrade it? Do I have the permission to do it?

@Amanieu
Copy link
Member

Amanieu commented Feb 10, 2021

Just modify the Dockerfile in ci/docker for all platforms where CI fails.

@Amanieu
Copy link
Member

Amanieu commented Feb 10, 2021

If the MSVC targets still give you trouble then you can disable the test for those targets in the source code by modifying the cfg_attr to exclude target_env = "msvc".

@kangshan1157
Copy link
Contributor Author

kangshan1157 commented Feb 10, 2021

Just modify the Dockerfile in ci/docker for all platforms where CI fails.

@Amanieu Thank you very much for your help.
For x86_64-unknown-linux-gnu-emulated target, as it doesn't allow skipping tests, those avx512bf16 related tests will fail. So I force to skip those tests through adding "SKIP_TESTS" environment variable. Now all of the CI tests can pass.

If we want to run those bf16 tests in CI, another emulator needs to be set up. From what I learned, only cooper lake supports avx512bf16. If we start the emulator through "sde64 -cpx" to simulate cooper lake, then the other tests such as vnni will fail. So I suggest to start a separate emulator for bf16 tests only if CI doesn't allow to skip any tests.

@Amanieu
Copy link
Member

Amanieu commented Feb 10, 2021

If we start the emulator through "sde64 -cpx" to simulate cooper lake, then the other tests such as vnni will fail.

I thought cooper lake did support vnni?

In any case, I'm happy to merge this if you don't think you can get the emulator working.

@kangshan1157
Copy link
Contributor Author

If we start the emulator through "sde64 -cpx" to simulate cooper lake, then the other tests such as vnni will fail.

I thought cooper lake did support vnni?

In any case, I'm happy to merge this if you don't think you can get the emulator working.

Sorry for my typo, cooper lake doesn't support vbmi,vbmi2, so those tests will fail. Could you please help to merge this PR? I will add a new emulator target for bf16 tests later.

@Amanieu Amanieu merged commit a06cb4c into rust-lang:master Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants