-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Enable AVX512 kernels when compiling with nvc (NVIDIA HPC C compiler) #4162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks. Based on an older OpenMPI ticket I found, ths might be from release 22.2 onwards - unfortunately the official release notes are too brief to allow me to verify this easily |
22.3 it is, 22.2 still stumbled. Will apply the updated patch later today. |
Terrific! Thank you! FYI - I have found bugs in our current release when compiling the CASUM and ZASUM kernels for SKYLAKEX. I have an open bug report on this internally. As soon as I can root cause the issue, I will try to get our developers to fix it for the next release. These will show up as wrong answers in the test suite when run on a system that supports AVX512. For our internal testing, I have temporarily disabled these kernels in the KERNEL.SKYLAKEX file. |
Oops - are they related to the microkernels that your patch enables (as an aside, I notice the casum one is not modified by it) ? And can they be worked around by reducing the optimization level with a suitable pragma ? Certainly do not want to produce wrong answers here, whether in the testsuite or in user code... |
I'm not sure yet. Unfortunately, I found this a few weeks ago, but I have been busy with other priorities and have not gotten back to looking at it again until this week. I'll run some experiments and let you know what I find. Don't change anything on your end just yet until I have a better handle on this. |
Here is the symptom: Test of subprogram number 7 CBLAS_DZASUM CASE N INCX INCY MODE I COMP(I) TRUE(I) DIFFERENCE SIZE(I)
|
Just tried lowering the opt level to -O1 for these two kernels - did not help. I'll dive in and see what's going on here. |
Building everything with |
@martin-frbg - thanks for the feedback on the docs, I'll pass that along to our docs person. I've traced the problem to our optimizer miscompiling the casum_microk_skylake-2.c and zasum_microk_skylake-2.c files. We are not loading the correct values into the XMM registers in the case where n = 2. (And probably the others, but that is the case I'm focused on at the moment.) I'm writing up my findings in my internal bug report on this issue. I'm hoping we can get this fixed for the 23.9 release, which is due out in September. But I'll let you know once I confirm that. |
Hmm, I'm currently playing with this again, and just disabling the casum microkernel did not appear to fix the problem for me. I'll see if I forgot to revert some of the previous changes.. |
So the issue appears to be that nvc actually defines |
Oh interesting. Wondering if we recently removed this in our development branch? I'll have to do some further investigation. Sorry for the trouble. |
So I think I understand what's going on here now. nvc defines I have a simple program that dumps out the definition of
On an Ubuntu 18.04 system, gcc-7.4.0 is the system default. So this program dumps
However, our build system is CentOS 7, for maximum backward-compatibility. gcc-4.8.5 is the system default GCC on this distribution:
However, even if I bring a newer GCC into the $PATH, nvc still refers to the system default version:
This explains why the kernels were not enabled on our build system. So I'm thinking testing on |
Sorry for the bad formatting above - can't figure out how to make github make it look right here. |
You can also use |
Thanks - edited the above comment to make it more readable. |
Will have to check if e.g. the Intel compiler defines |
so current Intel defines |
Interestingly, a similar miscompilation of the casum/zasum microkernels appears to be present in the current release candidate of LLVM17 as well. |
It seems NVHPC 23.09 still has this issue, the lapack testcase has a huge number of failures unless I replace 2309 by e.g. 2311 or some other higher number. |
Hmm that's bad. 23.11 just came out, I'll test ASAP |
I just tested 23.11 and it still has the exact same issue.
|
Me too but dinnertime so not allowed too push a PR right away :) |
In the end I think the compiler (and icx as well, also LLVM-based) simply optimized out the code since it was dodgy: |
I have a preliminary patch to a bunch of the SKYLAKEX (AVX512) kernels that enables them to be compiled with nvc, the NVIDIA HPC C compiler. Without this patch, OpenBLAS runs about 2x as slow on AVX512 systems when compiled with nvc versus gcc or clang.
Unfortunately, I do not know offhand the exact release of nvc in which the AVX512 intrinsics were enabled, so if you want to further refine this patch to guard a specific nvc version or later, I'll have to do some more investigation. This support has been in our compilers for a while, at least a year or two.
I can follow up with further info if needed.
nvc.patch
The text was updated successfully, but these errors were encountered: