Skip to content

Add bfloat16 based dot and conversion with single/double #2796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Guobing-Chen
Copy link
Contributor

  1. Added bfloat16 based dot as new API: shdot
  2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
  3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
    shstobf16 -- convert single float array to bfloat16 array
    shdtobf16 -- convert double float array to bfloat16 array
    sbf16tos -- convert bfloat16 array to single float array
    dbf16tod -- convert bfloat16 array to double float array
  4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
  5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
  6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
  7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t

@Guobing-Chen
Copy link
Contributor Author

Guobing-Chen commented Aug 27, 2020

Some more information about the 4 conversion APIs in this PR:
OpenBLAS needs these conversion APIs as we should help our users to better use OpenBLAs's bfloat16 BLAS APIs. There are no any libs in community that provide capability to convert bfloat16 from/to single/double, we cann't just leave it to our users to do the implementation, instead we can provide a professional implementation with best performance.
Another reason that we need these conversion APIs is that for those platforms not support bfloat16 hardware support, the fall-backed generic kernels need these conversion APIs to convert back to single data type and then calculating with s based BLAS APIs.

1. Added bfloat16 based dot as new API: shdot
2. Implemented generic kernel and cooperlake-specific (AVX512-BF16) kernel for shdot
3. Added 4 conversion APIs for bfloat16 data type <=> single/double: shstobf16 shdtobf16 sbf16tos dbf16tod
     shstobf16 -- convert single float array to bfloat16 array
     shdtobf16 -- convert double float array to bfloat16 array
     sbf16tos  -- convert bfloat16 array to single float array
     dbf16tod  -- convert bfloat16 array to double float array
4. Implemented generic kernels for all 4 conversion APIs, and cooperlake-specific kernel for shstobf16 and shdtobf16
5. Update level1 thread facilitate functions and macros to support multi-threading for these new APIs
6. Fix Cooperlake platform detection/specify issue when under dynamic-arch building
7. Change the typedef of bfloat16 from unsigned short to more strict uint16_t

Signed-off-by: Chen, Guobing <[email protected]>
@Guobing-Chen Guobing-Chen force-pushed the BF16_dot_coversion_apis branch from 3e7c5e5 to deaeb6c Compare September 4, 2020 01:47
@conradsnicta
Copy link

conradsnicta commented Sep 7, 2020

I'm not sure it's a good idea to call "bfloat16 based dot" as shdot. Perhaps bf16dot would be more appropriate.

The sh prefix in shdot indicates (or strongly suggests) "half single precision". The bfloat16 type is certainly not "half single precision". Instead, it's a mutated version of fp32.

The proper "half single precision" type is already defined in the IEEE 754-2008 standard: https://en.wikipedia.org/wiki/Half-precision_floating-point_format. This is not the same as bfloat16.

Related discussion: #2767. Extract from comment by @mgates3: "... we're proposing to move away from single letter precisions, so likely will never standardize on h for half" and "... differentiating IEEE half and bfloat16 half with one letter is confusing".

@martin-frbg
Copy link
Collaborator

If you are quoting from #2767, you will have seen how the SH.. names came to be. Please comment there if you are not satisfied with Edelsohn's dictum.

@martin-frbg
Copy link
Collaborator

Holding back on this one as it appears to have caused some collateral damage (SIGSEGV in regular single precision) on ppc.

@Guobing-Chen
Copy link
Contributor Author

Holding back on this one as it appears to have caused some collateral damage (SIGSEGV in regular single precision) on ppc.

Anything I can do to help? Unfortunately that I don't have a PPC machine to reproduce.

@martin-frbg
Copy link
Collaborator

Already trying to track it down in the unicamp OpenPOWER minicloud, just wanted to let you know that it has nothing to do with the ongoing discussion about names.

@Guobing-Chen
Copy link
Contributor Author

Well received, thanks much for the information.

@martin-frbg
Copy link
Collaborator

Seems to have been unrelated breakage in my build environment, sorry for the delay.

@martin-frbg martin-frbg added this to the 0.3.11 milestone Sep 14, 2020
@martin-frbg martin-frbg merged commit 91c84e1 into OpenMathLib:develop Sep 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants