Add fp16 mul op support and bind paddle fp16 to numpy fp16 #9017

kexinzhao · 2018-03-13T05:23:48Z

numpy float16 is internally represented as numpy.uint16. Hence, we create a binding via the help of the uint16_t type.

QiJune · 2018-03-14T02:36:58Z

paddle/fluid/operators/mul_op.cc

+              both input tensors to float16 data types if needed and use the float16 
+              compute kernel to generate the output tensor also in float16 data type. 
+              This attribute is by default false and normally would only be set to 
+              true in inference stage for performance optimization.


The Volta generation of GPUs introduces Tensor Cores, which provide 8x more throughput than single precision math pipelines. Each Tensor Core performs D = A x B + C, where A, B, C and D are matrices. A and B are half-precision 4x4 matrices, whereas D and C can be either half or single precision 4x4 matrices. In other words, Tensor Core math can accumulate half precision products into either single or half-precision outputs.

Read more at: http://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#ixzz59ge7vMHO
Follow us: @gpucomputing on Twitter | NVIDIA on Facebook

It seems that the output of Tensor Cores can be either fp16 or fp32.

At now, we only take fp16 inference into consideration, are we?

If all the operators support fp16 kernel, and the input data is fp16, our framework will choose fp16 kernel automatically.

I am not sure if we need this use_float16 attribute. At now, we have this attribute for MulOp, does we need to add this attribute for other operators, like SumOperator?

Or we can add a CastOperator first if the input data is fp32. And after the input data is casted to fp16, the reset operator will choose fp16 kernel automatically.

At now, we will check the data type of the inputs of an operator, and enforce the data type to be the same. I am not sure what will happen if one input is fp16, and the other input is fp32.

It seems that the output of Tensor Cores can be either fp16 or fp32.
At now, we only take fp16 inference into consideration, are we?

For the purpose of using tensor core to calculate C = A * B，where A and B are both fp16, we have three ways to get the output C:

If C is fp16, the we can use cublasHgemm and the compute type is fp16

If C is fp16, we can also use cublasGemmEx(), and the compute type need to be set to fp32, meaning internally the computation is done in fp32.

If C is fp32, we can only use cublasGemmEx(), and the compute type is set to fp32

Not sure which mode is the most computationally efficient one.

Right now, we only consider generate fp16 output. But introducing cublasGemmEx() and provide the option of fp16 gemm with fp32 output can be a future to do item.

At now, we will check the data type of the inputs of an operator, and enforce the data type to be the same. I am not sure what will happen if one input is fp16, and the other input is fp32.

Before the operator calls the compute kernel, it will compare the expected data type (via GetExpectedKernelType()) with the actual data type (via GetKernelTypeForVar()) for each input tensor and do data_type_transform (similar to cast op) if necessary.

The default GetExpectedKernelType() would indeed enforce the data type to be the same. That is why I override GetExpectedKernelType() so that we can deal with the situation where one input is fp16 and the other is fp32 (do data_type_transform if necessary by comparing tensor data_type to the expected data_type).

I am not sure if we need this use_float16 attribute. At now, we have this attribute for MulOp, does we need to add this attribute for other operators, like SumOperator?
Or we can add a CastOperator first if the input data is fp32. And after the input data is casted to fp16, the reset operator will choose fp16 kernel automatically.

Good point! After some thought, I agree we prefer not to add the use_float16 attribute so that we don't accept input tensors with different data type. We can use cast op to bridge different operators if needed. Adding use_float16 attribute will complicate the code and make it more error-prone.

QiJune · 2018-03-14T02:40:15Z

python/paddle/fluid/tests/unittests/op_test.py

                    actual = outs[idx]
                    actual_t = np.array(actual)
+                    # paddle float16 is exposed to python as uint16 type
+                    # reinterpret the memory as numpy.float16


Why not just expose to python as float16 directly?

QiJune

LGTM！

kexinzhao added 11 commits March 12, 2018 14:01

add fp16 mul op support

5d8fbf2

small fix

2e98ec4

fix bug

41c97e5

small fix

50eb122

fix PADDLE_WITH_CUDA compiling issue

78e1145

reorg code

e23c467

test for pybind

7f52a42

treate as float16 as uint16_t in pybind

3376339

bind np.float16 to paddle float16

6c5771f

small fix

62d3b66

clean code

cc4614b

kexinzhao mentioned this pull request Mar 13, 2018

Add float16 mul op support #9011

Closed

kexinzhao added the 预测原名Inference，包含Capi预测问题等 label Mar 13, 2018

remove redundancy

3689e5a

kexinzhao requested review from QiJune, Xreki and dzhwinter March 13, 2018 07:19

fix mul_op test

de8c179

kexinzhao requested a review from jacquesqiao March 14, 2018 01:56

QiJune reviewed Mar 14, 2018

View reviewed changes

kexinzhao added 3 commits March 14, 2018 15:27

address comments

3b2203a

small fix

41e2af3

add is_float16_supported func

ab17134

QiJune approved these changes Mar 15, 2018

View reviewed changes

kexinzhao merged commit e26f112 into PaddlePaddle:develop Mar 15, 2018

kexinzhao deleted the numpy_fp16_mul_op branch March 15, 2018 02:42

zhiqiu mentioned this pull request Sep 25, 2019

Update Tensor.set() to support float16 #19964

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add fp16 mul op support and bind paddle fp16 to numpy fp16 #9017

Add fp16 mul op support and bind paddle fp16 to numpy fp16 #9017

Uh oh!

kexinzhao commented Mar 13, 2018 •

edited

Loading

Uh oh!

QiJune Mar 14, 2018 •

edited

Loading

Uh oh!

QiJune Mar 14, 2018

Uh oh!

QiJune Mar 14, 2018

Uh oh!

kexinzhao Mar 14, 2018 •

edited

Loading

Uh oh!

kexinzhao Mar 14, 2018 •

edited

Loading

Uh oh!

kexinzhao Mar 14, 2018

Uh oh!

QiJune Mar 14, 2018

Uh oh!

kexinzhao Mar 14, 2018

Uh oh!

QiJune left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add fp16 mul op support and bind paddle fp16 to numpy fp16 #9017

Add fp16 mul op support and bind paddle fp16 to numpy fp16 #9017

Uh oh!

Conversation

kexinzhao commented Mar 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

QiJune Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QiJune Mar 14, 2018

Choose a reason for hiding this comment

Uh oh!

QiJune Mar 14, 2018

Choose a reason for hiding this comment

Uh oh!

kexinzhao Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kexinzhao Mar 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kexinzhao Mar 14, 2018

Choose a reason for hiding this comment

Uh oh!

QiJune Mar 14, 2018

Choose a reason for hiding this comment

Uh oh!

kexinzhao Mar 14, 2018

Choose a reason for hiding this comment

Uh oh!

QiJune left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kexinzhao commented Mar 13, 2018 •

edited

Loading

QiJune Mar 14, 2018 •

edited

Loading

kexinzhao Mar 14, 2018 •

edited

Loading

kexinzhao Mar 14, 2018 •

edited

Loading