Skip to content

Conversation

@kexinzhao
Copy link
Contributor

@kexinzhao kexinzhao commented Mar 13, 2018

fix #8816
fix #9021

numpy float16 is internally represented as numpy.uint16. Hence, we create a binding via the help of the uint16_t type.

@kexinzhao kexinzhao added the 预测 原名Inference,包含Capi预测问题等 label Mar 13, 2018
@kexinzhao kexinzhao requested review from QiJune, Xreki and dzhwinter March 13, 2018 07:19
@kexinzhao kexinzhao requested a review from jacquesqiao March 14, 2018 01:56
both input tensors to float16 data types if needed and use the float16
compute kernel to generate the output tensor also in float16 data type.
This attribute is by default false and normally would only be set to
true in inference stage for performance optimization.
Copy link
Member

@QiJune QiJune Mar 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Volta generation of GPUs introduces Tensor Cores, which provide 8x more throughput than single precision math pipelines. Each Tensor Core performs D = A x B + C, where A, B, C and D are matrices. A and B are half-precision 4x4 matrices, whereas D and C can be either half or single precision 4x4 matrices. In other words, Tensor Core math can accumulate half precision products into either single or half-precision outputs.

Read more at: http://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html#ixzz59ge7vMHO
Follow us: @gpucomputing on Twitter | NVIDIA on Facebook

It seems that the output of Tensor Cores can be either fp16 or fp32.

At now, we only take fp16 inference into consideration, are we?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all the operators support fp16 kernel, and the input data is fp16, our framework will choose fp16 kernel automatically.

I am not sure if we need this use_float16 attribute. At now, we have this attribute for MulOp, does we need to add this attribute for other operators, like SumOperator?

Or we can add a CastOperator first if the input data is fp32. And after the input data is casted to fp16, the reset operator will choose fp16 kernel automatically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At now, we will check the data type of the inputs of an operator, and enforce the data type to be the same. I am not sure what will happen if one input is fp16, and the other input is fp32.

Copy link
Contributor Author

@kexinzhao kexinzhao Mar 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the output of Tensor Cores can be either fp16 or fp32.
At now, we only take fp16 inference into consideration, are we?

For the purpose of using tensor core to calculate C = A * B,where A and B are both fp16, we have three ways to get the output C:

  1. If C is fp16, the we can use cublasHgemm and the compute type is fp16

  2. If C is fp16, we can also use cublasGemmEx(), and the compute type need to be set to fp32, meaning internally the computation is done in fp32.

  3. If C is fp32, we can only use cublasGemmEx(), and the compute type is set to fp32

Not sure which mode is the most computationally efficient one.

Right now, we only consider generate fp16 output. But introducing cublasGemmEx() and provide the option of fp16 gemm with fp32 output can be a future to do item.

Copy link
Contributor Author

@kexinzhao kexinzhao Mar 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At now, we will check the data type of the inputs of an operator, and enforce the data type to be the same. I am not sure what will happen if one input is fp16, and the other input is fp32.

Before the operator calls the compute kernel, it will compare the expected data type (via GetExpectedKernelType()) with the actual data type (via GetKernelTypeForVar()) for each input tensor and do data_type_transform (similar to cast op) if necessary.

The default GetExpectedKernelType() would indeed enforce the data type to be the same. That is why I override GetExpectedKernelType() so that we can deal with the situation where one input is fp16 and the other is fp32 (do data_type_transform if necessary by comparing tensor data_type to the expected data_type).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we need this use_float16 attribute. At now, we have this attribute for MulOp, does we need to add this attribute for other operators, like SumOperator?
Or we can add a CastOperator first if the input data is fp32. And after the input data is casted to fp16, the reset operator will choose fp16 kernel automatically.

Good point! After some thought, I agree we prefer not to add the use_float16 attribute so that we don't accept input tensors with different data type. We can use cast op to bridge different operators if needed. Adding use_float16 attribute will complicate the code and make it more error-prone.

actual = outs[idx]
actual_t = np.array(actual)
# paddle float16 is exposed to python as uint16 type
# reinterpret the memory as numpy.float16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just expose to python as float16 directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@QiJune QiJune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kexinzhao kexinzhao merged commit e26f112 into PaddlePaddle:develop Mar 15, 2018
@kexinzhao kexinzhao deleted the numpy_fp16_mul_op branch March 15, 2018 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

预测 原名Inference,包含Capi预测问题等

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bind paddle float16 with numpy float16 Add FP16 support for mul op

2 participants