-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Add a lookup table op and a CUDA helper. #3620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1. finish lookup table CPU and GPU kernel 2. Add some cuda helper 3. Add some math funtor
paddle/framework/CMakeLists.txt
Outdated
| recurrent_op | ||
| uniform_random_op | ||
| gaussian_random_op | ||
| lookup_table_op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lookup_table_op and other ops defined here in framework/CMakeLists.txt are targets defined in operators/CMakeLists.txt. This implies a cyclic dependency --
framework and operators depend on each other. Can we remove this cyclic dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the solution is to move target pybind into a third package, say, paddle/pybind, in parallel with paddle/framework and paddle/operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do it.
| namespace functor { | ||
|
|
||
| template <typename T> | ||
| struct Set<platform::CPUPlace, T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name Set here is a little confusing -- it could refer to a data container and the assignment operation. If it is the latter case here, as a class name should be a noun, we might want Setter instead of Set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set is not needed. Use Eigen::Vector 's operator =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reyoung Done.
| @@ -0,0 +1,5 @@ | |||
| if(WITH_GPU) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we are creating a subdirectory and a namespace for holding a new class Set. Do we have a plan to add a lot of more functors in the near future? If so, I think it is OK; otherwise, the creation of a subdirectory looks an overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we have a subdirectory of paddle/operators/math. But the plain function is used in math subdirectory, instead of functor (or function object). I notice TensorFlow and WarpCTC use functors to write some common math functions. And the functors are also used in Paddle's function
subdirectory. The functors have some advantages, they can have state. And it is convenient to specialize in different data types compared with plain function (discussed with @hedaoyuan). We can see many duplicated codes in operators/math/math_function.cc and operators/math/math_function.cu for single float and double type.
So, maybe it's better to move math/math_function to functor/math_functor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some operations on tensor in our code internally. Now, the internal operations are in paddle/operators/math directory, and there are some global functions. If functor make codes cleaner, we can use functor instead of function(Just like TensorFlow does).
| @@ -0,0 +1,55 @@ | |||
| /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda_helper.h => cuda.h ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA also has a cuda.h header. In caffe2, the file with similar function is called common_gpu.h, and in TensorFlow, called cuda_kernel_helper.h.
|
|
||
| namespace paddle { | ||
| namespace platform { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this header is necessary.
Using thrust of Cuda can easily write an element-wise operator. and thrust is a header only library provided by cuda toolkit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this header, the atomicAdd for float and double type is defined. The code for double type is a little complex. I think we need a common file to write some basic and common usage for CUDA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reyoung remove set functor and remove CUDA_1D_KERNEL_LOOP.
| if (alpha == static_cast<T>(0)) { | ||
| memset(YData, 0, N * sizeof(T)); | ||
| } else { | ||
| framework::EigenVector<T, Eigen::RowMajor, Eigen::DenseIndex>::Flatten(*Y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always use Eigen.setConstant in header of operator kernel is OK.
memset is not always faster than std::copy or other carefully implemented setConstant or Copy method. So it is no need to special handle zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll modify the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/fill_zeros_like_op.h#L29
Both for CPU and GPU
t.device(EigenDeivce) = t.constant(T(scalar));
And Set can be a function in operators/math_function.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* add gfl model and PicoDet
Fix #3621
Add a lookup table operator, which is used as embedding in RNN network.
SparseRowtensor. And theTableProjectionin Paddle also uses a dense matrix to represent the table by default.SparseRowwill be supported in future.Add some CUDA helper.