-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Add topk op #3760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add topk op #3760
Conversation
@@ -16,6 +16,7 @@ py_test(test_cross_entropy_op SRCS test_cross_entropy_op.py) | |||
py_test(test_gather_op SRCS test_gather_op.py) | |||
py_test(test_scatter_op SRCS test_scatter_op.py) | |||
py_test(test_fill_zeros_like_op SRCS test_fill_zeros_like_op.py) | |||
py_test(test_top_k_op SRCS test_top_k_op.py) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not find file test_top_k_op.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added.
I'm looking into http://www.math.grin.edu/%7Eblanchaj/Research/ABGS_KSelection.pdf for some details of the performance of topk kernel. |
paddle/operators/top_k_op.cc
Outdated
limitations under the License. */ | ||
|
||
#include "paddle/operators/top_k_op.h" | ||
#include <iostream> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this line.
paddle/operators/top_k_op.cc
Outdated
R"DOC(If the input is a vector (1d tensor), finds the k largest entries in the vector and outputs their values and indices as vectors. Thus values[j] is the j-th largest entry in input, and its index is indices[j]. | ||
|
||
For matrices, computes the top k entries in each row. )DOC"); | ||
AddAttr<AttrType>("k", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddAttr<int>( "k",
The template <typename AttrType>
is not needed here.
paddle/operators/top_k_op.cu
Outdated
} | ||
|
||
T v_; | ||
int id_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The member variables in structs and classes have different naming rules. There is no _
https://google.github.io/styleguide/cppguide.html#Variable_Names
paddle/operators/top_k_op.cu
Outdated
auto* input = ctx.Input<Tensor>("X"); | ||
auto* output = ctx.Output<Tensor>("Out"); | ||
auto* indices = ctx.Output<Tensor>("Indices"); | ||
size_t k = static_cast<AttrType>(ctx.op_.GetAttr<AttrType>("k")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above comments, the typename AttrType = int
is not necessary.
paddle/operators/top_k_op.cu
Outdated
void Compute(const framework::ExecutionContext& ctx) const override { | ||
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()), | ||
"It must use GPUPlace."); | ||
std::cout << "in TopkOpCUDAKernel" << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this line.
paddle/operators/top_k_op.cu
Outdated
output->mutable_data<T>(ctx.GetPlace()); | ||
indices->mutable_data<int>(ctx.GetPlace()); | ||
T* output_data = output->data<T>(); | ||
int* indices_data = indices->data<int>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 296 to line 299 can be write as follows:
T* output_data = output->mutable_data<T>(ctx.GetPlace());
int* indices_data =indices->mutable_data<int>(ctx.GetPlace());
dim3 threads(256, 1); | ||
dim3 grid(input_height, 1); | ||
|
||
KeMatrixTopK<T, 5, 256><<<grid, threads>>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the cuda stream can be got by reinterpret_cast<platform::CUDADeviceContext*>(ctx.device_context())->stream()
, so
auto stream = reinterpret_cast<platform::CUDADeviceContext*>(ctx.device_context())->stream();
KeMatrixTopK<T, 5, 256><<<grid, threads, 0, stream>>>(
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx.device_context()
returns const
type currently, can not be reinterpret_cast
ed currently?
paddle/operators/top_k_op.h
Outdated
auto* output = ctx.Output<Tensor>("Out"); | ||
auto* indices = ctx.Output<Tensor>("Indices"); | ||
// k is determined by Attr | ||
const size_t k = static_cast<AttrType>(ctx.op_.GetAttr<AttrType>("k")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the typename AttrType = int
, same problem with above comments.
paddle/operators/top_k_op.h
Outdated
|
||
auto X = EigenMatrix<T>::From(*input); | ||
auto Out = EigenMatrix<T>::From(*output); | ||
auto Indices = EigenMatrix<T>::From(*indices); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using the std::partial_sort
and not using the function in Eigen, there is no need to convert to Eigen tensor.
paddle/operators/top_k_op.h
Outdated
// TODO(typhoonzero): make this more efficient | ||
std::vector<std::pair<T, size_t>> vec; | ||
for (size_t j = 0; j < col; j++) { | ||
vec.push_back(std::pair<T, size_t>(X(i, j), j)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use the plain pointer to get data and there is no need to use Eigen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow comments, all done. Please review again, Thanks very much!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some minor problems
paddle/operators/top_k_op.cc
Outdated
PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"), | ||
"Input of TopkOP must be initialized."); | ||
auto *input = ctx.Input<framework::Tensor>("X"); | ||
const int k = static_cast<int>(ctx.op().GetAttr<int>("k")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctx.op().GetAttr ->ctx.GetAttr.
if merged after this PR: #3903
please use Attr.
paddle/operators/top_k_op.cc
Outdated
// input must have >= 1d shape. | ||
PADDLE_ENFORCE_GE(input->dims().size(), 1); | ||
// input must have >= k columns. | ||
PADDLE_ENFORCE_GE(input->dims()[input->dims().size() - 1], k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need comments in PADDLE_ENFORCE_GE
paddle/operators/top_k_op.cu
Outdated
dim3 grid(input_height, 1); | ||
|
||
// auto stream = reinterpret_cast<platform::CUDADeviceContext*>( | ||
// ctx.device_context())->stream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle/operators/top_k_op.h
Outdated
framework::slice_ddim(inputdims, 0, inputdims.size() - 1)); | ||
const size_t col = inputdims[inputdims.size() - 1]; | ||
Eigen::DSizes<int, 2> flat2dims(row, col); | ||
X.reshape(flat2dims); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X -> x. all variable names should be lowercase.
const size_t row = framework::product( | ||
framework::slice_ddim(inputdims, 0, inputdims.size() - 1)); | ||
const size_t col = inputdims[inputdims.size() - 1]; | ||
Eigen::DSizes<int, 2> flat2dims(row, col); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我觉得通常是: row = inputdims[0], col = product(inputdims[1: ...])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这应该是两种不同的flattern的方法,我提过一个issue:https://github.com/PaddlePaddle/Paddle/issues/3771, 参考TF的实现是这样的。
public: | ||
void Compute(const framework::ExecutionContext& ctx) const override { | ||
// Get the top k elements of each row of input tensor | ||
// FIXME: only deal with matrix(2d tensor). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
实际上我觉得当前可以只支持rank=2的,用户如果需要可以TopK之前加reshap op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯嗯,同感可以先merge后续增加。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add cuda stream in next PR.
Fix #3759
Still got problems when using a 3d tensor input.