Skip to content

Add topk op #3760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Sep 7, 2017
Merged

Add topk op #3760

merged 19 commits into from
Sep 7, 2017

Conversation

typhoonzero
Copy link
Contributor

Fix #3759

Still got problems when using a 3d tensor input.

@@ -16,6 +16,7 @@ py_test(test_cross_entropy_op SRCS test_cross_entropy_op.py)
py_test(test_gather_op SRCS test_gather_op.py)
py_test(test_scatter_op SRCS test_scatter_op.py)
py_test(test_fill_zeros_like_op SRCS test_fill_zeros_like_op.py)
py_test(test_top_k_op SRCS test_top_k_op.py)
Copy link
Member

@jacquesqiao jacquesqiao Aug 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not find file test_top_k_op.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added.

@typhoonzero
Copy link
Contributor Author

I'm looking into http://www.math.grin.edu/%7Eblanchaj/Research/ABGS_KSelection.pdf for some details of the performance of topk kernel.

limitations under the License. */

#include "paddle/operators/top_k_op.h"
#include <iostream>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line.

R"DOC(If the input is a vector (1d tensor), finds the k largest entries in the vector and outputs their values and indices as vectors. Thus values[j] is the j-th largest entry in input, and its index is indices[j].

For matrices, computes the top k entries in each row. )DOC");
AddAttr<AttrType>("k",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddAttr<int>( "k",

The template <typename AttrType> is not needed here.

}

T v_;
int id_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The member variables in structs and classes have different naming rules. There is no _

https://google.github.io/styleguide/cppguide.html#Variable_Names

auto* input = ctx.Input<Tensor>("X");
auto* output = ctx.Output<Tensor>("Out");
auto* indices = ctx.Output<Tensor>("Indices");
size_t k = static_cast<AttrType>(ctx.op_.GetAttr<AttrType>("k"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comments, the typename AttrType = int is not necessary.

void Compute(const framework::ExecutionContext& ctx) const override {
PADDLE_ENFORCE(platform::is_gpu_place(ctx.GetPlace()),
"It must use GPUPlace.");
std::cout << "in TopkOpCUDAKernel" << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line.

output->mutable_data<T>(ctx.GetPlace());
indices->mutable_data<int>(ctx.GetPlace());
T* output_data = output->data<T>();
int* indices_data = indices->data<int>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 296 to line 299 can be write as follows:

T* output_data  = output->mutable_data<T>(ctx.GetPlace());
int* indices_data =indices->mutable_data<int>(ctx.GetPlace());

dim3 threads(256, 1);
dim3 grid(input_height, 1);

KeMatrixTopK<T, 5, 256><<<grid, threads>>>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the cuda stream can be got by reinterpret_cast<platform::CUDADeviceContext*>(ctx.device_context())->stream(), so

auto stream = reinterpret_cast<platform::CUDADeviceContext*>(ctx.device_context())->stream();
KeMatrixTopK<T, 5, 256><<<grid, threads, 0, stream>>>(
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.device_context() returns const type currently, can not be reinterpret_casted currently?

auto* output = ctx.Output<Tensor>("Out");
auto* indices = ctx.Output<Tensor>("Indices");
// k is determined by Attr
const size_t k = static_cast<AttrType>(ctx.op_.GetAttr<AttrType>("k"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the typename AttrType = int, same problem with above comments.


auto X = EigenMatrix<T>::From(*input);
auto Out = EigenMatrix<T>::From(*output);
auto Indices = EigenMatrix<T>::From(*indices);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using the std::partial_sort and not using the function in Eigen, there is no need to convert to Eigen tensor.

// TODO(typhoonzero): make this more efficient
std::vector<std::pair<T, size_t>> vec;
for (size_t j = 0; j < col; j++) {
vec.push_back(std::pair<T, size_t>(X(i, j), j));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use the plain pointer to get data and there is no need to use Eigen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow comments, all done. Please review again, Thanks very much!

Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor problems

PADDLE_ENFORCE_NOT_NULL(ctx.InputVar("X"),
"Input of TopkOP must be initialized.");
auto *input = ctx.Input<framework::Tensor>("X");
const int k = static_cast<int>(ctx.op().GetAttr<int>("k"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx.op().GetAttr ->ctx.GetAttr.

if merged after this PR: #3903
please use Attr.

// input must have >= 1d shape.
PADDLE_ENFORCE_GE(input->dims().size(), 1);
// input must have >= k columns.
PADDLE_ENFORCE_GE(input->dims()[input->dims().size() - 1], k);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need comments in PADDLE_ENFORCE_GE

dim3 grid(input_height, 1);

// auto stream = reinterpret_cast<platform::CUDADeviceContext*>(
// ctx.device_context())->stream();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

framework::slice_ddim(inputdims, 0, inputdims.size() - 1));
const size_t col = inputdims[inputdims.size() - 1];
Eigen::DSizes<int, 2> flat2dims(row, col);
X.reshape(flat2dims);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X -> x. all variable names should be lowercase.

const size_t row = framework::product(
framework::slice_ddim(inputdims, 0, inputdims.size() - 1));
const size_t col = inputdims[inputdims.size() - 1];
Eigen::DSizes<int, 2> flat2dims(row, col);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我觉得通常是: row = inputdims[0], col = product(inputdims[1: ...])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这应该是两种不同的flattern的方法,我提过一个issue:https://github.com/PaddlePaddle/Paddle/issues/3771, 参考TF的实现是这样的。

public:
void Compute(const framework::ExecutionContext& ctx) const override {
// Get the top k elements of each row of input tensor
// FIXME: only deal with matrix(2d tensor).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

实际上我觉得当前可以只支持rank=2的,用户如果需要可以TopK之前加reshap op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯嗯,同感可以先merge后续增加。

@typhoonzero typhoonzero changed the title [WIP]Add topk op Add topk op Sep 7, 2017
Copy link
Contributor

@qingqing01 qingqing01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add cuda stream in next PR.

@typhoonzero typhoonzero merged commit 3fbb692 into PaddlePaddle:develop Sep 7, 2017
@typhoonzero typhoonzero deleted the top_k_op branch December 22, 2017 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants