Correctly use host_vector in LoDTensor and expose LoDTensor to Python. #4001

qingqing01 · 2017-09-11T04:22:26Z

Correctly use host_vector
- If use the default allocator in host_vector, the LoD information can not be accessed in CUDA kernel. So use the thrust::system::cuda::experimental::pinned_allocator<T> allocator.
- Add unit testing lod_tensor_test.cu to test LoDTensor for GPU.
Expose LoDTensor to Python.
- Expose LoDTensor in pybind.
- Add unit testing.

… lod_tensor_py

Superjomn · 2017-09-11T05:16:01Z

paddle/pybind/pybind.cc

+             self.set_lod(lod);
+#else
+             paddle::framework::LoD new_lod;
+             new_lod.reserve(lod.size());


expose other interfaces like, SliceLevels, SliceInLevel, NumLevels, NumElements, these interfaces may be used by user too.

What's more, LODSliceInLevel, LoDSliceLevels may be operators also, which may be used in RNN or some other Ops. These can be new issues.

SliceLevels, SliceInLevel, NumLevels, NumElements

Now, all these interfaces are used in C++ code. Just like Tensor, many interfaces are not exposed. I prefer to add these interfaces in the future if they are really needed in Python.

Superjomn · 2017-09-11T05:20:32Z

paddle/pybind/pybind.cc

+call the set_tensor and set_lod functions to set them.
+
+)DOC")
+      .def("set_tensor",


these two interfaces are too trivial, add another construction function may be better, __init__(lod, tensor) because LoDTensor is a complete concept, not a union of two different concepts (though implementation is, but that should be hidden from user)

Yes, in Paddle, we have a DataConverter to convert user's data into argument.
Whether should we implement a DataConverter or just expose a user-friendly interface in LoDTensor.

@Superjom Add __init__ construction function in pybind. But I think the set_tensor and set_lod is still needed. Since we usually get empty LoDTensor from the variable, then we need to set tensor and lod for this LoDTensor, and some times we only need to set tensor for the case without lod info. So the set_tensor and set_lod are stilled reserved.

@QiJune You are right. We should provide converter to get lod for the sequence inputs. The inputs are a list of list or a list of numpy array. Maybe it's relatively convenient to convert in Python API instead of pybind or C++ code. @reyoung @Superjom

reyoung · 2017-09-11T05:27:23Z

paddle/framework/lod_tensor.h

 #else
 template <typename T>
-using Vector = thrust::host_vector<T>;
+using Vector = thrust::host_vector<


just thrust::device_vector is OK.

Yes, I test the thrust::device_vector, it also can be used. But how to get the raw pointer is different for CPU and GPU.

For example:

std::vector<thrust::device_vector<size_t>> lod = {{0, 2 ,4}} // get the pointer in CPU. size_t* cpu_ptr = lod[0].data() // get the pointer for GPU kernel size_t* gpu_ptr = thrust::raw_pointer_cast(lod[0].data())

Both the two methods can satisfy our demands. But for the different usage for CPU and GPU, I prefer to use thrust::system::cuda::experimental::pinned_allocator<T>, I can change it to thrust::device_vector in the future if there is any question.

Why should we need the raw pointer? Maybe just to use thrust::transform is OK

@reyoung For the sequence operator, we need to use the lod as the input argument of CUDA kernel. The raw pointer (size_t*) is needed. I'm not sure that the thrust::transform can be always used for any kernel.

QiJune · 2017-09-11T05:34:59Z

paddle/pybind/pybind.cc

+call the set_tensor and set_lod functions to set them.
+
+)DOC")
+      .def("set_tensor",


Yes, in Paddle, we have a DataConverter to convert user's data into argument.
Whether should we implement a DataConverter or just expose a user-friendly interface in LoDTensor.

QiJune · 2017-09-11T05:48:53Z

python/paddle/v2/framework/tests/test_tensor.py

+    def test_float_lod_tensor(self):
+        scope = core.Scope()
+        var = scope.new_var("test_tensor")
+        var_lod = scope.new_var("test_lod_tensor")


I am wondering do we need two variables to represent a LodTensor in operators, and how tensor memory is managed.

I try to implement infer shape in add two operator:

void InferShape(const framework::InferShapeContext &ctx) const override { auto tensor_x = ctx.Input<LoDTensor>("X")->tensor(); auto tensor_y = ctx.Input<LoDTensor>("Y")->tensor(); PADDLE_ENFORCE_EQ(tensor_x.dims(), tensor_y.dims(), "Two input of Add Op's dimension must be same."); auto* lod_tensor_out = ctx.Output<LodTensor>("Out"); Tensor* out = new Tensor(); out->Resize(tensor_x->dims()); lod_tensor_out->set_tensor(out); } };

It's hard to know when to delete the pointer out.

It seems that it's better to manage the tensor memory in LoDTensor. Or we have to use shared_ptr in LodTensor.

class LodTensor { shared_ptr<Tensor> tensor_; Lod lod_; };

@QiJune Good question. Maybe class LodTensor : public Tensor { } is betther to avoid use std::unique_ptr or std::shared_ptr ?

If using std::unique_ptr, we can fix it in another PR. If using class LodTensor : public Tensor { }, I can change in this PR.

yes, tensor should be a part of LODTensor.

a raw pointer with a bool flag indicating whether this LoDTensor owns the memory seems better.

If some else what to share this Tensor, just copy the pointer.

@Superjom @QiJune @reyoung This problem will be fixed in next PR.

… lod_tensor_py

reyoung

LGTM, except I do not think we should get the raw pointer from LoD information.
So maybe device_vector is OK. It is up to you.

qingqing01 added 4 commits September 7, 2017 15:29

Expose LoDTensor to pybind.

bea8212

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

5f90524

… lod_tensor_py

Correctly use host_vector in LoDTensor and expose LoDTensor to Python.

d0dbc06

update and fix conflicts.

372ede1

qingqing01 requested review from QiJune, Superjomn, hedaoyuan, reyoung and wangkuiyi September 11, 2017 04:53

Remove redundant code in lod_tensor.h

e75aab3

Superjomn reviewed Sep 11, 2017

View reviewed changes

reyoung reviewed Sep 11, 2017

View reviewed changes

QiJune reviewed Sep 11, 2017

View reviewed changes

qingqing01 added 2 commits September 11, 2017 18:38

Add construction function for LoDTensor in pybind.

68943f5

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

28dc434

… lod_tensor_py

qingqing01 added the OpPorting label Sep 11, 2017

reyoung approved these changes Sep 11, 2017

View reviewed changes

qingqing01 merged commit 6d0d29f into PaddlePaddle:develop Sep 12, 2017

qingqing01 deleted the lod_tensor_py branch November 14, 2019 05:20

Correctly use host_vector in LoDTensor and expose LoDTensor to Python. #4001

Correctly use host_vector in LoDTensor and expose LoDTensor to Python. #4001

Uh oh!

Conversation

qingqing01 commented Sep 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qingqing01 Sep 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qingqing01 Sep 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QiJune Sep 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qingqing01 Sep 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

reyoung left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qingqing01 commented Sep 11, 2017 •

edited

Loading

qingqing01 Sep 11, 2017 •

edited

Loading

qingqing01 Sep 11, 2017 •

edited

Loading

QiJune Sep 11, 2017 •

edited

Loading

qingqing01 Sep 11, 2017 •

edited

Loading