Skip to content

Conversation

@qingqing01
Copy link
Contributor

@qingqing01 qingqing01 commented Sep 11, 2017

Fix #3883

  • Correctly use host_vector

    • If use the default allocator in host_vector, the LoD information can not be accessed in CUDA kernel. So use the thrust::system::cuda::experimental::pinned_allocator<T> allocator.
    • Add unit testing lod_tensor_test.cu to test LoDTensor for GPU.
  • Expose LoDTensor to Python.

    • Expose LoDTensor in pybind.
    • Add unit testing.

self.set_lod(lod);
#else
paddle::framework::LoD new_lod;
new_lod.reserve(lod.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expose other interfaces like, SliceLevels, SliceInLevel, NumLevels, NumElements, these interfaces may be used by user too.

What's more, LODSliceInLevel, LoDSliceLevels may be operators also, which may be used in RNN or some other Ops. These can be new issues.

Copy link
Contributor Author

@qingqing01 qingqing01 Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SliceLevels, SliceInLevel, NumLevels, NumElements

Now, all these interfaces are used in C++ code. Just like Tensor, many interfaces are not exposed. I prefer to add these interfaces in the future if they are really needed in Python.

call the set_tensor and set_lod functions to set them.
)DOC")
.def("set_tensor",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two interfaces are too trivial, add another construction function may be better, __init__(lod, tensor) because LoDTensor is a complete concept, not a union of two different concepts (though implementation is, but that should be hidden from user)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in Paddle, we have a DataConverter to convert user's data into argument.
Whether should we implement a DataConverter or just expose a user-friendly interface in LoDTensor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Superjom Add __init__ construction function in pybind. But I think the set_tensor and set_lod is still needed. Since we usually get empty LoDTensor from the variable, then we need to set tensor and lod for this LoDTensor, and some times we only need to set tensor for the case without lod info. So the set_tensor and set_lod are stilled reserved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QiJune You are right. We should provide converter to get lod for the sequence inputs. The inputs are a list of list or a list of numpy array. Maybe it's relatively convenient to convert in Python API instead of pybind or C++ code. @reyoung @Superjom

#else
template <typename T>
using Vector = thrust::host_vector<T>;
using Vector = thrust::host_vector<
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just thrust::device_vector is OK.

Copy link
Contributor Author

@qingqing01 qingqing01 Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I test the thrust::device_vector, it also can be used. But how to get the raw pointer is different for CPU and GPU.

For example:

std::vector<thrust::device_vector<size_t>> lod = {{0, 2 ,4}}

// get the pointer in CPU.
size_t* cpu_ptr = lod[0].data()

// get the pointer for GPU kernel
size_t* gpu_ptr = thrust::raw_pointer_cast(lod[0].data())

Both the two methods can satisfy our demands. But for the different usage for CPU and GPU, I prefer to use thrust::system::cuda::experimental::pinned_allocator<T>, I can change it to thrust::device_vector in the future if there is any question.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we need the raw pointer? Maybe just to use thrust::transform is OK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reyoung For the sequence operator, we need to use the lod as the input argument of CUDA kernel. The raw pointer (size_t*) is needed. I'm not sure that the thrust::transform can be always used for any kernel.

call the set_tensor and set_lod functions to set them.
)DOC")
.def("set_tensor",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in Paddle, we have a DataConverter to convert user's data into argument.
Whether should we implement a DataConverter or just expose a user-friendly interface in LoDTensor.

def test_float_lod_tensor(self):
scope = core.Scope()
var = scope.new_var("test_tensor")
var_lod = scope.new_var("test_lod_tensor")
Copy link
Member

@QiJune QiJune Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering do we need two variables to represent a LodTensor in operators, and how tensor memory is managed.

I try to implement infer shape in add two operator:

void InferShape(const framework::InferShapeContext &ctx) const override {
    auto tensor_x = ctx.Input<LoDTensor>("X")->tensor();
    auto tensor_y = ctx.Input<LoDTensor>("Y")->tensor();
    PADDLE_ENFORCE_EQ(tensor_x.dims(),
                      tensor_y.dims(),
                      "Two input of Add Op's dimension must be same.");
    auto* lod_tensor_out =   ctx.Output<LodTensor>("Out");
    Tensor* out = new Tensor();
    out->Resize(tensor_x->dims());
    lod_tensor_out->set_tensor(out);
  }
};

It's hard to know when to delete the pointer out.

It seems that it's better to manage the tensor memory in LoDTensor. Or we have to use shared_ptr in LodTensor.

class LodTensor {
  shared_ptr<Tensor> tensor_;
  Lod lod_;
};

Copy link
Contributor Author

@qingqing01 qingqing01 Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QiJune Good question. Maybe class LodTensor : public Tensor { } is betther to avoid use std::unique_ptr or std::shared_ptr ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using std::unique_ptr, we can fix it in another PR. If using class LodTensor : public Tensor { }, I can change in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, tensor should be a part of LODTensor.

a raw pointer with a bool flag indicating whether this LoDTensor owns the memory seems better.

If some else what to share this Tensor, just copy the pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Superjom @QiJune @reyoung This problem will be fixed in next PR.

Copy link
Collaborator

@reyoung reyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except I do not think we should get the raw pointer from LoD information.
So maybe device_vector is OK. It is up to you.

@qingqing01 qingqing01 merged commit 6d0d29f into PaddlePaddle:develop Sep 12, 2017
@qingqing01 qingqing01 deleted the lod_tensor_py branch November 14, 2019 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants