-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Correctly use host_vector in LoDTensor and expose LoDTensor to Python. #4001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| self.set_lod(lod); | ||
| #else | ||
| paddle::framework::LoD new_lod; | ||
| new_lod.reserve(lod.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expose other interfaces like, SliceLevels, SliceInLevel, NumLevels, NumElements, these interfaces may be used by user too.
What's more, LODSliceInLevel, LoDSliceLevels may be operators also, which may be used in RNN or some other Ops. These can be new issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SliceLevels, SliceInLevel, NumLevels, NumElements
Now, all these interfaces are used in C++ code. Just like Tensor, many interfaces are not exposed. I prefer to add these interfaces in the future if they are really needed in Python.
| call the set_tensor and set_lod functions to set them. | ||
| )DOC") | ||
| .def("set_tensor", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two interfaces are too trivial, add another construction function may be better, __init__(lod, tensor) because LoDTensor is a complete concept, not a union of two different concepts (though implementation is, but that should be hidden from user)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in Paddle, we have a DataConverter to convert user's data into argument.
Whether should we implement a DataConverter or just expose a user-friendly interface in LoDTensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Superjom Add __init__ construction function in pybind. But I think the set_tensor and set_lod is still needed. Since we usually get empty LoDTensor from the variable, then we need to set tensor and lod for this LoDTensor, and some times we only need to set tensor for the case without lod info. So the set_tensor and set_lod are stilled reserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| #else | ||
| template <typename T> | ||
| using Vector = thrust::host_vector<T>; | ||
| using Vector = thrust::host_vector< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just thrust::device_vector is OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I test the thrust::device_vector, it also can be used. But how to get the raw pointer is different for CPU and GPU.
For example:
std::vector<thrust::device_vector<size_t>> lod = {{0, 2 ,4}}
// get the pointer in CPU.
size_t* cpu_ptr = lod[0].data()
// get the pointer for GPU kernel
size_t* gpu_ptr = thrust::raw_pointer_cast(lod[0].data())Both the two methods can satisfy our demands. But for the different usage for CPU and GPU, I prefer to use thrust::system::cuda::experimental::pinned_allocator<T>, I can change it to thrust::device_vector in the future if there is any question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should we need the raw pointer? Maybe just to use thrust::transform is OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reyoung For the sequence operator, we need to use the lod as the input argument of CUDA kernel. The raw pointer (size_t*) is needed. I'm not sure that the thrust::transform can be always used for any kernel.
| call the set_tensor and set_lod functions to set them. | ||
| )DOC") | ||
| .def("set_tensor", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in Paddle, we have a DataConverter to convert user's data into argument.
Whether should we implement a DataConverter or just expose a user-friendly interface in LoDTensor.
| def test_float_lod_tensor(self): | ||
| scope = core.Scope() | ||
| var = scope.new_var("test_tensor") | ||
| var_lod = scope.new_var("test_lod_tensor") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering do we need two variables to represent a LodTensor in operators, and how tensor memory is managed.
I try to implement infer shape in add two operator:
void InferShape(const framework::InferShapeContext &ctx) const override {
auto tensor_x = ctx.Input<LoDTensor>("X")->tensor();
auto tensor_y = ctx.Input<LoDTensor>("Y")->tensor();
PADDLE_ENFORCE_EQ(tensor_x.dims(),
tensor_y.dims(),
"Two input of Add Op's dimension must be same.");
auto* lod_tensor_out = ctx.Output<LodTensor>("Out");
Tensor* out = new Tensor();
out->Resize(tensor_x->dims());
lod_tensor_out->set_tensor(out);
}
};
It's hard to know when to delete the pointer out.
It seems that it's better to manage the tensor memory in LoDTensor. Or we have to use shared_ptr in LodTensor.
class LodTensor {
shared_ptr<Tensor> tensor_;
Lod lod_;
};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@QiJune Good question. Maybe class LodTensor : public Tensor { } is betther to avoid use std::unique_ptr or std::shared_ptr ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If using std::unique_ptr, we can fix it in another PR. If using class LodTensor : public Tensor { }, I can change in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, tensor should be a part of LODTensor.
a raw pointer with a bool flag indicating whether this LoDTensor owns the memory seems better.
If some else what to share this Tensor, just copy the pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reyoung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except I do not think we should get the raw pointer from LoD information.
So maybe device_vector is OK. It is up to you.
Fix #3883
Correctly use host_vector
thrust::system::cuda::experimental::pinned_allocator<T>allocator.lod_tensor_test.cuto test LoDTensor for GPU.Expose LoDTensor to Python.