-
Notifications
You must be signed in to change notification settings - Fork 5.9k
gather, scatter with gpu support, passed python test #4483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
gather scatter gpu
paddle/operators/gather.cu.h
Outdated
| * return: output tensor | ||
| */ | ||
| template <typename T> | ||
| void GPUTGather(const Place& place, const Tensor* src, const Tensor* index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPUTGather name is not clear, and can we merge GPUGather GPUTGather together?
And we'd better change the parameter Place of GPUTGather to parameter CUDADeviceContext. CUDADeviceContext has a CUDA stream, we should launch a CUDA kernel on specific CUDA stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea. Working on it now...
They were using float64 for FP32 kernel before.
|
Please merge latest develop branch first |
paddle/operators/gather.h
Outdated
| void Gather(const platform::Place& place, const paddle::framework::Tensor* src, | ||
| const paddle::framework::Tensor* index, | ||
| paddle::framework::Tensor* output) { | ||
| void CPUGather(const platform::Place& place, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better unify the parameter of CPUGather and GPUGather, the first parameter should be DeviceContext
void CPUGather(const platform::DeviceContext& ctx...)
paddle/operators/scatter.h
Outdated
| */ | ||
| template <typename T> | ||
| void ScatterUpdate(const platform::Place& place, | ||
| void ScatterAssign(const platform::Place& place, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same with CPUGather
paddle/operators/gather.cu.h
Outdated
| * return: output tensor | ||
| */ | ||
| template <typename T> | ||
| void GPUGather(const platform::DeviceContext& ctx, const Tensor* src, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the parameter is input, we should take const T&;
If the parameter is output, we should take T*;
If the parameter is both input and output, we should take T*;
Please refer to https://google.github.io/styleguide/cppguide.html#Reference_Arguments
| template <typename T> | ||
| void GPUGather(const platform::DeviceContext& ctx, const Tensor* src, | ||
| const Tensor* index, Tensor* output) { | ||
| // PADDLE_ENFORCE(platform::is_gpu_place(place)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PADDLE_ENFORCE(platform::is_gpu_place(place));
paddle/operators/scatter.cu.h
Outdated
| * return: output tensor | ||
| */ | ||
| template <typename T> | ||
| void GPUScatterAssign(const platform::DeviceContext& ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as CPUGather, input should be const T&
QiJune
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.