-
Notifications
You must be signed in to change notification settings - Fork 5.9k
[Speed]implement cudnn sequence softmax cudnn #8978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Speed]implement cudnn sequence softmax cudnn #8978
Conversation
|
Do we have a benchmark data using this new operator? |
| ctx.template device_context<platform::CUDADeviceContext>(); | ||
| use_cudnn &= dev_ctx.cudnn_handle() != nullptr; | ||
| } | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic of getting use_cudnn here is a little complex. Could you refine it and make it more readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
@QiJune @jacquesqiao has updated. |
| using Tensor = framework::Tensor; | ||
|
|
||
| template <typename DeviceContext, typename T> | ||
| class SoftmaxCUDNNKernel : public framework::OpKernel<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since CUDNNKernel is only supported on GPU,
template <typename T>
class SoftmaxCUDNNKernel : public framework::OpKernel<T> {
....
context.cuda_device_context()
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| library_ = framework::LibraryType::kCUDNN; | ||
| } else { | ||
| library_ = framework::LibraryType::kPlain; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
framework::LibraryType library_ = framework::LibraryType::kPlain;
if (use_cudnn && runtime_cudnn_support) {
library_ = framework::LibraryType::kCUDNN;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| using Tensor = framework::Tensor; | ||
| using LoDTensor = framework::LoDTensor; | ||
|
|
||
| template <typename DeviceContext, typename T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to take DeviceContext as a template parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| library_ = framework::LibraryType::kCUDNN; | ||
| } else { | ||
| library_ = framework::LibraryType::kPlain; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
framework::LibraryType library_ = framework::LibraryType::kPlain;
if (use_cudnn && runtime_cudnn_support) {
library_ = framework::LibraryType::kCUDNN;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
QiJune
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
fix #8594.
This PR contains four operators. sequence_softmax, sequence_softmax_grad, softmax
, softmax_grad.
Take sequence_softmax op as example, compare with the previous implement, the sequence_softmax operator time cost is lower than mul operator now.
The time cost from 1.94211 -> 0.981581 in every minibach.
before optimize
after optimize