Skip to content

Conversation

@dzhwinter
Copy link
Contributor

@dzhwinter dzhwinter commented Mar 11, 2018

fix #8594.
This PR contains four operators. sequence_softmax, sequence_softmax_grad, softmax
, softmax_grad.
Take sequence_softmax op as example, compare with the previous implement, the sequence_softmax operator time cost is lower than mul operator now.
The time cost from 1.94211 -> 0.981581 in every minibach.

before optimize

thread0::sum                                2080        208.066     0.031584    1.06134     0.100032
thread0::sequence_softmax                   67          130.121     0.078336    9.44381     1.94211
thread0::mul_grad                           741         121.969     0.062656    1.37158     0.164601
thread0::lod_tensor_to_array                2           85.2417     30.0434     55.1984     42.6209

after optimize

thread0::sum                              204128      17150.6     0.011904    8.13488     0.0840188
thread0::mul_grad                         72729       11242.9     0.042464    6.78758     0.154587
thread0::sequence_softmax_grad            6575        6767.53     0.039008    7.94013     1.02928
thread0::sequence_softmax                 6575        6453.9      0.044608    8.09661     0.981581

@dzhwinter dzhwinter changed the title Speed/softmax cudnn [Speed]implement cudnn softmax cudnn Mar 11, 2018
@dzhwinter dzhwinter changed the title [Speed]implement cudnn softmax cudnn [Speed]implement cudnn sequence softmax cudnn Mar 12, 2018
@jacquesqiao
Copy link
Member

Do we have a benchmark data using this new operator?

ctx.template device_context<platform::CUDADeviceContext>();
use_cudnn &= dev_ctx.cudnn_handle() != nullptr;
}
#endif
Copy link
Member

@QiJune QiJune Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of getting use_cudnn here is a little complex. Could you refine it and make it more readable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dzhwinter
Copy link
Contributor Author

@QiJune @jacquesqiao has updated.

using Tensor = framework::Tensor;

template <typename DeviceContext, typename T>
class SoftmaxCUDNNKernel : public framework::OpKernel<T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since CUDNNKernel is only supported on GPU,

template <typename T>
class SoftmaxCUDNNKernel : public framework::OpKernel<T> {
....
  context.cuda_device_context()
...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

library_ = framework::LibraryType::kCUDNN;
} else {
library_ = framework::LibraryType::kPlain;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

framework::LibraryType library_ = framework::LibraryType::kPlain;
if (use_cudnn && runtime_cudnn_support) {
  library_ = framework::LibraryType::kCUDNN;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

using Tensor = framework::Tensor;
using LoDTensor = framework::LoDTensor;

template <typename DeviceContext, typename T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to take DeviceContext as a template parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

library_ = framework::LibraryType::kCUDNN;
} else {
library_ = framework::LibraryType::kPlain;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

framework::LibraryType library_ = framework::LibraryType::kPlain;
if (use_cudnn && runtime_cudnn_support) {
  library_ = framework::LibraryType::kCUDNN;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Member

@QiJune QiJune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dzhwinter dzhwinter merged commit 128adf5 into PaddlePaddle:develop Mar 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Speed] accelerate the softmax cuda implementation

3 participants