[Speed]implement cudnn sequence softmax cudnn #8978

dzhwinter · 2018-03-11T09:49:53Z

fix #8594.
This PR contains four operators. sequence_softmax, sequence_softmax_grad, softmax
, softmax_grad.
Take sequence_softmax op as example, compare with the previous implement, the sequence_softmax operator time cost is lower than mul operator now.
The time cost from 1.94211 -> 0.981581 in every minibach.

before optimize

thread0::sum                                2080        208.066     0.031584    1.06134     0.100032
thread0::sequence_softmax                   67          130.121     0.078336    9.44381     1.94211
thread0::mul_grad                           741         121.969     0.062656    1.37158     0.164601
thread0::lod_tensor_to_array                2           85.2417     30.0434     55.1984     42.6209

after optimize

thread0::sum                              204128      17150.6     0.011904    8.13488     0.0840188
thread0::mul_grad                         72729       11242.9     0.042464    6.78758     0.154587
thread0::sequence_softmax_grad            6575        6767.53     0.039008    7.94013     1.02928
thread0::sequence_softmax                 6575        6453.9      0.044608    8.09661     0.981581

jacquesqiao · 2018-03-12T05:00:48Z

Do we have a benchmark data using this new operator?

QiJune · 2018-03-12T05:51:23Z

paddle/fluid/operators/sequence_softmax_op.cc

+          ctx.template device_context<platform::CUDADeviceContext>();
+      use_cudnn &= dev_ctx.cudnn_handle() != nullptr;
+    }
+#endif


The logic of getting use_cudnn here is a little complex. Could you refine it and make it more readable?

dzhwinter · 2018-03-12T06:28:39Z

@QiJune @jacquesqiao has updated.

QiJune · 2018-03-13T03:35:01Z

paddle/fluid/operators/softmax_cudnn_op.cu.cc

+using Tensor = framework::Tensor;
+
+template <typename DeviceContext, typename T>
+class SoftmaxCUDNNKernel : public framework::OpKernel<T> {


Since CUDNNKernel is only supported on GPU,

template <typename T> class SoftmaxCUDNNKernel : public framework::OpKernel<T> { .... context.cuda_device_context() ... }

QiJune · 2018-03-13T03:36:20Z

paddle/fluid/operators/sequence_softmax_op.cc

+      library_ = framework::LibraryType::kCUDNN;
+    } else {
+      library_ = framework::LibraryType::kPlain;
+    }


framework::LibraryType library_ = framework::LibraryType::kPlain; if (use_cudnn && runtime_cudnn_support) { library_ = framework::LibraryType::kCUDNN; }

QiJune · 2018-03-13T03:36:39Z

paddle/fluid/operators/sequence_softmax_cudnn_op.cu.cc

+using Tensor = framework::Tensor;
+using LoDTensor = framework::LoDTensor;
+
+template <typename DeviceContext, typename T>


No need to take DeviceContext as a template parameter.

QiJune · 2018-03-13T03:37:23Z

paddle/fluid/operators/softmax_op.cc

+      library_ = framework::LibraryType::kCUDNN;
+    } else {
+      library_ = framework::LibraryType::kPlain;
+    }


framework::LibraryType library_ = framework::LibraryType::kPlain; if (use_cudnn && runtime_cudnn_support) { library_ = framework::LibraryType::kCUDNN; }

QiJune

LGTM!

dzhwinter added 12 commits March 8, 2018 19:10

"add softmax cudnn functor support"

9456d8a

"add testing"

4f16e1e

"refine cmakelist"

f39e7fb

"sequence softmax forward speed up"

e37a034

"add softmax grad"

91ea44c

"fix sequence softmax test"

1dc2dba

"add double precision'

806a6e1

"fix softmax test"

af32561

"add softmax cudnn support"

588a97e

"fix softmax cudnn test"

0b3f692

"add softmax to nn.py"

7826ba8

Merge remote-tracking branch 'origin/develop' into speed/softmax_cudnn

6779b6b

dzhwinter changed the title ~~Speed/softmax cudnn~~ [Speed]implement cudnn softmax cudnn Mar 11, 2018

dzhwinter added 3 commits March 11, 2018 21:05

"fix compile bug"

30d003d

"refine cmakelist"

9f752a8

"fix ci"

30be30a

dzhwinter changed the title ~~[Speed]implement cudnn softmax cudnn~~ [Speed]implement cudnn sequence softmax cudnn Mar 12, 2018

QiJune reviewed Mar 12, 2018

View reviewed changes

"fix based on comment"

edeb10b

QiJune reviewed Mar 13, 2018

View reviewed changes

dzhwinter and others added 4 commits March 13, 2018 03:00

"fix based on comments"

c23599f

Merge remote-tracking branch 'origin/develop' into speed/softmax_cudnn

3388218

"fix ci"

5bb1a5a

Merge branch 'develop' into speed/softmax_cudnn

da351e2

QiJune approved these changes Mar 15, 2018

View reviewed changes

dzhwinter merged commit 128adf5 into PaddlePaddle:develop Mar 15, 2018

[Speed]implement cudnn sequence softmax cudnn #8978

[Speed]implement cudnn sequence softmax cudnn #8978

Uh oh!

Conversation

dzhwinter commented Mar 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jacquesqiao commented Mar 12, 2018

Uh oh!

QiJune Mar 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dzhwinter commented Mar 12, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

QiJune left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dzhwinter commented Mar 11, 2018 •

edited

Loading

QiJune Mar 12, 2018 •

edited

Loading