add MKL Packed design doc for RNN optimazation #6636

tensor-tang · 2017-12-14T14:46:07Z

Click here to review MD.

The Python API part need be discussed #6612.

luotao1

需要在文中的某一个地方说一下，tensorflow或者其他框架使用/未使用packed这样的技术么？

luotao1 · 2017-12-15T03:28:00Z

doc/design/mkl/mkl_packed.md

@@ -0,0 +1,91 @@
+# Intel® MKL Packed Optimization on PaddlePaddle: Design Doc


标题里面可以去掉Optimization，现在标题占两行，有点长。

luotao1 · 2017-12-15T03:29:16Z

doc/design/mkl/mkl_packed.md

+
+## Overview
+我们计划将 Intel® MKL 中引入的 GEMM Packed APIs\[[1](#references)\] 集成到 PaddlePaddle 中，充分发挥英特尔平台的优势，有效提升PaddlePaddle在英特尔架构上的性能。
+现阶段的优化主要针对 Recurrent Neural Network(以下简称RNN)相关层（包括`RecurrentLayer`, `GatedRecurrentLayer`和`LstmLayer`）， 以及 PaddlePaddle V1 API。


（以下简称RNN）换成中文括号

V1和V2 API应该都适用吧。

（以下简称RNN）换成中文括号

好的，thx

V1和V2 API应该都适用吧。

应该是都适用，只是目前的验证主要工作还是集中在V1上。所以先写V1。

luotao1 · 2017-12-15T03:33:12Z

doc/design/mkl/mkl_packed.md

+## Key Points
+
+### Background
+为了达到最佳性能， Intel® MKL 中的 cblas_?gemm 会在计算前将原数据转换为更适合英特尔平台的Packed格式， 这一数据格式的转换操作 (Packing)，在问题本身的计算量比较小的时候显得相对来说较为耗时。


cblas_?gemm，中间的问号是笔误？下同。

25行可以改成：目前PaddlePaddle采用 Intel® MKL的cblas_gemm库，这个库会在计算前将原数据转换为更适合英特尔平台的Packed格式。但这一数据格式的转换操作（Packing），在问题本身的计算量比较小的时候（例如RNN，矩阵大小一般是XXX）显得相对来说较为耗时。

可再组织一下语句。

Packed中文是转换，应该是打包吧？

cblas_?gemm，中间的问号是笔误？

这个不是笔误，官网就是这样写的：https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm
表示的精度可选。

25行可以改成：目前PaddlePaddle采用 Intel® MKL的cblas_gemm库，这个库会在计算前将原数据转换为更适合英特尔平台的Packed格式。但这一数据格式的转换操作（Packing），在问题本身的计算量比较小的时候（例如RNN，矩阵大小一般是XXX）显得相对来说较为耗时。

cblas_gemm 这是MKL库里的一个函数，它本身不是库。

语句本身确实可以整理成你说的这样。Thx。

Packed中文是转换，应该是打包吧？

官网是把转换后的数据格式，直接称为 internal packed format，转换的过程叫Packing。

luotao1 · 2017-12-15T03:34:38Z

doc/design/mkl/mkl_packed.md

+
+### Background
+为了达到最佳性能， Intel® MKL 中的 cblas_?gemm 会在计算前将原数据转换为更适合英特尔平台的Packed格式， 这一数据格式的转换操作 (Packing)，在问题本身的计算量比较小的时候显得相对来说较为耗时。
+在现有的某些情况下（例如RNN），多次调用 cblas_?gemm 时会使用相同的原数据，每次调用时对原数据的重复Packing便成为了冗余。


同时，由于RNN多次调用 cblas_gemm 时会使用相同的原数据，因此，每次调用时对原数据的重复Packing便成为了冗余。

OK，我来整理下，thx。

luotao1 · 2017-12-15T03:38:38Z

doc/design/mkl/mkl_packed.md

+通过使用这些API，我们可以先完成对原数据的Packing操作，再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数，从而避免了Packing冗余。
+
+### Solution
+在RNN的case下，同一次 forward/backward 过程中所有time state共享同一个weight矩阵。当只做 inference 时，各次 forward 之间也都使用相同的weight矩阵，没有必要在每次forward中每个time state的计算时对weight进行重复的packing操作。


尽量都用中文名词：

forward/backward：前向、后向

time state：时间步

weight：权重

inference：推断
下同。

OK， thx。

luotao1 · 2017-12-15T03:39:38Z

doc/design/mkl/mkl_packed.md

+### Solution
+在RNN的case下，同一次 forward/backward 过程中所有time state共享同一个weight矩阵。当只做 inference 时，各次 forward 之间也都使用相同的weight矩阵，没有必要在每次forward中每个time state的计算时对weight进行重复的packing操作。
+
+我们通过使用新引入的GEMM Packed APIs，在layer init时先完成对weight的packing操作，然后在 forward/backward 时复用已pack过后的weight，并在每次weight更新后重新Packing。


GEMM Packed APIs，是指上面四个API么？上面第一次列出时，可以说一下他们是GEMM Packed APIs。

luotao1 · 2017-12-15T03:41:34Z

doc/design/mkl/mkl_packed.md

+
+我们通过使用新引入的GEMM Packed APIs，在layer init时先完成对weight的packing操作，然后在 forward/backward 时复用已pack过后的weight，并在每次weight更新后重新Packing。
+
+* 优化前，对于sequence length = `T` 的model, `N` 次iteration执行的Packing次数为：   


sequence_length：序列长度

iteration：迭代

这里用model，正确么？这里应该没有model的概念。

好的，thx。

model ==> 网络模型。

tensor-tang · 2017-12-15T06:00:50Z

Thanks, done.

luotao1 · 2017-12-15T06:09:48Z

doc/design/mkl/mkl_packed.md

+2. 转换冗余 \
+由于在现有的某些情况下（例如RNN），多次调用 cblas_?gemm 会使用相同的原数据，因此，每次调用时对原数据的重复Packing便成为了冗余。
+
+为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时，Intel® MKL 引入了以下四个API:


cblas_?gemm可以加一个参考文献：https://software.intel.com/en-us/mkl-developer-reference-c-cblas-gemm 说明？是代表四种不同精度。

好的没问题。可以把这个直接作成链接。

luotao1 · 2017-12-15T06:13:40Z

doc/design/mkl/mkl_packed.md

+通过使用这些API，我们可以先完成对原数据的Packing操作，再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数，从而避免了Packing冗余。
+
+### Solution
+在RNN的情况下，同一次**前向/后向**（forward/backward）过程中所有**时间步**（time step）共享同一个**权重**（weight）。当只做**预测**（inference）时，各次**前向**之间也都使用了相同的**权重**，没有必要在每次**前向**中每个**时间步**的计算时对**权重**进行重复的Packing操作。


这些关键字没必要加粗，都很常见。下同。

inference是推断：比如 https://github.com/PaddlePaddle/Mobile/blob/develop/Demo/iOS/AICamera/README_cn.md#目录结构里面写的“如何调用Paddle C API进行离线推断”

加粗是想要强调，并且放起来方便。如果觉得没有必要也可以去掉。

OK。

tensor-tang

OK, done

tensor-tang · 2017-12-15T06:59:53Z

doc/design/mkl/mkl_packed.md

+2. 转换冗余 \
+由于在现有的某些情况下（例如RNN），多次调用 cblas_?gemm 会使用相同的原数据，因此，每次调用时对原数据的重复Packing便成为了冗余。
+
+为了最大程度减少多次调用 cblas_?gemm 在Packing上的耗时，Intel® MKL 引入了以下四个API:


好的没问题。可以把这个直接作成链接。

tensor-tang · 2017-12-15T07:03:14Z

doc/design/mkl/mkl_packed.md

+通过使用这些API，我们可以先完成对原数据的Packing操作，再把已转换为Packed格式的数据传递给那些复用同一数据的gemm_compute函数，从而避免了Packing冗余。
+
+### Solution
+在RNN的情况下，同一次**前向/后向**（forward/backward）过程中所有**时间步**（time step）共享同一个**权重**（weight）。当只做**预测**（inference）时，各次**前向**之间也都使用了相同的**权重**，没有必要在每次**前向**中每个**时间步**的计算时对**权重**进行重复的Packing操作。


加粗是想要强调，并且放起来方便。如果觉得没有必要也可以去掉。

OK。

luotao1

LGTM. 下次加python API时，可以把cblas_?gemm_alloc等四个函数也加上相应的链接。

tensor-tang · 2017-12-15T08:38:24Z

好的，没问题。

tensor-tang requested review from wangkuiyi and luotao1 December 14, 2017 14:46

tensor-tang added the MKL label Dec 14, 2017

luotao1 reviewed Dec 15, 2017

View reviewed changes

tensor-tang commented Dec 15, 2017

View reviewed changes

luotao1 approved these changes Dec 15, 2017

View reviewed changes

luotao1 merged commit c13805e into PaddlePaddle:develop Dec 15, 2017

tensor-tang deleted the mkl branch December 15, 2017 08:38

tensor-tang added 4 commits December 15, 2017 22:33

rename mkldnn doc

acaef9a

add mkl packed desgin doc

1670ec0

follow comments and refine doc

3f2fa0a

follow comments and refine doc

c1a6870

tensor-tang mentioned this pull request Dec 17, 2017

update mkl packed design doc #6680

Merged

		@@ -0,0 +1,91 @@
		# Intel® MKL Packed Optimization on PaddlePaddle: Design Doc


		我们通过使用新引入的GEMM Packed APIs，在layer init时先完成对weight的packing操作，然后在 forward/backward 时复用已pack过后的weight，并在每次weight更新后重新Packing。

		* 优化前，对于sequence length = `T` 的model, `N` 次iteration执行的Packing次数为：

add MKL Packed design doc for RNN optimazation #6636

add MKL Packed design doc for RNN optimazation #6636

Uh oh!

Conversation

tensor-tang commented Dec 14, 2017

Uh oh!

luotao1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tensor-tang commented Dec 15, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tensor-tang left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luotao1 left a comment

Choose a reason for hiding this comment

Uh oh!

tensor-tang commented Dec 15, 2017

Uh oh!

Uh oh!