Skip to content

Commit e3d0fa6

Browse files
authored
Merge pull request #4 from QiJune/feature/dynamic_net_doc
Feature/dynamic net doc
2 parents 4a94baa + 3e5d22a commit e3d0fa6

File tree

6 files changed

+46
-13
lines changed

6 files changed

+46
-13
lines changed

doc/design/dynamic_net/00.how_to_implenment_dynamic_net.md

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 动态神经网络的实现
22

3-
动态网络是目前神经网络框架的前沿课题。动态神经网络的优势解决了普通神经网络框架的一个重要问题,**神经网络的定义和计算是分离的**即普通神经网络框架的计算步骤是,先定义一个神经网络的计算图,再使用计算引擎计算这个计算图。而动态神经网络的特点是,直接对每个操作求值,隐式的定义计算图,从而再对这个隐式的计算图反向传播。
3+
动态网络是目前神经网络框架的前沿课题。动态神经网络的优势解决了普通神经网络框架的一个重要问题,**神经网络的定义和计算是分离的**即静态神经网络框架的计算步骤是,先定义一个神经网络的计算图,再使用计算引擎计算这个计算图。而动态神经网络的特点是,直接对每个操作求值,隐式的定义计算图,从而再对这个隐式的计算图反向传播。
44

55
常见的使用方式为:
66

@@ -12,7 +12,7 @@ x.fill([0.058, 0.548, ...])
1212
y = paddle.dyn.data(type=Integer(10))
1313
y.fill(9)
1414

15-
hidden = paddle.dyn.fc(input=y, size=200)
15+
hidden = paddle.dyn.fc(input=x, size=200)
1616

1717
# You can use hidden.npvalue() to get this layer's value now.
1818

@@ -31,18 +31,31 @@ parameters.update()
3131

3232
## 动态神经网络解决的问题
3333

34-
动态神经网络只有神经网络的计算步骤,而隐藏了神经网络的定义步骤。他解决的问题是:
34+
动态神经网络只有神经网络的计算步骤,而隐藏了神经网络的定义步骤,用户可以为每一个sample或者batch定义一个不同的网络。相对于静态神经网络而言,动态神经网络解决了以下几个问题:
3535

36-
* 可以任意的在计算过程中添加非线性的操作,例如`if`。并且对于不同的数据,神经网络的计算图可以不同。例如 树形神经网络
36+
* 可以任意的在计算过程中添加复杂的控制逻辑,例如迭代,递归,条件选择等,这些控制逻辑都可以由host language(C++/Python)来实现。
37+
* 可以支持更复杂的数据类型,并且对于不同的数据,神经网络的计算图可以不同。
38+
* 动态神经网络的执行过程就是其定义过程,用户可以对神经网络中的参数,中间结果等信息直接求值,方便debug的过程。
3739

38-
// TODO(qijun): Complete this docs
39-
40-
TBD
4140

4241
## 动态神经网络的实现思路
4342

44-
TBD
43+
动态神经网络计算图的定义是隐式的,其设计哲学可以参考一些autograd库(例如https://github.com/HIPS/autograd)。具体实现思路如下:
44+
45+
46+
1. 对于每一个sample,用户使用layer的组合来定义神经网络结构。每个sample都拥有一个graph结构来记录该sample的计算图。
47+
2. graph中包含每一层layer的信息,包括输入数据来源,该层layer进行的操作,输出数据大小等。新连接上的layer的相关信息会被持续追加到graph中。
48+
3. layer的求值操作是lazy的,直到用户显式的调用value()方法,graph中记录的计算图才会被execute engine真正执行,计算得到该层layer的输出结果。通常情况下执行forward()操作时会对网络进行求值。
49+
4. 用户可以在组合layer的时候加入控制逻辑,被选择的分支信息也会记录到graph中。
50+
5. 在进行backward()操作时,graph的execute engine会根据记录的计算图执行求导操作,计算梯度。
51+
52+
53+
4554

4655
## 动态神经网络对神经网络框架的要求
4756

48-
TBD
57+
* 最核心的要求就是构建计算图的过程要足够轻量,后端使用C++来实现,并且考虑设计特定的内存/显存 管理策略。前端的Python wrapper也要足够小,可以直接使用后端C++提供的接口。
58+
59+
* 考虑到layer的求值是lazy的,可以使用表达式模板对计算过程进行优化。
60+
61+
* 考虑对不同大小数据/不同网络结构 组batch进行训练。在动态网络中,每一个sample都拥有自己的计算图,相比于静态网络,在GPU上进行并行操作是比较困难的。

doc/templates/conf.py.cn.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ extensions = [
5555
'sphinx.ext.napoleon',
5656
'sphinx.ext.graphviz'
5757
]
58+
mathjax_path="https://cdn.bootcss.com/mathjax/2.7.0/MathJax.js"
5859
table_styling_embed_css = True
5960

6061
autodoc_member_order = 'bysource'

paddle/gserver/layers/HierarchicalSigmoidLayer.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ namespace paddle {
3636
* | |- 5
3737
* |
3838
* |-*- 0
39-
* |- 1
39+
* |- 1
4040
* @endcode
4141
*
4242
* where * indicates an internal node, and each leaf node represents a class.

paddle/scripts/docker/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ docker build -t paddle:dev --build-arg UBUNTU_MIRROR=mirror://mirrors.ubuntu.com
9494
Given the development image `paddle:dev`, the following command builds PaddlePaddle from the source tree on the development computer (host):
9595

9696
```bash
97-
docker run -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "WITH_TEST=OFF" -e "RUN_TEST=OFF" paddle:dev
97+
docker run --rm -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" -e "WITH_TEST=OFF" -e "RUN_TEST=OFF" paddle:dev
9898
```
9999

100100
This command mounts the source directory on the host into `/paddle` in the container, so the default entry point of `paddle:dev`, `build.sh`, could build the source code with possible local changes. When it writes to `/paddle/build` in the container, it writes to `$PWD/build` on the host indeed.
@@ -110,7 +110,7 @@ Users can specify the following Docker build arguments with either "ON" or "OFF"
110110
- `WITH_AVX`: ***Required***. Set to "OFF" prevents from generating AVX instructions. If you don't know what is AVX, you might want to set "ON".
111111
- `WITH_TEST`: ***Optional, default OFF***. Build unit tests binaries. Once you've built the unit tests, you can run these test manually by the following command:
112112
```bash
113-
docker run -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" paddle:dev sh -c "cd /paddle/build; make coverall"
113+
docker run --rm -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=ON" paddle:dev sh -c "cd /paddle/build; make coverall"
114114
```
115115
- `RUN_TEST`: ***Optional, default OFF***. Run unit tests after building. You can't run unit tests without building it.
116116

@@ -129,7 +129,7 @@ This production image is minimal -- it includes binary `paddle`, the shared libr
129129
Again the development happens on the host. Suppose that we have a simple application program in `a.py`, we can test and run it using the production image:
130130

131131
```bash
132-
docker run -it -v $PWD:/work paddle /work/a.py
132+
docker run --rm -it -v $PWD:/work paddle /work/a.py
133133
```
134134

135135
But this works only if all dependencies of `a.py` are in the production image. If this is not the case, we need to build a new Docker image from the production image and with more dependencies installs.

python/paddle/v2/dataset/wmt14.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,10 @@
1515
wmt14 dataset
1616
"""
1717
import tarfile
18+
import gzip
1819

1920
from paddle.v2.dataset.common import download
21+
from paddle.v2.parameters import Parameters
2022

2123
__all__ = ['train', 'test', 'build_dict']
2224

@@ -25,6 +27,9 @@
2527
# this is a small set of data for test. The original data is too large and will be add later.
2628
URL_TRAIN = 'http://paddlepaddle.cdn.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz'
2729
MD5_TRAIN = 'a755315dd01c2c35bde29a744ede23a6'
30+
# this is the pretrained model, whose bleu = 26.92
31+
URL_MODEL = 'http://paddlepaddle.bj.bcebos.com/demo/wmt_14/wmt14_model.tar.gz'
32+
MD5_MODEL = '6b097d23e15654608c6f74923e975535'
2833

2934
START = "<s>"
3035
END = "<e>"
@@ -103,5 +108,13 @@ def test(dict_size):
103108
download(URL_TRAIN, 'wmt14', MD5_TRAIN), 'test/test', dict_size)
104109

105110

111+
def model():
112+
tar_file = download(URL_MODEL, 'wmt14', MD5_MODEL)
113+
with gzip.open(tar_file, 'r') as f:
114+
parameters = Parameters.from_tar(f)
115+
return parameters
116+
117+
106118
def fetch():
107119
download(URL_TRAIN, 'wmt14', MD5_TRAIN)
120+
download(URL_MODEL, 'wmt14', MD5_MODEL)

python/paddle/v2/trainer.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,12 @@ def __init__(self, cost, parameters, update_equation):
5252
self.__topology__ = topology
5353
self.__parameters__ = parameters
5454
self.__topology_in_proto__ = topology.proto()
55+
56+
# In local mode, disable sparse_remote_update.
57+
for param in self.__topology_in_proto__.parameters:
58+
if param.sparse_remote_update:
59+
param.sparse_remote_update = False
60+
5561
self.__data_types__ = topology.data_type()
5662
gm = api.GradientMachine.createFromConfigProto(
5763
self.__topology_in_proto__, api.CREATE_MODE_NORMAL,

0 commit comments

Comments
 (0)