-
Notifications
You must be signed in to change notification settings - Fork 5.9k
update FAQ. #4379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update FAQ. #4379
Changes from 5 commits
8882231
458b726
c4cdddb
0464c94
611dacf
208da87
8177cf9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -247,7 +247,7 @@ PaddlePaddle的参数使用名字 :code:`name` 作为参数的ID,相同名字 | |
|
|
||
| CMake Warning at cmake/version.cmake:20 (message): | ||
| Cannot add paddle version from git tag | ||
|
|
||
| 那么用户需要拉取所有的远程分支到本机,命令为 :code:`git fetch upstream`,然后重新cmake即可。 | ||
|
|
||
| 12. A protocol message was rejected because it was too big | ||
|
|
@@ -316,7 +316,39 @@ Paddle二进制在运行时捕获了浮点数异常,只要出现浮点数异 | |
| * 模型一直不收敛,发散到了一个数值特别大的地方。 | ||
| * 训练数据有问题,导致参数收敛到了一些奇异的情况。或者输入数据尺度过大,有些特征的取值达到数百万,这时进行矩阵乘法运算就可能导致浮点数溢出。 | ||
|
|
||
| 主要的解决办法是减小学习率或者对数据进行归一化处理。 | ||
| 这里有两种有效的解决方法: | ||
|
|
||
| 1. 设置 :code:`gradient_clipping_threshold` 参数,示例代码如下: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| optimizer = paddle.optimizer.RMSProp( | ||
| learning_rate=1e-3, | ||
| gradient_clipping_threshold=10.0, | ||
| regularization=paddle.optimizer.L2Regularization(rate=8e-4)) | ||
|
|
||
| 具体可以参考 `nmt_without_attention <https://github.com/PaddlePaddle/models/blob/develop/nmt_without_attention/train.py#L35>`_ 示例。 | ||
|
|
||
| 2. 设置 :code:`error_clipping_threshold` 参数,示例代码如下: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| decoder_inputs = paddle.layer.fc( | ||
| act=paddle.activation.Linear(), | ||
| size=decoder_size * 3, | ||
| bias_attr=False, | ||
| input=[context, current_word], | ||
| layer_attr=paddle.attr.ExtraLayerAttribute( | ||
| error_clipping_threshold=100.0)) | ||
|
|
||
| 完整代码可以参考示例 `machine translation <https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/train.py#L66>`_ 。 | ||
|
|
||
| 两种方法的区别: | ||
|
|
||
| 1. 两者都是对梯度的截断,但截断时机不同,前者在 :code:`optimzier` 更新网络参数时应用;后者在激活函数反向计算时被调用; | ||
| 2. 截断对象不同:前者截断可学习参数的梯度,后者截断回传给前层的梯度; | ||
|
|
||
| 除此之外,还可以通过减小学习律或者对数据进行归一化处理来解决这类问题。 | ||
|
|
||
| 15. 编译安装后执行 import paddle.v2 as paddle 报ImportError: No module named v2 | ||
| ------------------------------------------------------------------------ | ||
|
|
@@ -405,9 +437,28 @@ PaddlePaddle保存的模型参数文件内容由16字节头信息和网络参数 | |
|
|
||
| .. code-block:: python | ||
|
|
||
| out = inferer.infer(input=data_batch, flatten_result=False, field=["value"]) | ||
| out = inferer.infer(input=data_batch, field=["value"]) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
|
|
||
| 需要注意的是: | ||
|
|
||
| * 如果指定了2个layer作为输出层,实际上需要的输出结果是两个矩阵; | ||
| * 假设第一个layer的输出A是一个 N1 * M1 的矩阵,第二个 Layer 的输出B是一个 N2 * M2 的矩阵; | ||
| * paddle.v2 默认会将A和B 横向拼接,当N1 和 N2 大小不一样时,会报如下的错误: | ||
|
|
||
| 这里设置 :code:`flatten_result=False`,得到的输出结果是元素个数等于输出字段数的 :code:`list`,该 :code:`list` 的每个元素是由所有输出层相应字段结果组成的 :code:`list`,每个字段结果的类型是 :code:`numpy.array`。:code:`flatten_result` 的默认值为 :code:`True`,该情况下,PaddlePaddle会分别对每个字段将所有输出层的结果按行进行拼接,如果各输出层该字段 :code:`numpy.array` 结果的相应维数不匹配,程序将不能正常运行。 | ||
| .. code-block:: python | ||
|
|
||
| ValueError: all the input array dimensions except for the concatenation axis must match exactly | ||
|
|
||
| 多个层的输出矩阵的高度不一致导致拼接失败,这种情况常常发生在: | ||
|
|
||
| * 同时输出序列层和非序列层; | ||
| * 多个输出层处理多个不同长度的序列; | ||
|
|
||
| 此时可以在调用infer接口时通过设置 :code:`flatten_result=False` , 跳过“拼接”步骤,来解决上面的问题。这时,infer接口的返回值是一个python list: | ||
|
|
||
| * list 中元素的个数等于网络中输出层的个数; | ||
| * list 中每个元素是一个layer的输出结果矩阵,类型是numpy的ndarray; | ||
| * 每一个layer输出矩阵的高度,在非序列输入时:等于样本数;序列输入时等于:输入序列中元素的总数;宽度等于配置中layer的size; | ||
|
|
||
| 20. :code:`paddle.layer.memory` 的参数 :code:`name` 如何使用 | ||
| ------------------------------------------------------------- | ||
|
|
@@ -503,7 +554,7 @@ PaddlePaddle目前支持8种learning_rate_schedule,这8种learning_rate_schedu | |
| optimizer = paddle.optimizer.Adam( | ||
| learning_rate=1e-3, | ||
| learning_rate_schedule="manual", | ||
| learning_rate_args="1:1.0,2:0.9,3:0.8",) | ||
| learning_rate_args="1:1.0,2:0.9,3:0.8",) | ||
|
|
||
| 在该示例中,当已训练pass数小于等于1时,学习率为 :code:`1e-3 * 1.0`;当已训练pass数大于1小于等于2时,学习率为 :code:`1e-3 * 0.9`;当已训练pass数大于2时,学习率为 :code:`1e-3 * 0.8`。 | ||
|
|
||
|
|
@@ -512,3 +563,30 @@ PaddlePaddle目前支持8种learning_rate_schedule,这8种learning_rate_schedu | |
|
|
||
| 出现该错误的原因一般是用户对不同layer的参数 :code:`name` 设置了相同的取值。遇到该错误时,先找出参数 :code:`name` 取值相同的layer,然后将这些layer的参数 :code:`name` 设置为不同的值。 | ||
|
|
||
| 24. PaddlePaddle 中不同的 recurrent layer 的区别 | ||
| -------------------------------------------------- | ||
| 以LSTM为例,在PaddlePaddle中包含以下 recurrent layer: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 以LSTM为例,PaddlePaddle 有以下与LSTM相关的layer: |
||
|
|
||
| * :code:`paddle.layer.lstmemory` | ||
| * :code:`paddle.networks.simple_lstm` | ||
| * :code:`paddle.networks.lstmemory_group` | ||
| * :code:`paddle.networks.bidirectional_lstm` | ||
|
|
||
| 按照具体实现方式可以归纳为2类: | ||
|
|
||
| 1. 由 recurrent_group 实现的 recurrent layer: | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| * 用户在使用这一类recurrent layer时,可以访问由recurrent unit在一个时间步内计算得到的中间值(例如:hidden states, memory cells等); | ||
| * 上述的 :code:`paddle.networks.lstmemory_group` 是这一类的 recurrent layer ; | ||
|
|
||
| 2. 将recurrent layer作为一个整体来实现: | ||
|
|
||
| * 用户在使用这一类recurrent layer,只能访问它们的输出值; | ||
| * 上述的 :code:`paddle.networks.lstmemory_group` 、 :code:`paddle.networks.simple_lstm` 和 :code:`paddle.networks.bidirectional_lstm` 属于这一类的实现; | ||
|
|
||
| 将recurrent layer作为一个整体来实现, 能够针对CPU和GPU的计算做更多优化, 所以相比于recurrent group的实现方式, 第二类 recurrent layer 计算效率更高。 在实际应用中,如果用户不需要访问LSTM的中间变量,而只需要获得recurrent layer计算的输出,我们建议使用第二类实现。 | ||
|
|
||
| 此外,关于LSTM, PaddlePaddle中还包含 :code:`paddle.networks.lstmemory_unit` 这一计算单元: | ||
|
|
||
| * 不同于上述介绍的recurrent layer , :code:`paddle.networks.lstmemory_unit` 定义了LSTM单元在一个时间步内的计算过程,它并不是一个完整的recurrent layer,也不能接收序列数据作为输入; | ||
| * :code:`paddle.networks.lstmemory_unit` 只能在recurrent_group中作为step function使用; | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的1和2 改为:
gradient_clipping_threshold参数;error_clipping_threshold参数;下文再详细展开两者的不同,放在标题里面解释不清楚。
optimzier更新网络参数时应用;后者在激活函数反向计算时被调用;