paddle v2 训练速度慢

## 背景
预提交到mpi集群的paddle任务（预测用户对视频内容的阅读时长），本地单机调试的过程中，发现500条sample，batch_size=256，20个pass，每个pass要跑70秒左右，后续全量数据训练样本数量级大概在亿级别，因此先本地进行训练速度上的优化。
## 当前问题
性能瓶颈在哪里？如何优化？
打印了reader和总的每个pass的处理时间，具体如下：
![88](https://user-images.githubusercontent.com/28750940/29699382-de963a0e-898e-11e7-82e7-cd5edc3eed06.png)
后续把用户及视频内容的泛化特征给都去掉，只保留id特征，reader的时间未发生变化（因为读取的数据字段还是跟之前一样，只是在模型训练的时候未用到，所以这块儿的处理时间肯定是一样的），但每个pass的处理时间却快了10倍，由原来的70秒左右变成了7秒左右，具体如下：
![new cost](https://user-images.githubusercontent.com/28750940/29699438-4ab6762c-898f-11e7-987b-7cb760816500.png)
## 读取数据的代码
![reader](https://user-images.githubusercontent.com/28750940/29699472-87e2d932-898f-11e7-9edd-e07bc8972d2f.png)
## 获取用户特征的代码
![user](https://user-images.githubusercontent.com/28750940/29699512-e1c18f48-898f-11e7-8018-f773e5a09dde.png)
## 获取内容特征的代码
![content](https://user-images.githubusercontent.com/28750940/29699518-f44b2584-898f-11e7-93c0-edd39a8729f3.png)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

paddle v2 训练速度慢 #3675

背景

当前问题

读取数据的代码

获取用户特征的代码

获取内容特征的代码

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

paddle v2 训练速度慢 #3675

Description

背景

当前问题

读取数据的代码

获取用户特征的代码

获取内容特征的代码

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions