Skip to content

Conversation

@qingqing01
Copy link
Contributor

@qingqing01 qingqing01 commented Mar 27, 2018

Fix #9386

There is no need to copy data to the first device. Just make the first device share data with the global scope, since they are on the same device.

Copy link
Contributor

@panyx0718 panyx0718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some comments to explain the problem?

The global moving_mean and moving_variance is currently not correctly updated by the values calculated from sub_scopes (unlike trainable parameters). Perhaps ParallelExecutor has the similar problem to solve
@tonyyang-svail @reyoung

@qingqing01
Copy link
Contributor Author

qingqing01 commented Mar 27, 2018

Add some comments to explain the problem?

In #9386, the moving mean/variance in BN are un-trainable parameters. The trainable parameters will update in backward and copy to the sub-scope in each mini-batch before the forward. Different from other trainable parameters, the moving means/variances will not updated in backward, the parallel_do_op still copy the initialized parameters in the global scope.

This fix makes the first device share parameter address with the global scope. When the moving mean/variance in the first device is updated, they will also be updated in the global scope.

But for BN, only save moving mean/variance in the first device. Maybe we can merge them between multi-GPUs and multi-machines in the future.

@qingqing01 qingqing01 merged commit 25317bd into PaddlePaddle:develop Mar 27, 2018
@qingqing01 qingqing01 deleted the parallel_do_op branch November 14, 2019 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants