Skip to content

Conversation

@pkuyym
Copy link
Contributor

@pkuyym pkuyym commented Apr 26, 2018

Resolves #10219

@pkuyym pkuyym requested review from panyx0718 and reyoung April 26, 2018 02:35
reyoung
reyoung previously approved these changes Apr 26, 2018
Copy link
Collaborator

@reyoung reyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool

panyx0718
panyx0718 previously approved these changes Apr 26, 2018
Copy link
Contributor

@panyx0718 panyx0718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also update the transformer model?

it will share variables from the specified ParallelExecutor.
use_default_grad_scale(bool, default True): If set True, a default
scale value equal to `1./device_count` would be multiplied to
the gradients. Otherwise, a customized scale value should be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to gradients of each device? and then aggregated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, followed the comment.

use_default_grad_scale(bool, default True): If set True, a default
scale value equal to `1./device_count` would be multiplied to
the gradients. Otherwise, a customized scale value should be
feeded to the network.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feeded->fed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@pkuyym pkuyym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comments and will update transformer after this PR merged.

use_default_grad_scale(bool, default True): If set True, a default
scale value equal to `1./device_count` would be multiplied to
the gradients. Otherwise, a customized scale value should be
feeded to the network.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

it will share variables from the specified ParallelExecutor.
use_default_grad_scale(bool, default True): If set True, a default
scale value equal to `1./device_count` would be multiplied to
the gradients. Otherwise, a customized scale value should be
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, followed the comment.

@reyoung reyoung dismissed stale reviews from panyx0718 and themself via c0ac0cd April 28, 2018 06:07
@pkuyym pkuyym merged commit 9a8be9d into PaddlePaddle:develop May 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants