-
Notifications
You must be signed in to change notification settings - Fork 5.8k
[Auto Parallel] Add general gradient merge pass to support auto parallel #38259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for your contribution! |
@@ -0,0 +1,349 @@ | |||
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename gradient_merge.py
to auto_parallel_gradient_merge.py
since this pass may not work for other codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return optimize_ops_desc | ||
|
||
|
||
def _remove_op_role_var(param, grad): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of _remove_op_role_var
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
非自动并行的情况下,多卡训练时,是通过“op_role_var”这一变量来记录通信的Var(标记哪些是需要通信的梯度),增加grad merge后,原op中记录的op_role_var是错误的,需要删除,同时相应的optimizer op需要增加相应op_role_var,便于后续对grad进行allreduce merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
给op_role_var 添加allreduce 是PE的逻辑,自动并行不走PE, 是通过每个dist op 自己判断是否需要梯度同步 和同步的dp world (可能存在多个dp world 的情况)。op_role_var 在自动并行是不生效的(因为无法区分多个 dp world 的情况)。所以都不需要添加
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已将program中的op_role_var参数删除
_add_gm_op_role_var(new_grad_op, param, gradient_merge_var, | ||
cond_var_name) | ||
new_params_grads.append([param, gradient_merge_var]) | ||
return new_params_grads, param_to_gradient_merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to rename new_params_grads
to new_params_to_grads
as param_to_gradient_merge
to explicitly indicate a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Others
Describe
[Auto Parallel] add gradient merge pass
Refer to for the results of precision alignment : https://github.com/xymyeah/gradient_merge_precision_alignment