-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Adding the implementation for rmsprop operator #4565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dzhwinter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Job! there are some small pieces need to fix.
paddle/operators/rmsprop_op.cc
Outdated
| : OpProtoAndCheckerMaker(proto, op_checker) { | ||
| AddInput("Param", "Input parameter"); | ||
| AddInput("Grad", "Input gradient"); | ||
| AddInput("Moment", "Second moment"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is a typo. It is the momentum. And, the comment is not helpful, we should make the comment self-explained, in format of (type):comment. e.g. (tensor): blabla
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, will fix this.
paddle/operators/rmsprop_op.cc
Outdated
| RMSprop | ||
|
|
||
| MomentOut = decayRate * Moment + (1 - decayRate) * Grad * Grad | ||
| ParamOut = Param - learningRate * Grad / (sqrt(MomentOut) + epsilon) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the Paddle old version, tensorflow, caffe2 had implemented rmsprop algorithm. They all follow the paper's formula parameter names, users used to use the same name between different version of our framework.
https://caffe2.ai/docs/operators-catalogue.html#rmsprop
tensorflow
paddle/operators/rmsprop_op.h
Outdated
| class RmspropOpKernel : public framework::OpKernel<T> { | ||
| public: | ||
| void Compute(const framework::ExecutionContext& ctx) const override { | ||
| auto param_out = ctx.Output<Tensor>("ParamOut"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here please use auto* since param_out is a pointer. auto keyword always hidden the real type, pointer can be more clear for users who read the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I just assumed for now that auto will resolve this by itself. But I see the point of make it more understandable for users. Will fix.
| param = np.random.random((123, 321)).astype("float32") | ||
| grad = np.random.random((123, 321)).astype("float32") | ||
| moment = np.zeros((123, 321)).astype("float32") | ||
| learning_rate = np.array([0.01]).astype("float32") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is 'learning_rate' not an attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Abhinav commented above, I think I will retain this as an input for now, given that we are doing the same in all other PRs too.
| http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf) | ||
| does not have the epsilon attribute. It is added here for numerical stability | ||
| to avoid division by zero. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd better use the same common used parameter name since it is a popular optimizer. User used to the same name between frameworks.
https://github.com/tensorflow/tensorflow/blob/994226a4a992c4a0205bca9e2f394cb644775ad7/tensorflow/core/ops/training_ops.cc#L1281
https://caffe2.ai/docs/operators-catalogue.html#rmsprop
dzhwinter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Adding the implementation for RMSprop:
MeanSquareOut = decay * MeanSquare + (1 - decay) * Grad * Grad
MomentOut = momentum * Moment + LearningRate * Grad / sqrt(MeanSquareOut + epsilon)
ParamOut = Param - MomentOut