-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Description
This part of backward.md needs refine:
When implementing a specific op, the developer is also asked to implement its backward version, called grad_op. A grad_op takes gradients of its corresponding op's outputs, and calculate gradients of the op's inputs. During the building of a model's backward part, the framework creates each forward op's grad_op, and then string them together in reverse order of forwarding part. In this way, gradients spread from the end to the beginning of the model, in another word, from the loss to parameters.
@reyoung 's commit:
grad_opis not the backward version of the forward operator.
We maintain a mapping between an operator and operators that will produce its gradient. It is not aone-to-onemapping. Operators inbackward stage(I do not think there should be abackward stage.However, to make it easy to understand, I just assume operators in Block can be split into two stages) can be used in forward stage.
See #7123 (comment)