-
Notifications
You must be signed in to change notification settings - Fork 5.9k
design doc for implementation parameters in CPP. #2249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| # Parameters in CPP | ||
|
|
||
| `Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md). | ||
|
|
||
| We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation: | ||
| * We just use `memcpy` to share Parameters between topologies, but this is very inefficient. | ||
| * We did not implement share Parameters while training. We just trigger `memcpy` when start training. | ||
|
|
||
| It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with `Parameters`: | ||
|
|
||
| 1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`. | ||
| It is evident that we should use `paddle::Parameter` when developing `Parameters`. | ||
| However, the `Parameter` class contains many functions and does not have a clear interface. | ||
| It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`. | ||
| When we developing `Parameters`, we only use `create/store Parameter` functionality. | ||
| We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation. | ||
|
|
||
| 2. `paddle::GradientMachine` and its sub-classes, e.g., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`. | ||
| We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies. | ||
| Also, we should handle multi-GPU/CPU training, because `forward` and `backward` would perform on multi-GPUs and multi-CPUs. | ||
| `Parameters` should dispatch the parameter value to each device, and gather the parameter gradient from each device. | ||
|
|
||
| 3. `paddle::ParameterUpdater`. The ParameterUpdater is used to update parameters in Paddle. | ||
| So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD). | ||
|
|
||
|
|
||
| The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one. | ||
|
|
||
| 1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters. | ||
|
|
||
| 2. Implementation a `Parameters` class. It just stores the `paddle::Parameter` inside. Make `GradientMachine` uses `Parameters` as a class member. | ||
|
|
||
| 3. Make `Parameters` support Multi-CPU and Multi-GPU training to prepare for sharing `Parameter` between topologies. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it that 如果
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MultiGradientMachine can support only one topology while training. But Another reason I want to extract 原因有如下几点: // ParameterExchanger负责参数交换的全部逻辑
auto exchanger = new ParameterExchanger(parameters, used_parameter_names);
exchanger.exchange();2、另一个想要把参数交换逻辑提取出来的原因是,MultiGradientMachine是一个非常重的类,揉和了多个功能。例如多设备的计算,参数聚合分发,同步逻辑等等。如果我们在写Parameters的时候,把参数聚合逻辑分解出来,会让代码逻辑变得更清晰。
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe global function is better.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. Added this part into design doc. |
||
| Because we need share `Parameters` between topologies, it is `Parameters`'s response to exchange Parameters between GPUs. | ||
| `GradientMachine` should not handle how to exchange Parameters because `GradientMachine` only used to train one topology and we need to support train many topologies in Paddle, i.e., there could be many GradientMachines use one `Parameters`. | ||
|
|
||
| 4. Make `Parameters` as an argument for `forward/backward` function, not a data member for `GradientMachine`. For example, `forward` could be `forward(const Parameters& params, ...)` and `backward` could be `backward(Parameters* params, ...)`. After this step, Paddle could share `Parameters` between topologies. | ||
|
|
||
| 5. `ParameterUpdater` is invoked by `GradientMachine` and `Trainer`, but it updates `Parameters`. In the end of this code refactoring, we could change `ParameterUpdater` directly uses `Parameters` to make `ParameterUpdater`'s implementation clear. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPP => C++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The title should be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.