Update graph construction design doc #3862

wangkuiyi · 2017-09-04T22:05:47Z

jacquesqiao · 2017-09-04T23:26:37Z

doc/design/graph.md

-In particular, the first line `x = layer.data("images")` creates variable x and a Feed operator that copies a column from the minibatch to x.  `y = layer.fc(x)` creates not only the FC operator and output variable y, but also two parameters, W and b.
+In particular, the first line `x = layer.data("images")` creates variable x and a Feed operator that copies a column from the minibatch to x.  `y = layer.fc(x)` creates not only the FC operator and output variable y, but also two parameters, W and b, and the initialization operators.
+
+Initialization operators are kind of "run-once" operators -- the `Run` method increments a class data member counter so to run at most once.  By doing so, a parameter wouldn't be initialized repeatedly, say, in every minibatch.


run-once maybe not a very good choice. Because sometimes the user may want to reinitialize the params. Maybe we should think out some better way to do it.

Agreed. It would be great if we can have another solution. How about we keep the run-once operator as a viable solution right now, and update it later after we got a better idea?

#3862 (comment) could solve it :)

helinwang · 2017-09-04T22:10:06Z

doc/design/graph.md

 - construct the backward part
 - construct the optimization part

+## The Construction of a Graph


Sorry, have some comment for the part that is not from this PR:

optimize(cost) train(cost, reader=mnist.train())

I think train should use the var returned by optimizer as argument, not cost. For example if two optimizer is connected with the cost, only specifying the cost the engine would have confusion of with optimizer to run.

I think the training needs 1) the cost, and 2) the parameter to be optimized to minimize the cost.

The cost is specified in the invocation to train.

Parameters could be created by a layer function like layer.fc, or the user via W = paddle.Var(type=parameter, ...). Anyway, they are marked parameters and can be updated.

So both cost and parameter are known prior to training. What do you think about this approach?

the training needs 1) the cost, and 2) the parameter to be optimized to minimize the cost.

I think it need the optimizer as well (Adam or Adagrad).
For example, it user do something like:

opt0 = pd.Adam(cost) opt1 = pd.Adagrad(cost) train(cost, reader=mnist.train())

What optimizer will Paddle use for training? Maybe the code below is more concise:

opt0 = pd.Adam(cost) opt1 = pd.Adagrad(cost) train(opt1, reader=mnist.train())

However, I just realized the Python code you wrote is perhaps the V2 API, which maybe only allow one optimizer to be connected with the cost.

Yes. What I mean is that we can have two forms of Block::Eval:

One accepts targets of type Variables:

void Block::Eval(vector<Variable*> targets);

which is used to do forward computation. It traces only operators in BlockDesc::ops before targets.

Forward computation: Because our Python API doesn't expose gradient variables to users, targets have to be forward variables, so this form of Block::Eval works only with forward computation.

Backward computation: In the C++ world, Block::Eval can accept gradient variables as its targets. We can create a Python API function, say backward, which calls Block::Eval with gradient variables to do the backward computation.

The other form of Block::Eval accepts targets as operators:

void Block::Eval(vector<Operator*> targets);

Somewhere in the C++ world, we can enumerate all optimization operators and use them as the target, so could we run the optimization step.

helinwang · 2017-09-04T23:47:47Z

doc/design/graph.md


 For each parameter, like W and b created by `layer.fc`, marked as double circles in above graphs, `ConstructOptimizationGraph` creates an optimization operator to apply its gradient.  Here results in the complete graph:

 ![](images/graph_construction_example_all.png)


I think we should call A depends on B only if in every step running A requires running B. For example, we probably should not call "MSE" depends on "init" (however, according to the dependency chain, currently "MSE" depends on "init" in the graph). Otherwise we need to come up a way to let "init" only run once while doing training.

In my opinion we need two kinds of directed edges. One for dependency, one for data flow. And maybe for discussion we don't need to draw the intermediate variable. In the graph below the dotted line is data flow, the full line is dependency. In this representation, there is no cycle in the graph, and "MSE" no longer depends on "init".

User can call "init all" to do initialize, and call training later (which does not do init again, since there is no dependency).

Got it. I love this idea and the figure! I agree that there are two kinds of dependencies -- the data dependency and the execution dependency. Currently, we treat them as the same and represent them by the order of operators in array repeated OpDesc ops in protobuf message BlockDesc.

I am not sure if it is necessary to explicitly describe these two kinds of dependencies in our protobuf messages. A reason is that I am not sure what InitAll is -- is it a Var like those returned by operator binding functions, or is it an operator?

Sorry, I should have make "init all" more clear. It's an OP that joins / merges all the dependency: it will run when all its dependencies are done, it's does nothing itself (only used to join the dependency). Maybe we can call it join or merge.

The reason behind why we need to explicitly describe these two kinds of dependencies is: the PaddlePaddle scheduler only need to schedule OP to run according to the dependency constraint (data flow is no longer a scheduling constraint). For example, in this case, even though "init" writes to var "B" (data flows from "init" to "B"), var "B" no longer depends on "init", so when doing optimization, "init" will not be scheduled.

Another solution is TF's solution: there is no type var in the graph. A graph only has OP, and every directed edge is a tensor rather than var. A var is represented by a "var OP", which only have output (output the handle for read / write), but no input:

Record the temporary conclusions from offline discussions:

TensorFlow's graph representation embeds variables into operators, and

requires users specify input, output, and dependent operators for each operator.

The specification of dependencies looks ugly. So let's follow our current design of using variables and operators.

jacquesqiao

LGTM! Run-once is a kind of solution of parameter initialization. Maybe there are some other ways. We can change this design doc when finding a better solution.

Yi Wang added 2 commits September 4, 2017 14:57

Add initialization operators

29fa887

Add initialization operators

a266a22

wangkuiyi requested a review from jacquesqiao September 4, 2017 22:05

jacquesqiao reviewed Sep 4, 2017

View reviewed changes

helinwang reviewed Sep 4, 2017

View reviewed changes

jacquesqiao approved these changes Sep 6, 2017

View reviewed changes

wangkuiyi merged commit 097d0fe into PaddlePaddle:develop Sep 6, 2017

heavengate pushed a commit to heavengate/Paddle that referenced this pull request Aug 16, 2021

fix doc, test=document_fix (PaddlePaddle#3862)

d6c8947


		For each parameter, like W and b created by `layer.fc`, marked as double circles in above graphs, `ConstructOptimizationGraph` creates an optimization operator to apply its gradient. Here results in the complete graph:

		![](images/graph_construction_example_all.png)

Update graph construction design doc #3862

Update graph construction design doc #3862

Uh oh!

Conversation

wangkuiyi commented Sep 4, 2017

Uh oh!

jacquesqiao Sep 4, 2017

Choose a reason for hiding this comment

Uh oh!

wangkuiyi Sep 5, 2017

Choose a reason for hiding this comment

Uh oh!

helinwang Sep 5, 2017

Choose a reason for hiding this comment

Uh oh!

helinwang Sep 4, 2017

Choose a reason for hiding this comment

Uh oh!

wangkuiyi Sep 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

helinwang Sep 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangkuiyi Sep 6, 2017

Choose a reason for hiding this comment

Uh oh!

helinwang Sep 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangkuiyi Sep 5, 2017

Choose a reason for hiding this comment

Uh oh!

helinwang Sep 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

helinwang Sep 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangkuiyi Sep 6, 2017

Choose a reason for hiding this comment

Uh oh!

jacquesqiao left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wangkuiyi Sep 5, 2017 •

edited

Loading

helinwang Sep 5, 2017 •

edited

Loading

helinwang Sep 4, 2017 •

edited

Loading

helinwang Sep 5, 2017 •

edited

Loading

helinwang Sep 5, 2017 •

edited

Loading