PaddlePaddle
diff --git a/‎doc/design/graph.md‎
Lines changed: 21 additions & 2 deletions b/‎doc/design/graph.md‎
Lines changed: 21 additions & 2 deletions
diff --git a/‎doc/design/images/graph_construction_example.dot‎
Lines changed: 4 additions & 0 deletions b/‎doc/design/images/graph_construction_example.dot‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎doc/design/images/graph_construction_example_all.png‎
4.16 KB b/‎doc/design/images/graph_construction_example_all.png‎
4.16 KB
diff --git a/‎doc/design/images/graph_construction_example_forward_backward.png‎
4.06 KB b/‎doc/design/images/graph_construction_example_forward_backward.png‎
4.06 KB
diff --git a/‎doc/design/images/graph_construction_example_forward_only.png‎
3.01 KB b/‎doc/design/images/graph_construction_example_forward_only.png‎
3.01 KB
@@ -1,4 +1,4 @@
-# Design Doc: Computations as Graphs
+# Design Doc: Computations as a Graph
 
 A primary goal of the refactorization of PaddlePaddle is a more flexible representation of deep learning computation, in particular, a graph of operators and variables, instead of sequences of layers as before.
 
@@ -8,6 +8,8 @@ This document explains that the construction of a graph as three steps:
 - construct the backward part
 - construct the optimization part
 
+## The Construction of a Graph
+
 Let us take the problem of image classification as a simple example.  The application program that trains the model looks like:
 
 ```python
@@ -25,7 +27,9 @@ The first four lines of above program build the forward part of the graph.
 
 ![](images/graph_construction_example_forward_only.png)
 
-In particular, the first line `x = layer.data("images")` creates variable x and a Feed operator that copies a column from the minibatch to x.  `y = layer.fc(x)` creates not only the FC operator and output variable y, but also two parameters, W and b.
+In particular, the first line `x = layer.data("images")` creates variable x and a Feed operator that copies a column from the minibatch to x.  `y = layer.fc(x)` creates not only the FC operator and output variable y, but also two parameters, W and b, and the initialization operators.
+
+Initialization operators are kind of "run-once" operators -- the `Run` method increments a class data member counter so to run at most once.  By doing so, a parameter wouldn't be initialized repeatedly, say, in every minibatch.
 
 In this example, all operators are created as `OpDesc` protobuf messages, and all variables are `VarDesc`.  These protobuf messages are saved in a `BlockDesc` protobuf message.
 
@@ -49,3 +53,18 @@ According to the chain rule of gradient computation, `ConstructBackwardGraph` wo
 For each parameter, like W and b created by `layer.fc`, marked as double circles in above graphs, `ConstructOptimizationGraph` creates an optimization operator to apply its gradient.  Here results in the complete graph:
 
 ![](images/graph_construction_example_all.png)
+
+## Block and Graph
+
+The word block and graph are interchangable in the desgin of PaddlePaddle.  A [Block[(https://github.com/PaddlePaddle/Paddle/pull/3708) is a metaphore of the code and local variables in a pair of curly braces in programming languages, where operators are like statements or instructions.  A graph of operators and variables is a representation of the block.
+
+A Block keeps operators in an array `BlockDesc::ops`
+
+```protobuf
+message BlockDesc {
+  repeated OpDesc ops = 1;
+  repeated VarDesc vars = 2;
+}
+```
+
+in the order that there appear in user programs, like the Python program at the beginning of this article.  We can imagine that in `ops`,  we have some forward operators, followed by some gradient operators, and then some optimization operators.
@@ -2,6 +2,8 @@ digraph ImageClassificationGraph {
         ///////// The forward part /////////
         FeedX [label="Feed", color=blue, shape=box];
         FeedY [label="Feed", color=blue, shape=box];
+        InitW [label="Init", color=blue, shape=diamond];
+        Initb [label="Init", color=blue, shape=diamond];
         FC [label="FC", color=blue, shape=box];
         MSE [label="MSE", color=blue, shape=box];
 
@@ -14,6 +16,8 @@ digraph ImageClassificationGraph {
 
         FeedX -> x -> FC -> y -> MSE -> cost [color=blue];
         FeedY -> l [color=blue];
+        InitW -> W [color=blue];
+        Initb -> b [color=blue];
         W -> FC [color=blue];
         b -> FC [color=blue];
         l -> MSE [color=blue];