Skip to content
Closed
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions doc/design/dynamic_rnn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Implementation Doc: Dynamic RNN

## A glance of Dynamic RNN

A common neural network structure called recurrent neural network(`RNN` for short), which there is a directed circle in the neural network model. RNN can use a internal memory to process arbitrary sequences of inputs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a internal memory --> an internal memory
process arbitrary sequences of inputs is not clear enough. Maybe you mean:
RNN can use an internal memory to process sequences with variable lengths.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


PaddlePaddle Fluid directly represents the `directed circle` in the `ProgramDesc`, since we do not use directed acyclic graph to represent our model. The `ProgramDesc` just like the AST of a programming language, which describes the computation instructions for training a neural network. We use arrays and a while loop to describe the training process of an RNN. The C++ code below demonstrates the forward logic of RNN which PaddlePaddle Fluid generates in `ProgramDesc`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

training process --> training/inference process
The C++ code below demonstrates the forward logic of RNN which PaddlePaddle Fluid generates in ProgramDesc --> The C++ code below demonstrates the forward logic of RNN generated in ProgramDesc of PaddlePaddle Fluid.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


```cpp
auto input = LoDTensor(...); // LoDTensor is the data structure for time series

std::vector<LoDTensor> inputs_for_each_timestep = LoDTensorToTimesteps(LoDTensor())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoDTensor() is input? If so, please use input instead.

std::vector<LoDTensor> memories;
memories.resize(inputs_for_each_timestep.size() + 1);
memories[0] = 0;
std::vector<LoDTensor> outputs_for_each_timestep;
outputs_for_each_timestep.resize(inputs_for_each_timestep.size());

auto W0 = LoDTensor(...);
auto W1 = LoDTensor(...);
auto Bias = LoDTensor(...);

size_t i = 0;
while (i < inputs_for_each_timestep.size()) {
auto& step_input = inputs_for_each_timestep[i];
auto& ex_mem = memories[i];

auto tmp0 = step_input * W0;
auto tmp1 = ex_mem * W1;
auto sum = tmp0 + tmp1 + Bias
auto hidden = sigmoid(sum);
memories[i+1] = sum;
outputs_for_each_timestep[i] = sum;
}

LoDTensor outputs = TimestepsToLoDTensor(outputs_for_each_timestep);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think memories is also an output, however the type is vector, need convert it to LoDTensor ?

```

The `Dynamic RNN` in PaddlePaddle Fluid is basically a syntax sugar to compose operators, such as `while`, `split_lod_tensor_to_timesteps`, `restore_lod_tensor_from_timesteps`.

The following of this document will be organized in several sections:

1. Control flow operators
1. Data manipulation operators of RNN.
2. Backward of RNN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. Data manipulation operators of RNN.
2. Backward of RNN.

-->

2. Data manipulation operators of RNN
3. Backward of RNN



## Control flow operators

### WhileOp

The primary control flow operator to implement dynamic RNN is `WhileOp`. The `WhileOp` takes a sub-block. The operators in the sub-block will be executed again and again while the condition is true.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WhileOp takes a sub-block. --> The WhileOp holds a sub-block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


#### Sub-Block
The fragment program of a while op and its sub-block is:

```text
program {
block {
idx: 0 # main block
parent_idx: -1 # -1 means no parent
ops: {
... # ops before while op

op {
inputs: ...,
outputs: ...,
type: "while",
attrs: {
attr {
name: 'sub_block',
type: 'BlockID',
value: 1 # the sub block id of this while op is 1
}
}
}
... # ops after while op
}
}

block {
idx: 1 # the sub_block of while_op
parent_idx: 0 # parent of while block is the main block
ops: {
... # ops inside while
}
}
}
```

#### inputs

The while operator has two kinds of inputs. They are

* Condition: A bool scalar. When it's False, the While Op will be terminated. Note that this scalar should always be in CPU memory.
* The condition variable is in the external block. However, it should be updated inside the sub-block of while op unless it is an endless loop. The condition variable will be an output variable of the while operator, too.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the external block you mean in the parent block ?
unless it is an endless loop. --> otherwise it would result to an endless loop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* X: The external inputs variables, which are required by operators inside the block of While Op.
* For example, if there is a hidden fully-connected layer in while operator. The input of the fully-connected layer is calculated by another operator inside the while operator. The input of this fully-connected layer is not the `external` inputs of the while operator. However, weight tensors of this fully-connected layer are external outputs of the while operator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input of the fully-connected layer is calculated by another operator inside the while operator. --> The input of the fully-connected layer is output of another operator inside the while operator.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done



#### outputs

* Output: The output variables. They are `assigned` or `push_back` by the operators inside the block of While Op.
* It is reasonable for `while operator` to `push_back` its output to an array because 1) the while operator is a loop. 2) the output in every timestep should not be overwritten since they will be used in backward.
* The condition and other control flow related operator, like `++i` or `i=0`, could be overwritten since they do not need when backwards. The backward control flow operator of `++i` is `--i`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they do not need when backwards --> they are not required in backward stage.
The backward control flow operator of ++i is --i. --> The corresponding control flow operator of ++i in backward stage is --i.

* The step-scopes. A vector of local scope, which size equals the step number of While Op. The i'th scope storages temporary variables generated in the i'th step.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which --> whose
equals --> equals to

* A potential optimization of `while operator` when inference is just maintaining one step of scope in while operator since there is no backward stage when inference.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of --> for


#### Implementation of the while operator

The implementation is quite simple. It is just a while loop in C++. The pseudocode is:


```cpp
auto global_scopes = ...
vector<Scope> step_scopes;
while(cond) {
auto cur_scope = global_scopes.NewScope()
step_scopes.push_back(cur_scope);
executor.Run(cur_scope, sub_block);
}
```

#### Backward of the while operator

The backward of the while operator will just execute the backward of its sub-block reversely. The gradient of while operator has a
File renamed without changes.
3 changes: 2 additions & 1 deletion paddle/operators/while_op.cc
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,8 @@ class WhileOpMaker : public framework::OpProtoAndCheckerMaker {
.AsDuplicable();
AddInput(
kCondition,
"(Bool) An scalar. When it's False, the While Op will be terminated.")
"(Bool) An scalar. When it's False, the While Op will be terminated."
" Note that this scalar should always be in CPU memory.")
.AsDuplicable();
AddOutput(kOutputs,
"A set of variables, which will be assigned with values "
Expand Down