-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Create feed_op_and_fectch_op Desgin Doc #4599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| ```Python | ||
| def feed_value(variable, np_variable): | ||
| """Overwrite feed_result[variable.name] with a numpy.array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feed_result and fetch_value will be serialized and passed from Python to C++, where are they serialized into? (i.e., are they part of ProgramDesc?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. This part must be redesigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an offline discussion with @helinwang.
- Each paddle trainer will have a global Scope and two global Variables(maybe static variable in C++),
feed_resultandfetch_result. Python can not create C++ Variable. - What Feed Operator does is to take LodTensors from global Variables and copy to its output Variable.
I think that:
- In distributed training, the training data is saved in a distributed file system. C++ feed_value method will load data from file. feed_value method will set data to Global Variable. And this method must be called before Executor::Run.
- In local machine training, we can exposed feed_value to Python, and numpy array will be set into Global Varibale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In distributed training, the training data is saved in a distributed file system. C++ feed_value method will load data from file.
feed_value is only for the argument feed_dict in session.eval. Which will come from the network. It will not come from the disk. We will have OP that reads data from the disk, but that's is irrelevant with feed_value.
In local machine training, we can exposed feed_value to Python, and numpy array will be set into Global Varibale.
Everything needs to be serialized, "numpy array will be set into Global Varibale" means the Python code is involved in the runtime (which conflicts with the current design).
I think the compile-time_runtime separation is very important. Let's have more discussion if you have other ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@helinwang
Currently, we only take local machine as our execution environment. So, all the data can be from disk.
Let's make the whole training process work first. Python code is involved in the runtime breaks our design. But it only happens in feeding data or passing data to our training process. Most other training logic is fine with compile-time design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@QiJune sure, thanks for explaining! Could you create an issue for it and put into the TODO in the Github project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@helinwang Yes, I have created a issue #4613. But we have not a Project now for supporting distributed training feature. Let's put it later.
reyoung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but since @helinwang has comments, maybe he will approve this PR.
| @@ -0,0 +1,120 @@ | |||
| # FeedOp and FetchOp Design Doc | |||
|
|
|||
| ### Motivation | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a second level caption, should be prefixed by ## instead of ###.
|
|
||
| ### Challenge | ||
|
|
||
| 1. During the runtime of a particular Op, it only knows which `Variable` to be read from and written to. It doesn't have a direct access to python object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python => Python
|
|
||
| ### Motivation | ||
|
|
||
| Python programer needs an interface to feed the data to PaddlePaddle, run the model, and fetch the result from it. Since PaddlePaddle runtime only goes through a graph of ops, we need to design corresponding Ops and add them to the graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
programmer needs => programmers need, or
A Python programmer needs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the data => data
| # Run ------------------- | ||
| while not converge: | ||
| # user loads data | ||
| np_data, np_label = load_input_data() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to use Python Reader API to load the data. This API doesn't split columns. Instead, it returns the mini-batch as a sequence of Python Tuples.
| np_data, np_label = load_input_data() | ||
|
|
||
| # user defines the maping | ||
| my_feed_dict = {data: np_data, label: np_label} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constant dict should be moved out of the loop. And it could be the form
dict = {"image":0, "label", 1}|
|
||
| ```python | ||
| # Build the model ------------------- | ||
| data = Variable(dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here a Variable class is not enough to present the idea I am afraid.
image = layer.data(column=dict{"image"})
label = layer.data(column=dict{"label"})| // Get Tensor reference in feed_result | ||
| string name = ctx.Output<Tensor>("Output")->name(); | ||
| auto& var = GetScope()->GetVar("feed_result"); | ||
| auto& input_tensor = var->Get<map<string, LoDTensor>>[name]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will need to access an Attribute here named "column", so could we know which column of the mini-batch should be copied into the image variable.
| // Get Tensor reference in feed_result | ||
| string name = ctx.Output<Tensor>("Output")->name(); | ||
| auto& var = GetScope()->GetVar("feed_result"); | ||
| auto& input_tensor = var->Get<map<string, LoDTensor>>[name]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
map => vector
|
An implementation of feed and fetch is merged at #4815. |
|
Moved this branch to tonyyang-svail/feed-op-desgin. |
No description provided.