Skip to content

Conversation

@tonyyang-svail
Copy link

No description provided.


```Python
def feed_value(variable, np_variable):
"""Overwrite feed_result[variable.name] with a numpy.array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feed_result and fetch_value will be serialized and passed from Python to C++, where are they serialized into? (i.e., are they part of ProgramDesc?)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This part must be redesigned.

Copy link
Member

@QiJune QiJune Oct 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an offline discussion with @helinwang.

  • Each paddle trainer will have a global Scope and two global Variables(maybe static variable in C++), feed_result and fetch_result. Python can not create C++ Variable.
  • What Feed Operator does is to take LodTensors from global Variables and copy to its output Variable.

I think that:

  • In distributed training, the training data is saved in a distributed file system. C++ feed_value method will load data from file. feed_value method will set data to Global Variable. And this method must be called before Executor::Run.
  • In local machine training, we can exposed feed_value to Python, and numpy array will be set into Global Varibale

Copy link
Contributor

@helinwang helinwang Oct 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QiJune

In distributed training, the training data is saved in a distributed file system. C++ feed_value method will load data from file.

feed_value is only for the argument feed_dict in session.eval. Which will come from the network. It will not come from the disk. We will have OP that reads data from the disk, but that's is irrelevant with feed_value.

In local machine training, we can exposed feed_value to Python, and numpy array will be set into Global Varibale.

Everything needs to be serialized, "numpy array will be set into Global Varibale" means the Python code is involved in the runtime (which conflicts with the current design).

I think the compile-time_runtime separation is very important. Let's have more discussion if you have other ideas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helinwang
Currently, we only take local machine as our execution environment. So, all the data can be from disk.
Let's make the whole training process work first. Python code is involved in the runtime breaks our design. But it only happens in feeding data or passing data to our training process. Most other training logic is fine with compile-time design.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QiJune sure, thanks for explaining! Could you create an issue for it and put into the TODO in the Github project?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@helinwang Yes, I have created a issue #4613. But we have not a Project now for supporting distributed training feature. Let's put it later.

Copy link
Collaborator

@reyoung reyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but since @helinwang has comments, maybe he will approve this PR.

@@ -0,0 +1,120 @@
# FeedOp and FetchOp Design Doc

### Motivation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a second level caption, should be prefixed by ## instead of ###.


### Challenge

1. During the runtime of a particular Op, it only knows which `Variable` to be read from and written to. It doesn't have a direct access to python object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python => Python


### Motivation

Python programer needs an interface to feed the data to PaddlePaddle, run the model, and fetch the result from it. Since PaddlePaddle runtime only goes through a graph of ops, we need to design corresponding Ops and add them to the graph.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

programmer needs => programmers need, or
A Python programmer needs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the data => data

# Run -------------------
while not converge:
# user loads data
np_data, np_label = load_input_data()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to use Python Reader API to load the data. This API doesn't split columns. Instead, it returns the mini-batch as a sequence of Python Tuples.

np_data, np_label = load_input_data()

# user defines the maping
my_feed_dict = {data: np_data, label: np_label}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant dict should be moved out of the loop. And it could be the form

dict = {"image":0, "label", 1}


```python
# Build the model -------------------
data = Variable(dim)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here a Variable class is not enough to present the idea I am afraid.

image = layer.data(column=dict{"image"})
label = layer.data(column=dict{"label"})

// Get Tensor reference in feed_result
string name = ctx.Output<Tensor>("Output")->name();
auto& var = GetScope()->GetVar("feed_result");
auto& input_tensor = var->Get<map<string, LoDTensor>>[name];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need to access an Attribute here named "column", so could we know which column of the mini-batch should be copied into the image variable.

// Get Tensor reference in feed_result
string name = ctx.Output<Tensor>("Output")->name();
auto& var = GetScope()->GetVar("feed_result");
auto& input_tensor = var->Get<map<string, LoDTensor>>[name];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map => vector

@tonyyang-svail
Copy link
Author

An implementation of feed and fetch is merged at #4815.
Close this pull request.

@tonyyang-svail
Copy link
Author

Moved this branch to tonyyang-svail/feed-op-desgin.

@tonyyang-svail tonyyang-svail deleted the tonyyang-svail-feed-op-desgin branch October 18, 2017 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants