-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Executor interface design and implementation #4537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
70663eb to
714148a
Compare
714148a to
d4be973
Compare
paddle/framework/executor.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one of the most important thing for the executor is Run should be thread-safe (e.g., ok to do concurrent Runs). This is a must for inferencing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. We must allow doing concurrent Runs in inference with only one copy of parameters in memory.
I am thinking whom and where to do parameters loading/saving. Our topology can be serialized to ProgramDesc, and what will the parameters serialized.
paddle/framework/executor.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need LinearListView since we already have GraphView?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LinearListVieworganizes the topology in a linearlist, and operators will be executed sequentially.GraphVieworganizes the topology in a Graph, and further optimization can be applied based on Graph structure.
I think that we can haveLinearListViewat now, maybeGraphViewcan be implemented later.
paddle/framework/executor.cc
Outdated
| class LinearListView; | ||
| class GraphView; | ||
|
|
||
| // Immutable view of a ProgramDesc organized for efficient execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been implemented by framework/op_desc.h
paddle/framework/executor.h
Outdated
| virtual void Run() = 0; | ||
| }; | ||
|
|
||
| Executor* NewLocalExecutor(const platform::Place&, const ProgramDesc&, bool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename and redesign NewLocalExecutor into
NewExecutor(const std::vector<Place>& places);- No need for the
bool optimizeparameter. ProgramDescis a parameter toExecutor::Run.
paddle/framework/executor.h
Outdated
|
|
||
| class Executor { | ||
| public: | ||
| virtual ~Executor() {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@helinwang has a suggestion -- given that the construction of an executor could be expensive -- including the creation of thread pools, it would be reasonable to reuse an executor to run multiple ProgramDesc's.
Therefore we need the constructor:
Executor(const std::vector<Place>& places);and the Run method:
virtual void Run(const ProgramDesc& program, Scope* scope);| scope->NewVar(var.name()); | ||
| } | ||
|
|
||
| Scope& local_scope = scope->NewScope(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we should drop local_scope after invoking Run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there is no easy way to do this. I will add a TODO on this.
| } | ||
| } | ||
| } | ||
| auto op = paddle::framework::OpRegistry::CreateOp(block.ops(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be extremely slow if we create operator every time because the protobuf message will be parsed and copied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the most straightforward implementation. (i.e. avoid any possible premature optimization). Once we get it running, we can go ahead to do profiling.
paddle/framework/executor.cc
Outdated
| } | ||
| } | ||
|
|
||
| std::vector<bool> Executor::Preprocess(const ProgramDesc& pdesc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preprocess is a bad name. Maybe just Prune is OK. Since there are many preprocessing stages not only Prune
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/operators/feed_op.cc
Outdated
| public: | ||
| FeedOpMaker(framework::OpProto* proto, framework::OpAttrChecker* op_checker) | ||
| : OpProtoAndCheckerMaker(proto, op_checker) { | ||
| AddAttr<int>("data_type", "output data type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure that all Input, Output and Attribute names adhere to the naming convention. https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/name_convention.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/operators/feed_op.cc
Outdated
|
|
||
| PADDLE_ENFORCE_GT(tensors.size(), static_cast<size_t>(col)); | ||
| auto in_dim = tensors[col].dims(); | ||
| ctx->SetOutputDim("Out", in_dim); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@QiJune should we use dims attribute to infer output shape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I fixed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also add enforce (*tensors)[col].numel == product(dims). Chances are a user specifies the wrong the col.
paddle/operators/fetch_op.cc
Outdated
|
|
||
| auto input_dim = ctx->GetInputDim("Input"); | ||
| PADDLE_ENFORCE_GT(tensors->size(), col); | ||
| (*tensors)[col].Resize(input_dim); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@QiJune Same for fetch op, inferShape according to dims attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then enforce (*tensors)[col].numel == product(dims)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tonyyang-svail
I have a discussion with @jacquesqiao, InferShape will be done in compile-time first. So, InferShape of FeedOp and FetchOp can not use run-time concepts, like GlobalScope.
FeedOp has an attribute dims to set its output tensor dims. And FecthOp does not need a dims attribute. dims of Tensors in fetch_result can be set from FetchOp's input tensor.
paddle/framework/executor.h
Outdated
| * @return | ||
| * vector<bool> Same size as ops. Indicates whether an op should be run. | ||
| */ | ||
| std::vector<bool> Prune(const ProgramDesc& pdesc, int block_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prune is a high level optimize part, which should be done before executor run. The executor takes a "no redundant op groups".
paddle/framework/executor.cc
Outdated
| std::vector<bool> should_run = Prune(pdesc, block_id); | ||
| PADDLE_ENFORCE_EQ(should_run.size(), block.ops_size()); | ||
| for (size_t i = 0; i < should_run.size(); ++i) { | ||
| // if (should_run[i]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No comment out code, please.
paddle/framework/executor.cc
Outdated
| PADDLE_ENFORCE_EQ(should_run.size(), block.ops_size()); | ||
| for (size_t i = 0; i < should_run.size(); ++i) { | ||
| // if (should_run[i]) { | ||
| if (true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to add a constant value and assigned with true. Otherwise, this would be a magic value.
paddle/framework/executor_test.cc
Outdated
| USE_OP(fill_constant); | ||
| USE_OP(sgd); | ||
|
|
||
| using std::string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems useless?
| }; | ||
|
|
||
| /* @Brief | ||
| * Pruning the graph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of Prune? I'd thought that it needs a target parameter, which could be either a variable or an operator, and it returns a new ProgramDesc that includes only dependent operators. But why doesn't the following code take a target parameter?
| explicit Executor(const std::vector<platform::Place>& places); | ||
| ~Executor(); | ||
|
|
||
| /* @Brief |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think C++ code is the document, and we don't really need to use Doxygen. Therefore, we can write much shorter comments. For this specific case Executor::Run, I don't even think that it needs a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am approving this PR so it doesn't last too long time. But please consider my comments in #4537 (comment).
In my mind,
-
ProgramDesc shouldn't carry targets because a program includes all the instructions supposed to be executed.
-
The Prune function's signature should be
int/bool Prune( const ProgramDesc* input, const std::vector<std::string>& targets, ProgramDesc* output);
Fix #4523 #4557
Timeline(@tonyyang-svail):
Oct 3rd, Tuesday
vector<Tensor> Executor.Run). ItProgramDescandScopeOct 4th, Wednesday
Oct 5th, Thursday
FeedOp(by @QiJune )FeedOptoProgramDesc, so that it will be ran first.FetchOp(by @QiJune )Prune. Itconst ProgramDesc& input(UseFeedOpto findfeed, and useFetchOpto findtarget.vector<bool>indicates a op should be run or no.Oct 6th, Friday
PruneOct 7th, Saturday
executor_test.ccRun()Preprocessing()Oct 8th, Sunday
PruneAdd Backward ProtoBuf. (Several bugs found on backward There should be a fill_one_like_op as an starting point of backward pass #4627. Fixed)AppendBackward. Switch toProgramDescBind.Oct 9th, Monday
Integrate InitOp into. (More discussion needed for where to put init op)ProgramDesc