-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Add sequence concat op #4508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sequence concat op #4508
Conversation
| @@ -0,0 +1,25 @@ | |||
| /* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename all the source files into "sequence_concat_op.xx".
Currently all of our operator source files use lower case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| const size_t level = static_cast<size_t>(ctx->Attrs().Get<int>("level")); | ||
| const size_t axis = static_cast<size_t>(ctx->Attrs().Get<int>("axis")); | ||
| PADDLE_ENFORCE(level == 0UL || level == 1UL, | ||
| "Sequence Concat Op only support one or two sequence now."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sequence_concat operator only accepts sequence or a nested sequence as its input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| auto ins_dims = ctx->GetInputsDim("X"); | ||
| framework::DDim out_dims = ins_dims[0]; | ||
| const size_t n = ins_dims.size(); | ||
| for (size_t i = 1; i < n; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer ++i in a loop (actually there is no difference for an integer value), you can check this https://google.github.io/styleguide/cppguide.html#Preincrement_and_Predecrement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| : OpProtoAndCheckerMaker(proto, op_checker) { | ||
| AddInput("X", | ||
| "Multip LodTensors, the variable-length inputs of " | ||
| "SequenceConcatOp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the: (type, default value) usage style, this is recommended in our doc https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/operators/name_convention.md
I think fc_op has a very good comment style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| "If level is 1, the inputs will be joined with sentence.") | ||
| .SetDefault(0); | ||
| AddComment(R"DOC( | ||
| SequenceConcatOp concat multip LodTensors and only supports one or two levels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The sequence_concat operator concatenates multiple LodTensors. It only supports sequences ( LoD Tensor with level=1) or nested sequences (LoD tensor with level=0) as its inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| axis is 1, level is 1, the Lod of Inputs are the same, | ||
| LoD(x0) = {{0,2,4},{0,1,2,3,4}}; Dims(x0) = (2,3,4) | ||
| LoD(x1) = {{0,2,4},{0,1,2,3,4}}; Dims(x1) = (2,4,4) | ||
| LoD(Out) = {{0,2,4},{01,2,3,4}}; Dims(Out) = (2,7,4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{01,2,3,4}} --> {0,1,2,3,4}}
Is there a comma missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, sorry for the low-level mistake, done.
| LoD(x1) = {{0,2,4},{0,1,2,3,4}}; Dims(x1) = (2,4,4) | ||
| LoD(Out) = {{0,2,4},{01,2,3,4}}; Dims(Out) = (2,7,4) | ||
| - Case2: | ||
| If axis is 0, level is 1, the Lod of inputs are different, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the axis
add an article before the noun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| AddComment(R"DOC( | ||
| SequenceConcatOp concat multip LodTensors and only supports one or two levels. | ||
| - Case1: | ||
| axis is 1, level is 1, the Lod of Inputs are the same, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aixs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| If axis is 0, level is 1, the Lod of inputs are different, | ||
| LoD(x0) = {{0,2,4}, {0,1,2,3,4}}; Dims(x0) = (2,3,4) | ||
| LoD(x1) = {{0,3,5}, {0,1,3,4,5}}; Dims(x1) = (3,3,4) | ||
| LoD(Out) = {{0,5,9}, {0,1,2,4,5,6,7,8,9}}; Dims(Out) = (5,3,4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoD or Lod?
I prefer to keep consistent in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add NOTE into the doc, as I understand, the level of all the inputs should be the same (if I am right).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| const size_t n = ins.size(); | ||
| if (axis == 0UL) { | ||
| if (level == 0) { | ||
| for (size_t i = 1; i < n; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer ++i, though there is no difference for integer value ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| protected: | ||
| void InferShape(framework::InferShapeContextBase* ctx) const override { | ||
| PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")), | ||
| "Gradient of Out should not be null."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gradient of Out should not be null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| framework::OpAttrChecker* op_checker) | ||
| : OpProtoAndCheckerMaker(proto, op_checker) { | ||
| AddInput("X", | ||
| "Multip LodTensors, the variable-length inputs of " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- set X
AddInput(……).NotInGradient() - The input multiple LoDTensors, which are variable-lengh sequence or nested sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set X AddInput(……).NotInGradient()
Maybe we also need the LoD of X in the Gradient Kernel to calculate the LoD of Grad(Out).
| If axis is 0, level is 1, the Lod of inputs are different, | ||
| LoD(x0) = {{0,2,4}, {0,1,2,3,4}}; Dims(x0) = (2,3,4) | ||
| LoD(x1) = {{0,3,5}, {0,1,3,4,5}}; Dims(x1) = (3,3,4) | ||
| LoD(Out) = {{0,5,9}, {0,1,2,4,5,6,7,8,9}}; Dims(Out) = (5,3,4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add NOTE into the doc, as I understand, the level of all the inputs should be the same (if I am right).
| for (size_t i = 1; i < n; i++) { | ||
| PADDLE_ENFORCE_EQ(ins[i]->NumLevels(), 2UL, | ||
| "All the LoDTensors of Inputs(X) should " | ||
| "have two level."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, it seems that there is no check to guarantee that in the concatenation, except dimension of the axis along which to concatenate, dimensions in all the other axis must be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right and I added some dimension check.
Done.
| } | ||
|
|
||
| template <typename Place, typename T> | ||
| class SequenceConcatOpKernel : public framework::OpKernel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that in the latest dev branch, all the OpKernel should inherit from framework::OpKernel<T> to support different computational accuracies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| }; | ||
|
|
||
| template <typename Place, typename T> | ||
| class SequenceConcatGradOpKernel : public framework::OpKernel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that in the latest dev branch, all the OpKernel should inherit from framework::OpKernel to support different computational accuracies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| : OpProtoAndCheckerMaker(proto, op_checker) { | ||
| AddInput("X", | ||
| "The input Multip LoDTensors, which are variable-length " | ||
| "sequence or nested sequence.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddInput("X",
"(A vector of LoDTensor), the input is a vector of LoDTensor, "
"each of which is a variable-length sequence or nested sequence.")There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| .AsDuplicable(); | ||
| AddOutput("Out", | ||
| "A LoDTensor, the variable-length output of " | ||
| "sequence_concat Op."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddOutput("Out",
"(A LoDTensor), the variable-length output of "
"sequence_concat Op.");There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| .SetDefault(0); | ||
| AddAttr<int>("level", | ||
| "(int, default 0)" | ||
| "The level which the inputs will be joined with." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The level at which the inputs will be joined."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| It only supports sequences ( LoD Tensor with level=1) | ||
| or nested sequences (LoD tensor with level=0) as its inputs. | ||
| - Case1: | ||
| If the axis is 1, level is 1, the LoD of Inputs are the same, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Inputs --> input
- If the
axisis other than 0, (here,axisis 1 andlevelis 1), each input should have the same LoD information and the LoD information of the output keeps the same as the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| "(int, default 0)" | ||
| "The level which the inputs will be joined with." | ||
| "If level is 0, the inputs will be joined with " | ||
| "nested sequences." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the level is 0, the inputs will be joined at the nested sequence level, which
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| LoD(x1) = {{0,2,4},{0,1,2,3,4}}; Dims(x1) = (2,4,4) | ||
| LoD(Out) = {{0,2,4},{0,1,2,3,4}}; Dims(Out) = (2,7,4) | ||
| - Case2: | ||
| If the axis is 0, level is 1, the LoD of inputs are different, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If axis is 0 (here, level is 1) the inputs are concatenated along time steps, the LoD information of the output need to re-compute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| PADDLE_ENFORCE(ctx->HasInput(framework::GradVarName("Out")), | ||
| "The gradient of Out should not be null."); | ||
| PADDLE_ENFORCE(ctx->HasOutputs(framework::GradVarName("X")), | ||
| "The gradient of X should not be empty."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- " should not be empty." --> " should not be null.". Keep the output information the same as check in line 95, since I think these two checks are for almost for the same purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| // the Output LoD will be modified as followed: | ||
| // LoD(x0) = {{0,2,4}, {0,1,2,3,4}} | ||
| // LoD(x1) = {{0,3,5}, {0,1,3,4,5}} | ||
| // LoD(Out) = {{0,5,9}, {0,1,2,4,5,6,7,8,9}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think line 26 ~ line 38 can be deleted because the examples in the operator comments explain the same logic well too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| PADDLE_ENFORCE_EQ(ins[0]->dims()[j], ins[i]->dims()[j], | ||
| "The dimensions of all the input LoDTensors " | ||
| "except for the specify axis should be " | ||
| "matched exactly."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for the dimension of the specified axis along which all the inputs are concatenated, dimensions of all the other axises of the input LoDTensors should be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| x1 = np.random.random((11, 6, 3)).astype('float32') | ||
| lod1 = [[0, 2, 5, 11], [0, 1, 2, 5, 7, 11]] | ||
| axis = 0 | ||
| level = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Only my personal question, I found in the comment and the unittest, only
level=1is tested. It is still hard to make others understand how the attributelevelworks. - Is it necessary to test
level=0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Add a unit test with level=0.
|
Sorry, my understanding is not correct. The attribute |
| auto out_lod = ins[0]->lod(); | ||
| const size_t n = ins.size(); | ||
| if (axis == 0UL) { | ||
| if (level == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
level == 0UL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| out_lod[0][j] += ins[i]->lod()[0][j]; | ||
| } | ||
| } | ||
| } else if (level == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
level == 1UL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| } else if (level == 1) { | ||
| PADDLE_ENFORCE_EQ(ins[0]->NumLevels(), 2UL, | ||
| "If the level is 1, all of the inputs " | ||
| "should be the the nested sequence."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the the--> an extrathe- nested sequence --> nested sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| for (size_t i = 1; i < n; ++i) { | ||
| out_dims[axis] += ins_dims[i][axis]; | ||
| } | ||
| ctx->SetOutputDim("Out", out_dims); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
之前设计框架时讨论,InferShape里是要能够推断出完成的Shape信息,所以下面LoD的check, set_lod, concatLoD实现可能需要移到这里。 @reyoung
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
赞同在InferShape里推断出所有Shape信息,但现在的接口貌似还没有获取LoD的接口?
https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/shape_inference.h#L25
| information of the output keeps the same as the input. | ||
| LoD(x0) = {{0,2,4}, {0,1,2,3,4}}; Dims(x0) = (4,3,4) | ||
| LoD(x1) = {{0,2,4}, {0,1,2,3,4}}; Dims(x1) = (4,4,4) | ||
| LoD(Out) = {{0,2,4}, {0,1,2,3,4}}; Dims(Out) = (4,7,4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 78前,line 80后分别空一行吧,下同。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
lcy-seso
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Fixed #4226