-
Notifications
You must be signed in to change notification settings - Fork 14.1k
[mlir][bufferization]-Replace only one use in TensorEmptyElimination #118958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mlir][bufferization]-Replace only one use in TensorEmptyElimination #118958
Conversation
@llvm/pr-subscribers-mlir @llvm/pr-subscribers-mlir-bufferization Author: Amir Bishara (amirBish) ChangesIn many cases the emptyTensorElimination can not transform or eliminate the empty tensor which is being inserted into the Two major reasons for that: 1- Failing when trying to find a legal/suitable insertion point for the 2-The EmptyTensorElimination transform replaces the tensor.empty's uses all at once in one apply, rather than replacing only the specific use which was visited in the use-def chain (when traversing from the tensor.insert_slice). This scenario of replacing all the uses of the tensor.empty may lead into additional read effects after bufferization of the specific subset extract/subview which should not be the case. Both cases may result in many copies in the coming bufferization which can not be canonicalized. The first case can be noticed when having a The second case can be noticed when having a This MR only adds the lit tests for the cases shown above (NFC), to emphasize how the transform works, in the coming MRs will upload a slight changes to handle these case. Full diff: https://github.com/llvm/llvm-project/pull/118958.diff 1 Files Affected:
diff --git a/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir b/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir
index efe59af97d9649..9d9bb443160465 100644
--- a/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir
+++ b/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir
@@ -365,3 +365,101 @@ func.func @multiple_materialize_in_destination_buffer(%m: memref<5xf32>, %f: f32
bufferization.materialize_in_destination %selected in restrict writable %m : (tensor<5xf32>, memref<5xf32>) -> ()
return
}
+
+// -----
+
+// `EmptyTensorElimination` fails to find a valid insertion
+// point for the new injected `SubsetExtraction`.
+// CHECK-LABEL: func.func @fail_to_eliminate_any_empty_tensors
+func.func @fail_to_eliminate_any_empty_tensors() -> tensor<5x6x128xf32> {
+ %cst_1 = arith.constant 1.0 : f32
+ %cst_2 = arith.constant 2.0 : f32
+ // CHECK: memref.alloc
+ // CHECK: memref.alloc
+ // CHECK: memref.alloc
+ %empty_1 = tensor.empty() : tensor<5x6x64xf32>
+ %res_1 = linalg.fill ins(%cst_1 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+ %empty_2 = tensor.empty() : tensor<5x6x64xf32>
+ %res_2 = linalg.fill ins(%cst_2 : f32) outs(%empty_2 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+ %cancatenated_empty = tensor.empty() : tensor<5x6x128xf32>
+ // CHECK: memref.copy
+ %inserted_slice_1 = tensor.insert_slice %res_1 into %cancatenated_empty[0, 0, 0][5, 6, 64][1, 1, 1]
+ : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+ %inserted_slice_2 = tensor.insert_slice %res_2 into %inserted_slice_1[0, 0, 64][5, 6, 64][1, 1, 1]
+ : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+ return %inserted_slice_2 : tensor<5x6x128xf32>
+}
+
+// -----
+
+// CHECK-LABEL: func.func @succeed_to_eliminate_one_empty_tensor
+func.func @succeed_to_eliminate_one_empty_tensor() -> tensor<5x6x128xf32> {
+ %cst_1 = arith.constant 1.0 : f32
+ %cst_2 = arith.constant 2.0 : f32
+ // CHECK: memref.alloc
+ // CHECK: memref.alloc
+ %cancatenated_empty = tensor.empty() : tensor<5x6x128xf32>
+ %empty_1 = tensor.empty() : tensor<5x6x64xf32>
+ %res_1 = linalg.fill ins(%cst_1 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+ %empty_2 = tensor.empty() : tensor<5x6x64xf32>
+ %res_2 = linalg.fill ins(%cst_2 : f32) outs(%empty_2 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+ // CHECK: memref.copy
+ %inserted_slice_1 = tensor.insert_slice %res_1 into %cancatenated_empty[0, 0, 0][5, 6, 64][1, 1, 1]
+ : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+ %inserted_slice_2 = tensor.insert_slice %res_2 into %inserted_slice_1[0, 0, 64][5, 6, 64][1, 1, 1]
+ : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+ return %inserted_slice_2 : tensor<5x6x128xf32>
+}
+
+// -----
+
+// `EmptyTensorElimination` replaces all of the uses of the tensor
+// empty with the new injected `SubsetExtraction`, without to consider
+// the specific use has been tracked, sometimes creating a non existent
+// bufferization conflicts.
+
+// CHECK-ELIM-LABEL: func.func @mutli_use_of_the_same_tensor_empty
+// CHECK-LABEL: func.func @mutli_use_of_the_same_tensor_empty
+func.func @mutli_use_of_the_same_tensor_empty() -> tensor<5x6x128xf32> {
+ %cst_1 = arith.constant 1.0 : f32
+ %cst_2 = arith.constant 2.0 : f32
+ %cancatenated_empty = tensor.empty() : tensor<5x6x128xf32>
+ %empty_1 = tensor.empty() : tensor<5x6x64xf32>
+ // CHECK-ELIM: %[[VAL_3:.*]] = tensor.extract_slice
+ // CHECK-ELIM: linalg.fill ins(%[[VAL_0:.*]] : f32) outs(%[[VAL_3]]
+ // CHECK-ELIM: linalg.fill ins(%[[VAL_1:.*]] : f32) outs(%[[VAL_3]]
+ %res_1 = linalg.fill ins(%cst_1 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+ %res_2 = linalg.fill ins(%cst_2 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+ // CHECK: memref.copy
+ %inserted_slice_1 = tensor.insert_slice %res_1 into %cancatenated_empty[0, 0, 0][5, 6, 64][1, 1, 1]
+ : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+ // CHECK: memref.copy
+ %inserted_slice_2 = tensor.insert_slice %res_2 into %inserted_slice_1[0, 0, 64][5, 6, 64][1, 1, 1]
+ : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+ return %inserted_slice_2 : tensor<5x6x128xf32>
+}
+
+// -----
+
+// CHECK-LABEL: func.func @mutli_use_of_the_same_tensor_empty_creates_non_existent_read
+func.func @mutli_use_of_the_same_tensor_empty_creates_non_existent_read(%arg1: tensor<5x6x128xf32> , %arg2: tensor<5x6x64xf32>)
+ -> (tensor<5x6x128xf32>, tensor<5x6x64xf32>) {
+ %cst_1 = arith.constant 1.0 : f32
+ %empty_1 = tensor.empty() : tensor<5x6x64xf32>
+ // CHECK: memref.alloc
+ %res_1 = linalg.fill ins(%cst_1 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+ %res_2 = linalg.generic{
+ indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>],
+ iterator_types = ["parallel", "parallel", "parallel"]
+ }
+ ins(%empty_1 : tensor<5x6x64xf32>)
+ outs(%arg2 :tensor<5x6x64xf32>) {
+ ^bb0(%in: f32, %out: f32):
+ %res = arith.addf %in, %in : f32
+ linalg.yield %res : f32
+ } -> tensor<5x6x64xf32>
+ // CHECK: memref.copy
+ %inserted_slice_1 = tensor.insert_slice %res_1 into %arg1[0, 0, 0][5, 6, 64][1, 1, 1]
+ : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+ return %inserted_slice_1, %res_2 : tensor<5x6x128xf32>, tensor<5x6x64xf32>
+}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
cc09909
to
ddd3230
Compare
mlir/lib/Dialect/Bufferization/Transforms/EmptyTensorElimination.cpp
Outdated
Show resolved
Hide resolved
ddd3230
to
e40d0ef
Compare
@matthias-springer ping, could you have additional look on the changes ? |
mlir/lib/Dialect/Bufferization/Transforms/EmptyTensorElimination.cpp
Outdated
Show resolved
Hide resolved
mlir/lib/Dialect/Bufferization/Transforms/EmptyTensorElimination.cpp
Outdated
Show resolved
Hide resolved
e40d0ef
to
85ea2c2
Compare
@matthias-springer , Thanks for the review and the feedback. I've just modified the PR to include only the changes of |
mlir/lib/Dialect/Bufferization/Transforms/EmptyTensorElimination.cpp
Outdated
Show resolved
Hide resolved
mlir/lib/Dialect/Bufferization/Transforms/EmptyTensorElimination.cpp
Outdated
Show resolved
Hide resolved
mlir/lib/Dialect/Bufferization/Transforms/EmptyTensorElimination.cpp
Outdated
Show resolved
Hide resolved
…rElimination In many cases the emptyTensorElimination can not transform or eliminate the empty tensor which is being inserted into the `SubsetInsertionOpInterface`. Two major reasons for that: 1- Failing when trying to find a legal/suitable insertion point for the `subsetExtract` which is about to replace the empty tensor. However, we may try to handle this issue by moving the needed values which responsible on building the `subsetExtract` nearby the empty tensor (which is about to be eliminated). Thus increasing the probability to find a legal insertion point. 2-The EmptyTensorElimination transform replaces the tensor.empty's uses all at once in one apply, rather than replacing only the specific use which was visited in the use-def chain (when traversing from the tensor.insert_slice). This scenario of replacing all the uses of the tensor.empty may lead into additional read effects after bufferization of the specific subset extract/subview which should not be the case. Both cases may result in many copies in the coming bufferization which can not be canonicalized. The first case can be noticed when having a `tensor.empty` followed by `SubsetInsertionOpInterface` (or in simple words `tensor.insert_slice`), which have been lowered from `tensor/tosa.concat`. The second case can be noticed when having a `tensor.empty`, with many uses and leading to applying the transformation only once, since the whole uses have been replaced at once. This MR only adds the lit tests for the cases shown above (NFC), to emphasize how the transform works, in the coming MRs will upload a slight changes to handle these cases in a more generic way.
85ea2c2
to
a430ad6
Compare
This MR hanldes the second case where we want to replace only the specific use which was visited in the `use-def` chain (when traversing from the tensor.insert_slice's source). This scenario of replacing all the uses of the tensor.empty may lead into additional read effects after bufferization of the specific subset extract/subview which should not be the case, Thus eliminating a potential copies.
a430ad6
to
11506c3
Compare
In many cases the emptyTensorElimination can not transform or eliminate the empty tensor which is being inserted into the
SubsetInsertionOpInterface
.Two major reasons for that:
1- Failing when trying to find a legal/suitable insertion point for the
subsetExtract
which is about to replace the empty tensor. However, we may try to handle this issue by moving the needed values which responsible on building thesubsetExtract
nearby the empty tensor (which is about to be eliminated). Thus increasing the probability to find a legal insertion point.2-The EmptyTensorElimination transform replaces the tensor.empty's uses all at once in one apply, rather than replacing only the specific use which was visited in the use-def chain (when traversing from the tensor.insert_slice). This scenario of replacing all the uses of the tensor.empty may lead into additional read effects after bufferization of the specific subset extract/subview which should not be the case.
Both cases may result in many copies in the coming bufferization which can not be canonicalized.
The first case can be noticed when having a
tensor.empty
followed bySubsetInsertionOpInterface
(or in simple wordstensor.insert_slice
), which have been lowered fromtensor/tosa.concat
.The second case can be noticed when having a
tensor.empty
, with many uses and leading to applying the transformation only once, since the whole uses have been replaced at once.The first commit in the PR only adds the lit tests for the cases shown above (NFC), to emphasize how the transform works, in the coming MRs will upload a slight changes to handle these case.
The second commit in this PR, we want to replace only the specific use which was visited in the
use-def
chain (when traversing from thetensor.insert_slice
's source).