Skip to content

[mlir][bufferization]-Replace only one use in TensorEmptyElimination #118958

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

amirBish
Copy link
Contributor

@amirBish amirBish commented Dec 6, 2024

In many cases the emptyTensorElimination can not transform or eliminate the empty tensor which is being inserted into the SubsetInsertionOpInterface.

Two major reasons for that:

1- Failing when trying to find a legal/suitable insertion point for the subsetExtract which is about to replace the empty tensor. However, we may try to handle this issue by moving the needed values which responsible on building the subsetExtract nearby the empty tensor (which is about to be eliminated). Thus increasing the probability to find a legal insertion point.

2-The EmptyTensorElimination transform replaces the tensor.empty's uses all at once in one apply, rather than replacing only the specific use which was visited in the use-def chain (when traversing from the tensor.insert_slice). This scenario of replacing all the uses of the tensor.empty may lead into additional read effects after bufferization of the specific subset extract/subview which should not be the case.

Both cases may result in many copies in the coming bufferization which can not be canonicalized.

The first case can be noticed when having a tensor.empty followed by SubsetInsertionOpInterface (or in simple words tensor.insert_slice), which have been lowered from tensor/tosa.concat.

The second case can be noticed when having a tensor.empty, with many uses and leading to applying the transformation only once, since the whole uses have been replaced at once.

The first commit in the PR only adds the lit tests for the cases shown above (NFC), to emphasize how the transform works, in the coming MRs will upload a slight changes to handle these case.

The second commit in this PR, we want to replace only the specific use which was visited in the use-def chain (when traversing from the tensor.insert_slice's source).

@llvmbot llvmbot added mlir mlir:bufferization Bufferization infrastructure labels Dec 6, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 6, 2024

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-bufferization

Author: Amir Bishara (amirBish)

Changes

In many cases the emptyTensorElimination can not transform or eliminate the empty tensor which is being inserted into the SubsetInsertionOpInterface.

Two major reasons for that:

1- Failing when trying to find a legal/suitable insertion point for the subsetExtract which is about to replace the empty tensor. However, we may try to handle this issue by moving the needed values which responsible on building the subsetExtract nearby the empty tensor (which is about to be eliminated). Thus increasing the probability to find a legal insertion point.

2-The EmptyTensorElimination transform replaces the tensor.empty's uses all at once in one apply, rather than replacing only the specific use which was visited in the use-def chain (when traversing from the tensor.insert_slice). This scenario of replacing all the uses of the tensor.empty may lead into additional read effects after bufferization of the specific subset extract/subview which should not be the case.

Both cases may result in many copies in the coming bufferization which can not be canonicalized.

The first case can be noticed when having a tensor.empty followed by SubsetInsertionOpInterface (or in simple words tensor.insert_slice), which have been lowered from tensor/tosa.concat.

The second case can be noticed when having a tensor.empty, with many uses and leading to applying the transformation only once, since the whole uses have been replaced at once.

This MR only adds the lit tests for the cases shown above (NFC), to emphasize how the transform works, in the coming MRs will upload a slight changes to handle these case.


Full diff: https://github.com/llvm/llvm-project/pull/118958.diff

1 Files Affected:

  • (modified) mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir (+98)
diff --git a/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir b/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir
index efe59af97d9649..9d9bb443160465 100644
--- a/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir
+++ b/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir
@@ -365,3 +365,101 @@ func.func @multiple_materialize_in_destination_buffer(%m: memref<5xf32>, %f: f32
   bufferization.materialize_in_destination %selected in restrict writable %m : (tensor<5xf32>, memref<5xf32>) -> ()
   return
 }
+
+// -----
+
+// `EmptyTensorElimination` fails to find a valid insertion
+// point for the new injected `SubsetExtraction`.
+// CHECK-LABEL:   func.func @fail_to_eliminate_any_empty_tensors
+func.func @fail_to_eliminate_any_empty_tensors() -> tensor<5x6x128xf32> {
+  %cst_1 = arith.constant 1.0 : f32
+  %cst_2 = arith.constant 2.0 : f32
+  // CHECK: memref.alloc
+  // CHECK: memref.alloc
+  // CHECK: memref.alloc
+  %empty_1 = tensor.empty() : tensor<5x6x64xf32>
+  %res_1 = linalg.fill ins(%cst_1 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+  %empty_2 = tensor.empty() : tensor<5x6x64xf32>
+  %res_2 = linalg.fill ins(%cst_2 : f32) outs(%empty_2 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+  %cancatenated_empty = tensor.empty() : tensor<5x6x128xf32>
+  // CHECK: memref.copy
+  %inserted_slice_1 = tensor.insert_slice %res_1 into %cancatenated_empty[0, 0, 0][5, 6, 64][1, 1, 1]
+      : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+  %inserted_slice_2 = tensor.insert_slice %res_2 into %inserted_slice_1[0, 0, 64][5, 6, 64][1, 1, 1]
+      : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+  return %inserted_slice_2 : tensor<5x6x128xf32>
+}
+
+// -----
+
+// CHECK-LABEL:   func.func @succeed_to_eliminate_one_empty_tensor
+func.func @succeed_to_eliminate_one_empty_tensor() -> tensor<5x6x128xf32> {
+  %cst_1 = arith.constant 1.0 : f32
+  %cst_2 = arith.constant 2.0 : f32
+  // CHECK: memref.alloc
+  // CHECK: memref.alloc
+  %cancatenated_empty = tensor.empty() : tensor<5x6x128xf32>
+  %empty_1 = tensor.empty() : tensor<5x6x64xf32>
+  %res_1 = linalg.fill ins(%cst_1 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+  %empty_2 = tensor.empty() : tensor<5x6x64xf32>
+  %res_2 = linalg.fill ins(%cst_2 : f32) outs(%empty_2 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+  // CHECK: memref.copy
+  %inserted_slice_1 = tensor.insert_slice %res_1 into %cancatenated_empty[0, 0, 0][5, 6, 64][1, 1, 1]
+      : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+  %inserted_slice_2 = tensor.insert_slice %res_2 into %inserted_slice_1[0, 0, 64][5, 6, 64][1, 1, 1]
+      : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+  return %inserted_slice_2 : tensor<5x6x128xf32>
+}
+
+// -----
+
+// `EmptyTensorElimination` replaces all of the uses of the tensor
+// empty with the new injected `SubsetExtraction`, without to consider
+// the specific use has been tracked, sometimes creating a non existent
+// bufferization conflicts.
+
+// CHECK-ELIM-LABEL:   func.func @mutli_use_of_the_same_tensor_empty
+// CHECK-LABEL:   func.func @mutli_use_of_the_same_tensor_empty
+func.func @mutli_use_of_the_same_tensor_empty() -> tensor<5x6x128xf32> {
+  %cst_1 = arith.constant 1.0 : f32
+  %cst_2 = arith.constant 2.0 : f32
+  %cancatenated_empty = tensor.empty() : tensor<5x6x128xf32>
+  %empty_1 = tensor.empty() : tensor<5x6x64xf32>
+  // CHECK-ELIM: %[[VAL_3:.*]] = tensor.extract_slice
+  // CHECK-ELIM: linalg.fill ins(%[[VAL_0:.*]] : f32) outs(%[[VAL_3]]
+  // CHECK-ELIM: linalg.fill ins(%[[VAL_1:.*]] : f32) outs(%[[VAL_3]]
+  %res_1 = linalg.fill ins(%cst_1 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+  %res_2 = linalg.fill ins(%cst_2 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+  // CHECK: memref.copy
+  %inserted_slice_1 = tensor.insert_slice %res_1 into %cancatenated_empty[0, 0, 0][5, 6, 64][1, 1, 1]
+      : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+  // CHECK: memref.copy
+  %inserted_slice_2 = tensor.insert_slice %res_2 into %inserted_slice_1[0, 0, 64][5, 6, 64][1, 1, 1]
+      : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+  return %inserted_slice_2 : tensor<5x6x128xf32>
+}
+
+// -----
+
+// CHECK-LABEL:   func.func @mutli_use_of_the_same_tensor_empty_creates_non_existent_read
+func.func @mutli_use_of_the_same_tensor_empty_creates_non_existent_read(%arg1: tensor<5x6x128xf32> , %arg2: tensor<5x6x64xf32>)
+    -> (tensor<5x6x128xf32>, tensor<5x6x64xf32>) {
+  %cst_1 = arith.constant 1.0 : f32
+  %empty_1 = tensor.empty() : tensor<5x6x64xf32>
+  // CHECK: memref.alloc
+  %res_1 = linalg.fill ins(%cst_1 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+  %res_2 = linalg.generic{
+    indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>],
+    iterator_types = ["parallel", "parallel", "parallel"]
+  }
+  ins(%empty_1 : tensor<5x6x64xf32>)
+  outs(%arg2 :tensor<5x6x64xf32>) {
+  ^bb0(%in: f32, %out: f32):
+    %res = arith.addf %in, %in : f32
+    linalg.yield %res : f32
+  } -> tensor<5x6x64xf32>
+  // CHECK: memref.copy
+  %inserted_slice_1 = tensor.insert_slice %res_1 into %arg1[0, 0, 0][5, 6, 64][1, 1, 1]
+      : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+  return %inserted_slice_1, %res_2 : tensor<5x6x128xf32>, tensor<5x6x64xf32>
+}

@amirBish amirBish changed the title [mlir][bufferization]-Add lit tests for unhandled cases in EmptyTensorElimination [mlir][bufferization]-Add unhandled cases in EmptyTensorElimination Dec 6, 2024
Copy link

github-actions bot commented Dec 6, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@amirBish amirBish force-pushed the amirb/master/add_lit_tests_for_tensor_empty_elimination branch from cc09909 to ddd3230 Compare December 6, 2024 12:32
@amirBish amirBish changed the title [mlir][bufferization]-Add unhandled cases in EmptyTensorElimination [mlir][bufferization]-Support unhandled cases in EmptyTensorElimination Dec 6, 2024
@amirBish amirBish force-pushed the amirb/master/add_lit_tests_for_tensor_empty_elimination branch from ddd3230 to e40d0ef Compare December 11, 2024 15:47
@amirBish
Copy link
Contributor Author

@matthias-springer ping, could you have additional look on the changes ?

@amirBish amirBish force-pushed the amirb/master/add_lit_tests_for_tensor_empty_elimination branch from e40d0ef to 85ea2c2 Compare December 17, 2024 14:48
@amirBish amirBish changed the title [mlir][bufferization]-Support unhandled cases in EmptyTensorElimination [mlir][bufferization]-Replace only one use in TensorEmptyElimination Dec 17, 2024
@amirBish
Copy link
Contributor Author

@matthias-springer , Thanks for the review and the feedback. I've just modified the PR to include only the changes of Replacing the specific use. And after submitting this one will adopt the approach you have suggested to have a lambda of buildSubsetExtraction that user can control. I guess it will abstract the different implementation of finding the valid insertion point and let user have custom support.

…rElimination

In many cases the emptyTensorElimination can not transform or
eliminate the empty tensor which is being inserted into the
`SubsetInsertionOpInterface`.

Two major reasons for that:

1- Failing when trying to find a legal/suitable insertion point
for the `subsetExtract` which is about to replace the empty tensor.
However, we may try to handle this issue by moving the needed values
which responsible on building the  `subsetExtract` nearby the empty
tensor (which is about to be eliminated). Thus increasing the
probability to find a legal insertion point.

2-The EmptyTensorElimination transform replaces the tensor.empty's uses
all at once in one apply, rather than replacing only the specific use
which was visited in the use-def chain (when traversing from the tensor.insert_slice).
This scenario of replacing all the uses of the tensor.empty may lead
into additional read effects after bufferization of the specific subset
extract/subview which should not be the case.

Both cases may result in many copies in the coming bufferization
which can not be canonicalized.

The first case can be noticed when having a `tensor.empty` followed by
`SubsetInsertionOpInterface` (or in simple words `tensor.insert_slice`),
which have been lowered from `tensor/tosa.concat`.

The second case can be noticed when having a `tensor.empty`, with many
uses and leading to applying the transformation only once, since the
whole uses have been replaced at once.

This MR only adds the lit tests for the cases shown above (NFC),
to emphasize how the transform works, in the coming MRs will upload
a slight changes to handle these cases in a more generic way.
@amirBish amirBish force-pushed the amirb/master/add_lit_tests_for_tensor_empty_elimination branch from 85ea2c2 to a430ad6 Compare December 17, 2024 21:26
This MR hanldes the second case where we want to replace only the
specific use which was visited in the `use-def` chain (when
traversing from the tensor.insert_slice's source).

This scenario of replacing all the uses of the tensor.empty may lead
into additional read effects after bufferization of the specific subset
extract/subview which should not be the case, Thus eliminating a
potential copies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlir:bufferization Bufferization infrastructure mlir
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants