[mlir][bufferization]-Replace only one use in TensorEmptyElimination #118958

amirBish · 2024-12-06T12:05:41Z

In many cases the emptyTensorElimination can not transform or eliminate the empty tensor which is being inserted into the SubsetInsertionOpInterface.

Two major reasons for that:

1- Failing when trying to find a legal/suitable insertion point for the subsetExtract which is about to replace the empty tensor. However, we may try to handle this issue by moving the needed values which responsible on building the subsetExtract nearby the empty tensor (which is about to be eliminated). Thus increasing the probability to find a legal insertion point.

2-The EmptyTensorElimination transform replaces the tensor.empty's uses all at once in one apply, rather than replacing only the specific use which was visited in the use-def chain (when traversing from the tensor.insert_slice). This scenario of replacing all the uses of the tensor.empty may lead into additional read effects after bufferization of the specific subset extract/subview which should not be the case.

Both cases may result in many copies in the coming bufferization which can not be canonicalized.

The first case can be noticed when having a tensor.empty followed by SubsetInsertionOpInterface (or in simple words tensor.insert_slice), which have been lowered from tensor/tosa.concat.

The second case can be noticed when having a tensor.empty, with many uses and leading to applying the transformation only once, since the whole uses have been replaced at once.

The first commit in the PR only adds the lit tests for the cases shown above (NFC), to emphasize how the transform works, in the coming MRs will upload a slight changes to handle these case.

The second commit in this PR, we want to replace only the specific use which was visited in the use-def chain (when traversing from the tensor.insert_slice's source).

llvmbot · 2024-12-06T12:06:17Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-bufferization

Author: Amir Bishara (amirBish)

Changes

In many cases the emptyTensorElimination can not transform or eliminate the empty tensor which is being inserted into the SubsetInsertionOpInterface.

Two major reasons for that:

1- Failing when trying to find a legal/suitable insertion point for the subsetExtract which is about to replace the empty tensor. However, we may try to handle this issue by moving the needed values which responsible on building the subsetExtract nearby the empty tensor (which is about to be eliminated). Thus increasing the probability to find a legal insertion point.

2-The EmptyTensorElimination transform replaces the tensor.empty's uses all at once in one apply, rather than replacing only the specific use which was visited in the use-def chain (when traversing from the tensor.insert_slice). This scenario of replacing all the uses of the tensor.empty may lead into additional read effects after bufferization of the specific subset extract/subview which should not be the case.

Both cases may result in many copies in the coming bufferization which can not be canonicalized.

The first case can be noticed when having a tensor.empty followed by SubsetInsertionOpInterface (or in simple words tensor.insert_slice), which have been lowered from tensor/tosa.concat.

The second case can be noticed when having a tensor.empty, with many uses and leading to applying the transformation only once, since the whole uses have been replaced at once.

This MR only adds the lit tests for the cases shown above (NFC), to emphasize how the transform works, in the coming MRs will upload a slight changes to handle these case.

Full diff: https://github.com/llvm/llvm-project/pull/118958.diff

1 Files Affected:

(modified) mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir (+98)

diff --git a/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir b/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir
index efe59af97d9649..9d9bb443160465 100644
--- a/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir
+++ b/mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-empty-tensor-elimination.mlir
@@ -365,3 +365,101 @@ func.func @multiple_materialize_in_destination_buffer(%m: memref<5xf32>, %f: f32
   bufferization.materialize_in_destination %selected in restrict writable %m : (tensor<5xf32>, memref<5xf32>) -> ()
   return
 }
+
+// -----
+
+// `EmptyTensorElimination` fails to find a valid insertion
+// point for the new injected `SubsetExtraction`.
+// CHECK-LABEL:   func.func @fail_to_eliminate_any_empty_tensors
+func.func @fail_to_eliminate_any_empty_tensors() -> tensor<5x6x128xf32> {
+  %cst_1 = arith.constant 1.0 : f32
+  %cst_2 = arith.constant 2.0 : f32
+  // CHECK: memref.alloc
+  // CHECK: memref.alloc
+  // CHECK: memref.alloc
+  %empty_1 = tensor.empty() : tensor<5x6x64xf32>
+  %res_1 = linalg.fill ins(%cst_1 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+  %empty_2 = tensor.empty() : tensor<5x6x64xf32>
+  %res_2 = linalg.fill ins(%cst_2 : f32) outs(%empty_2 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+  %cancatenated_empty = tensor.empty() : tensor<5x6x128xf32>
+  // CHECK: memref.copy
+  %inserted_slice_1 = tensor.insert_slice %res_1 into %cancatenated_empty[0, 0, 0][5, 6, 64][1, 1, 1]
+      : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+  %inserted_slice_2 = tensor.insert_slice %res_2 into %inserted_slice_1[0, 0, 64][5, 6, 64][1, 1, 1]
+      : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+  return %inserted_slice_2 : tensor<5x6x128xf32>
+}
+
+// -----
+
+// CHECK-LABEL:   func.func @succeed_to_eliminate_one_empty_tensor
+func.func @succeed_to_eliminate_one_empty_tensor() -> tensor<5x6x128xf32> {
+  %cst_1 = arith.constant 1.0 : f32
+  %cst_2 = arith.constant 2.0 : f32
+  // CHECK: memref.alloc
+  // CHECK: memref.alloc
+  %cancatenated_empty = tensor.empty() : tensor<5x6x128xf32>
+  %empty_1 = tensor.empty() : tensor<5x6x64xf32>
+  %res_1 = linalg.fill ins(%cst_1 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+  %empty_2 = tensor.empty() : tensor<5x6x64xf32>
+  %res_2 = linalg.fill ins(%cst_2 : f32) outs(%empty_2 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+  // CHECK: memref.copy
+  %inserted_slice_1 = tensor.insert_slice %res_1 into %cancatenated_empty[0, 0, 0][5, 6, 64][1, 1, 1]
+      : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+  %inserted_slice_2 = tensor.insert_slice %res_2 into %inserted_slice_1[0, 0, 64][5, 6, 64][1, 1, 1]
+      : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+  return %inserted_slice_2 : tensor<5x6x128xf32>
+}
+
+// -----
+
+// `EmptyTensorElimination` replaces all of the uses of the tensor
+// empty with the new injected `SubsetExtraction`, without to consider
+// the specific use has been tracked, sometimes creating a non existent
+// bufferization conflicts.
+
+// CHECK-ELIM-LABEL:   func.func @mutli_use_of_the_same_tensor_empty
+// CHECK-LABEL:   func.func @mutli_use_of_the_same_tensor_empty
+func.func @mutli_use_of_the_same_tensor_empty() -> tensor<5x6x128xf32> {
+  %cst_1 = arith.constant 1.0 : f32
+  %cst_2 = arith.constant 2.0 : f32
+  %cancatenated_empty = tensor.empty() : tensor<5x6x128xf32>
+  %empty_1 = tensor.empty() : tensor<5x6x64xf32>
+  // CHECK-ELIM: %[[VAL_3:.*]] = tensor.extract_slice
+  // CHECK-ELIM: linalg.fill ins(%[[VAL_0:.*]] : f32) outs(%[[VAL_3]]
+  // CHECK-ELIM: linalg.fill ins(%[[VAL_1:.*]] : f32) outs(%[[VAL_3]]
+  %res_1 = linalg.fill ins(%cst_1 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+  %res_2 = linalg.fill ins(%cst_2 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+  // CHECK: memref.copy
+  %inserted_slice_1 = tensor.insert_slice %res_1 into %cancatenated_empty[0, 0, 0][5, 6, 64][1, 1, 1]
+      : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+  // CHECK: memref.copy
+  %inserted_slice_2 = tensor.insert_slice %res_2 into %inserted_slice_1[0, 0, 64][5, 6, 64][1, 1, 1]
+      : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+  return %inserted_slice_2 : tensor<5x6x128xf32>
+}
+
+// -----
+
+// CHECK-LABEL:   func.func @mutli_use_of_the_same_tensor_empty_creates_non_existent_read
+func.func @mutli_use_of_the_same_tensor_empty_creates_non_existent_read(%arg1: tensor<5x6x128xf32> , %arg2: tensor<5x6x64xf32>)
+    -> (tensor<5x6x128xf32>, tensor<5x6x64xf32>) {
+  %cst_1 = arith.constant 1.0 : f32
+  %empty_1 = tensor.empty() : tensor<5x6x64xf32>
+  // CHECK: memref.alloc
+  %res_1 = linalg.fill ins(%cst_1 : f32) outs(%empty_1 : tensor<5x6x64xf32>) -> tensor<5x6x64xf32>
+  %res_2 = linalg.generic{
+    indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>],
+    iterator_types = ["parallel", "parallel", "parallel"]
+  }
+  ins(%empty_1 : tensor<5x6x64xf32>)
+  outs(%arg2 :tensor<5x6x64xf32>) {
+  ^bb0(%in: f32, %out: f32):
+    %res = arith.addf %in, %in : f32
+    linalg.yield %res : f32
+  } -> tensor<5x6x64xf32>
+  // CHECK: memref.copy
+  %inserted_slice_1 = tensor.insert_slice %res_1 into %arg1[0, 0, 0][5, 6, 64][1, 1, 1]
+      : tensor<5x6x64xf32> into tensor<5x6x128xf32>
+  return %inserted_slice_1, %res_2 : tensor<5x6x128xf32>, tensor<5x6x64xf32>
+}

github-actions · 2024-12-06T12:27:50Z

✅ With the latest revision this PR passed the C/C++ code formatter.

mlir/lib/Dialect/Bufferization/Transforms/EmptyTensorElimination.cpp

amirBish · 2024-12-16T15:45:08Z

@matthias-springer ping, could you have additional look on the changes ?

mlir/lib/Dialect/Bufferization/Transforms/EmptyTensorElimination.cpp

amirBish · 2024-12-17T14:59:57Z

@matthias-springer , Thanks for the review and the feedback. I've just modified the PR to include only the changes of Replacing the specific use. And after submitting this one will adopt the approach you have suggested to have a lambda of buildSubsetExtraction that user can control. I guess it will abstract the different implementation of finding the valid insertion point and let user have custom support.

mlir/lib/Dialect/Bufferization/Transforms/EmptyTensorElimination.cpp

…rElimination In many cases the emptyTensorElimination can not transform or eliminate the empty tensor which is being inserted into the `SubsetInsertionOpInterface`. Two major reasons for that: 1- Failing when trying to find a legal/suitable insertion point for the `subsetExtract` which is about to replace the empty tensor. However, we may try to handle this issue by moving the needed values which responsible on building the `subsetExtract` nearby the empty tensor (which is about to be eliminated). Thus increasing the probability to find a legal insertion point. 2-The EmptyTensorElimination transform replaces the tensor.empty's uses all at once in one apply, rather than replacing only the specific use which was visited in the use-def chain (when traversing from the tensor.insert_slice). This scenario of replacing all the uses of the tensor.empty may lead into additional read effects after bufferization of the specific subset extract/subview which should not be the case. Both cases may result in many copies in the coming bufferization which can not be canonicalized. The first case can be noticed when having a `tensor.empty` followed by `SubsetInsertionOpInterface` (or in simple words `tensor.insert_slice`), which have been lowered from `tensor/tosa.concat`. The second case can be noticed when having a `tensor.empty`, with many uses and leading to applying the transformation only once, since the whole uses have been replaced at once. This MR only adds the lit tests for the cases shown above (NFC), to emphasize how the transform works, in the coming MRs will upload a slight changes to handle these cases in a more generic way.

This MR hanldes the second case where we want to replace only the specific use which was visited in the `use-def` chain (when traversing from the tensor.insert_slice's source). This scenario of replacing all the uses of the tensor.empty may lead into additional read effects after bufferization of the specific subset extract/subview which should not be the case, Thus eliminating a potential copies.

llvmbot added mlir mlir:bufferization Bufferization infrastructure labels Dec 6, 2024

amirBish requested a review from matthias-springer December 6, 2024 12:06

amirBish requested review from MaheshRavishankar and joker-eph December 6, 2024 12:06

amirBish changed the title ~~[mlir][bufferization]-Add lit tests for unhandled cases in EmptyTensorElimination~~ [mlir][bufferization]-Add unhandled cases in EmptyTensorElimination Dec 6, 2024

amirBish force-pushed the amirb/master/add_lit_tests_for_tensor_empty_elimination branch from cc09909 to ddd3230 Compare December 6, 2024 12:32

amirBish changed the title ~~[mlir][bufferization]-Add unhandled cases in EmptyTensorElimination~~ [mlir][bufferization]-Support unhandled cases in EmptyTensorElimination Dec 6, 2024

matthias-springer reviewed Dec 8, 2024

View reviewed changes

mlir/lib/Dialect/Bufferization/Transforms/EmptyTensorElimination.cpp Outdated Show resolved Hide resolved

amirBish force-pushed the amirb/master/add_lit_tests_for_tensor_empty_elimination branch from ddd3230 to e40d0ef Compare December 11, 2024 15:47

matthias-springer reviewed Dec 16, 2024

View reviewed changes

amirBish force-pushed the amirb/master/add_lit_tests_for_tensor_empty_elimination branch from e40d0ef to 85ea2c2 Compare December 17, 2024 14:48

amirBish changed the title ~~[mlir][bufferization]-Support unhandled cases in EmptyTensorElimination~~ [mlir][bufferization]-Replace only one use in TensorEmptyElimination Dec 17, 2024

matthias-springer approved these changes Dec 17, 2024

View reviewed changes

amirBish force-pushed the amirb/master/add_lit_tests_for_tensor_empty_elimination branch from 85ea2c2 to a430ad6 Compare December 17, 2024 21:26

amirBish force-pushed the amirb/master/add_lit_tests_for_tensor_empty_elimination branch from a430ad6 to 11506c3 Compare December 17, 2024 21:31

amirBish merged commit 08aa956 into llvm:main Dec 18, 2024
8 checks passed

amirBish mentioned this pull request Dec 21, 2024

[mlir][bufferization]-Add ControlBuildSubsetExtractionFn to TensorEmptyElimination #120851

Merged

matthias-springer mentioned this pull request Jan 15, 2025

[MLIR] Bufferization behaving differently based on the position of extract_slice. #122869

Open

This was referenced Jun 2, 2025

[MTE] [NFC] use vector to collect globals to tag (#120283) #142329

Closed

[MTE] [NFC] use vector to collect globals to tag (#120283) #142330

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[mlir][bufferization]-Replace only one use in TensorEmptyElimination #118958

[mlir][bufferization]-Replace only one use in TensorEmptyElimination #118958

Uh oh!

amirBish commented Dec 6, 2024 •

edited

Loading

Uh oh!

llvmbot commented Dec 6, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Dec 6, 2024 •

edited

Loading

Uh oh!

Uh oh!

amirBish commented Dec 16, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amirBish commented Dec 17, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[mlir][bufferization]-Replace only one use in TensorEmptyElimination #118958

[mlir][bufferization]-Replace only one use in TensorEmptyElimination #118958

Uh oh!

Conversation

amirBish commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

amirBish commented Dec 16, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amirBish commented Dec 17, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amirBish commented Dec 6, 2024 •

edited

Loading

llvmbot commented Dec 6, 2024 •

edited

Loading

github-actions bot commented Dec 6, 2024 •

edited

Loading