[mlir] Convert `expand_shape` to more static form #112265

IanWood1 · 2024-10-14T21:29:26Z

Add pattern that converts a tensor.expand_shape op to a more static form.

This matches the pattern: tensor.cast -> tensor.expand_shape if it has a foldable tensor.cast and some constant foldable output_shape operands for the tensor.expand_shape. This makes the tensor.expand_shape more static, as well as allowing the static information to be propagated further down in the program.

Sink tensor.cast op through tensor.expand_shape ops when it makes the expand op more static. This allows for other ops further down infer their shapes.

When output_sizes can be determined, convert to a static expand_shape op and insert cast ops. The top cast will be (dynamic -> static) allowing it to be propagated upwards and the bottom will be (static -> dynamic) allowing it to propagate down (or cancel with adjacent tensor.cast ops). [skip ci]

MaheshRavishankar · 2024-10-15T16:58:48Z

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

+
+  LogicalResult matchAndRewrite(ExpandShapeOp expandOp,
+                                PatternRewriter &rewriter) const override {
+    SmallVector<int64_t> newOutputShape(expandOp.getResultType().getShape());


You should look for source of expandOp is a tensor.cast operation where the source of the cast has a more static shape than the result (using

llvm-project/mlir/include/mlir/Dialect/Tensor/IR/Tensor.h

Line 88 in fb858b4

bool canFoldIntoConsumerOp(CastOp castOp);

).

hanhanW · 2024-10-15T18:06:13Z

My main concern here is that the generated casts are not guaranteed to fold with other casts.

There is ChainedTensorCast pattern, which folds the tensor.cast ops into a single tensor.cast op. Then you can follow what Mahesh suggested, which folds the producer tensor.cast into the expand_shape op. There is a canFoldIntoProducerOp, which can be used in the expand_shape -> tensor.cast folding. I'm not pretty sure if they work or not, please take a look at these two functions.

llvmbot · 2024-10-21T14:53:45Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-tensor

Author: Ian Wood (IanWood1)

Changes

Initially, my idea was to sink tensor.cast op through tensor.expand_shape ops when it makes the expand op more static. But then I realized that the SSA output_shape operands are capturing shape info that can't be propagated. From the commit's description:

>When output_sizes can be determined, convert to a static expand_shape
op and insert cast ops. The top cast will be (dynamic -> static) allowing
it to be propagated upwards and the bottom will be (static -> dynamic)
allowing it to propagate down (or cancel with adjacent tensor.cast ops).

My main concern here is that the generated casts are not guaranteed to fold with other casts. This is somewhat similar to what linalg does where it introduces casts before operands when their shapes are inferred. But, I'm not sure if this is suited for a canonicalization pattern (I could just add a check to make sure the pattern would fold >1 adjacent cast).

Also, the opposite might happen as well. Where output_sizes are unknown constants but there is a tensor.cast consumer that has the output size information.

Sidenote: I disabled CI because drop-unit-extent-dims.mlir will fail. There is a cast that gets converted to a static form. Just wanted to wait for review to determine if a fix is needed.

Full diff: https://github.com/llvm/llvm-project/pull/112265.diff

2 Files Affected:

(modified) mlir/lib/Dialect/Tensor/IR/TensorOps.cpp (+79-1)
(modified) mlir/test/Dialect/Tensor/canonicalize.mlir (+54)

diff --git a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
index 4d6c5965c4fcc3..ee0e8c2d201226 100644
--- a/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
+++ b/mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
@@ -24,6 +24,7 @@
 #include "mlir/IR/TypeUtilities.h"
 #include "mlir/Interfaces/DestinationStyleOpInterface.h"
 #include "mlir/Interfaces/LoopLikeInterface.h"
+#include "mlir/Support/LLVM.h"
 #include "llvm/ADT/DenseSet.h"
 #include "llvm/ADT/STLExtras.h"
 #include "llvm/ADT/SmallBitVector.h"
@@ -1982,6 +1983,83 @@ struct FoldDimOfCollapseShape : public OpRewritePattern<DimOp> {
     return success();
   }
 };
+
+struct ConvertToStaticExpandShape : public OpRewritePattern<ExpandShapeOp> {
+  using OpRewritePattern<ExpandShapeOp>::OpRewritePattern;
+
+  LogicalResult matchAndRewrite(ExpandShapeOp expandOp,
+                                PatternRewriter &rewriter) const override {
+    auto castOp = expandOp.getSrc().getDefiningOp<CastOp>();
+    if (!canFoldIntoConsumerOp(castOp))
+      return failure();
+
+    const ArrayRef<int64_t> castSrcShape =
+        castOp.getSource().getType().getShape();
+    const SmallVector<ReassociationIndices, 4> reassoc =
+        expandOp.getReassociationIndices();
+
+    SmallVector<int64_t> newOutputShape(expandOp.getResultType().getShape());
+    SmallVector<Value> dynamicOutputShape;
+    auto outputIt = expandOp.getOutputShape().begin();
+
+    for (const auto &[inputDim, innerReassoc] : llvm::enumerate(reassoc)) {
+      for (const uint64_t outDim : innerReassoc) {
+        if (!ShapedType::isDynamic(newOutputShape[outDim]))
+          continue;
+
+        // If the cast's src type is dynamic, don't infer any of the
+        // corresponding expanded dimensions. `tensor.expand_shape` requires at
+        // least one of the expanded dimensions to be dynamic if the input is
+        // dynamic.
+        Value val = *outputIt;
+        ++outputIt;
+        if (ShapedType::isDynamic(castSrcShape[inputDim])) {
+          dynamicOutputShape.push_back(val);
+          continue;
+        }
+
+        APInt cst;
+        if (matchPattern(val, m_ConstantInt(&cst))) {
+          newOutputShape[outDim] = cst.getSExtValue();
+        } else {
+          dynamicOutputShape.push_back(val);
+        }
+      }
+    }
+
+    // Couldn't match any values, nothing to change
+    if (expandOp.getOutputShape().size() == dynamicOutputShape.size())
+      return failure();
+
+    // Calculate the input shape from the output
+    SmallVector<int64_t> newInputShape(expandOp.getSrcType().getRank(), 1l);
+    for (uint64_t inDim = 0; inDim < newInputShape.size(); inDim++) {
+      for (auto outDim : reassoc[inDim]) {
+        auto ofr = newOutputShape[outDim];
+        if (ShapedType::isDynamic(ofr)) {
+          newInputShape[inDim] = ShapedType::kDynamic;
+          break;
+        }
+        newInputShape[inDim] *= ofr;
+      }
+    }
+
+    SmallVector<OpFoldResult> outputOfr =
+        getMixedValues(newOutputShape, dynamicOutputShape, rewriter);
+    auto inputType = RankedTensorType::get(
+        newInputShape, expandOp.getSrcType().getElementType());
+    auto outputType = RankedTensorType::get(
+        newOutputShape, expandOp.getSrcType().getElementType());
+    auto inputCast = rewriter.create<CastOp>(expandOp.getLoc(), inputType,
+                                             expandOp.getSrc());
+    auto newExpand = rewriter.create<ExpandShapeOp>(
+        expandOp.getLoc(), outputType, inputCast.getResult(),
+        expandOp.getReassociationIndices(), outputOfr);
+    rewriter.replaceOpWithNewOp<CastOp>(expandOp, expandOp.getType(),
+                                        newExpand.getResult());
+    return success();
+  }
+};
 } // namespace
 
 void ExpandShapeOp::getCanonicalizationPatterns(RewritePatternSet &results,
@@ -1989,7 +2067,7 @@ void ExpandShapeOp::getCanonicalizationPatterns(RewritePatternSet &results,
   results.add<
       ComposeReassociativeReshapeOps<ExpandShapeOp, ReshapeOpKind::kExpand>,
       ComposeExpandOfCollapseOp<ExpandShapeOp, CollapseShapeOp>,
-      FoldReshapeWithConstant<ExpandShapeOp>,
+      ConvertToStaticExpandShape, FoldReshapeWithConstant<ExpandShapeOp>,
       FoldReshapeWithSplat<ExpandShapeOp>,
       FoldReshapeWithFromElements<ExpandShapeOp>, FoldDimOfExpandShape,
       FoldDimOfCollapseShape>(context);
diff --git a/mlir/test/Dialect/Tensor/canonicalize.mlir b/mlir/test/Dialect/Tensor/canonicalize.mlir
index 0aa2d33ef17ed4..63f394a14d3899 100644
--- a/mlir/test/Dialect/Tensor/canonicalize.mlir
+++ b/mlir/test/Dialect/Tensor/canonicalize.mlir
@@ -2718,3 +2718,57 @@ func.func @pack_dont_drop_attributes(%arg0: tensor<?x?x?xf16>, %arg1: tensor<128
   %pack = tensor.pack %arg0 padding_value(%cst : f16) outer_dims_perm = [0, 1, 2] inner_dims_pos = [1, 2] inner_tiles = [16, 1] into %arg1 {test_attr} : tensor<?x?x?xf16> -> tensor<128x?x100x16x1xf16>
   return %pack : tensor<128x?x100x16x1xf16>
 }
+
+// -----
+
+func.func @fold_expand_of_cast(%arg0 : tensor<10x10xf32>)
+    -> tensor<10x1x10xf32> {
+  %c1 = arith.constant 1 : index 
+  %c10 = arith.constant 10 : index 
+  %0 = tensor.cast %arg0 : tensor<10x10xf32> to tensor<?x?xf32>
+  %1 = tensor.expand_shape %0 [[0, 1], [2]] output_shape [%c10, %c1, %c10]
+      : tensor<?x?xf32> into tensor<?x?x?xf32>
+  %2 = tensor.cast %1 : tensor<?x?x?xf32> to tensor<10x1x10xf32>
+  return %2 : tensor<10x1x10xf32>
+}
+// CHECK-LABEL:  func.func @fold_expand_of_cast
+//       CHECK:   %[[RES:.+]] = tensor.expand_shape %{{.*}} {{\[}}[0, 1], [2]] output_shape [10, 1, 10]
+//       CHECK:   return %[[RES]]
+
+// -----
+
+func.func @sink_expand_of_cast(%arg0 : tensor<?x10xf32>)
+    -> tensor<?x?x?xf32> {
+  %c1 = arith.constant 1 : index
+  %c10 = arith.constant 10 : index
+  %0 = tensor.cast %arg0 : tensor<?x10xf32> to tensor<?x?xf32>
+  %1 = tensor.expand_shape %0 [[0, 1], [2]] output_shape [%c10, %c1, %c10]
+      : tensor<?x?xf32> into tensor<?x?x?xf32>
+  return %1 : tensor<?x?x?xf32>
+}
+// CHECK-LABEL:  func.func @sink_expand_of_cast
+//   CHECK-DAG:   %[[C10:.*]] = arith.constant 10
+//   CHECK-DAG:   %[[C1:.*]] = arith.constant 1
+//       CHECK:   %[[EXPAND:.+]] = tensor.expand_shape %{{.*}} {{\[}}[0, 1], [2]] 
+//  CHECK-SAME:     output_shape [%[[C10]], %[[C1]], 10]
+//       CHECK:   %[[RES:.+]] = tensor.cast %[[EXPAND]]
+//       CHECK:   return %[[RES]]
+
+// -----
+
+func.func @partial_sink_expand_of_cast(%arg0 : tensor<10x10xf32>, %arg1 : index, %arg2 : index)
+    -> tensor<?x?x?xf32> {
+  %c10 = arith.constant 10 : index
+  %0 = tensor.cast %arg0 : tensor<10x10xf32> to tensor<?x?xf32>
+  %1 = tensor.expand_shape %0 [[0, 1], [2]] output_shape [%arg1, %arg2, %c10]
+      : tensor<?x?xf32> into tensor<?x?x?xf32>
+  return %1 : tensor<?x?x?xf32>
+}
+// CHECK-LABEL:  func.func @partial_sink_expand_of_cast
+//       CHECK:   %[[CAST:.+]] = tensor.cast
+//  CHECK-SAME:     tensor<10x10xf32> to tensor<?x10xf32>
+//       CHECK:   %[[EXPAND:.+]] = tensor.expand_shape %{{.*}} {{\[}}[0, 1], [2]] 
+//  CHECK-SAME:     output_shape [%{{.*}}, %{{.*}}, 10]
+//       CHECK:   %[[RES:.+]] = tensor.cast %[[EXPAND]]
+//  CHECK-SAME:     tensor<?x?x10xf32> to tensor<?x?x?xf32>
+//       CHECK:   return %[[RES]]

MaheshRavishankar

This looks mostly good. I left a few comments. Please address before landing them.

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

MaheshRavishankar · 2024-10-22T05:52:59Z

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

+      return failure();
+
+    // Calculate the input shape from the output
+    SmallVector<int64_t> newInputShape(expandOp.getSrcType().getRank(), 1l);


Do you need this? Isnt the input shape the same as the source of the cast operation?

From the partial_sink_expand_of_cast test case:

%0 = tensor.cast %arg0 : tensor<10x10xf32> to tensor<?x?xf32> %1 = tensor.expand_shape %0 [[0, 1], [2]] output_shape [%arg1, %arg2, %c10] : tensor<?x?xf32> into tensor<?x?x?xf32>

tensor.expand_shape's src type cannot become fully static because the op requires a dynamic input dim if the output is dynamic. The input cast becomes tensor<10x10xf32> to tensor<?x10xf32> instead of being fully removed. I could just bail on cases where not all SSA values can be matched (if the input dim can be made static). That way teh input shape would be the same as the tensor.cast at the cost of not being able to propagate any of the static dim info

joker-eph · 2024-10-22T21:22:46Z

Nit: please cleanup the description to only describe the commit before landing the PR.

joker-eph · 2024-10-22T21:23:57Z

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

@@ -1982,14 +1983,90 @@ struct FoldDimOfCollapseShape : public OpRewritePattern<DimOp> {
    return success();
  }
 };
+
+struct ConvertToStaticExpandShape : public OpRewritePattern<ExpandShapeOp> {


Can you please document the pattern with a high-level description of what this pattern is doing? That'll be useful to future folks having to debug or improve this :)
(or just skimming the codebase).

Thanks.

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

Add pattern that converts a `tensor.expand_shape` op to a more static form. This matches the pattern: `tensor.cast` -> `tensor.expand_shape` if it has a foldable `tensor.cast` and some constant foldable `output_shape` operands for the `tensor.expand_shape`. This makes the `tensor.expand_shape` more static, as well as allowing the static information to be propagated further down in the program.

[mlir] Fold expand of cast

6764919

Sink tensor.cast op through tensor.expand_shape ops when it makes the expand op more static. This allows for other ops further down infer their shapes.

IanWood1 mentioned this pull request Oct 14, 2024

Revert tensor.cast to flow reshape conversion iree-org/iree#18772

Merged

IanWood1 requested a review from MaheshRavishankar October 15, 2024 02:33

IanWood1 changed the title ~~[mlir] Fold expand of cast~~ [mlir] Convert expand_shape to more static form Oct 15, 2024

IanWood1 requested a review from hanhanW October 15, 2024 16:18

MaheshRavishankar requested changes Oct 15, 2024

View reviewed changes

Redo logic to ensure cast gets folded

4e16a97

IanWood1 marked this pull request as ready for review October 21, 2024 14:53

llvmbot added mlir mlir:tensor labels Oct 21, 2024

IanWood1 mentioned this pull request Oct 21, 2024

Llama dispatch perf tracking issue iree-org/iree#18853

Open

8 tasks

MaheshRavishankar approved these changes Oct 22, 2024

View reviewed changes

Drop const qualifier

c6e1139

joker-eph reviewed Oct 22, 2024

View reviewed changes

Add comment to rewrite pattern

8cf255b

hanhanW approved these changes Oct 24, 2024

View reviewed changes

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp Outdated Show resolved Hide resolved

Use llvm::seq

f8670eb

IanWood1 merged commit 455f71d into llvm:main Oct 25, 2024
8 checks passed

IanWood1 deleted the fold_expand_of_cast branch October 25, 2024 00:04

frobtech mentioned this pull request Oct 25, 2024

p/libc fexcept t #113664

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mlir] Convert `expand_shape` to more static form #112265

[mlir] Convert `expand_shape` to more static form #112265

IanWood1 commented Oct 14, 2024 •

edited

Loading

MaheshRavishankar Oct 15, 2024

hanhanW commented Oct 15, 2024

llvmbot commented Oct 21, 2024 •

edited

Loading

MaheshRavishankar left a comment

MaheshRavishankar Oct 22, 2024

IanWood1 Oct 22, 2024

joker-eph commented Oct 22, 2024

joker-eph Oct 22, 2024

[mlir] Convert expand_shape to more static form #112265

[mlir] Convert expand_shape to more static form #112265

Conversation

IanWood1 commented Oct 14, 2024 • edited Loading

MaheshRavishankar Oct 15, 2024

Choose a reason for hiding this comment

hanhanW commented Oct 15, 2024

llvmbot commented Oct 21, 2024 • edited Loading

MaheshRavishankar left a comment

Choose a reason for hiding this comment

MaheshRavishankar Oct 22, 2024

Choose a reason for hiding this comment

IanWood1 Oct 22, 2024

Choose a reason for hiding this comment

joker-eph commented Oct 22, 2024

joker-eph Oct 22, 2024

Choose a reason for hiding this comment

[mlir] Convert `expand_shape` to more static form #112265

[mlir] Convert `expand_shape` to more static form #112265

IanWood1 commented Oct 14, 2024 •

edited

Loading

llvmbot commented Oct 21, 2024 •

edited

Loading