[LoopInterchange] Improve profitability check for vectorization #133672

kasuga-fj · 2025-03-31T01:56:22Z

The vectorization profitability has a process to check whether a given loop can be vectorized or not. Since the process is conservative, a loop that can be vectorized may be deemed not to be possible. This can trigger unnecessary exchanges.
This patch improves the profitability decision by mitigating such misjudgments. Before this patch, we considered a loop to be vectorizable only when there are no loop carried dependencies with the IV of the loop. However, a loop carried dependency doesn't prevent vectorization if the distance is positive.

This patch makes the vectorization check more accurate by allowing a loop with the positive dependency. This change potentially increases the number of entries of DepMatrix due to separation between negated and non-negated direction vectors. However, as far as tested with llvm-test-suite, the impact was negligible.

Note that it is difficult to make a complete decision whether a loop can be vectorized or not. To achieve this, we must check the vector width and the distance of dependency.

llvmbot · 2025-03-31T01:56:52Z

@llvm/pr-subscribers-llvm-transforms

Author: Ryotaro Kasuga (kasuga-fj)

Changes

The vectorization profitability has a process to check whether a given loop can be vectorized or not. Since the process is conservative, a loop that can be vectorized may be deemed not to be possible. This can trigger unnecessary exchanges.
This patch improves the profitability decision by mitigating such misjudgments. Before this patch, we considered a loop to be vectorizable only when there are no loop carried dependencies with the IV of the loop. However, a loop carried dependency doesn't prevent vectorization if the distance is positive. This patch makes the vectorization check more accurate by allowing a loop with the positive dependency. Note that it is difficult to make a complete decision whether a loop can be vectorized or not. To achieve this, we must check the vector width and the distance of dependency.

Full diff: https://github.com/llvm/llvm-project/pull/133672.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Scalar/LoopInterchange.cpp (+103-25)
(modified) llvm/test/Transforms/LoopInterchange/profitability-vectorization-heuristic.ll (+3-5)

diff --git a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
index b6b0b7d7a947a..0c3a9cbfeed5f 100644
--- a/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopInterchange.cpp
@@ -17,8 +17,8 @@
 #include "llvm/ADT/SmallSet.h"
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/Statistic.h"
+#include "llvm/ADT/StringMap.h"
 #include "llvm/ADT/StringRef.h"
-#include "llvm/ADT/StringSet.h"
 #include "llvm/Analysis/DependenceAnalysis.h"
 #include "llvm/Analysis/LoopCacheAnalysis.h"
 #include "llvm/Analysis/LoopInfo.h"
@@ -80,6 +80,21 @@ enum class RuleTy {
   ForVectorization,
 };
 
+/// Store the information about if corresponding direction vector was negated
+/// by normalization or not. This is necessary to restore the original one from
+/// a row of a dependency matrix because we only manage normalized direction
+/// vectors. Also, duplicate vectors are eliminated, so there may be both
+/// original and negated vectors for a single entry (a row of dependency
+/// matrix). E.g., if there are two direction vectors `[< =]` and `[> =]`, the
+/// later one will be converted to the same as former one by normalization, so
+/// only `[< =]` would be retained in the final result.
+struct NegatedStatus {
+  bool Original = false;
+  bool Negated = false;
+
+  bool isNonNegativeDir(char Dir) const;
+};
+
 } // end anonymous namespace
 
 // Minimum loop depth supported.
@@ -126,9 +141,10 @@ static void printDepMatrix(CharMatrix &DepMatrix) {
 }
 #endif
 
-static bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level,
-                                     Loop *L, DependenceInfo *DI,
-                                     ScalarEvolution *SE,
+static bool populateDependencyMatrix(CharMatrix &DepMatrix,
+                                     std::vector<NegatedStatus> &NegStatusVec,
+                                     unsigned Level, Loop *L,
+                                     DependenceInfo *DI, ScalarEvolution *SE,
                                      OptimizationRemarkEmitter *ORE) {
   using ValueVector = SmallVector<Value *, 16>;
 
@@ -167,7 +183,9 @@ static bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level,
     return false;
   }
   ValueVector::iterator I, IE, J, JE;
-  StringSet<> Seen;
+
+  // Manage all found direction vectors. and map it to the index of DepMatrix.
+  StringMap<unsigned> Seen;
 
   for (I = MemInstr.begin(), IE = MemInstr.end(); I != IE; ++I) {
     for (J = I, JE = MemInstr.end(); J != JE; ++J) {
@@ -182,7 +200,8 @@ static bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level,
         assert(D->isOrdered() && "Expected an output, flow or anti dep.");
         // If the direction vector is negative, normalize it to
         // make it non-negative.
-        if (D->normalize(SE))
+        bool Normalized = D->normalize(SE);
+        if (Normalized)
           LLVM_DEBUG(dbgs() << "Negative dependence vector normalized.\n");
         LLVM_DEBUG(StringRef DepType =
                        D->isFlow() ? "flow" : D->isAnti() ? "anti" : "output";
@@ -214,8 +233,17 @@ static bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level,
         }
 
         // Make sure we only add unique entries to the dependency matrix.
-        if (Seen.insert(StringRef(Dep.data(), Dep.size())).second)
+        unsigned Index = DepMatrix.size();
+        auto [Ite, Inserted] =
+            Seen.try_emplace(StringRef(Dep.data(), Dep.size()), Index);
+        if (Inserted) {
           DepMatrix.push_back(Dep);
+          NegStatusVec.push_back(NegatedStatus{});
+        } else
+          Index = Ite->second;
+
+        NegatedStatus &Status = NegStatusVec[Index];
+        (Normalized ? Status.Negated : Status.Original) = true;
       }
     }
   }
@@ -400,6 +428,7 @@ class LoopInterchangeProfitability {
   bool isProfitable(const Loop *InnerLoop, const Loop *OuterLoop,
                     unsigned InnerLoopId, unsigned OuterLoopId,
                     CharMatrix &DepMatrix,
+                    const std::vector<NegatedStatus> &NegStatusVec,
                     const DenseMap<const Loop *, unsigned> &CostMap,
                     std::unique_ptr<CacheCost> &CC);
 
@@ -409,9 +438,10 @@ class LoopInterchangeProfitability {
       const DenseMap<const Loop *, unsigned> &CostMap,
       std::unique_ptr<CacheCost> &CC);
   std::optional<bool> isProfitablePerInstrOrderCost();
-  std::optional<bool> isProfitableForVectorization(unsigned InnerLoopId,
-                                                   unsigned OuterLoopId,
-                                                   CharMatrix &DepMatrix);
+  std::optional<bool>
+  isProfitableForVectorization(unsigned InnerLoopId, unsigned OuterLoopId,
+                               CharMatrix &DepMatrix,
+                               const std::vector<NegatedStatus> &NegStatusVec);
   Loop *OuterLoop;
   Loop *InnerLoop;
 
@@ -503,8 +533,9 @@ struct LoopInterchange {
                       << "\n");
 
     CharMatrix DependencyMatrix;
+    std::vector<NegatedStatus> NegStatusVec;
     Loop *OuterMostLoop = *(LoopList.begin());
-    if (!populateDependencyMatrix(DependencyMatrix, LoopNestDepth,
+    if (!populateDependencyMatrix(DependencyMatrix, NegStatusVec, LoopNestDepth,
                                   OuterMostLoop, DI, SE, ORE)) {
       LLVM_DEBUG(dbgs() << "Populating dependency matrix failed\n");
       return false;
@@ -543,8 +574,8 @@ struct LoopInterchange {
     for (unsigned j = SelecLoopId; j > 0; j--) {
       bool ChangedPerIter = false;
       for (unsigned i = SelecLoopId; i > SelecLoopId - j; i--) {
-        bool Interchanged =
-            processLoop(LoopList, i, i - 1, DependencyMatrix, CostMap);
+        bool Interchanged = processLoop(LoopList, i, i - 1, DependencyMatrix,
+                                        NegStatusVec, CostMap);
         ChangedPerIter |= Interchanged;
         Changed |= Interchanged;
       }
@@ -559,6 +590,8 @@ struct LoopInterchange {
   bool processLoop(SmallVectorImpl<Loop *> &LoopList, unsigned InnerLoopId,
                    unsigned OuterLoopId,
                    std::vector<std::vector<char>> &DependencyMatrix,
+
+                   const std::vector<NegatedStatus> &NegStatusVec,
                    const DenseMap<const Loop *, unsigned> &CostMap) {
     Loop *OuterLoop = LoopList[OuterLoopId];
     Loop *InnerLoop = LoopList[InnerLoopId];
@@ -572,7 +605,7 @@ struct LoopInterchange {
     LLVM_DEBUG(dbgs() << "Loops are legal to interchange\n");
     LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE, ORE);
     if (!LIP.isProfitable(InnerLoop, OuterLoop, InnerLoopId, OuterLoopId,
-                          DependencyMatrix, CostMap, CC)) {
+                          DependencyMatrix, NegStatusVec, CostMap, CC)) {
       LLVM_DEBUG(dbgs() << "Interchanging loops not profitable.\n");
       return false;
     }
@@ -1197,27 +1230,71 @@ LoopInterchangeProfitability::isProfitablePerInstrOrderCost() {
   return std::nullopt;
 }
 
+static char flipDirection(char Dir) {
+  switch (Dir) {
+  case '<':
+    return '>';
+  case '>':
+    return '<';
+  case '=':
+  case 'I':
+  case '*':
+    return Dir;
+  default:
+    llvm_unreachable("Unknown direction");
+  }
+}
+
+/// Ensure that there are no negative direction dependencies corresponding to \p
+/// Dir.
+bool NegatedStatus::isNonNegativeDir(char Dir) const {
+  assert((Original || Negated) && "Cannot restore the original direction");
+
+  // If both flag is true, it means that there is both as-is and negated
+  // direction. In this case only `=` or `I` don't have negative direction
+  // dependency.
+  if (Original && Negated)
+    return Dir == '=' || Dir == 'I';
+
+  char Restored = Negated ? flipDirection(Dir) : Dir;
+  return Restored == '=' || Restored == 'I' || Restored == '<';
+}
+
 /// Return true if we can vectorize the loop specified by \p LoopId.
-static bool canVectorize(const CharMatrix &DepMatrix, unsigned LoopId) {
+static bool canVectorize(const CharMatrix &DepMatrix,
+                         const std::vector<NegatedStatus> &NegStatusVec,
+                         unsigned LoopId) {
+  // The loop can be vectorized if there are no negative dependencies. Consider
+  // the dependency of `j` in the following example.
+  //
+  //   Positive: ... = A[i][j]       Negative: ... = A[i][j-1]
+  //             A[i][j-1] = ...               A[i][j] = ...
+  //
+  // In the right case, vectorizing the loop can change the loaded value from
+  // `A[i][j-1]`. At the moment we don't take into account the distance of the
+  // dependency and vector width.
+  // TODO: Considering the dependency distance and the vector width can give a
+  // more accurate result. For example, the following loop can be vectorized if
+  // the vector width is less than or equal to 4 x sizeof(A[0][0]).
   for (unsigned I = 0; I != DepMatrix.size(); I++) {
     char Dir = DepMatrix[I][LoopId];
-    if (Dir != 'I' && Dir != '=')
+    if (!NegStatusVec[I].isNonNegativeDir(Dir))
       return false;
   }
   return true;
 }
 
 std::optional<bool> LoopInterchangeProfitability::isProfitableForVectorization(
-    unsigned InnerLoopId, unsigned OuterLoopId, CharMatrix &DepMatrix) {
-  // If the outer loop is not loop independent it is not profitable to move
-  // this to inner position, since doing so would not enable inner loop
-  // parallelism.
-  if (!canVectorize(DepMatrix, OuterLoopId))
+    unsigned InnerLoopId, unsigned OuterLoopId, CharMatrix &DepMatrix,
+    const std::vector<NegatedStatus> &NegStatusVec) {
+  // If the outer loop cannot be vectorized, it is not profitable to move this
+  // to inner position.
+  if (!canVectorize(DepMatrix, NegStatusVec, OuterLoopId))
     return false;
 
-  // If inner loop has dependence and outer loop is loop independent then it is
+  // If inner loop cannot be vectorized and outer loop can be then it is
   // profitable to interchange to enable inner loop parallelism.
-  if (!canVectorize(DepMatrix, InnerLoopId))
+  if (!canVectorize(DepMatrix, NegStatusVec, InnerLoopId))
     return true;
 
   // TODO: Estimate the cost of vectorized loop body when both the outer and the
@@ -1228,6 +1305,7 @@ std::optional<bool> LoopInterchangeProfitability::isProfitableForVectorization(
 bool LoopInterchangeProfitability::isProfitable(
     const Loop *InnerLoop, const Loop *OuterLoop, unsigned InnerLoopId,
     unsigned OuterLoopId, CharMatrix &DepMatrix,
+    const std::vector<NegatedStatus> &NegStatusVec,
     const DenseMap<const Loop *, unsigned> &CostMap,
     std::unique_ptr<CacheCost> &CC) {
   // isProfitable() is structured to avoid endless loop interchange. If the
@@ -1249,8 +1327,8 @@ bool LoopInterchangeProfitability::isProfitable(
       shouldInterchange = isProfitablePerInstrOrderCost();
       break;
     case RuleTy::ForVectorization:
-      shouldInterchange =
-          isProfitableForVectorization(InnerLoopId, OuterLoopId, DepMatrix);
+      shouldInterchange = isProfitableForVectorization(InnerLoopId, OuterLoopId,
+                                                       DepMatrix, NegStatusVec);
       break;
     }
 
diff --git a/llvm/test/Transforms/LoopInterchange/profitability-vectorization-heuristic.ll b/llvm/test/Transforms/LoopInterchange/profitability-vectorization-heuristic.ll
index b82dd5141a6b2..b83b6b37a6eda 100644
--- a/llvm/test/Transforms/LoopInterchange/profitability-vectorization-heuristic.ll
+++ b/llvm/test/Transforms/LoopInterchange/profitability-vectorization-heuristic.ll
@@ -64,15 +64,13 @@ exit:
 ;   for (int j = 1; j < 256; j++)
 ;     A[i][j-1] = A[i][j] + B[i][j];
 ;
-; FIXME: These loops are exchanged at this time due to the problem of
-; profitablity heuristic for vectorization.
 
-; CHECK:      --- !Passed
+; CHECK:      --- !Missed
 ; CHECK-NEXT: Pass:            loop-interchange
-; CHECK-NEXT: Name:            Interchanged
+; CHECK-NEXT: Name:            InterchangeNotProfitable
 ; CHECK-NEXT: Function:        interchange_unnecesasry_for_vectorization
 ; CHECK-NEXT: Args:
-; CHECK-NEXT:   - String:          Loop interchanged with enclosing loop.
+; CHECK-NEXT:   - String:          Insufficient information to calculate the cost of loop for interchange.
 define void @interchange_unnecesasry_for_vectorization() {
 entry:
   br label %for.i.header

kasuga-fj · 2025-03-31T08:59:43Z

Depends on #133665 and #133667

sjoerdmeijer · 2025-04-02T12:47:12Z

llvm/lib/Transforms/Scalar/LoopInterchange.cpp

@@ -80,6 +80,21 @@ enum class RuleTy {
  ForVectorization,
 };

+/// Store the information about if corresponding direction vector was negated


Before I keep reading the rest of this patch, just wanted to share this first question that I had. I was initially a bit confused about this, and was wondering why we need 2 booleans and 4 states if a direction vector's negated status can only be true or false. But I now guess that the complication here is the unique entries in the dependency matrix, is that right? If that is the case, then I am wondering if it isn't easier to keep all the entries and don't make them unique? Making them unique was a little optimisation that I added recently because I thought that would help, but if this is now complicating things and we need to do all sorts of gymnastics we might as well keep all entries.

But I now guess that the complication here is the unique entries in the dependency matrix, is that right?

Yes. (But holding two boolean values is a bit redundant. What is actually needed are three states. If both of them are false, it is an illegal state.)

I am wondering if it isn't easier to keep all the entries and don't make them unique?

I think it would be simpler. Also, there is no need to stop making entries unique altogether. If duplicate direction vectors are allowed, I think the simplest implementation would be to keep pairs of a direction vector and a boolean value indicating whether the corresponding vector is negated. However, I'm not sure how effective it is to make direction vectors unique. In the worst case, holding pairs of a vector and a boolean value instead of a single vector doubles the number of entries. Is this allowed?

I think duplicated direction vectors are always allowed. They don't add new or different information, so it shouldn't effect the interpretation of the dependence analysis in any way. The only thing that it affects is processing the same information again and again, so the only benefit of making them unique is to avoid that. But if keeping all entries makes the logic easier, there is a good reason to not make them unique. I think adding all the state here complicates things, and if a simple map of original to negated helps, you've certainly got my vote to simplify this.

I think duplicated direction vectors are always allowed. They don't add new or different information, so it shouldn't effect the interpretation of the dependence analysis in any way. The only thing that it affects is processing the same information again and again, so the only benefit of making them unique is to avoid that.

I agree with this, and am only concerned about the compile time degradation. So I will try to compare the difference of the number of entries with or without making them unique. Thank you for your opinion!

I did a quick check using llvm-test-suite, and found using (direction vector, boolean value) pairs as unique keys makes little difference in the number of entries (although in some cases the number of entries increased enormously if I stop making them unique). So changed to keep the negated and non-negated vectors separate.

The vectorization profitability has a process to check whether a given loop can be vectorized or not. Since the process is conservative, a loop that can be vectorized may be deemed not to be possible. This can trigger unnecessary exchanges. This patch improves the profitability decision by mitigating such misjudgments. Before this patch, we considered a loop to be vectorizable only when there are no loop carried dependencies with the IV of the loop. However, a loop carried dependency doesn't prevent vectorization if the distance is positive. This patch makes the vectorization check more accurate by allowing a loop with the positive dependency. Note that it is difficult to make a complete decision whether a loop can be vectorized or not. To achieve this, we must check the vector width and the distance of dependency.

Meinersbur

The PR is described as allowing some loop-carried dependencies, but the only test case is fixing a false-positive which is rejected because "nsufficient information to calculate the cost of loop for interchange". Can you add a positive test with such a positive deendence distance what can now be detected?

sjoerdmeijer · 2025-04-07T19:15:03Z

( I am away from keyboard for 1.5 weeks, can pick this up later, but good thing Michael is looking now too)

kasuga-fj · 2025-04-08T15:05:15Z

Can you add a positive test with such a positive deendence distance what can now be detected?

That makes sense, added it.

llvmbot added the llvm:transforms label Mar 31, 2025

kasuga-fj force-pushed the users/kasuga-fj/loop-interchange-fix-profitable-vectorization branch from 2db59e8 to ae5d9cf Compare April 2, 2025 07:12

kasuga-fj mentioned this pull request Apr 2, 2025

[LoopInterchange] Add tests for the vectorization profitability (NFC) #133665

Merged

kasuga-fj force-pushed the users/kasuga-fj/loop-interchange-improve-profitable-vectorization branch from cdec72a to 72b48ba Compare April 2, 2025 07:13

kasuga-fj requested review from Meinersbur, madhur13490 and sjoerdmeijer April 2, 2025 07:16

kasuga-fj force-pushed the users/kasuga-fj/loop-interchange-fix-profitable-vectorization branch from ae5d9cf to bd84ddc Compare April 2, 2025 07:29

kasuga-fj force-pushed the users/kasuga-fj/loop-interchange-improve-profitable-vectorization branch from 72b48ba to 692e4de Compare April 2, 2025 07:29

kasuga-fj force-pushed the users/kasuga-fj/loop-interchange-fix-profitable-vectorization branch from bd84ddc to b1f0744 Compare April 2, 2025 12:13

kasuga-fj force-pushed the users/kasuga-fj/loop-interchange-improve-profitable-vectorization branch from 692e4de to 1a1c1f6 Compare April 2, 2025 12:14

sjoerdmeijer reviewed Apr 2, 2025

View reviewed changes

Base automatically changed from users/kasuga-fj/loop-interchange-fix-profitable-vectorization to main April 3, 2025 07:21

kasuga-fj force-pushed the users/kasuga-fj/loop-interchange-improve-profitable-vectorization branch from 1a1c1f6 to b1c4248 Compare April 3, 2025 08:00

Handle negated and non negated direction vectors separately.

8f4f814

Meinersbur reviewed Apr 7, 2025

View reviewed changes

Add test that has positive dependencies

cad4db9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoopInterchange] Improve profitability check for vectorization #133672

[LoopInterchange] Improve profitability check for vectorization #133672

kasuga-fj commented Mar 31, 2025 •

edited

Loading

llvmbot commented Mar 31, 2025

kasuga-fj commented Mar 31, 2025 •

edited

Loading

sjoerdmeijer Apr 2, 2025

kasuga-fj Apr 2, 2025

sjoerdmeijer Apr 2, 2025

kasuga-fj Apr 2, 2025

kasuga-fj Apr 3, 2025

Meinersbur left a comment

sjoerdmeijer commented Apr 7, 2025

kasuga-fj commented Apr 8, 2025

[LoopInterchange] Improve profitability check for vectorization #133672

Are you sure you want to change the base?

[LoopInterchange] Improve profitability check for vectorization #133672

Conversation

kasuga-fj commented Mar 31, 2025 • edited Loading

llvmbot commented Mar 31, 2025

kasuga-fj commented Mar 31, 2025 • edited Loading

sjoerdmeijer Apr 2, 2025

Choose a reason for hiding this comment

kasuga-fj Apr 2, 2025

Choose a reason for hiding this comment

sjoerdmeijer Apr 2, 2025

Choose a reason for hiding this comment

kasuga-fj Apr 2, 2025

Choose a reason for hiding this comment

kasuga-fj Apr 3, 2025

Choose a reason for hiding this comment

Meinersbur left a comment

Choose a reason for hiding this comment

sjoerdmeijer commented Apr 7, 2025

kasuga-fj commented Apr 8, 2025

kasuga-fj commented Mar 31, 2025 •

edited

Loading

kasuga-fj commented Mar 31, 2025 •

edited

Loading