[LoopInterchange] Add tests for the vectorization profitability (NFC) #133665

kasuga-fj · 2025-03-31T01:16:55Z

There is a problem with the current profitability check for vectorization in LoopInterchange. There are both false positives and false negatives. The former means that the heuristic may say that "an exchange is necessary to vectorize the innermost loop" even though it's already possible. The latter means that the heuristic may miss a case where an exchange is necessary to vectorize the innermost loop. Note that this is not a dependency analysis problem. This is caused by incorrect handling of the dependency matrix in the profitability check, so these problems can occur even if the analysis is accurate (no overestimation).

This patch adds tests to clarify the cases that should be fixed. The root cause of these cases is that the heuristic doesn't handle the direction of a dependency correctly.

llvmbot · 2025-03-31T01:17:26Z

@llvm/pr-subscribers-llvm-transforms

Author: Ryotaro Kasuga (kasuga-fj)

Changes

There is a problem with the current profitability check for vectorization in LoopInterchange. There are both false positives and false negatives. The former means that the heuristic may say that "an exchange is necessary to vectorize the innermost loop" even though it's already possible. The latter means that the heuristic may miss a case where an exchange is necessary to vectorize the innermost loop. Note that this is not a dependency analysis problem. These problems can occur even if the analysis is accurate (no overestimation).

This patch adds tests to clarify the cases that should be fixed. The root cause of these cases is that the heuristic doesn't handle the direction of a dependency correctly.

Full diff: https://github.com/llvm/llvm-project/pull/133665.diff

1 Files Affected:

(added) llvm/test/Transforms/LoopInterchange/profitability-vectorization-heuristic.ll (+108)

diff --git a/llvm/test/Transforms/LoopInterchange/profitability-vectorization-heuristic.ll b/llvm/test/Transforms/LoopInterchange/profitability-vectorization-heuristic.ll
new file mode 100644
index 0000000000000..606117e70db86
--- /dev/null
+++ b/llvm/test/Transforms/LoopInterchange/profitability-vectorization-heuristic.ll
@@ -0,0 +1,108 @@
+; RUN: opt < %s -passes=loop-interchange -cache-line-size=64 \
+; RUN:     -pass-remarks-output=%t -disable-output -loop-interchange-profitabilities=vectorize
+; RUN: FileCheck -input-file %t %s
+
+@A = dso_local global [256 x [256 x float]] zeroinitializer
+@B = dso_local global [256 x [256 x float]] zeroinitializer
+@C = dso_local global [256 x [256 x float]] zeroinitializer
+
+; Check that the below loops are exchanged for vectorization.
+;
+; for (int i = 0; i < 256; i++) {
+;   for (int j = 1; j < 256; j++) {
+;     A[i][j] = A[i][j-1] + B[i][j];
+;     C[i][j] += 1;
+;   }
+; }
+;
+; FIXME: These loops are not exchanged at this time due to the problem of
+; profitablity heuristic for vectorization.
+
+; CHECK:      --- !Missed
+; CHECK-NEXT: Pass:            loop-interchange
+; CHECK-NEXT: Name:            InterchangeNotProfitable
+; CHECK-NEXT: Function:        interchange_necesasry_for_vectorization
+; CHECK-NEXT: Args:
+; CHECK-NEXT:   - String:          Interchanging loops is not considered to improve cache locality nor vectorization.
+; CHECK-NEXT: ...
+define void @interchange_necesasry_for_vectorization() {
+entry:
+  br label %for.i.header
+
+for.i.header:
+  %i = phi i64 [ 1, %entry ], [ %i.next, %for.i.inc ]
+  br label %for.j.body
+
+for.j.body:
+  %j = phi i64 [ 1, %for.i.header ], [ %j.next, %for.j.body ]
+  %j.dec = add nsw i64 %j, -1
+  %a.load.index = getelementptr nuw inbounds [256 x [256 x float]], ptr @A, i64 %i, i64 %j.dec
+  %b.index = getelementptr nuw inbounds [256 x [256 x float]], ptr @B, i64 %i, i64 %j
+  %c.index = getelementptr nuw inbounds [256 x [256 x float]], ptr @C, i64 %i, i64 %j
+  %a = load float, ptr %a.load.index, align 4
+  %b = load float, ptr %b.index, align 4
+  %c = load float, ptr %c.index, align 4
+  %add.0 = fadd float %a, %b
+  %a.store.index = getelementptr nuw inbounds [256 x [256 x float]], ptr @A, i64 %i, i64 %j
+  store float %add.0, ptr %a.store.index, align 4
+  %add.1 = fadd float %c, 1.0
+  store float %add.1, ptr %c.index, align 4
+  %j.next = add nuw nsw i64 %j, 1
+  %cmp.j = icmp eq i64 %j.next, 256
+  br i1 %cmp.j, label %for.i.inc, label %for.j.body
+
+for.i.inc:
+  %i.next = add nuw nsw i64 %i, 1
+  %cmp.i = icmp eq i64 %i.next, 256
+  br i1 %cmp.i, label %exit, label %for.i.header
+
+exit:
+  ret void
+}
+
+; Check that the following innermost loop can be vectorized so that
+; interchangig is unnecessary.
+;
+; for (int i = 0; i < 256; i++)
+;   for (int j = 1; j < 256; j++)
+;     A[i][j-1] = A[i][j] + B[i][j];
+;
+; FIXME: These loops are exchanged at this time due to the problem of
+; profitablity heuristic for vectorization.
+
+; CHECK:      --- !Passed
+; CHECK-NEXT: Pass:            loop-interchange
+; CHECK-NEXT: Name:            Interchanged
+; CHECK-NEXT: Function:        interchange_unnecesasry_for_vectorization
+; CHECK-NEXT: Args:
+; CHECK-NEXT:   - String:          Loop interchanged with enclosing loop.
+define void @interchange_unnecesasry_for_vectorization() {
+entry:
+  br label %for.i.header
+
+for.i.header:
+  %i = phi i64 [ 1, %entry ], [ %i.next, %for.i.inc ]
+  br label %for.j.body
+
+for.j.body:
+  %j = phi i64 [ 1, %for.i.header ], [ %j.next, %for.j.body ]
+  %j.dec = add nsw i64 %j, -1
+  %a.load.index = getelementptr nuw inbounds [256 x [256 x float]], ptr @A, i64 %i, i64 %j
+  %b.index = getelementptr nuw inbounds [256 x [256 x float]], ptr @B, i64 %i, i64 %j
+  %a = load float, ptr %a.load.index, align 4
+  %b = load float, ptr %b.index, align 4
+  %add = fadd float %a, %b
+  %a.store.index = getelementptr nuw inbounds [256 x [256 x float]], ptr @A, i64 %i, i64 %j.dec
+  store float %add, ptr %a.store.index, align 4
+  %j.next = add nuw nsw i64 %j, 1
+  %cmp.j = icmp eq i64 %j.next, 256
+  br i1 %cmp.j, label %for.i.inc, label %for.j.body
+
+for.i.inc:
+  %i.next = add nuw nsw i64 %i, 1
+  %cmp.i = icmp eq i64 %i.next, 256
+  br i1 %cmp.i, label %exit, label %for.i.header
+
+exit:
+  ret void
+}

kasuga-fj · 2025-03-31T08:58:03Z

Depends on #133664
Each case added will be solved by #133667 and #133672
(Sorry for inconvenience, I tried using Graphite, but it didn't work fine due to my network problem).

There is a problem with the current profitability check for vectorization in LoopInterchange. There are both false positives and false negatives. The former means that the heuristic may say that "an exchange is necessary to vectorize the innermost loop" even though it's already possible. The latter means that the heuristic may miss a case where an exchange is necessary to vectorize the innermost loop. Note that this is not a dependency analysis problem. These problems can occur even if the analysis is accurate (no overestimation). This patch adds tests to clarify the cases that should be fixed. The root cause of these cases is that the heuristic doesn't handle the direction of a dependency correctly.

llvm/test/Transforms/LoopInterchange/profitability-vectorization-heuristic.ll

sjoerdmeijer · 2025-04-02T07:37:40Z

llvm/test/Transforms/LoopInterchange/profitability-vectorization-heuristic.ll

+; for (int i = 0; i < 256; i++) {
+;   for (int j = 1; j < 256; j++) {
+;     A[i][j] = A[i][j-1] + B[i][j];
+;     C[i][j] += 1;


Just two question about this out of curiousity:

is the write to array C necessary?

and is vectorised with a gather/scatter?

is the write to array C necessary?

Yes. Due to the existence of the write to C, the profitability check makes a wrong decision as "we can vectorize the j-loop". Please see the description of #133667 for more details if you are interested in.

and is vectorised with a gather/scatter?

Maybe yes, if the i-loop becomes the innermost one by interchange and it is vectorized. Note that this depends on LoopVectorizer whether the loop is actually vectorized or not.

The current vectorization profitability check in LoopInterchange only checks if a loop can be vectorized or we cannot due to some dependency. To make better decisions, I think we should estimate the vectorization cost.

llvm/test/Transforms/LoopInterchange/profitability-vectorization-heuristic.ll

sjoerdmeijer

Thanks, LGTM

llvmbot added the llvm:transforms label Mar 31, 2025

kasuga-fj mentioned this pull request Mar 31, 2025

[LoopInterchange] Fix the vectorizable check for a loop #133667

Merged

Base automatically changed from users/kasuga-fj/loop-interchange-option-profitabilities to main April 2, 2025 06:41

kasuga-fj force-pushed the users/kasuga-fj/loop-interchange-vectorzation-test branch from b53b7ce to 0f2ac45 Compare April 2, 2025 07:10

kasuga-fj mentioned this pull request Apr 2, 2025

[LoopInterchange] Improve profitability check for vectorization #133672

Open

kasuga-fj requested review from Meinersbur, madhur13490 and sjoerdmeijer April 2, 2025 07:16

sjoerdmeijer reviewed Apr 2, 2025

View reviewed changes

madhur13490 reviewed Apr 2, 2025

View reviewed changes

Address review comments

96dd374

sjoerdmeijer approved these changes Apr 2, 2025

View reviewed changes

kasuga-fj merged commit cf976bf into main Apr 2, 2025
11 checks passed

kasuga-fj deleted the users/kasuga-fj/loop-interchange-vectorzation-test branch April 2, 2025 12:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoopInterchange] Add tests for the vectorization profitability (NFC) #133665

[LoopInterchange] Add tests for the vectorization profitability (NFC) #133665

kasuga-fj commented Mar 31, 2025 •

edited

Loading

llvmbot commented Mar 31, 2025

kasuga-fj commented Mar 31, 2025 •

edited

Loading

sjoerdmeijer Apr 2, 2025

kasuga-fj Apr 2, 2025

sjoerdmeijer left a comment

[LoopInterchange] Add tests for the vectorization profitability (NFC) #133665

[LoopInterchange] Add tests for the vectorization profitability (NFC) #133665

Conversation

kasuga-fj commented Mar 31, 2025 • edited Loading

llvmbot commented Mar 31, 2025

kasuga-fj commented Mar 31, 2025 • edited Loading

sjoerdmeijer Apr 2, 2025

Choose a reason for hiding this comment

kasuga-fj Apr 2, 2025

Choose a reason for hiding this comment

sjoerdmeijer left a comment

Choose a reason for hiding this comment

kasuga-fj commented Mar 31, 2025 •

edited

Loading

kasuga-fj commented Mar 31, 2025 •

edited

Loading