Skip to content

[LoopVectorize] Use CodeSize as the cost kind for minsize #124119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Feb 27, 2025

Conversation

john-brawn-arm
Copy link
Collaborator

Functions marked with minsize should aim for minimum code size, so the vectorizer should use CodeSize for the cost kind and also the cost we compare should be the cost for the entire loop: it shouldn't be divided by the number of vector elements and block costs shouldn't be divided by the block probability.

Possibly we should also be doing this for optsize as well, but there are a lot of tests that assume the current behaviour and the definition of optsize is less clear than minsize (for minsize the goal is to "keep the code size of this function as small as possible" whereas for optsize it's "keep the code size of this function low").

Functions marked with minsize should aim for minimum code size, so the
vectorizer should use CodeSize for the cost kind and also the cost
we compare should be the cost for the entire loop: it shouldn't be
divided by the number of vector elements and block costs shouldn't be
divided by the block probability.

Possibly we should also be doing this for optsize as well, but there
are a lot of tests that assume the current behaviour and the
definition of optsize is less clear than minsize (for minsize the
goal is to "keep the code size of this function as small as possible"
whereas for optsize it's "keep the code size of this function low").
@llvmbot
Copy link
Member

llvmbot commented Jan 23, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: John Brawn (john-brawn-arm)

Changes

Functions marked with minsize should aim for minimum code size, so the vectorizer should use CodeSize for the cost kind and also the cost we compare should be the cost for the entire loop: it shouldn't be divided by the number of vector elements and block costs shouldn't be divided by the block probability.

Possibly we should also be doing this for optsize as well, but there are a lot of tests that assume the current behaviour and the definition of optsize is less clear than minsize (for minsize the goal is to "keep the code size of this function as small as possible" whereas for optsize it's "keep the code size of this function low").


Patch is 140.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124119.diff

4 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+16-4)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+1-1)
  • (added) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+1067)
  • (added) llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll (+1043)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 7167e2179af535..9a617fc66cb935 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -978,7 +978,9 @@ class LoopVectorizationCostModel {
                              InterleavedAccessInfo &IAI)
       : ScalarEpilogueStatus(SEL), TheLoop(L), PSE(PSE), LI(LI), Legal(Legal),
         TTI(TTI), TLI(TLI), DB(DB), AC(AC), ORE(ORE), TheFunction(F),
-        Hints(Hints), InterleaveInfo(IAI), CostKind(TTI::TCK_RecipThroughput) {}
+        Hints(Hints), InterleaveInfo(IAI) {
+    CostKind = F->hasMinSize() ? TTI::TCK_CodeSize : TTI::TCK_RecipThroughput;
+  }
 
   /// \return An upper bound for the vectorization factors (both fixed and
   /// scalable). If the factors are 0, vectorization and interleaving should be
@@ -4277,6 +4279,13 @@ bool LoopVectorizationPlanner::isMoreProfitable(
       EstimatedWidthB *= *VScale;
   }
 
+  // When optimizing for size choose whichever is smallest, which will be the
+  // one with the smallest cost for the whole loop. On a tie pick the larger
+  // vector width, on the assumption that throughput will be greater.
+  if (CM.CostKind == TTI::TCK_CodeSize)
+    return CostA < CostB ||
+      (CostA == CostB && EstimatedWidthA > EstimatedWidthB);
+
   // Assume vscale may be larger than 1 (or the value being tuned for),
   // so that scalable vectorization is slightly favorable over fixed-width
   // vectorization.
@@ -5506,7 +5515,8 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
       }
 
     // Scale the total scalar cost by block probability.
-    ScalarCost /= getReciprocalPredBlockProb();
+    if (CostKind != TTI::TCK_CodeSize)
+      ScalarCost /= getReciprocalPredBlockProb();
 
     // Compute the discount. A non-negative discount means the vector version
     // of the instruction costs more, and scalarizing would be beneficial.
@@ -5558,7 +5568,8 @@ InstructionCost LoopVectorizationCostModel::expectedCost(ElementCount VF) {
     // the predicated block, if it is an if-else block. Thus, scale the block's
     // cost by the probability of executing it. blockNeedsPredication from
     // Legal is used so as to not include all blocks in tail folded loops.
-    if (VF.isScalar() && Legal->blockNeedsPredication(BB))
+    if (VF.isScalar() && Legal->blockNeedsPredication(BB) &&
+        CostKind != TTI::TCK_CodeSize)
       BlockCost /= getReciprocalPredBlockProb();
 
     Cost += BlockCost;
@@ -5637,7 +5648,8 @@ LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,
   // conditional branches, but may not be executed for each vector lane. Scale
   // the cost by the probability of executing the predicated block.
   if (isPredicatedInst(I)) {
-    Cost /= getReciprocalPredBlockProb();
+    if (CostKind != TTI::TCK_CodeSize)
+      Cost /= getReciprocalPredBlockProb();
 
     // Add the cost of an i1 extract and a branch
     auto *VecI1Ty =
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index f1228368804beb..b92bfbe716855a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -793,7 +793,7 @@ InstructionCost VPRegionBlock::cost(ElementCount VF, VPCostContext &Ctx) {
 
   // For the scalar case, we may not always execute the original predicated
   // block, Thus, scale the block's cost by the probability of executing it.
-  if (VF.isScalar())
+  if (VF.isScalar() && Ctx.CostKind != TTI::TCK_CodeSize)
     return ThenCost / getReciprocalPredBlockProb();
 
   return ThenCost;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll b/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll
new file mode 100644
index 00000000000000..37dd63b5c093da
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll
@@ -0,0 +1,1067 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; The tests here check for differences in behaviour between the default,
+; optsize, and minsize.
+; RUN: opt -passes=loop-vectorize -S < %s | FileCheck %s --check-prefix=DEFAULT
+; RUN: opt -passes=forceattrs,loop-vectorize -force-attribute=optsize -S < %s | FileCheck %s --check-prefix=OPTSIZE
+; RUN: opt -passes=forceattrs,loop-vectorize -force-attribute=minsize -S < %s | FileCheck %s --check-prefix=MINSIZE
+
+target triple = "aarch64-unknown-linux-gnu"
+
+@A = global [1000 x i16] zeroinitializer, align 2
+@B = global [1000 x i32] zeroinitializer, align 4
+@C = global [1000 x i32] zeroinitializer, align 4
+
+; This should always vectorize, as using vector instructions eliminates the loop
+; which is both faster and smaller (a scalar version is emitted, but the branch
+; to it is false and it's later removed).
+define void @always_vectorize(ptr %p, i32 %x) {
+; DEFAULT-LABEL: define void @always_vectorize(
+; DEFAULT-SAME: ptr [[P:%.*]], i32 [[X:%.*]]) {
+; DEFAULT-NEXT:  [[ENTRY:.*]]:
+; DEFAULT-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; DEFAULT:       [[VECTOR_PH]]:
+; DEFAULT-NEXT:    br label %[[VECTOR_BODY:.*]]
+; DEFAULT:       [[VECTOR_BODY]]:
+; DEFAULT-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 0
+; DEFAULT-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
+; DEFAULT-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP2]], align 4
+; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[X]], i64 0
+; DEFAULT-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
+; DEFAULT-NEXT:    [[TMP3:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
+; DEFAULT-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
+; DEFAULT-NEXT:    store <4 x i32> [[TMP3]], ptr [[TMP5]], align 4
+; DEFAULT-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
+; DEFAULT:       [[MIDDLE_BLOCK]]:
+; DEFAULT-NEXT:    br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]]
+; DEFAULT:       [[SCALAR_PH]]:
+; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; DEFAULT-NEXT:    br label %[[FOR_BODY:.*]]
+; DEFAULT:       [[FOR_BODY]]:
+; DEFAULT-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; DEFAULT-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[INDVARS_IV]]
+; DEFAULT-NEXT:    [[TMP4:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; DEFAULT-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP4]], [[X]]
+; DEFAULT-NEXT:    store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
+; DEFAULT-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; DEFAULT-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 4
+; DEFAULT-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; DEFAULT:       [[FOR_COND_CLEANUP]]:
+; DEFAULT-NEXT:    ret void
+;
+; OPTSIZE-LABEL: define void @always_vectorize(
+; OPTSIZE-SAME: ptr [[P:%.*]], i32 [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+; OPTSIZE-NEXT:  [[ENTRY:.*]]:
+; OPTSIZE-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; OPTSIZE:       [[VECTOR_PH]]:
+; OPTSIZE-NEXT:    br label %[[VECTOR_BODY:.*]]
+; OPTSIZE:       [[VECTOR_BODY]]:
+; OPTSIZE-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 0
+; OPTSIZE-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
+; OPTSIZE-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP2]], align 4
+; OPTSIZE-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[X]], i64 0
+; OPTSIZE-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
+; OPTSIZE-NEXT:    [[TMP3:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
+; OPTSIZE-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
+; OPTSIZE-NEXT:    store <4 x i32> [[TMP3]], ptr [[TMP5]], align 4
+; OPTSIZE-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
+; OPTSIZE:       [[MIDDLE_BLOCK]]:
+; OPTSIZE-NEXT:    br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]]
+; OPTSIZE:       [[SCALAR_PH]]:
+; OPTSIZE-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; OPTSIZE-NEXT:    br label %[[FOR_BODY:.*]]
+; OPTSIZE:       [[FOR_BODY]]:
+; OPTSIZE-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; OPTSIZE-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[INDVARS_IV]]
+; OPTSIZE-NEXT:    [[TMP4:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; OPTSIZE-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP4]], [[X]]
+; OPTSIZE-NEXT:    store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
+; OPTSIZE-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; OPTSIZE-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 4
+; OPTSIZE-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; OPTSIZE:       [[FOR_COND_CLEANUP]]:
+; OPTSIZE-NEXT:    ret void
+;
+; MINSIZE-LABEL: define void @always_vectorize(
+; MINSIZE-SAME: ptr [[P:%.*]], i32 [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+; MINSIZE-NEXT:  [[ENTRY:.*]]:
+; MINSIZE-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; MINSIZE:       [[VECTOR_PH]]:
+; MINSIZE-NEXT:    br label %[[VECTOR_BODY:.*]]
+; MINSIZE:       [[VECTOR_BODY]]:
+; MINSIZE-NEXT:    [[TMP0:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 0
+; MINSIZE-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i32 0
+; MINSIZE-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP1]], align 4
+; MINSIZE-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[X]], i64 0
+; MINSIZE-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
+; MINSIZE-NEXT:    [[TMP2:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
+; MINSIZE-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i32 0
+; MINSIZE-NEXT:    store <4 x i32> [[TMP2]], ptr [[TMP3]], align 4
+; MINSIZE-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
+; MINSIZE:       [[MIDDLE_BLOCK]]:
+; MINSIZE-NEXT:    br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]]
+; MINSIZE:       [[SCALAR_PH]]:
+; MINSIZE-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; MINSIZE-NEXT:    br label %[[FOR_BODY:.*]]
+; MINSIZE:       [[FOR_BODY]]:
+; MINSIZE-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; MINSIZE-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[INDVARS_IV]]
+; MINSIZE-NEXT:    [[TMP4:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; MINSIZE-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP4]], [[X]]
+; MINSIZE-NEXT:    store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
+; MINSIZE-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; MINSIZE-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 4
+; MINSIZE-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; MINSIZE:       [[FOR_COND_CLEANUP]]:
+; MINSIZE-NEXT:    ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds i32, ptr %p, i64 %indvars.iv
+  %0 = load i32, ptr %arrayidx, align 4
+  %add = add nsw i32 %0, %x
+  store i32 %add, ptr %arrayidx, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, 4
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  ret void
+}
+
+; This should vectorize only without optsize, as it needs a scalar version
+; which increases code size.
+define void @vectorize_without_optsize(ptr %p, i32 %x, i64 %n) {
+; DEFAULT-LABEL: define void @vectorize_without_optsize(
+; DEFAULT-SAME: ptr [[P:%.*]], i32 [[X:%.*]], i64 [[N:%.*]]) {
+; DEFAULT-NEXT:  [[ENTRY:.*]]:
+; DEFAULT-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 8
+; DEFAULT-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; DEFAULT:       [[VECTOR_PH]]:
+; DEFAULT-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N]], 8
+; DEFAULT-NEXT:    [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
+; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[X]], i64 0
+; DEFAULT-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
+; DEFAULT-NEXT:    br label %[[VECTOR_BODY:.*]]
+; DEFAULT:       [[VECTOR_BODY]]:
+; DEFAULT-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; DEFAULT-NEXT:    [[TMP0:%.*]] = add i64 [[INDEX]], 0
+; DEFAULT-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[TMP0]]
+; DEFAULT-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
+; DEFAULT-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 4
+; DEFAULT-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP2]], align 4
+; DEFAULT-NEXT:    [[WIDE_LOAD1:%.*]] = load <4 x i32>, ptr [[TMP3]], align 4
+; DEFAULT-NEXT:    [[TMP4:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
+; DEFAULT-NEXT:    [[TMP5:%.*]] = add nsw <4 x i32> [[WIDE_LOAD1]], [[BROADCAST_SPLAT]]
+; DEFAULT-NEXT:    store <4 x i32> [[TMP4]], ptr [[TMP2]], align 4
+; DEFAULT-NEXT:    store <4 x i32> [[TMP5]], ptr [[TMP3]], align 4
+; DEFAULT-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
+; DEFAULT-NEXT:    [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; DEFAULT-NEXT:    br i1 [[TMP6]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; DEFAULT:       [[MIDDLE_BLOCK]]:
+; DEFAULT-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
+; DEFAULT-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]]
+; DEFAULT:       [[SCALAR_PH]]:
+; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; DEFAULT-NEXT:    br label %[[FOR_BODY:.*]]
+; DEFAULT:       [[FOR_BODY]]:
+; DEFAULT-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; DEFAULT-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[INDVARS_IV]]
+; DEFAULT-NEXT:    [[TMP7:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; DEFAULT-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP7]], [[X]]
+; DEFAULT-NEXT:    store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
+; DEFAULT-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; DEFAULT-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
+; DEFAULT-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; DEFAULT:       [[FOR_COND_CLEANUP]]:
+; DEFAULT-NEXT:    ret void
+;
+; OPTSIZE-LABEL: define void @vectorize_without_optsize(
+; OPTSIZE-SAME: ptr [[P:%.*]], i32 [[X:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
+; OPTSIZE-NEXT:  [[ENTRY:.*]]:
+; OPTSIZE-NEXT:    br label %[[FOR_BODY:.*]]
+; OPTSIZE:       [[FOR_BODY]]:
+; OPTSIZE-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; OPTSIZE-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[INDVARS_IV]]
+; OPTSIZE-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; OPTSIZE-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP0]], [[X]]
+; OPTSIZE-NEXT:    store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
+; OPTSIZE-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; OPTSIZE-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
+; OPTSIZE-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP:.*]], label %[[FOR_BODY]]
+; OPTSIZE:       [[FOR_COND_CLEANUP]]:
+; OPTSIZE-NEXT:    ret void
+;
+; MINSIZE-LABEL: define void @vectorize_without_optsize(
+; MINSIZE-SAME: ptr [[P:%.*]], i32 [[X:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
+; MINSIZE-NEXT:  [[ENTRY:.*]]:
+; MINSIZE-NEXT:    br label %[[FOR_BODY:.*]]
+; MINSIZE:       [[FOR_BODY]]:
+; MINSIZE-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; MINSIZE-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[INDVARS_IV]]
+; MINSIZE-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; MINSIZE-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP0]], [[X]]
+; MINSIZE-NEXT:    store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
+; MINSIZE-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; MINSIZE-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
+; MINSIZE-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP:.*]], label %[[FOR_BODY]]
+; MINSIZE:       [[FOR_COND_CLEANUP]]:
+; MINSIZE-NEXT:    ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds i32, ptr %p, i64 %indvars.iv
+  %0 = load i32, ptr %arrayidx, align 4
+  %add = add nsw i32 %0, %x
+  store i32 %add, ptr %arrayidx, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, %n
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  ret void
+}
+
+; This should be vectorized and tail predicated without optsize, as that's
+; faster, but not with optsize, as it's much larger.
+; FIXME: Currently we avoid tail predication only with minsize
+define void @tail_predicate_without_optsize(ptr %p, i8 %a, i8 %b, i8 %c, i32 %n) {
+; DEFAULT-LABEL: define void @tail_predicate_without_optsize(
+; DEFAULT-SAME: ptr [[P:%.*]], i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]], i32 [[N:%.*]]) {
+; DEFAULT-NEXT:  [[ENTRY:.*]]:
+; DEFAULT-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; DEFAULT:       [[VECTOR_PH]]:
+; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i8> poison, i8 [[A]], i64 0
+; DEFAULT-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT]], <16 x i8> poison, <16 x i32> zeroinitializer
+; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <16 x i8> poison, i8 [[B]], i64 0
+; DEFAULT-NEXT:    [[BROADCAST_SPLAT4:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT3]], <16 x i8> poison, <16 x i32> zeroinitializer
+; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT5:%.*]] = insertelement <16 x i8> poison, i8 [[C]], i64 0
+; DEFAULT-NEXT:    [[BROADCAST_SPLAT6:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT5]], <16 x i8> poison, <16 x i32> zeroinitializer
+; DEFAULT-NEXT:    br label %[[VECTOR_BODY:.*]]
+; DEFAULT:       [[VECTOR_BODY]]:
+; DEFAULT-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE36:.*]] ]
+; DEFAULT-NEXT:    [[VEC_IND:%.*]] = phi <16 x i64> [ <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11, i64 12, i64 13, i64 14, i64 15>, %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], %[[PRED_STORE_CONTINUE36]] ]
+; DEFAULT-NEXT:    [[VEC_IND1:%.*]] = phi <16 x i8> [ <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>, %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT2:%.*]], %[[PRED_STORE_CONTINUE36]] ]
+; DEFAULT-NEXT:    [[TMP0:%.*]] = icmp ule <16 x i64> [[VEC_IND]], splat (i64 14)
+; DEFAULT-NEXT:    [[TMP1:%.*]] = mul <16 x i8> [[BROADCAST_SPLAT]], [[VEC_IND1]]
+; DEFAULT-NEXT:    [[TMP2:%.*]] = lshr <16 x i8> [[VEC_IND1]], splat (i8 1)
+; DEFAULT-NEXT:    [[TMP3:%.*]] = mul <16 x i8> [[TMP2]], [[BROADCAST_SPLAT4]]
+; DEFAULT-NEXT:    [[TMP4:%.*]] = add <16 x i8> [[TMP3]], [[TMP1]]
+; DEFAULT-NEXT:    [[TMP5:%.*]] = lshr <16 x i8> [[VEC_IND1]], splat (i8 2)
+; DEFAULT-NEXT:    [[TMP6:%.*]] = mul <16 x i8> [[TMP5]], [[BROADCAST_SPLAT6]]
+; DEFAULT-NEXT:    [[TMP7:%.*]] = add <16 x i8> [[TMP4]], [[TMP6]]
+; DEFAULT-NEXT:    [[TM...
[truncated]

Copy link

github-actions bot commented Jan 23, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@SamTebbs33 SamTebbs33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything aarch64 or arm-specific about these changes. Can the tests be moved to the parent folder?

Comment on lines 1037 to 1066
;.
; DEFAULT: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
; DEFAULT: [[META1]] = !{!"llvm.loop.unroll.runtime.disable"}
; DEFAULT: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
; DEFAULT: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
; DEFAULT: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]]}
; DEFAULT: [[LOOP5]] = distinct !{[[LOOP5]], [[META2]], [[META1]]}
; DEFAULT: [[LOOP6]] = distinct !{[[LOOP6]], [[META1]], [[META2]]}
; DEFAULT: [[LOOP7]] = distinct !{[[LOOP7]], [[META2]], [[META1]]}
; DEFAULT: [[LOOP8]] = distinct !{[[LOOP8]], [[META1]], [[META2]]}
; DEFAULT: [[LOOP9]] = distinct !{[[LOOP9]], [[META2]], [[META1]]}
; DEFAULT: [[LOOP10]] = distinct !{[[LOOP10]], [[META1]], [[META2]]}
;.
; OPTSIZE: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
; OPTSIZE: [[META1]] = !{!"llvm.loop.unroll.runtime.disable"}
; OPTSIZE: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
; OPTSIZE: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
; OPTSIZE: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]]}
; OPTSIZE: [[LOOP5]] = distinct !{[[LOOP5]], [[META2]], [[META1]]}
; OPTSIZE: [[LOOP6]] = distinct !{[[LOOP6]], [[META1]], [[META2]]}
; OPTSIZE: [[LOOP7]] = distinct !{[[LOOP7]], [[META2]], [[META1]]}
; OPTSIZE: [[LOOP8]] = distinct !{[[LOOP8]], [[META1]], [[META2]]}
;.
; MINSIZE: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
; MINSIZE: [[META1]] = !{!"llvm.loop.unroll.runtime.disable"}
; MINSIZE: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
; MINSIZE: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
; MINSIZE: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]]}
; MINSIZE: [[LOOP5]] = distinct !{[[LOOP5]], [[META2]], [[META1]]}
; MINSIZE: [[LOOP6]] = distinct !{[[LOOP6]], [[META1]], [[META2]]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need these. Likewise for the other file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all generated by update_test_checks.py. I could remove them, but the next time the test is updated they would just be added back.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've always been asked to remove them so I just delete those lines from the file after updating.

Copy link
Collaborator

@huntergr-arm huntergr-arm Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally ask that existing metadata lines are removed (unless they're a deliberate part of the test); this is checking the output of opt instead, which adds new metadata.

Removing the output line checks can be done by changing the version flag used by update_test_checks.py at the top of the file -- instead of UTC_ARGS: --version 5, try version 4 or maybe 3. There might be another flag you can use as well but I can't see it from the help text.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like --check-globals none is the flag to do this.

@john-brawn-arm
Copy link
Collaborator Author

I don't see anything aarch64 or arm-specific about these changes. Can the tests be moved to the parent folder?

Costs are calculated by TargetTransformInfo and so are target-dependent. The two test files are also expecting different things, e.g. in the dont_vectorize_with_minsize we expect different vectorization factors in arm and aarch64.

Copy link
Collaborator

@SamTebbs33 SamTebbs33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, thank you.

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the fix #118100?

Do you have any statistics/perf number on the impact of this?

@@ -5506,7 +5515,8 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
}

// Scale the total scalar cost by block probability.
ScalarCost /= getReciprocalPredBlockProb();
if (CostKind != TTI::TCK_CodeSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the checks here and below may not be covered by the existing tests. Would be good to add test coverage for some of them, if possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been able to come up with a test to specifically test this, as any test I come up with the cost is large enough that incorrectly halving it doesn't matter because it's still larger than the scalar cost. If getReciprocalPredBlockProb used the actual block probability, instead of assuming it's 0.5 as it currently does, then I could probably do it by setting the block probability to close to zero.

@david-arm
Copy link
Contributor

Does the fix #118100?

Isn't #118100 using -Os? If I understand correctly -Oz adds the minsize opt, right?

@john-brawn-arm
Copy link
Collaborator Author

john-brawn-arm commented Jan 28, 2025

Does the fix #118100?

Isn't #118100 using -Os? If I understand correctly -Oz adds the minsize opt, right?

Yes, it doesn't fix #118100 due to that using -Os, which adds optsize not minsize. Compiling the test case in that with -Oz -fvectorize it looks like we currently don't vectorize for other reasons already.

Do you have any statistics/perf number on the impact of this?

Compiling the llvm-test-suite SingleSource and MultiSource benchmarks with --target=aarch64-none-elf -Oz -fvectorize (we need the -fvectorize because clang disables vectorization by default with -Oz) and looking at the code size change in each object, there's only two objects where this makes a difference:

Object                                                Before After
SingleSource/UnitTests/Vectorizer/recurrences.cpp.obj 3024   2940
MultiSource/Benchmarks/MallocBench/gs/gdevmem.c.obj   5492   5396

@john-brawn-arm
Copy link
Collaborator Author

Ping.

@john-brawn-arm
Copy link
Collaborator Author

Ping. @fhahn, have I sufficiently addressed your comments?

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@john-brawn-arm john-brawn-arm merged commit 8150ab9 into llvm:main Feb 27, 2025
8 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 27, 2025

LLVM Buildbot has detected a new failure on builder ml-opt-devrel-x86-64 running on ml-opt-devrel-x86-64-b2 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/175/builds/13911

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/optsize_minsize.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 4: /b/ml-opt-devrel-x86-64-b1/build/bin/opt -passes=loop-vectorize -S < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
+ /b/ml-opt-devrel-x86-64-b1/build/bin/opt -passes=loop-vectorize -S
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
/b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:23:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: br label %[[VECTOR_BODY:.*]]
                ^
<stdin>:17:2: note: 'next' match was here
 br label %vector.body
 ^
<stdin>:14:11: note: previous match ended here
vector.ph: ; preds = %entry
          ^
<stdin>:15:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0
^
/b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:695:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: [[TMP10:%.*]] = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
                ^
<stdin>:357:2: note: 'next' match was here
 %10 = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
 ^
<stdin>:350:105: note: previous match ended here
 %active.lane.mask.entry = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 15)
                                                                                                        ^
<stdin>:351:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <vscale x 16 x i8> poison, i8 %a, i64 0
^

Input file: <stdin>
Check file: /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          .
          .
          .
         12:  br i1 false, label %scalar.ph, label %vector.ph 
         13:  
         14: vector.ph: ; preds = %entry 
         15:  %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0 
         16:  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer 
         17:  br label %vector.body 
next:23       !~~~~~~~~~~~~~~~~~~~~  error: match on wrong line
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 27, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-aarch64-darwin running on doug-worker-5 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/15358

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/optsize_minsize.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 4: /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/opt -passes=loop-vectorize -S < /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll | /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
+ /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/opt -passes=loop-vectorize -S
+ /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
�[1m/Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:23:17: �[0m�[0;1;31merror: �[0m�[1mDEFAULT-NEXT: is not on the line after the previous match
�[0m; DEFAULT-NEXT: br label %[[VECTOR_BODY:.*]]
�[0;1;32m                ^
�[0m�[1m<stdin>:17:2: �[0m�[0;1;30mnote: �[0m�[1m'next' match was here
�[0m br label %vector.body
�[0;1;32m ^
�[0m�[1m<stdin>:14:11: �[0m�[0;1;30mnote: �[0m�[1mprevious match ended here
�[0mvector.ph: ; preds = %entry
�[0;1;32m          ^
�[0m�[1m<stdin>:15:1: �[0m�[0;1;30mnote: �[0m�[1mnon-matching line after previous match is here
�[0m %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0
�[0;1;32m^
�[0m�[1m/Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:695:17: �[0m�[0;1;31merror: �[0m�[1mDEFAULT-NEXT: is not on the line after the previous match
�[0m; DEFAULT-NEXT: [[TMP10:%.*]] = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
�[0;1;32m                ^
�[0m�[1m<stdin>:357:2: �[0m�[0;1;30mnote: �[0m�[1m'next' match was here
�[0m %10 = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
�[0;1;32m ^
�[0m�[1m<stdin>:350:105: �[0m�[0;1;30mnote: �[0m�[1mprevious match ended here
�[0m %active.lane.mask.entry = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 15)
�[0;1;32m                                                                                                        ^
�[0m�[1m<stdin>:351:1: �[0m�[0;1;30mnote: �[0m�[1mnon-matching line after previous match is here
�[0m %broadcast.splatinsert = insertelement <vscale x 16 x i8> poison, i8 %a, i64 0
�[0;1;32m^
�[0m
Input file: <stdin>
Check file: /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m              1: �[0m�[1m�[0;1;46m; ModuleID = '<stdin>' �[0m
�[0;1;30m              2: �[0m�[1m�[0;1;46msource_filename = "<stdin>" �[0m
�[0;1;30m              3: �[0m�[1m�[0;1;46mtarget datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32" �[0m
�[0;1;30m              4: �[0m�[1m�[0;1;46mtarget triple = "aarch64-unknown-linux-gnu" �[0m
�[0;1;30m              5: �[0m�[1m�[0;1;46m �[0m
�[0;1;30m              6: �[0m�[1m�[0;1;46m@A = global [1000 x i16] zeroinitializer, align 2 �[0m
�[0;1;30m              7: �[0m�[1m�[0;1;46m@B = global [1000 x i32] zeroinitializer, align 4 �[0m
�[0;1;30m              8: �[0m�[1m�[0;1;46m@C = global [1000 x i32] zeroinitializer, align 4 �[0m
�[0;1;30m              9: �[0m�[1m�[0;1;46m �[0m
�[0;1;30m             10: �[0m�[1m�[0;1;46m�[0mdefine void @always_vectorize(ptr %p, i32 %x) {�[0;1;46m �[0m
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 27, 2025

LLVM Buildbot has detected a new failure on builder ml-opt-dev-x86-64 running on ml-opt-dev-x86-64-b2 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/14111

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/optsize_minsize.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 4: /b/ml-opt-dev-x86-64-b1/build/bin/opt -passes=loop-vectorize -S < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
+ /b/ml-opt-dev-x86-64-b1/build/bin/opt -passes=loop-vectorize -S
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:23:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: br label %[[VECTOR_BODY:.*]]
                ^
<stdin>:17:2: note: 'next' match was here
 br label %vector.body
 ^
<stdin>:14:11: note: previous match ended here
vector.ph: ; preds = %entry
          ^
<stdin>:15:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0
^
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:695:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: [[TMP10:%.*]] = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
                ^
<stdin>:357:2: note: 'next' match was here
 %10 = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
 ^
<stdin>:350:105: note: previous match ended here
 %active.lane.mask.entry = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 15)
                                                                                                        ^
<stdin>:351:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <vscale x 16 x i8> poison, i8 %a, i64 0
^

Input file: <stdin>
Check file: /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          .
          .
          .
         12:  br i1 false, label %scalar.ph, label %vector.ph 
         13:  
         14: vector.ph: ; preds = %entry 
         15:  %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0 
         16:  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer 
         17:  br label %vector.body 
next:23       !~~~~~~~~~~~~~~~~~~~~  error: match on wrong line
...

fhahn added a commit that referenced this pull request Feb 27, 2025
PR #124119 wasn't rebased & tested before merging. Update the failing
tests.
@fhahn
Copy link
Contributor

fhahn commented Feb 27, 2025

Looks like this was causing some test failures due to it not begin rebased to the latest main before merging, which missed some tests updates. Should be fixed in 649f4dc.

@john-brawn-arm
Copy link
Collaborator Author

Looks like this was causing some test failures due to it not begin rebased to the latest main before merging, which missed some tests updates. Should be fixed in 649f4dc.

Thanks for fixing that.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 27, 2025

LLVM Buildbot has detected a new failure on builder lld-x86_64-ubuntu-fast running on as-builder-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/12068

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/optsize_minsize.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 4: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/opt -passes=loop-vectorize -S < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/opt -passes=loop-vectorize -S
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:23:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: br label %[[VECTOR_BODY:.*]]
                ^
<stdin>:17:2: note: 'next' match was here
 br label %vector.body
 ^
<stdin>:14:11: note: previous match ended here
vector.ph: ; preds = %entry
          ^
<stdin>:15:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0
^
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:695:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: [[TMP10:%.*]] = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
                ^
<stdin>:357:2: note: 'next' match was here
 %10 = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
 ^
<stdin>:350:105: note: previous match ended here
 %active.lane.mask.entry = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 15)
                                                                                                        ^
<stdin>:351:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <vscale x 16 x i8> poison, i8 %a, i64 0
^

Input file: <stdin>
Check file: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          .
          .
          .
         12:  br i1 false, label %scalar.ph, label %vector.ph 
         13:  
         14: vector.ph: ; preds = %entry 
         15:  %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0 
         16:  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer 
         17:  br label %vector.body 
next:23       !~~~~~~~~~~~~~~~~~~~~  error: match on wrong line
...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 27, 2025

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/24095

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/ARM/optsize_minsize.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 4: /build/buildbot/premerge-monolithic-linux/build/bin/opt -passes=loop-vectorize -S < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll --check-prefix=DEFAULT
+ /build/buildbot/premerge-monolithic-linux/build/bin/opt -passes=loop-vectorize -S
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll --check-prefix=DEFAULT
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll:23:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: br label %[[VECTOR_BODY:.*]]
                ^
<stdin>:17:2: note: 'next' match was here
 br label %vector.body
 ^
<stdin>:14:11: note: previous match ended here
vector.ph: ; preds = %entry
          ^
<stdin>:15:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0
^

Input file: <stdin>
Check file: /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
         .
         .
         .
        12:  br i1 false, label %scalar.ph, label %vector.ph 
        13:  
        14: vector.ph: ; preds = %entry 
        15:  %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0 
        16:  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer 
        17:  br label %vector.body 
next:23      !~~~~~~~~~~~~~~~~~~~~  error: match on wrong line
        18:  
        19: vector.body: ; preds = %vector.ph 
        20:  %0 = getelementptr inbounds i32, ptr %p, i64 0 
        21:  %1 = getelementptr inbounds i32, ptr %0, i32 0 
        22:  %wide.load = load <4 x i32>, ptr %1, align 4 
         .
         .
         .
>>>>>>

--

...

@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 27, 2025

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building llvm at step 7 "test-build-unified-tree-check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/20614

Here is the relevant piece of the build log for the reference
Step 7 (test-build-unified-tree-check-llvm) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/optsize_minsize.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 4: /b/1/llvm-x86_64-debian-dylib/build/bin/opt -passes=loop-vectorize -S < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
+ /b/1/llvm-x86_64-debian-dylib/build/bin/opt -passes=loop-vectorize -S
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:23:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: br label %[[VECTOR_BODY:.*]]
                ^
<stdin>:17:2: note: 'next' match was here
 br label %vector.body
 ^
<stdin>:14:11: note: previous match ended here
vector.ph: ; preds = %entry
          ^
<stdin>:15:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0
^
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:695:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: [[TMP10:%.*]] = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
                ^
<stdin>:357:2: note: 'next' match was here
 %10 = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
 ^
<stdin>:350:105: note: previous match ended here
 %active.lane.mask.entry = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 15)
                                                                                                        ^
<stdin>:351:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <vscale x 16 x i8> poison, i8 %a, i64 0
^

Input file: <stdin>
Check file: /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          .
          .
          .
         12:  br i1 false, label %scalar.ph, label %vector.ph 
         13:  
         14: vector.ph: ; preds = %entry 
         15:  %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0 
         16:  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer 
         17:  br label %vector.body 
next:23       !~~~~~~~~~~~~~~~~~~~~  error: match on wrong line
...

joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Mar 3, 2025
Functions marked with minsize should aim for minimum code size, so the
vectorizer should use CodeSize for the cost kind and also the cost we
compare should be the cost for the entire loop: it shouldn't be divided
by the number of vector elements and block costs shouldn't be divided by
the block probability.

Possibly we should also be doing this for optsize as well, but there are
a lot of tests that assume the current behaviour and the definition of
optsize is less clear than minsize (for minsize the goal is to "keep the
code size of this function as small as possible" whereas for optsize
it's "keep the code size of this function low").
joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Mar 3, 2025
PR llvm#124119 wasn't rebased & tested before merging. Update the failing
tests.
@john-brawn-arm john-brawn-arm deleted the vectorize_minsize branch May 13, 2025 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants