[LoopVectorize] Use CodeSize as the cost kind for minsize #124119

john-brawn-arm · 2025-01-23T14:22:35Z

Functions marked with minsize should aim for minimum code size, so the vectorizer should use CodeSize for the cost kind and also the cost we compare should be the cost for the entire loop: it shouldn't be divided by the number of vector elements and block costs shouldn't be divided by the block probability.

Possibly we should also be doing this for optsize as well, but there are a lot of tests that assume the current behaviour and the definition of optsize is less clear than minsize (for minsize the goal is to "keep the code size of this function as small as possible" whereas for optsize it's "keep the code size of this function low").

Functions marked with minsize should aim for minimum code size, so the vectorizer should use CodeSize for the cost kind and also the cost we compare should be the cost for the entire loop: it shouldn't be divided by the number of vector elements and block costs shouldn't be divided by the block probability. Possibly we should also be doing this for optsize as well, but there are a lot of tests that assume the current behaviour and the definition of optsize is less clear than minsize (for minsize the goal is to "keep the code size of this function as small as possible" whereas for optsize it's "keep the code size of this function low").

llvmbot · 2025-01-23T14:23:12Z

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: John Brawn (john-brawn-arm)

Changes

Functions marked with minsize should aim for minimum code size, so the vectorizer should use CodeSize for the cost kind and also the cost we compare should be the cost for the entire loop: it shouldn't be divided by the number of vector elements and block costs shouldn't be divided by the block probability.

Possibly we should also be doing this for optsize as well, but there are a lot of tests that assume the current behaviour and the definition of optsize is less clear than minsize (for minsize the goal is to "keep the code size of this function as small as possible" whereas for optsize it's "keep the code size of this function low").

Patch is 140.13 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/124119.diff

4 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+16-4)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+1-1)
(added) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+1067)
(added) llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll (+1043)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 7167e2179af535..9a617fc66cb935 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -978,7 +978,9 @@ class LoopVectorizationCostModel {
                              InterleavedAccessInfo &IAI)
       : ScalarEpilogueStatus(SEL), TheLoop(L), PSE(PSE), LI(LI), Legal(Legal),
         TTI(TTI), TLI(TLI), DB(DB), AC(AC), ORE(ORE), TheFunction(F),
-        Hints(Hints), InterleaveInfo(IAI), CostKind(TTI::TCK_RecipThroughput) {}
+        Hints(Hints), InterleaveInfo(IAI) {
+    CostKind = F->hasMinSize() ? TTI::TCK_CodeSize : TTI::TCK_RecipThroughput;
+  }
 
   /// \return An upper bound for the vectorization factors (both fixed and
   /// scalable). If the factors are 0, vectorization and interleaving should be
@@ -4277,6 +4279,13 @@ bool LoopVectorizationPlanner::isMoreProfitable(
       EstimatedWidthB *= *VScale;
   }
 
+  // When optimizing for size choose whichever is smallest, which will be the
+  // one with the smallest cost for the whole loop. On a tie pick the larger
+  // vector width, on the assumption that throughput will be greater.
+  if (CM.CostKind == TTI::TCK_CodeSize)
+    return CostA < CostB ||
+      (CostA == CostB && EstimatedWidthA > EstimatedWidthB);
+
   // Assume vscale may be larger than 1 (or the value being tuned for),
   // so that scalable vectorization is slightly favorable over fixed-width
   // vectorization.
@@ -5506,7 +5515,8 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
       }
 
     // Scale the total scalar cost by block probability.
-    ScalarCost /= getReciprocalPredBlockProb();
+    if (CostKind != TTI::TCK_CodeSize)
+      ScalarCost /= getReciprocalPredBlockProb();
 
     // Compute the discount. A non-negative discount means the vector version
     // of the instruction costs more, and scalarizing would be beneficial.
@@ -5558,7 +5568,8 @@ InstructionCost LoopVectorizationCostModel::expectedCost(ElementCount VF) {
     // the predicated block, if it is an if-else block. Thus, scale the block's
     // cost by the probability of executing it. blockNeedsPredication from
     // Legal is used so as to not include all blocks in tail folded loops.
-    if (VF.isScalar() && Legal->blockNeedsPredication(BB))
+    if (VF.isScalar() && Legal->blockNeedsPredication(BB) &&
+        CostKind != TTI::TCK_CodeSize)
       BlockCost /= getReciprocalPredBlockProb();
 
     Cost += BlockCost;
@@ -5637,7 +5648,8 @@ LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,
   // conditional branches, but may not be executed for each vector lane. Scale
   // the cost by the probability of executing the predicated block.
   if (isPredicatedInst(I)) {
-    Cost /= getReciprocalPredBlockProb();
+    if (CostKind != TTI::TCK_CodeSize)
+      Cost /= getReciprocalPredBlockProb();
 
     // Add the cost of an i1 extract and a branch
     auto *VecI1Ty =
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index f1228368804beb..b92bfbe716855a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -793,7 +793,7 @@ InstructionCost VPRegionBlock::cost(ElementCount VF, VPCostContext &Ctx) {
 
   // For the scalar case, we may not always execute the original predicated
   // block, Thus, scale the block's cost by the probability of executing it.
-  if (VF.isScalar())
+  if (VF.isScalar() && Ctx.CostKind != TTI::TCK_CodeSize)
     return ThenCost / getReciprocalPredBlockProb();
 
   return ThenCost;
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll b/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll
new file mode 100644
index 00000000000000..37dd63b5c093da
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll
@@ -0,0 +1,1067 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; The tests here check for differences in behaviour between the default,
+; optsize, and minsize.
+; RUN: opt -passes=loop-vectorize -S < %s | FileCheck %s --check-prefix=DEFAULT
+; RUN: opt -passes=forceattrs,loop-vectorize -force-attribute=optsize -S < %s | FileCheck %s --check-prefix=OPTSIZE
+; RUN: opt -passes=forceattrs,loop-vectorize -force-attribute=minsize -S < %s | FileCheck %s --check-prefix=MINSIZE
+
+target triple = "aarch64-unknown-linux-gnu"
+
+@A = global [1000 x i16] zeroinitializer, align 2
+@B = global [1000 x i32] zeroinitializer, align 4
+@C = global [1000 x i32] zeroinitializer, align 4
+
+; This should always vectorize, as using vector instructions eliminates the loop
+; which is both faster and smaller (a scalar version is emitted, but the branch
+; to it is false and it's later removed).
+define void @always_vectorize(ptr %p, i32 %x) {
+; DEFAULT-LABEL: define void @always_vectorize(
+; DEFAULT-SAME: ptr [[P:%.*]], i32 [[X:%.*]]) {
+; DEFAULT-NEXT:  [[ENTRY:.*]]:
+; DEFAULT-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; DEFAULT:       [[VECTOR_PH]]:
+; DEFAULT-NEXT:    br label %[[VECTOR_BODY:.*]]
+; DEFAULT:       [[VECTOR_BODY]]:
+; DEFAULT-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 0
+; DEFAULT-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
+; DEFAULT-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP2]], align 4
+; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[X]], i64 0
+; DEFAULT-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
+; DEFAULT-NEXT:    [[TMP3:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
+; DEFAULT-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
+; DEFAULT-NEXT:    store <4 x i32> [[TMP3]], ptr [[TMP5]], align 4
+; DEFAULT-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
+; DEFAULT:       [[MIDDLE_BLOCK]]:
+; DEFAULT-NEXT:    br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]]
+; DEFAULT:       [[SCALAR_PH]]:
+; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; DEFAULT-NEXT:    br label %[[FOR_BODY:.*]]
+; DEFAULT:       [[FOR_BODY]]:
+; DEFAULT-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; DEFAULT-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[INDVARS_IV]]
+; DEFAULT-NEXT:    [[TMP4:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; DEFAULT-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP4]], [[X]]
+; DEFAULT-NEXT:    store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
+; DEFAULT-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; DEFAULT-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 4
+; DEFAULT-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; DEFAULT:       [[FOR_COND_CLEANUP]]:
+; DEFAULT-NEXT:    ret void
+;
+; OPTSIZE-LABEL: define void @always_vectorize(
+; OPTSIZE-SAME: ptr [[P:%.*]], i32 [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+; OPTSIZE-NEXT:  [[ENTRY:.*]]:
+; OPTSIZE-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; OPTSIZE:       [[VECTOR_PH]]:
+; OPTSIZE-NEXT:    br label %[[VECTOR_BODY:.*]]
+; OPTSIZE:       [[VECTOR_BODY]]:
+; OPTSIZE-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 0
+; OPTSIZE-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
+; OPTSIZE-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP2]], align 4
+; OPTSIZE-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[X]], i64 0
+; OPTSIZE-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
+; OPTSIZE-NEXT:    [[TMP3:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
+; OPTSIZE-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
+; OPTSIZE-NEXT:    store <4 x i32> [[TMP3]], ptr [[TMP5]], align 4
+; OPTSIZE-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
+; OPTSIZE:       [[MIDDLE_BLOCK]]:
+; OPTSIZE-NEXT:    br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]]
+; OPTSIZE:       [[SCALAR_PH]]:
+; OPTSIZE-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; OPTSIZE-NEXT:    br label %[[FOR_BODY:.*]]
+; OPTSIZE:       [[FOR_BODY]]:
+; OPTSIZE-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; OPTSIZE-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[INDVARS_IV]]
+; OPTSIZE-NEXT:    [[TMP4:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; OPTSIZE-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP4]], [[X]]
+; OPTSIZE-NEXT:    store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
+; OPTSIZE-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; OPTSIZE-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 4
+; OPTSIZE-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; OPTSIZE:       [[FOR_COND_CLEANUP]]:
+; OPTSIZE-NEXT:    ret void
+;
+; MINSIZE-LABEL: define void @always_vectorize(
+; MINSIZE-SAME: ptr [[P:%.*]], i32 [[X:%.*]]) #[[ATTR0:[0-9]+]] {
+; MINSIZE-NEXT:  [[ENTRY:.*]]:
+; MINSIZE-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; MINSIZE:       [[VECTOR_PH]]:
+; MINSIZE-NEXT:    br label %[[VECTOR_BODY:.*]]
+; MINSIZE:       [[VECTOR_BODY]]:
+; MINSIZE-NEXT:    [[TMP0:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 0
+; MINSIZE-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i32 0
+; MINSIZE-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP1]], align 4
+; MINSIZE-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[X]], i64 0
+; MINSIZE-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
+; MINSIZE-NEXT:    [[TMP2:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
+; MINSIZE-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP0]], i32 0
+; MINSIZE-NEXT:    store <4 x i32> [[TMP2]], ptr [[TMP3]], align 4
+; MINSIZE-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
+; MINSIZE:       [[MIDDLE_BLOCK]]:
+; MINSIZE-NEXT:    br i1 true, label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]]
+; MINSIZE:       [[SCALAR_PH]]:
+; MINSIZE-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ 4, %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; MINSIZE-NEXT:    br label %[[FOR_BODY:.*]]
+; MINSIZE:       [[FOR_BODY]]:
+; MINSIZE-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; MINSIZE-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[INDVARS_IV]]
+; MINSIZE-NEXT:    [[TMP4:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; MINSIZE-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP4]], [[X]]
+; MINSIZE-NEXT:    store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
+; MINSIZE-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; MINSIZE-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 4
+; MINSIZE-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; MINSIZE:       [[FOR_COND_CLEANUP]]:
+; MINSIZE-NEXT:    ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds i32, ptr %p, i64 %indvars.iv
+  %0 = load i32, ptr %arrayidx, align 4
+  %add = add nsw i32 %0, %x
+  store i32 %add, ptr %arrayidx, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, 4
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  ret void
+}
+
+; This should vectorize only without optsize, as it needs a scalar version
+; which increases code size.
+define void @vectorize_without_optsize(ptr %p, i32 %x, i64 %n) {
+; DEFAULT-LABEL: define void @vectorize_without_optsize(
+; DEFAULT-SAME: ptr [[P:%.*]], i32 [[X:%.*]], i64 [[N:%.*]]) {
+; DEFAULT-NEXT:  [[ENTRY:.*]]:
+; DEFAULT-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[N]], 8
+; DEFAULT-NEXT:    br i1 [[MIN_ITERS_CHECK]], label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; DEFAULT:       [[VECTOR_PH]]:
+; DEFAULT-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N]], 8
+; DEFAULT-NEXT:    [[N_VEC:%.*]] = sub i64 [[N]], [[N_MOD_VF]]
+; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i32> poison, i32 [[X]], i64 0
+; DEFAULT-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT]], <4 x i32> poison, <4 x i32> zeroinitializer
+; DEFAULT-NEXT:    br label %[[VECTOR_BODY:.*]]
+; DEFAULT:       [[VECTOR_BODY]]:
+; DEFAULT-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; DEFAULT-NEXT:    [[TMP0:%.*]] = add i64 [[INDEX]], 0
+; DEFAULT-NEXT:    [[TMP1:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[TMP0]]
+; DEFAULT-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 0
+; DEFAULT-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i32, ptr [[TMP1]], i32 4
+; DEFAULT-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x i32>, ptr [[TMP2]], align 4
+; DEFAULT-NEXT:    [[WIDE_LOAD1:%.*]] = load <4 x i32>, ptr [[TMP3]], align 4
+; DEFAULT-NEXT:    [[TMP4:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], [[BROADCAST_SPLAT]]
+; DEFAULT-NEXT:    [[TMP5:%.*]] = add nsw <4 x i32> [[WIDE_LOAD1]], [[BROADCAST_SPLAT]]
+; DEFAULT-NEXT:    store <4 x i32> [[TMP4]], ptr [[TMP2]], align 4
+; DEFAULT-NEXT:    store <4 x i32> [[TMP5]], ptr [[TMP3]], align 4
+; DEFAULT-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
+; DEFAULT-NEXT:    [[TMP6:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; DEFAULT-NEXT:    br i1 [[TMP6]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP3:![0-9]+]]
+; DEFAULT:       [[MIDDLE_BLOCK]]:
+; DEFAULT-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[N]], [[N_VEC]]
+; DEFAULT-NEXT:    br i1 [[CMP_N]], label %[[FOR_COND_CLEANUP:.*]], label %[[SCALAR_PH]]
+; DEFAULT:       [[SCALAR_PH]]:
+; DEFAULT-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[MIDDLE_BLOCK]] ], [ 0, %[[ENTRY]] ]
+; DEFAULT-NEXT:    br label %[[FOR_BODY:.*]]
+; DEFAULT:       [[FOR_BODY]]:
+; DEFAULT-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], %[[SCALAR_PH]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; DEFAULT-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[INDVARS_IV]]
+; DEFAULT-NEXT:    [[TMP7:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; DEFAULT-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP7]], [[X]]
+; DEFAULT-NEXT:    store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
+; DEFAULT-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; DEFAULT-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
+; DEFAULT-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP]], label %[[FOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
+; DEFAULT:       [[FOR_COND_CLEANUP]]:
+; DEFAULT-NEXT:    ret void
+;
+; OPTSIZE-LABEL: define void @vectorize_without_optsize(
+; OPTSIZE-SAME: ptr [[P:%.*]], i32 [[X:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
+; OPTSIZE-NEXT:  [[ENTRY:.*]]:
+; OPTSIZE-NEXT:    br label %[[FOR_BODY:.*]]
+; OPTSIZE:       [[FOR_BODY]]:
+; OPTSIZE-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; OPTSIZE-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[INDVARS_IV]]
+; OPTSIZE-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; OPTSIZE-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP0]], [[X]]
+; OPTSIZE-NEXT:    store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
+; OPTSIZE-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; OPTSIZE-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
+; OPTSIZE-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP:.*]], label %[[FOR_BODY]]
+; OPTSIZE:       [[FOR_COND_CLEANUP]]:
+; OPTSIZE-NEXT:    ret void
+;
+; MINSIZE-LABEL: define void @vectorize_without_optsize(
+; MINSIZE-SAME: ptr [[P:%.*]], i32 [[X:%.*]], i64 [[N:%.*]]) #[[ATTR0]] {
+; MINSIZE-NEXT:  [[ENTRY:.*]]:
+; MINSIZE-NEXT:    br label %[[FOR_BODY:.*]]
+; MINSIZE:       [[FOR_BODY]]:
+; MINSIZE-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.*]], %[[FOR_BODY]] ]
+; MINSIZE-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[P]], i64 [[INDVARS_IV]]
+; MINSIZE-NEXT:    [[TMP0:%.*]] = load i32, ptr [[ARRAYIDX]], align 4
+; MINSIZE-NEXT:    [[ADD:%.*]] = add nsw i32 [[TMP0]], [[X]]
+; MINSIZE-NEXT:    store i32 [[ADD]], ptr [[ARRAYIDX]], align 4
+; MINSIZE-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
+; MINSIZE-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
+; MINSIZE-NEXT:    br i1 [[EXITCOND_NOT]], label %[[FOR_COND_CLEANUP:.*]], label %[[FOR_BODY]]
+; MINSIZE:       [[FOR_COND_CLEANUP]]:
+; MINSIZE-NEXT:    ret void
+;
+entry:
+  br label %for.body
+
+for.body:
+  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
+  %arrayidx = getelementptr inbounds i32, ptr %p, i64 %indvars.iv
+  %0 = load i32, ptr %arrayidx, align 4
+  %add = add nsw i32 %0, %x
+  store i32 %add, ptr %arrayidx, align 4
+  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+  %exitcond.not = icmp eq i64 %indvars.iv.next, %n
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:
+  ret void
+}
+
+; This should be vectorized and tail predicated without optsize, as that's
+; faster, but not with optsize, as it's much larger.
+; FIXME: Currently we avoid tail predication only with minsize
+define void @tail_predicate_without_optsize(ptr %p, i8 %a, i8 %b, i8 %c, i32 %n) {
+; DEFAULT-LABEL: define void @tail_predicate_without_optsize(
+; DEFAULT-SAME: ptr [[P:%.*]], i8 [[A:%.*]], i8 [[B:%.*]], i8 [[C:%.*]], i32 [[N:%.*]]) {
+; DEFAULT-NEXT:  [[ENTRY:.*]]:
+; DEFAULT-NEXT:    br i1 false, label %[[SCALAR_PH:.*]], label %[[VECTOR_PH:.*]]
+; DEFAULT:       [[VECTOR_PH]]:
+; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT:%.*]] = insertelement <16 x i8> poison, i8 [[A]], i64 0
+; DEFAULT-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT]], <16 x i8> poison, <16 x i32> zeroinitializer
+; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <16 x i8> poison, i8 [[B]], i64 0
+; DEFAULT-NEXT:    [[BROADCAST_SPLAT4:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT3]], <16 x i8> poison, <16 x i32> zeroinitializer
+; DEFAULT-NEXT:    [[BROADCAST_SPLATINSERT5:%.*]] = insertelement <16 x i8> poison, i8 [[C]], i64 0
+; DEFAULT-NEXT:    [[BROADCAST_SPLAT6:%.*]] = shufflevector <16 x i8> [[BROADCAST_SPLATINSERT5]], <16 x i8> poison, <16 x i32> zeroinitializer
+; DEFAULT-NEXT:    br label %[[VECTOR_BODY:.*]]
+; DEFAULT:       [[VECTOR_BODY]]:
+; DEFAULT-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE36:.*]] ]
+; DEFAULT-NEXT:    [[VEC_IND:%.*]] = phi <16 x i64> [ <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i64 9, i64 10, i64 11, i64 12, i64 13, i64 14, i64 15>, %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], %[[PRED_STORE_CONTINUE36]] ]
+; DEFAULT-NEXT:    [[VEC_IND1:%.*]] = phi <16 x i8> [ <i8 0, i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15>, %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT2:%.*]], %[[PRED_STORE_CONTINUE36]] ]
+; DEFAULT-NEXT:    [[TMP0:%.*]] = icmp ule <16 x i64> [[VEC_IND]], splat (i64 14)
+; DEFAULT-NEXT:    [[TMP1:%.*]] = mul <16 x i8> [[BROADCAST_SPLAT]], [[VEC_IND1]]
+; DEFAULT-NEXT:    [[TMP2:%.*]] = lshr <16 x i8> [[VEC_IND1]], splat (i8 1)
+; DEFAULT-NEXT:    [[TMP3:%.*]] = mul <16 x i8> [[TMP2]], [[BROADCAST_SPLAT4]]
+; DEFAULT-NEXT:    [[TMP4:%.*]] = add <16 x i8> [[TMP3]], [[TMP1]]
+; DEFAULT-NEXT:    [[TMP5:%.*]] = lshr <16 x i8> [[VEC_IND1]], splat (i8 2)
+; DEFAULT-NEXT:    [[TMP6:%.*]] = mul <16 x i8> [[TMP5]], [[BROADCAST_SPLAT6]]
+; DEFAULT-NEXT:    [[TMP7:%.*]] = add <16 x i8> [[TMP4]], [[TMP6]]
+; DEFAULT-NEXT:    [[TM...
[truncated]

github-actions · 2025-01-23T14:26:47Z

✅ With the latest revision this PR passed the C/C++ code formatter.

SamTebbs33

I don't see anything aarch64 or arm-specific about these changes. Can the tests be moved to the parent folder?

SamTebbs33 · 2025-01-23T15:04:00Z

llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll

+;.
+; DEFAULT: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; DEFAULT: [[META1]] = !{!"llvm.loop.unroll.runtime.disable"}
+; DEFAULT: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
+; DEFAULT: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+; DEFAULT: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]]}
+; DEFAULT: [[LOOP5]] = distinct !{[[LOOP5]], [[META2]], [[META1]]}
+; DEFAULT: [[LOOP6]] = distinct !{[[LOOP6]], [[META1]], [[META2]]}
+; DEFAULT: [[LOOP7]] = distinct !{[[LOOP7]], [[META2]], [[META1]]}
+; DEFAULT: [[LOOP8]] = distinct !{[[LOOP8]], [[META1]], [[META2]]}
+; DEFAULT: [[LOOP9]] = distinct !{[[LOOP9]], [[META2]], [[META1]]}
+; DEFAULT: [[LOOP10]] = distinct !{[[LOOP10]], [[META1]], [[META2]]}
+;.
+; OPTSIZE: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; OPTSIZE: [[META1]] = !{!"llvm.loop.unroll.runtime.disable"}
+; OPTSIZE: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
+; OPTSIZE: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+; OPTSIZE: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]]}
+; OPTSIZE: [[LOOP5]] = distinct !{[[LOOP5]], [[META2]], [[META1]]}
+; OPTSIZE: [[LOOP6]] = distinct !{[[LOOP6]], [[META1]], [[META2]]}
+; OPTSIZE: [[LOOP7]] = distinct !{[[LOOP7]], [[META2]], [[META1]]}
+; OPTSIZE: [[LOOP8]] = distinct !{[[LOOP8]], [[META1]], [[META2]]}
+;.
+; MINSIZE: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]], [[META2:![0-9]+]]}
+; MINSIZE: [[META1]] = !{!"llvm.loop.unroll.runtime.disable"}
+; MINSIZE: [[META2]] = !{!"llvm.loop.isvectorized", i32 1}
+; MINSIZE: [[LOOP3]] = distinct !{[[LOOP3]], [[META2]], [[META1]]}
+; MINSIZE: [[LOOP4]] = distinct !{[[LOOP4]], [[META1]], [[META2]]}
+; MINSIZE: [[LOOP5]] = distinct !{[[LOOP5]], [[META2]], [[META1]]}
+; MINSIZE: [[LOOP6]] = distinct !{[[LOOP6]], [[META1]], [[META2]]}


I don't think we need these. Likewise for the other file.

These are all generated by update_test_checks.py. I could remove them, but the next time the test is updated they would just be added back.

I've always been asked to remove them so I just delete those lines from the file after updating.

We generally ask that existing metadata lines are removed (unless they're a deliberate part of the test); this is checking the output of opt instead, which adds new metadata.

Removing the output line checks can be done by changing the version flag used by update_test_checks.py at the top of the file -- instead of UTC_ARGS: --version 5, try version 4 or maybe 3. There might be another flag you can use as well but I can't see it from the help text.

It looks like --check-globals none is the flag to do this.

john-brawn-arm · 2025-01-23T17:18:02Z

I don't see anything aarch64 or arm-specific about these changes. Can the tests be moved to the parent folder?

Costs are calculated by TargetTransformInfo and so are target-dependent. The two test files are also expecting different things, e.g. in the dont_vectorize_with_minsize we expect different vectorization factors in arm and aarch64.

SamTebbs33

Looks good to me now, thank you.

fhahn

Does the fix #118100?

Do you have any statistics/perf number on the impact of this?

llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll

llvm/lib/Transforms/Vectorize/VPlan.cpp

fhahn · 2025-01-24T14:44:29Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -5506,7 +5515,8 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
      }

    // Scale the total scalar cost by block probability.
-    ScalarCost /= getReciprocalPredBlockProb();
+    if (CostKind != TTI::TCK_CodeSize)


It looks like the checks here and below may not be covered by the existing tests. Would be good to add test coverage for some of them, if possible.

I haven't been able to come up with a test to specifically test this, as any test I come up with the cost is large enough that incorrectly halving it doesn't matter because it's still larger than the scalar cost. If getReciprocalPredBlockProb used the actual block probability, instead of assuming it's 0.5 as it currently does, then I could probably do it by setting the block probability to close to zero.

david-arm · 2025-01-27T15:00:54Z

Does the fix #118100?

Isn't #118100 using -Os? If I understand correctly -Oz adds the minsize opt, right?

llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll

john-brawn-arm · 2025-01-28T17:25:50Z

Does the fix #118100?

Isn't #118100 using -Os? If I understand correctly -Oz adds the minsize opt, right?

Yes, it doesn't fix #118100 due to that using -Os, which adds optsize not minsize. Compiling the test case in that with -Oz -fvectorize it looks like we currently don't vectorize for other reasons already.

Do you have any statistics/perf number on the impact of this?

Compiling the llvm-test-suite SingleSource and MultiSource benchmarks with --target=aarch64-none-elf -Oz -fvectorize (we need the -fvectorize because clang disables vectorization by default with -Oz) and looking at the code size change in each object, there's only two objects where this makes a difference:

Object                                                Before After
SingleSource/UnitTests/Vectorizer/recurrences.cpp.obj 3024   2940
MultiSource/Benchmarks/MallocBench/gs/gdevmem.c.obj   5492   5396

john-brawn-arm · 2025-02-17T14:04:53Z

Ping.

john-brawn-arm · 2025-02-26T13:45:01Z

Ping. @fhahn, have I sufficiently addressed your comments?

fhahn

LGTM, thanks!

llvm-ci · 2025-02-27T11:36:47Z

LLVM Buildbot has detected a new failure on builder ml-opt-devrel-x86-64 running on ml-opt-devrel-x86-64-b2 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/175/builds/13911

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/optsize_minsize.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 4: /b/ml-opt-devrel-x86-64-b1/build/bin/opt -passes=loop-vectorize -S < /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll | /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
+ /b/ml-opt-devrel-x86-64-b1/build/bin/opt -passes=loop-vectorize -S
+ /b/ml-opt-devrel-x86-64-b1/build/bin/FileCheck /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
/b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:23:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: br label %[[VECTOR_BODY:.*]]
                ^
<stdin>:17:2: note: 'next' match was here
 br label %vector.body
 ^
<stdin>:14:11: note: previous match ended here
vector.ph: ; preds = %entry
          ^
<stdin>:15:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0
^
/b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:695:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: [[TMP10:%.*]] = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
                ^
<stdin>:357:2: note: 'next' match was here
 %10 = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
 ^
<stdin>:350:105: note: previous match ended here
 %active.lane.mask.entry = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 15)
                                                                                                        ^
<stdin>:351:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <vscale x 16 x i8> poison, i8 %a, i64 0
^

Input file: <stdin>
Check file: /b/ml-opt-devrel-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          .
          .
          .
         12:  br i1 false, label %scalar.ph, label %vector.ph 
         13:  
         14: vector.ph: ; preds = %entry 
         15:  %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0 
         16:  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer 
         17:  br label %vector.body 
next:23       !~~~~~~~~~~~~~~~~~~~~  error: match on wrong line
...

llvm-ci · 2025-02-27T11:36:56Z

LLVM Buildbot has detected a new failure on builder llvm-clang-aarch64-darwin running on doug-worker-5 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/15358

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/optsize_minsize.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 4: /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/opt -passes=loop-vectorize -S < /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll | /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
+ /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/opt -passes=loop-vectorize -S
+ /Users/buildbot/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
�[1m/Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:23:17: �[0m�[0;1;31merror: �[0m�[1mDEFAULT-NEXT: is not on the line after the previous match
�[0m; DEFAULT-NEXT: br label %[[VECTOR_BODY:.*]]
�[0;1;32m                ^
�[0m�[1m<stdin>:17:2: �[0m�[0;1;30mnote: �[0m�[1m'next' match was here
�[0m br label %vector.body
�[0;1;32m ^
�[0m�[1m<stdin>:14:11: �[0m�[0;1;30mnote: �[0m�[1mprevious match ended here
�[0mvector.ph: ; preds = %entry
�[0;1;32m          ^
�[0m�[1m<stdin>:15:1: �[0m�[0;1;30mnote: �[0m�[1mnon-matching line after previous match is here
�[0m %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0
�[0;1;32m^
�[0m�[1m/Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:695:17: �[0m�[0;1;31merror: �[0m�[1mDEFAULT-NEXT: is not on the line after the previous match
�[0m; DEFAULT-NEXT: [[TMP10:%.*]] = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
�[0;1;32m                ^
�[0m�[1m<stdin>:357:2: �[0m�[0;1;30mnote: �[0m�[1m'next' match was here
�[0m %10 = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
�[0;1;32m ^
�[0m�[1m<stdin>:350:105: �[0m�[0;1;30mnote: �[0m�[1mprevious match ended here
�[0m %active.lane.mask.entry = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 15)
�[0;1;32m                                                                                                        ^
�[0m�[1m<stdin>:351:1: �[0m�[0;1;30mnote: �[0m�[1mnon-matching line after previous match is here
�[0m %broadcast.splatinsert = insertelement <vscale x 16 x i8> poison, i8 %a, i64 0
�[0;1;32m^
�[0m
Input file: <stdin>
Check file: /Users/buildbot/buildbot-root/aarch64-darwin/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
�[1m�[0m�[0;1;30m              1: �[0m�[1m�[0;1;46m; ModuleID = '<stdin>' �[0m
�[0;1;30m              2: �[0m�[1m�[0;1;46msource_filename = "<stdin>" �[0m
�[0;1;30m              3: �[0m�[1m�[0;1;46mtarget datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128-Fn32" �[0m
�[0;1;30m              4: �[0m�[1m�[0;1;46mtarget triple = "aarch64-unknown-linux-gnu" �[0m
�[0;1;30m              5: �[0m�[1m�[0;1;46m �[0m
�[0;1;30m              6: �[0m�[1m�[0;1;46m@A = global [1000 x i16] zeroinitializer, align 2 �[0m
�[0;1;30m              7: �[0m�[1m�[0;1;46m@B = global [1000 x i32] zeroinitializer, align 4 �[0m
�[0;1;30m              8: �[0m�[1m�[0;1;46m@C = global [1000 x i32] zeroinitializer, align 4 �[0m
�[0;1;30m              9: �[0m�[1m�[0;1;46m �[0m
�[0;1;30m             10: �[0m�[1m�[0;1;46m�[0mdefine void @always_vectorize(ptr %p, i32 %x) {�[0;1;46m �[0m
...

llvm-ci · 2025-02-27T11:54:35Z

LLVM Buildbot has detected a new failure on builder ml-opt-dev-x86-64 running on ml-opt-dev-x86-64-b2 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/137/builds/14111

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/optsize_minsize.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 4: /b/ml-opt-dev-x86-64-b1/build/bin/opt -passes=loop-vectorize -S < /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll | /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
+ /b/ml-opt-dev-x86-64-b1/build/bin/opt -passes=loop-vectorize -S
+ /b/ml-opt-dev-x86-64-b1/build/bin/FileCheck /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:23:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: br label %[[VECTOR_BODY:.*]]
                ^
<stdin>:17:2: note: 'next' match was here
 br label %vector.body
 ^
<stdin>:14:11: note: previous match ended here
vector.ph: ; preds = %entry
          ^
<stdin>:15:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0
^
/b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:695:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: [[TMP10:%.*]] = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
                ^
<stdin>:357:2: note: 'next' match was here
 %10 = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
 ^
<stdin>:350:105: note: previous match ended here
 %active.lane.mask.entry = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 15)
                                                                                                        ^
<stdin>:351:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <vscale x 16 x i8> poison, i8 %a, i64 0
^

Input file: <stdin>
Check file: /b/ml-opt-dev-x86-64-b1/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          .
          .
          .
         12:  br i1 false, label %scalar.ph, label %vector.ph 
         13:  
         14: vector.ph: ; preds = %entry 
         15:  %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0 
         16:  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer 
         17:  br label %vector.body 
next:23       !~~~~~~~~~~~~~~~~~~~~  error: match on wrong line
...

PR #124119 wasn't rebased & tested before merging. Update the failing tests.

fhahn · 2025-02-27T12:20:08Z

Looks like this was causing some test failures due to it not begin rebased to the latest main before merging, which missed some tests updates. Should be fixed in 649f4dc.

john-brawn-arm · 2025-02-27T12:22:18Z

Looks like this was causing some test failures due to it not begin rebased to the latest main before merging, which missed some tests updates. Should be fixed in 649f4dc.

Thanks for fixing that.

llvm-ci · 2025-02-27T13:57:16Z

LLVM Buildbot has detected a new failure on builder lld-x86_64-ubuntu-fast running on as-builder-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/12068

Here is the relevant piece of the build log for the reference

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/optsize_minsize.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 4: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/opt -passes=loop-vectorize -S < /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll | /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/opt -passes=loop-vectorize -S
+ /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/build/bin/FileCheck /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:23:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: br label %[[VECTOR_BODY:.*]]
                ^
<stdin>:17:2: note: 'next' match was here
 br label %vector.body
 ^
<stdin>:14:11: note: previous match ended here
vector.ph: ; preds = %entry
          ^
<stdin>:15:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0
^
/home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:695:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: [[TMP10:%.*]] = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
                ^
<stdin>:357:2: note: 'next' match was here
 %10 = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
 ^
<stdin>:350:105: note: previous match ended here
 %active.lane.mask.entry = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 15)
                                                                                                        ^
<stdin>:351:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <vscale x 16 x i8> poison, i8 %a, i64 0
^

Input file: <stdin>
Check file: /home/buildbot/worker/as-builder-4/ramdisk/lld-x86_64/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          .
          .
          .
         12:  br i1 false, label %scalar.ph, label %vector.ph 
         13:  
         14: vector.ph: ; preds = %entry 
         15:  %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0 
         16:  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer 
         17:  br label %vector.body 
next:23       !~~~~~~~~~~~~~~~~~~~~  error: match on wrong line
...

llvm-ci · 2025-02-27T15:14:39Z

LLVM Buildbot has detected a new failure on builder premerge-monolithic-linux running on premerge-linux-1 while building llvm at step 7 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/24095

Here is the relevant piece of the build log for the reference

Step 7 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/ARM/optsize_minsize.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 4: /build/buildbot/premerge-monolithic-linux/build/bin/opt -passes=loop-vectorize -S < /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll | /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll --check-prefix=DEFAULT
+ /build/buildbot/premerge-monolithic-linux/build/bin/opt -passes=loop-vectorize -S
+ /build/buildbot/premerge-monolithic-linux/build/bin/FileCheck /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll --check-prefix=DEFAULT
/build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll:23:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: br label %[[VECTOR_BODY:.*]]
                ^
<stdin>:17:2: note: 'next' match was here
 br label %vector.body
 ^
<stdin>:14:11: note: previous match ended here
vector.ph: ; preds = %entry
          ^
<stdin>:15:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0
^

Input file: <stdin>
Check file: /build/buildbot/premerge-monolithic-linux/llvm-project/llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
         .
         .
         .
        12:  br i1 false, label %scalar.ph, label %vector.ph 
        13:  
        14: vector.ph: ; preds = %entry 
        15:  %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0 
        16:  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer 
        17:  br label %vector.body 
next:23      !~~~~~~~~~~~~~~~~~~~~  error: match on wrong line
        18:  
        19: vector.body: ; preds = %vector.ph 
        20:  %0 = getelementptr inbounds i32, ptr %p, i64 0 
        21:  %1 = getelementptr inbounds i32, ptr %0, i32 0 
        22:  %wide.load = load <4 x i32>, ptr %1, align 4 
         .
         .
         .
>>>>>>

--

...

llvm-ci · 2025-02-27T15:16:57Z

LLVM Buildbot has detected a new failure on builder llvm-x86_64-debian-dylib running on gribozavr4 while building llvm at step 7 "test-build-unified-tree-check-llvm".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/20614

Here is the relevant piece of the build log for the reference

Step 7 (test-build-unified-tree-check-llvm) failure: test (failure)
******************** TEST 'LLVM :: Transforms/LoopVectorize/AArch64/optsize_minsize.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 4: /b/1/llvm-x86_64-debian-dylib/build/bin/opt -passes=loop-vectorize -S < /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll | /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
+ /b/1/llvm-x86_64-debian-dylib/build/bin/FileCheck /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll --check-prefix=DEFAULT
+ /b/1/llvm-x86_64-debian-dylib/build/bin/opt -passes=loop-vectorize -S
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:23:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: br label %[[VECTOR_BODY:.*]]
                ^
<stdin>:17:2: note: 'next' match was here
 br label %vector.body
 ^
<stdin>:14:11: note: previous match ended here
vector.ph: ; preds = %entry
          ^
<stdin>:15:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0
^
/b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll:695:17: error: DEFAULT-NEXT: is not on the line after the previous match
; DEFAULT-NEXT: [[TMP10:%.*]] = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
                ^
<stdin>:357:2: note: 'next' match was here
 %10 = call <vscale x 16 x i8> @llvm.stepvector.nxv16i8()
 ^
<stdin>:350:105: note: previous match ended here
 %active.lane.mask.entry = call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 15)
                                                                                                        ^
<stdin>:351:1: note: non-matching line after previous match is here
 %broadcast.splatinsert = insertelement <vscale x 16 x i8> poison, i8 %a, i64 0
^

Input file: <stdin>
Check file: /b/1/llvm-x86_64-debian-dylib/llvm-project/llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          .
          .
          .
         12:  br i1 false, label %scalar.ph, label %vector.ph 
         13:  
         14: vector.ph: ; preds = %entry 
         15:  %broadcast.splatinsert = insertelement <4 x i32> poison, i32 %x, i64 0 
         16:  %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer 
         17:  br label %vector.body 
next:23       !~~~~~~~~~~~~~~~~~~~~  error: match on wrong line
...

Functions marked with minsize should aim for minimum code size, so the vectorizer should use CodeSize for the cost kind and also the cost we compare should be the cost for the entire loop: it shouldn't be divided by the number of vector elements and block costs shouldn't be divided by the block probability. Possibly we should also be doing this for optsize as well, but there are a lot of tests that assume the current behaviour and the definition of optsize is less clear than minsize (for minsize the goal is to "keep the code size of this function as small as possible" whereas for optsize it's "keep the code size of this function low").

PR llvm#124119 wasn't rebased & tested before merging. Update the failing tests.

john-brawn-arm requested review from fhahn, SamTebbs33, hassnaaHamdi, Mel-Chen and david-arm January 23, 2025 14:22

llvmbot added vectorizers llvm:transforms labels Jan 23, 2025

clang-format

0f42b5d

SamTebbs33 reviewed Jan 23, 2025

View reviewed changes

Add --check-globals none to the tests.

079c0e6

SamTebbs33 approved these changes Jan 24, 2025

View reviewed changes

fhahn reviewed Jan 24, 2025

View reviewed changes

david-arm reviewed Jan 27, 2025

View reviewed changes

llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll Show resolved Hide resolved

john-brawn-arm added 2 commits January 27, 2025 16:39

Remove "indvars." from the tests

5382ee2

Add SVE test

6c21bf9

john-brawn-arm added 2 commits January 30, 2025 13:01

Rename getReciprocalPredBlockProb and move CostKind handling into it.

3203100

Merge branch 'main' into vectorize_minsize

06c1ce0

fhahn approved these changes Feb 26, 2025

View reviewed changes

john-brawn-arm merged commit 8150ab9 into llvm:main Feb 27, 2025
8 checks passed

fhahn added a commit that referenced this pull request Feb 27, 2025

[LV] Fix tests after 8150ab9.

649f4dc

PR #124119 wasn't rebased & tested before merging. Update the failing tests.

huaatian mentioned this pull request Feb 28, 2025

fix live interval empty issue huaatian/llvm-project#1

Open

joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Mar 3, 2025

[LV] Fix tests after 8150ab9.

44a156b

PR llvm#124119 wasn't rebased & tested before merging. Update the failing tests.

john-brawn-arm deleted the vectorize_minsize branch May 13, 2025 12:26

[LoopVectorize] Use CodeSize as the cost kind for minsize #124119

[LoopVectorize] Use CodeSize as the cost kind for minsize #124119

Uh oh!

Conversation

john-brawn-arm commented Jan 23, 2025

Uh oh!

llvmbot commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamTebbs33 left a comment

Choose a reason for hiding this comment

Uh oh!

SamTebbs33 Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

john-brawn-arm Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

SamTebbs33 Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

huntergr-arm Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

john-brawn-arm Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

john-brawn-arm commented Jan 23, 2025

Uh oh!

SamTebbs33 left a comment

Choose a reason for hiding this comment

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fhahn Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

john-brawn-arm Jan 28, 2025

Choose a reason for hiding this comment

Uh oh!

david-arm commented Jan 27, 2025

Uh oh!

Uh oh!

john-brawn-arm commented Jan 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

john-brawn-arm commented Feb 17, 2025

Uh oh!

john-brawn-arm commented Feb 26, 2025

Uh oh!

fhahn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvm-ci commented Feb 27, 2025

Uh oh!

llvm-ci commented Feb 27, 2025

Uh oh!

llvm-ci commented Feb 27, 2025

Uh oh!

fhahn commented Feb 27, 2025

Uh oh!

john-brawn-arm commented Feb 27, 2025

Uh oh!

llvm-ci commented Feb 27, 2025

Uh oh!

llvm-ci commented Feb 27, 2025

Uh oh!

llvm-ci commented Feb 27, 2025

Uh oh!

Uh oh!

llvmbot commented Jan 23, 2025 •

edited

Loading

github-actions bot commented Jan 23, 2025 •

edited

Loading

huntergr-arm Jan 23, 2025 •

edited

Loading

john-brawn-arm commented Jan 28, 2025 •

edited

Loading