-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[SCEV] Add predicate in SolveLinEq to ensure B is a multiple of A. #108777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Florian Hahn (fhahn) ChangesThis can help in cases where pointer alignment info is missing, e.g. #108210 The predicate is formed for the complex expression that's passed to SolveLinEquationWithOverflow and the checks could probably be pushed closer to the root nodes, which in some cases may be cheaper to check. Patch is 22.83 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108777.diff 4 Files Affected:
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 57e03f667ba6ff..3fd07d2694d949 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -10120,8 +10120,11 @@ const SCEV *ScalarEvolution::stripInjectiveFunctions(const SCEV *S) const {
/// A and B isn't important.
///
/// If the equation does not have a solution, SCEVCouldNotCompute is returned.
-static const SCEV *SolveLinEquationWithOverflow(const APInt &A, const SCEV *B,
- ScalarEvolution &SE) {
+static const SCEV *
+SolveLinEquationWithOverflow(const APInt &A, const SCEV *B,
+ SmallPtrSetImpl<const SCEVPredicate *> *Predicates,
+
+ ScalarEvolution &SE) {
uint32_t BW = A.getBitWidth();
assert(BW == SE.getTypeSizeInBits(B->getType()));
assert(A != 0 && "A must be non-zero.");
@@ -10137,8 +10140,17 @@ static const SCEV *SolveLinEquationWithOverflow(const APInt &A, const SCEV *B,
//
// B is divisible by D if and only if the multiplicity of prime factor 2 for B
// is not less than multiplicity of this prime factor for D.
- if (SE.getMinTrailingZeros(B) < Mult2)
- return SE.getCouldNotCompute();
+ if (SE.getMinTrailingZeros(B) < Mult2) {
+ if (!Predicates)
+ return SE.getCouldNotCompute();
+ // Try to add a predicate ensuring B is a multiple of A.
+ const SCEV *URem = SE.getURemExpr(B, SE.getConstant(A));
+ const SCEV *Zero = SE.getZero(B->getType());
+ // Avoid adding a predicate that is known to be false.
+ if (SE.isKnownPredicate(CmpInst::ICMP_NE, URem, Zero))
+ return SE.getCouldNotCompute();
+ Predicates->insert(SE.getEqualPredicate(URem, Zero));
+ }
// 3. Compute I: the multiplicative inverse of (A / D) in arithmetic
// modulo (N / D).
@@ -10568,8 +10580,9 @@ ScalarEvolution::ExitLimit ScalarEvolution::howFarToZero(const SCEV *V,
// Solve the general equation.
if (!StepC || StepC->getValue()->isZero())
return getCouldNotCompute();
- const SCEV *E = SolveLinEquationWithOverflow(StepC->getAPInt(),
- getNegativeSCEV(Start), *this);
+ const SCEV *E = SolveLinEquationWithOverflow(
+ StepC->getAPInt(), getNegativeSCEV(Start),
+ AllowPredicates ? &Predicates : nullptr, *this);
const SCEV *M = E;
if (E != getCouldNotCompute()) {
diff --git a/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll b/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
index 37d6584b1e85f1..302b2e4defd255 100644
--- a/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
+++ b/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
@@ -1580,6 +1580,12 @@ define i32 @ptr_induction_ult_2(ptr %a, ptr %b) {
; CHECK-NEXT: Loop %loop: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %loop: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %loop: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %loop: Predicated backedge-taken count is (((-1 * (ptrtoint ptr %a to i64)) + (ptrtoint ptr %b to i64)) /u 4)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i2 ((trunc i64 (ptrtoint ptr %b to i64) to i2) + (-1 * (trunc i64 (ptrtoint ptr %a to i64) to i2))) to i64) == 0
+; CHECK-NEXT: Loop %loop: Predicated symbolic max backedge-taken count is (((-1 * (ptrtoint ptr %a to i64)) + (ptrtoint ptr %b to i64)) /u 4)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i2 ((trunc i64 (ptrtoint ptr %b to i64) to i2) + (-1 * (trunc i64 (ptrtoint ptr %a to i64) to i2))) to i64) == 0
;
entry:
%cmp.6 = icmp ult ptr %a, %b
@@ -1669,10 +1675,21 @@ define void @ptr_induction_early_exit_eq_1(ptr %a, ptr %b, ptr %c) {
; CHECK-NEXT: Loop %loop: <multiple exits> Unpredictable backedge-taken count.
; CHECK-NEXT: exit count for loop: ***COULDNOTCOMPUTE***
; CHECK-NEXT: exit count for loop.inc: ***COULDNOTCOMPUTE***
+; CHECK-NEXT: predicated exit count for loop.inc: ((-8 + (-1 * (ptrtoint ptr %a to i64)) + (ptrtoint ptr %b to i64)) /u 8)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i3 ((trunc i64 (ptrtoint ptr %b to i64) to i3) + (-1 * (trunc i64 (ptrtoint ptr %a to i64) to i3))) to i64) == 0
+; CHECK-EMPTY:
; CHECK-NEXT: Loop %loop: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %loop: Unpredictable symbolic max backedge-taken count.
; CHECK-NEXT: symbolic max exit count for loop: ***COULDNOTCOMPUTE***
; CHECK-NEXT: symbolic max exit count for loop.inc: ***COULDNOTCOMPUTE***
+; CHECK-NEXT: predicated symbolic max exit count for loop.inc: ((-8 + (-1 * (ptrtoint ptr %a to i64)) + (ptrtoint ptr %b to i64)) /u 8)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i3 ((trunc i64 (ptrtoint ptr %b to i64) to i3) + (-1 * (trunc i64 (ptrtoint ptr %a to i64) to i3))) to i64) == 0
+; CHECK-EMPTY:
+; CHECK-NEXT: Loop %loop: Predicated symbolic max backedge-taken count is ((-8 + (-1 * (ptrtoint ptr %a to i64)) + (ptrtoint ptr %b to i64)) /u 8)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i3 ((trunc i64 (ptrtoint ptr %b to i64) to i3) + (-1 * (trunc i64 (ptrtoint ptr %a to i64) to i3))) to i64) == 0
;
entry:
%cmp = icmp eq ptr %a, %b
diff --git a/llvm/test/Analysis/ScalarEvolution/ne-overflow.ll b/llvm/test/Analysis/ScalarEvolution/ne-overflow.ll
index 49288c85897fd9..73679767f0fb3d 100644
--- a/llvm/test/Analysis/ScalarEvolution/ne-overflow.ll
+++ b/llvm/test/Analysis/ScalarEvolution/ne-overflow.ll
@@ -58,6 +58,12 @@ define void @test_well_defined_infinite_st(i32 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
;
entry:
br label %for.body
@@ -79,6 +85,12 @@ define void @test_well_defined_infinite_ld(i32 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
;
entry:
br label %for.body
@@ -100,6 +112,12 @@ define void @test_no_mustprogress(i32 %N) {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
;
entry:
br label %for.body
@@ -187,6 +205,12 @@ define void @test_abnormal_exit(i32 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
;
entry:
br label %for.body
@@ -209,10 +233,24 @@ define void @test_other_exit(i32 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: <multiple exits> Unpredictable backedge-taken count.
; CHECK-NEXT: exit count for for.body: i32 9
; CHECK-NEXT: exit count for for.latch: ***COULDNOTCOMPUTE***
+; CHECK-NEXT: predicated exit count for for.latch: ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
+; CHECK-EMPTY:
; CHECK-NEXT: Loop %for.body: constant max backedge-taken count is i32 9
; CHECK-NEXT: Loop %for.body: symbolic max backedge-taken count is i32 9
; CHECK-NEXT: symbolic max exit count for for.body: i32 9
; CHECK-NEXT: symbolic max exit count for for.latch: ***COULDNOTCOMPUTE***
+; CHECK-NEXT: predicated symbolic max exit count for for.latch: ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
+; CHECK-EMPTY:
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is (9 umin ((-2 + %N) /u 2))
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is (9 umin ((-2 + %N) /u 2))
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
;
entry:
br label %for.body
@@ -264,6 +302,14 @@ define void @test_sext(i64 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is (%N /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nssw>
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i64 %N to i1) to i64) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is (%N /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nssw>
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i64 %N to i1) to i64) == 0
;
entry:
br label %for.body
@@ -285,6 +331,16 @@ define void @test_zext_of_sext(i64 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is (%N /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nssw>
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nusw>
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i64 %N to i1) to i64) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is (%N /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nssw>
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nusw>
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i64 %N to i1) to i64) == 0
;
entry:
br label %for.body
@@ -307,6 +363,14 @@ define void @test_zext_offset(i64 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is ((-21 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nusw>
+; CHECK-NEXT: Equal predicate: (zext i1 (true + (trunc i64 %N to i1)) to i64) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is ((-21 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nusw>
+; CHECK-NEXT: Equal predicate: (zext i1 (true + (trunc i64 %N to i1)) to i64) == 0
;
entry:
br label %for.body
@@ -329,6 +393,14 @@ define void @test_sext_offset(i64 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is ((-21 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nssw>
+; CHECK-NEXT: Equal predicate: (zext i1 (true + (trunc i64 %N to i1)) to i64) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is ((-21 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nssw>
+; CHECK-NEXT: Equal predicate: (zext i1 (true + (trunc i64 %N to i1)) to i64) == 0
;
entry:
br label %for.body
diff --git a/llvm/test/Transforms/LoopVectorize/opaque-ptr.ll b/llvm/test/Transforms/LoopVectorize/opaque-ptr.ll
index 13e79a4a47b39c..f62c3c7f42ec4e 100644
--- a/llvm/test/Transforms/LoopVectorize/opaque-ptr.ll
+++ b/llvm/test/Transforms/LoopVectorize/opaque-ptr.ll
@@ -4,18 +4,80 @@
define void @test_ptr_iv_no_inbounds(ptr %p1.start, ptr %p2.start, ptr %p1.end) {
; CHECK-LABEL: @test_ptr_iv_no_inbounds(
; CHECK-NEXT: entry:
+; CHECK-NEXT: [[P1_START7:%.*]] = ptrtoint ptr [[P1_START:%.*]] to i64
+; CHECK-NEXT: [[P1_END6:%.*]] = ptrtoint ptr [[P1_END:%.*]] to i64
+; CHECK-NEXT: [[P1_START4:%.*]] = ptrtoint ptr [[P1_START]] to i64
+; CHECK-NEXT: [[P1_END3:%.*]] = ptrtoint ptr [[P1_END]] to i64
+; CHECK-NEXT: [[P1_START2:%.*]] = ptrtoint ptr [[P1_START]] to i64
+; CHECK-NEXT: [[P1_END1:%.*]] = ptrtoint ptr [[P1_END]] to i64
+; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[P1_END6]], -4
+; CHECK-NEXT: [[TMP1:%.*]] = sub i64 [[TMP0]], [[P1_START7]]
+; CHECK-NEXT: [[TMP2:%.*]] = lshr i64 [[TMP1]], 2
+; CHECK-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1
+; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP3]], 2
+; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_SCEVCHECK:%.*]]
+; CHECK: vector.scevcheck:
+; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[P1_END1]] to i2
+; CHECK-NEXT: [[TMP5:%.*]] = trunc i64 [[P1_START2]] to i2
+; CHECK-NEXT: [[TMP6:%.*]] = sub i2 [[TMP4]], [[TMP5]]
+; CHECK-NEXT: [[TMP7:%.*]] = zext i2 [[TMP6]] to i64
+; CHECK-NEXT: [[IDENT_CHECK:%.*]] = icmp ne i64 [[TMP7]], 0
+; CHECK-NEXT: br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_MEMCHECK:%.*]]
+; CHECK: vector.memcheck:
+; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[P1_END3]], -4
+; CHECK-NEXT: [[TMP9:%.*]] = sub i64 [[TMP8]], [[P1_START4]]
+; CHECK-NEXT: [[TMP10:%.*]] = lshr i64 [[TMP9]], 2
+; CHECK-NEXT: [[TMP11:%.*]] = shl nuw i64 [[TMP10]], 2
+; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[TMP11]], 4
+; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[P1_START]], i64 [[TMP12]]
+; CHECK-NEXT: [[SCEVGEP5:%.*]] = getelementptr i8, ptr [[P2_START:%.*]], i64 [[TMP12]]
+; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[P1_START]], [[SCEVGEP5]]
+; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[P2_START]], [[SCEVGEP]]
+; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
+; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
+; CHECK: vector.ph:
+; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 2
+; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]
+; CHECK-NEXT: [[TMP13:%.*]] = mul i64 [[N_VEC]], 4
+; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[P1_START]], i64 [[TMP13]]
+; CHECK-NEXT: [[TMP14:%.*]] = mul i64 [[N_VEC]], 4
+; CHECK-NEXT: [[IND_END8:%.*]] = getelementptr i8, ptr [[P2_START]], i64 [[TMP14]]
+; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
+; CHECK: vector.body:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT: [[OFFSET_IDX:%.*]] = mul i64 [[INDEX]], 4
+; CHECK-NEXT: [[TMP15:%.*]] = add i64 [[OFFSET_IDX]], 0
+; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[P1_START]], i64 [[TMP15]]
+; CHECK-NEXT: [[OFFSET_IDX10:%.*]] = mul i64 [[INDEX]], 4
+; CHECK-NEXT: [[TMP16:%.*]] = add i64 [[OFFSET_IDX10]], 0
+; CHECK-NEXT: [[NEXT_GEP11:%.*]] = getelementptr i8, ptr [[P2_START]], i64 [[TMP16]]
+; CHECK-NEXT: [[TMP17:%.*]] = getelementptr float, ptr [[NEXT_GEP]], i32 0
+; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x float>, ptr [[TMP17]], align 4, !alias.scope [[META0:![0-9]+]], !noalias [[META3:![0-9]+]]
+; CHECK-NEXT: [[TMP18:%.*]] = getelementptr float, ptr [[NEXT_GEP11]], i32 0
+; CHECK-NEXT: [[WIDE_LOAD12:%.*]] = load <2 x float>, ptr [[TMP18]], align 4, !alias.scope [[META3]]
+; CHECK-NEXT: [[TMP19:%.*]] = fadd <2 x float> [[WIDE_LOAD]], [[WIDE_LOAD12]]
+; CHECK-NEXT: store <2 x float> [[TMP19]], ptr [[TMP17]], align 4, !alias.scope [[META0]], !noalias [[META3]]
+; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
+; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK: middle.block:
+; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK: scalar.ph:
+; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[P1_START]], [[ENTRY:%.*]] ], [ [[P1_START]], [[VECTOR_SCEVCHECK]] ], [ [[P1_START]], [[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT: [[BC_RESUME_VAL9:%.*]] = phi ptr [ [[IND_END8]], [[MIDDLE_BLOCK]] ], [ [[P2_START]], [[ENTRY]] ], [ [[P2_START]], [[VECTOR_SCEVCHECK]] ], [ [[P2_START]], [[VECTOR_MEMCHECK]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
-; CHECK-NEXT: [[P1:%.*]] = phi ptr [ [[P1_START:%.*]], [[ENTRY:%.*]] ], [ [[P1_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT: [[P2:%.*]] = phi ptr [ [[P2_START:%.*]], [[ENTRY]] ], [ [[P2_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT: [[P1:%.*]] = phi ptr [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[P1_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT: [[P2:%.*]] = phi ptr [ [[BC_RESUME_VAL9]], [[SCALAR_PH]] ], [ [[P2_NEXT:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[P1_VAL:%.*]] = load float, ptr [[P1]], align 4
; CHECK-NEXT: [[P2_VAL:%.*]] = load float, ptr [[P2]], align 4
; CHECK-NEXT: [[SUM:%.*]] = fadd float [[P1_VAL]], [[P2_VAL]]
; CHECK-NEXT: store float [[SUM]], ptr [[P1]], align 4
; CHECK-NEXT: [[P1_NEXT]] = getelementptr float, ptr [[P1]], i64 1
; CHECK-NEXT: [[P2_NEXT]] = getelementptr float, ptr [[P2]], i64 1
-; CHECK-NEXT: [[C:%.*]] = icmp ne ptr [[P1_NEXT]], [[P1_END:%.*]]
-; CHECK-NEXT: br i1 [[C]], label [[LOOP]], label [[EXIT:%.*]]
+; CHECK-NEXT: [[C:%.*]] = icmp ne ptr [[P1_NEXT]], [[P1_END]]
+; CHECK-NEXT: br i1 [[C]], label [[LOOP]], label [[EXIT]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK: exit:
; CHECK-NEXT: ret void
;
@@ -80,14 +142,14 @@ define void @test_ptr_iv_with_inbounds(ptr %p1.start, ptr %p2.start, ptr %p1.end
; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[OFFSET_IDX8]], 0
; CHECK-NEXT: [[NEXT_GEP9:%.*]] = getelementptr i8, ptr [[P2_START]], i64 [[TMP12]]
; CHECK-NEXT: [[TMP13:%.*]] = getelementptr float, ptr [[NEXT_GEP]], i32 0
-; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x float>, ptr [[TMP13]], align 4, !alias.scope [[META0:![0-9]+]], !noalias [[META3:![0-9]+]]
+; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x float>, ptr [[TMP13]], align 4, !alias.scope [[META9:![0-9]+]], !noalias [[META12:![0-9]+]]
; CHECK-NEXT: [[TMP14:%...
[truncated]
|
@llvm/pr-subscribers-llvm-analysis Author: Florian Hahn (fhahn) ChangesThis can help in cases where pointer alignment info is missing, e.g. #108210 The predicate is formed for the complex expression that's passed to SolveLinEquationWithOverflow and the checks could probably be pushed closer to the root nodes, which in some cases may be cheaper to check. Patch is 22.83 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/108777.diff 4 Files Affected:
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 57e03f667ba6ff..3fd07d2694d949 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -10120,8 +10120,11 @@ const SCEV *ScalarEvolution::stripInjectiveFunctions(const SCEV *S) const {
/// A and B isn't important.
///
/// If the equation does not have a solution, SCEVCouldNotCompute is returned.
-static const SCEV *SolveLinEquationWithOverflow(const APInt &A, const SCEV *B,
- ScalarEvolution &SE) {
+static const SCEV *
+SolveLinEquationWithOverflow(const APInt &A, const SCEV *B,
+ SmallPtrSetImpl<const SCEVPredicate *> *Predicates,
+
+ ScalarEvolution &SE) {
uint32_t BW = A.getBitWidth();
assert(BW == SE.getTypeSizeInBits(B->getType()));
assert(A != 0 && "A must be non-zero.");
@@ -10137,8 +10140,17 @@ static const SCEV *SolveLinEquationWithOverflow(const APInt &A, const SCEV *B,
//
// B is divisible by D if and only if the multiplicity of prime factor 2 for B
// is not less than multiplicity of this prime factor for D.
- if (SE.getMinTrailingZeros(B) < Mult2)
- return SE.getCouldNotCompute();
+ if (SE.getMinTrailingZeros(B) < Mult2) {
+ if (!Predicates)
+ return SE.getCouldNotCompute();
+ // Try to add a predicate ensuring B is a multiple of A.
+ const SCEV *URem = SE.getURemExpr(B, SE.getConstant(A));
+ const SCEV *Zero = SE.getZero(B->getType());
+ // Avoid adding a predicate that is known to be false.
+ if (SE.isKnownPredicate(CmpInst::ICMP_NE, URem, Zero))
+ return SE.getCouldNotCompute();
+ Predicates->insert(SE.getEqualPredicate(URem, Zero));
+ }
// 3. Compute I: the multiplicative inverse of (A / D) in arithmetic
// modulo (N / D).
@@ -10568,8 +10580,9 @@ ScalarEvolution::ExitLimit ScalarEvolution::howFarToZero(const SCEV *V,
// Solve the general equation.
if (!StepC || StepC->getValue()->isZero())
return getCouldNotCompute();
- const SCEV *E = SolveLinEquationWithOverflow(StepC->getAPInt(),
- getNegativeSCEV(Start), *this);
+ const SCEV *E = SolveLinEquationWithOverflow(
+ StepC->getAPInt(), getNegativeSCEV(Start),
+ AllowPredicates ? &Predicates : nullptr, *this);
const SCEV *M = E;
if (E != getCouldNotCompute()) {
diff --git a/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll b/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
index 37d6584b1e85f1..302b2e4defd255 100644
--- a/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
+++ b/llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
@@ -1580,6 +1580,12 @@ define i32 @ptr_induction_ult_2(ptr %a, ptr %b) {
; CHECK-NEXT: Loop %loop: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %loop: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %loop: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %loop: Predicated backedge-taken count is (((-1 * (ptrtoint ptr %a to i64)) + (ptrtoint ptr %b to i64)) /u 4)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i2 ((trunc i64 (ptrtoint ptr %b to i64) to i2) + (-1 * (trunc i64 (ptrtoint ptr %a to i64) to i2))) to i64) == 0
+; CHECK-NEXT: Loop %loop: Predicated symbolic max backedge-taken count is (((-1 * (ptrtoint ptr %a to i64)) + (ptrtoint ptr %b to i64)) /u 4)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i2 ((trunc i64 (ptrtoint ptr %b to i64) to i2) + (-1 * (trunc i64 (ptrtoint ptr %a to i64) to i2))) to i64) == 0
;
entry:
%cmp.6 = icmp ult ptr %a, %b
@@ -1669,10 +1675,21 @@ define void @ptr_induction_early_exit_eq_1(ptr %a, ptr %b, ptr %c) {
; CHECK-NEXT: Loop %loop: <multiple exits> Unpredictable backedge-taken count.
; CHECK-NEXT: exit count for loop: ***COULDNOTCOMPUTE***
; CHECK-NEXT: exit count for loop.inc: ***COULDNOTCOMPUTE***
+; CHECK-NEXT: predicated exit count for loop.inc: ((-8 + (-1 * (ptrtoint ptr %a to i64)) + (ptrtoint ptr %b to i64)) /u 8)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i3 ((trunc i64 (ptrtoint ptr %b to i64) to i3) + (-1 * (trunc i64 (ptrtoint ptr %a to i64) to i3))) to i64) == 0
+; CHECK-EMPTY:
; CHECK-NEXT: Loop %loop: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %loop: Unpredictable symbolic max backedge-taken count.
; CHECK-NEXT: symbolic max exit count for loop: ***COULDNOTCOMPUTE***
; CHECK-NEXT: symbolic max exit count for loop.inc: ***COULDNOTCOMPUTE***
+; CHECK-NEXT: predicated symbolic max exit count for loop.inc: ((-8 + (-1 * (ptrtoint ptr %a to i64)) + (ptrtoint ptr %b to i64)) /u 8)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i3 ((trunc i64 (ptrtoint ptr %b to i64) to i3) + (-1 * (trunc i64 (ptrtoint ptr %a to i64) to i3))) to i64) == 0
+; CHECK-EMPTY:
+; CHECK-NEXT: Loop %loop: Predicated symbolic max backedge-taken count is ((-8 + (-1 * (ptrtoint ptr %a to i64)) + (ptrtoint ptr %b to i64)) /u 8)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i3 ((trunc i64 (ptrtoint ptr %b to i64) to i3) + (-1 * (trunc i64 (ptrtoint ptr %a to i64) to i3))) to i64) == 0
;
entry:
%cmp = icmp eq ptr %a, %b
diff --git a/llvm/test/Analysis/ScalarEvolution/ne-overflow.ll b/llvm/test/Analysis/ScalarEvolution/ne-overflow.ll
index 49288c85897fd9..73679767f0fb3d 100644
--- a/llvm/test/Analysis/ScalarEvolution/ne-overflow.ll
+++ b/llvm/test/Analysis/ScalarEvolution/ne-overflow.ll
@@ -58,6 +58,12 @@ define void @test_well_defined_infinite_st(i32 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
;
entry:
br label %for.body
@@ -79,6 +85,12 @@ define void @test_well_defined_infinite_ld(i32 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
;
entry:
br label %for.body
@@ -100,6 +112,12 @@ define void @test_no_mustprogress(i32 %N) {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
;
entry:
br label %for.body
@@ -187,6 +205,12 @@ define void @test_abnormal_exit(i32 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
;
entry:
br label %for.body
@@ -209,10 +233,24 @@ define void @test_other_exit(i32 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: <multiple exits> Unpredictable backedge-taken count.
; CHECK-NEXT: exit count for for.body: i32 9
; CHECK-NEXT: exit count for for.latch: ***COULDNOTCOMPUTE***
+; CHECK-NEXT: predicated exit count for for.latch: ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
+; CHECK-EMPTY:
; CHECK-NEXT: Loop %for.body: constant max backedge-taken count is i32 9
; CHECK-NEXT: Loop %for.body: symbolic max backedge-taken count is i32 9
; CHECK-NEXT: symbolic max exit count for for.body: i32 9
; CHECK-NEXT: symbolic max exit count for for.latch: ***COULDNOTCOMPUTE***
+; CHECK-NEXT: predicated symbolic max exit count for for.latch: ((-2 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
+; CHECK-EMPTY:
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is (9 umin ((-2 + %N) /u 2))
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is (9 umin ((-2 + %N) /u 2))
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i32 %N to i1) to i32) == 0
;
entry:
br label %for.body
@@ -264,6 +302,14 @@ define void @test_sext(i64 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is (%N /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nssw>
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i64 %N to i1) to i64) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is (%N /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nssw>
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i64 %N to i1) to i64) == 0
;
entry:
br label %for.body
@@ -285,6 +331,16 @@ define void @test_zext_of_sext(i64 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is (%N /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nssw>
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nusw>
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i64 %N to i1) to i64) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is (%N /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nssw>
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nusw>
+; CHECK-NEXT: Equal predicate: (zext i1 (trunc i64 %N to i1) to i64) == 0
;
entry:
br label %for.body
@@ -307,6 +363,14 @@ define void @test_zext_offset(i64 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is ((-21 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nusw>
+; CHECK-NEXT: Equal predicate: (zext i1 (true + (trunc i64 %N to i1)) to i64) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is ((-21 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nusw>
+; CHECK-NEXT: Equal predicate: (zext i1 (true + (trunc i64 %N to i1)) to i64) == 0
;
entry:
br label %for.body
@@ -329,6 +393,14 @@ define void @test_sext_offset(i64 %N) mustprogress {
; CHECK-NEXT: Loop %for.body: Unpredictable backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable constant max backedge-taken count.
; CHECK-NEXT: Loop %for.body: Unpredictable symbolic max backedge-taken count.
+; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is ((-21 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nssw>
+; CHECK-NEXT: Equal predicate: (zext i1 (true + (trunc i64 %N to i1)) to i64) == 0
+; CHECK-NEXT: Loop %for.body: Predicated symbolic max backedge-taken count is ((-21 + %N) /u 2)
+; CHECK-NEXT: Predicates:
+; CHECK-NEXT: {0,+,2}<%for.body> Added Flags: <nssw>
+; CHECK-NEXT: Equal predicate: (zext i1 (true + (trunc i64 %N to i1)) to i64) == 0
;
entry:
br label %for.body
diff --git a/llvm/test/Transforms/LoopVectorize/opaque-ptr.ll b/llvm/test/Transforms/LoopVectorize/opaque-ptr.ll
index 13e79a4a47b39c..f62c3c7f42ec4e 100644
--- a/llvm/test/Transforms/LoopVectorize/opaque-ptr.ll
+++ b/llvm/test/Transforms/LoopVectorize/opaque-ptr.ll
@@ -4,18 +4,80 @@
define void @test_ptr_iv_no_inbounds(ptr %p1.start, ptr %p2.start, ptr %p1.end) {
; CHECK-LABEL: @test_ptr_iv_no_inbounds(
; CHECK-NEXT: entry:
+; CHECK-NEXT: [[P1_START7:%.*]] = ptrtoint ptr [[P1_START:%.*]] to i64
+; CHECK-NEXT: [[P1_END6:%.*]] = ptrtoint ptr [[P1_END:%.*]] to i64
+; CHECK-NEXT: [[P1_START4:%.*]] = ptrtoint ptr [[P1_START]] to i64
+; CHECK-NEXT: [[P1_END3:%.*]] = ptrtoint ptr [[P1_END]] to i64
+; CHECK-NEXT: [[P1_START2:%.*]] = ptrtoint ptr [[P1_START]] to i64
+; CHECK-NEXT: [[P1_END1:%.*]] = ptrtoint ptr [[P1_END]] to i64
+; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[P1_END6]], -4
+; CHECK-NEXT: [[TMP1:%.*]] = sub i64 [[TMP0]], [[P1_START7]]
+; CHECK-NEXT: [[TMP2:%.*]] = lshr i64 [[TMP1]], 2
+; CHECK-NEXT: [[TMP3:%.*]] = add nuw nsw i64 [[TMP2]], 1
+; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[TMP3]], 2
+; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_SCEVCHECK:%.*]]
+; CHECK: vector.scevcheck:
+; CHECK-NEXT: [[TMP4:%.*]] = trunc i64 [[P1_END1]] to i2
+; CHECK-NEXT: [[TMP5:%.*]] = trunc i64 [[P1_START2]] to i2
+; CHECK-NEXT: [[TMP6:%.*]] = sub i2 [[TMP4]], [[TMP5]]
+; CHECK-NEXT: [[TMP7:%.*]] = zext i2 [[TMP6]] to i64
+; CHECK-NEXT: [[IDENT_CHECK:%.*]] = icmp ne i64 [[TMP7]], 0
+; CHECK-NEXT: br i1 [[IDENT_CHECK]], label [[SCALAR_PH]], label [[VECTOR_MEMCHECK:%.*]]
+; CHECK: vector.memcheck:
+; CHECK-NEXT: [[TMP8:%.*]] = add i64 [[P1_END3]], -4
+; CHECK-NEXT: [[TMP9:%.*]] = sub i64 [[TMP8]], [[P1_START4]]
+; CHECK-NEXT: [[TMP10:%.*]] = lshr i64 [[TMP9]], 2
+; CHECK-NEXT: [[TMP11:%.*]] = shl nuw i64 [[TMP10]], 2
+; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[TMP11]], 4
+; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr [[P1_START]], i64 [[TMP12]]
+; CHECK-NEXT: [[SCEVGEP5:%.*]] = getelementptr i8, ptr [[P2_START:%.*]], i64 [[TMP12]]
+; CHECK-NEXT: [[BOUND0:%.*]] = icmp ult ptr [[P1_START]], [[SCEVGEP5]]
+; CHECK-NEXT: [[BOUND1:%.*]] = icmp ult ptr [[P2_START]], [[SCEVGEP]]
+; CHECK-NEXT: [[FOUND_CONFLICT:%.*]] = and i1 [[BOUND0]], [[BOUND1]]
+; CHECK-NEXT: br i1 [[FOUND_CONFLICT]], label [[SCALAR_PH]], label [[VECTOR_PH:%.*]]
+; CHECK: vector.ph:
+; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[TMP3]], 2
+; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[TMP3]], [[N_MOD_VF]]
+; CHECK-NEXT: [[TMP13:%.*]] = mul i64 [[N_VEC]], 4
+; CHECK-NEXT: [[IND_END:%.*]] = getelementptr i8, ptr [[P1_START]], i64 [[TMP13]]
+; CHECK-NEXT: [[TMP14:%.*]] = mul i64 [[N_VEC]], 4
+; CHECK-NEXT: [[IND_END8:%.*]] = getelementptr i8, ptr [[P2_START]], i64 [[TMP14]]
+; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
+; CHECK: vector.body:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; CHECK-NEXT: [[OFFSET_IDX:%.*]] = mul i64 [[INDEX]], 4
+; CHECK-NEXT: [[TMP15:%.*]] = add i64 [[OFFSET_IDX]], 0
+; CHECK-NEXT: [[NEXT_GEP:%.*]] = getelementptr i8, ptr [[P1_START]], i64 [[TMP15]]
+; CHECK-NEXT: [[OFFSET_IDX10:%.*]] = mul i64 [[INDEX]], 4
+; CHECK-NEXT: [[TMP16:%.*]] = add i64 [[OFFSET_IDX10]], 0
+; CHECK-NEXT: [[NEXT_GEP11:%.*]] = getelementptr i8, ptr [[P2_START]], i64 [[TMP16]]
+; CHECK-NEXT: [[TMP17:%.*]] = getelementptr float, ptr [[NEXT_GEP]], i32 0
+; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x float>, ptr [[TMP17]], align 4, !alias.scope [[META0:![0-9]+]], !noalias [[META3:![0-9]+]]
+; CHECK-NEXT: [[TMP18:%.*]] = getelementptr float, ptr [[NEXT_GEP11]], i32 0
+; CHECK-NEXT: [[WIDE_LOAD12:%.*]] = load <2 x float>, ptr [[TMP18]], align 4, !alias.scope [[META3]]
+; CHECK-NEXT: [[TMP19:%.*]] = fadd <2 x float> [[WIDE_LOAD]], [[WIDE_LOAD12]]
+; CHECK-NEXT: store <2 x float> [[TMP19]], ptr [[TMP17]], align 4, !alias.scope [[META0]], !noalias [[META3]]
+; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
+; CHECK-NEXT: [[TMP20:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[TMP20]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
+; CHECK: middle.block:
+; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP3]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
+; CHECK: scalar.ph:
+; CHECK-NEXT: [[BC_RESUME_VAL:%.*]] = phi ptr [ [[IND_END]], [[MIDDLE_BLOCK]] ], [ [[P1_START]], [[ENTRY:%.*]] ], [ [[P1_START]], [[VECTOR_SCEVCHECK]] ], [ [[P1_START]], [[VECTOR_MEMCHECK]] ]
+; CHECK-NEXT: [[BC_RESUME_VAL9:%.*]] = phi ptr [ [[IND_END8]], [[MIDDLE_BLOCK]] ], [ [[P2_START]], [[ENTRY]] ], [ [[P2_START]], [[VECTOR_SCEVCHECK]] ], [ [[P2_START]], [[VECTOR_MEMCHECK]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
-; CHECK-NEXT: [[P1:%.*]] = phi ptr [ [[P1_START:%.*]], [[ENTRY:%.*]] ], [ [[P1_NEXT:%.*]], [[LOOP]] ]
-; CHECK-NEXT: [[P2:%.*]] = phi ptr [ [[P2_START:%.*]], [[ENTRY]] ], [ [[P2_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT: [[P1:%.*]] = phi ptr [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[P1_NEXT:%.*]], [[LOOP]] ]
+; CHECK-NEXT: [[P2:%.*]] = phi ptr [ [[BC_RESUME_VAL9]], [[SCALAR_PH]] ], [ [[P2_NEXT:%.*]], [[LOOP]] ]
; CHECK-NEXT: [[P1_VAL:%.*]] = load float, ptr [[P1]], align 4
; CHECK-NEXT: [[P2_VAL:%.*]] = load float, ptr [[P2]], align 4
; CHECK-NEXT: [[SUM:%.*]] = fadd float [[P1_VAL]], [[P2_VAL]]
; CHECK-NEXT: store float [[SUM]], ptr [[P1]], align 4
; CHECK-NEXT: [[P1_NEXT]] = getelementptr float, ptr [[P1]], i64 1
; CHECK-NEXT: [[P2_NEXT]] = getelementptr float, ptr [[P2]], i64 1
-; CHECK-NEXT: [[C:%.*]] = icmp ne ptr [[P1_NEXT]], [[P1_END:%.*]]
-; CHECK-NEXT: br i1 [[C]], label [[LOOP]], label [[EXIT:%.*]]
+; CHECK-NEXT: [[C:%.*]] = icmp ne ptr [[P1_NEXT]], [[P1_END]]
+; CHECK-NEXT: br i1 [[C]], label [[LOOP]], label [[EXIT]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK: exit:
; CHECK-NEXT: ret void
;
@@ -80,14 +142,14 @@ define void @test_ptr_iv_with_inbounds(ptr %p1.start, ptr %p2.start, ptr %p1.end
; CHECK-NEXT: [[TMP12:%.*]] = add i64 [[OFFSET_IDX8]], 0
; CHECK-NEXT: [[NEXT_GEP9:%.*]] = getelementptr i8, ptr [[P2_START]], i64 [[TMP12]]
; CHECK-NEXT: [[TMP13:%.*]] = getelementptr float, ptr [[NEXT_GEP]], i32 0
-; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x float>, ptr [[TMP13]], align 4, !alias.scope [[META0:![0-9]+]], !noalias [[META3:![0-9]+]]
+; CHECK-NEXT: [[WIDE_LOAD:%.*]] = load <2 x float>, ptr [[TMP13]], align 4, !alias.scope [[META9:![0-9]+]], !noalias [[META12:![0-9]+]]
; CHECK-NEXT: [[TMP14:%...
[truncated]
|
This can help in cases where pointer alignment info is missing, e.g. llvm#108210 The predicate is formed for the complex expression that's passed to SolveLinEquationWithOverflow and the checks could probably be pushed closer to the root nodes, which in some cases may be cheaper to check.
3551969
to
c91b942
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a nice fix!
; exit count for the loop.inc block here, in the same way as | ||
; ptr_induction_eq_1. The problem seems to be in howFarToZero when the | ||
; ControlsOnlyExit is set to false. | ||
; %a and %b may not have the same alignment, so the loop may only via the early |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/so the loop may only via/so the loop may only exit via/
// Avoid adding a predicate that is known to be false. | ||
if (SE.isKnownPredicate(CmpInst::ICMP_NE, URem, Zero)) | ||
return SE.getCouldNotCompute(); | ||
Predicates->insert(SE.getEqualPredicate(URem, Zero)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking in terms of compile-time here - can you get the equivalent result if you first compute SE.getEqualPredicate(URem, Zero)
and then test if the answer is constant-false? I don't know if isKnownPredicate
is expensive or not and whether or not we care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the terminology is a bit confusing, the predicate in isKnownPredicate
refers to condition to check for the arguments, while predicate in getEqualPredicate refers to a SCEVPredicate (used by PredicateScalarEvolution) and those cannot be used by the isKnownPredicate helpers AFAIK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK that's fair enough! Thanks for the explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
LGTM w/optional follow up.
@@ -10137,8 +10140,17 @@ static const SCEV *SolveLinEquationWithOverflow(const APInt &A, const SCEV *B, | |||
// | |||
// B is divisible by D if and only if the multiplicity of prime factor 2 for B | |||
// is not less than multiplicity of this prime factor for D. | |||
if (SE.getMinTrailingZeros(B) < Mult2) | |||
return SE.getCouldNotCompute(); | |||
if (SE.getMinTrailingZeros(B) < Mult2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe try to prove that (urem B, A) == 0 before resorting to the predicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC that check shouldn't be true, otherwise we are missing some logic in getMinTrailingZeros
? Added an assert that the check isn't true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the reframing of urem (B, 1 << Mult2) == 0, I agree with your comment, but I think we could reasonable see divergence with the original urem(B,A) check. Those are just different enough.
My suggestion is basically, can we prove the original fact without using the trailing bits proof strategy? Said differently. what if D = gcd(A, N) is something like 3^M?
Hm, though now I see that the multiplicative inverse code below would need updated too. I'd missed that originally.
(Totally fine to move forward here, this is purely a possible followup.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I stand corrected, the assertion actually uncovered a case where getMinTrailingZeros returns a worse result and I think catching this in getMinTrailingZeros would require matching a specific pattern (https://alive2.llvm.org/ce/z/t3A5X2)
I updated the code to check if the URem is zero if the trailing bits check failed, as we need to build the expression there already. Test is in llvm/test/Analysis/ScalarEvolution/trip-count-urem.ll
if (!Predicates) | ||
return SE.getCouldNotCompute(); | ||
// Try to add a predicate ensuring B is a multiple of A. | ||
const SCEV *URem = SE.getURemExpr(B, SE.getConstant(A)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why B % A
, and not B % (1 << Mult2)
?
I don't think this actually impacts any of the existing testcases; probably worth adding a few so we have coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated thanks. Test has been added in 28439a1
return SE.getCouldNotCompute(); | ||
// Try to add a predicate ensuring B is a multiple of 1 << Mult2. | ||
const SCEV *URem = | ||
SE.getURemExpr(B, SE.getConstant(B->getType(), 1 << Mult2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Mult2 guaranteed less than 32? Don't see anything enforcing as much. Maybe APInt::getOneBitSet(BW, Mult2)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated!
@@ -10137,8 +10140,17 @@ static const SCEV *SolveLinEquationWithOverflow(const APInt &A, const SCEV *B, | |||
// | |||
// B is divisible by D if and only if the multiplicity of prime factor 2 for B | |||
// is not less than multiplicity of this prime factor for D. | |||
if (SE.getMinTrailingZeros(B) < Mult2) | |||
return SE.getCouldNotCompute(); | |||
if (SE.getMinTrailingZeros(B) < Mult2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the reframing of urem (B, 1 << Mult2) == 0, I agree with your comment, but I think we could reasonable see divergence with the original urem(B,A) check. Those are just different enough.
My suggestion is basically, can we prove the original fact without using the trailing bits proof strategy? Said differently. what if D = gcd(A, N) is something like 3^M?
Hm, though now I see that the multiplicative inverse code below would need updated too. I'd missed that originally.
(Totally fine to move forward here, this is purely a possible followup.)
Also adds additional test coverage in Analysis/ScalarEvolution/trip-count-urem.ll Extra test coverage is for #108777.
Adds extra tests for llvm#108777.
Also adds additional test coverage in Analysis/ScalarEvolution/trip-count-urem.ll Extra test coverage is for llvm#108777.
Test needs a few less checks.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/4460 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/110/builds/1494 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/139/builds/4340 Here is the relevant piece of the build log for the reference
|
Store predicates in ExitLimit and ExitNotTaken in a SmallVector instead of a SmallPtrSet. This guarantees the predicates can be iterated on in a predictable manner. This ensures the predicates can be printed and generated in a predictable order. This shifts de-duplication of predicates to construction time for ExitLimit. ExitNotTaken just takes predicates from ExitLimit, so they should also be free of duplicates. This was exposed by 2f7ccaf (#108777). Should fix https://lab.llvm.org/buildbot/#/builders/110/builds/1494.
There was a genuine reverse-iteration failure due to predicates in ExitLimit and ExitNotTaken were stored in SmallPtrSet. I pushed a fix to store them in SmallVectors, doing de-duplication when constructing ExitLimit: 6022a3a |
Also adds additional test coverage in Analysis/ScalarEvolution/trip-count-urem.ll Extra test coverage is for llvm/llvm-project#108777.
…108777) This can help in cases where pointer alignment info is missing, e.g. llvm/llvm-project#108210 The predicate is formed for the complex expression that's passed to SolveLinEquationWithOverflow and the checks could probably be pushed closer to the root nodes, which in some cases may be cheaper to check. PR: llvm/llvm-project#108777
Store predicates in ExitLimit and ExitNotTaken in a SmallVector instead of a SmallPtrSet. This guarantees the predicates can be iterated on in a predictable manner. This ensures the predicates can be printed and generated in a predictable order. This shifts de-duplication of predicates to construction time for ExitLimit. ExitNotTaken just takes predicates from ExitLimit, so they should also be free of duplicates. This was exposed by 2f7ccaf4a8565628a4c7d2b5a49bb45478940be6 (llvm/llvm-project#108777). Should fix https://lab.llvm.org/buildbot/#/builders/110/builds/1494.
Also adds additional test coverage in Analysis/ScalarEvolution/trip-count-urem.ll Extra test coverage is for llvm/llvm-project#108777.
…108777) This can help in cases where pointer alignment info is missing, e.g. llvm/llvm-project#108210 The predicate is formed for the complex expression that's passed to SolveLinEquationWithOverflow and the checks could probably be pushed closer to the root nodes, which in some cases may be cheaper to check. PR: llvm/llvm-project#108777
Store predicates in ExitLimit and ExitNotTaken in a SmallVector instead of a SmallPtrSet. This guarantees the predicates can be iterated on in a predictable manner. This ensures the predicates can be printed and generated in a predictable order. This shifts de-duplication of predicates to construction time for ExitLimit. ExitNotTaken just takes predicates from ExitLimit, so they should also be free of duplicates. This was exposed by 2f7ccaf4a8565628a4c7d2b5a49bb45478940be6 (llvm/llvm-project#108777). Should fix https://lab.llvm.org/buildbot/#/builders/110/builds/1494.
This can help in cases where pointer alignment info is missing, e.g. #108210
The predicate is formed for the complex expression that's passed to SolveLinEquationWithOverflow and the checks could probably be pushed closer to the root nodes, which in some cases may be cheaper to check.