Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
36c68a2
[VPlan] Use pointer to member 0 as VPInterleaveRecipe's pointer arg.
fhahn Sep 25, 2024
725a1e7
[LV] Add
fhahn Aug 15, 2024
8252b0c
[VPlan] Add transformation to narrow interleave groups.
fhahn Jul 21, 2024
00471e2
!fixp address latest comments.
fhahn Sep 25, 2024
61279d1
!fixup address latest comments, thanks!
fhahn Sep 25, 2024
885984d
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Sep 26, 2024
d4cd3aa
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Sep 26, 2024
9d67dcd
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Sep 26, 2024
3637cfb
!fixup address latest comments, thanks!
fhahn Sep 26, 2024
f2dcf3d
!fix formatting
fhahn Sep 27, 2024
4693c6b
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Sep 27, 2024
09f2ee5
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Oct 7, 2024
ee6b265
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Dec 17, 2024
1937f99
!fixup fix after merge, address comments, thanks
fhahn Dec 17, 2024
8edad6b
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Dec 18, 2024
f08c313
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Dec 18, 2024
9312264
!fixup address latest comments, thanks!
fhahn Dec 18, 2024
8aa6cd6
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Jan 7, 2025
b5ada93
!fixup after merging main
fhahn Jan 7, 2025
95cf546
!fixup remove stray whitespace
fhahn Jan 7, 2025
1110761
!fixup reflow comment
fhahn Jan 7, 2025
521d8fc
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Mar 15, 2025
3494339
!fixup update after merge.
fhahn Mar 15, 2025
ac323a7
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Mar 15, 2025
3599a52
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Mar 15, 2025
7755ba9
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Mar 16, 2025
3fd2b8d
!fixup address comments, thanks!
fhahn Mar 16, 2025
b9b4fc2
[LV] Reorganize tests for narrowing interleave group transforms.
fhahn Mar 16, 2025
89d4f13
[VPlan]
fhahn Mar 16, 2025
e127e33
Merge branch 'vplan-narrow-interleave-mem-only2' into vplan-narrow-in…
fhahn Mar 16, 2025
4742f67
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Mar 21, 2025
b45c3aa
!fixup adjustments
fhahn Mar 21, 2025
86ac70a
!fixup address remaining comments.
fhahn Mar 21, 2025
315de55
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Mar 22, 2025
b79c14c
!fixup finalize comments
fhahn Mar 22, 2025
0226cb0
Merge remote-tracking branch 'origin/main' into vplan-narrow-interleave
fhahn Mar 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
Original file line number Diff line number Diff line change
Expand Up @@ -220,9 +220,15 @@ class VPBuilder {
new VPInstruction(Instruction::ICmp, Pred, A, B, DL, Name));
}

VPInstruction *createPtrAdd(VPValue *Ptr, VPValue *Offset, DebugLoc DL,
VPInstruction *createPtrAdd(VPValue *Ptr, VPValue *Offset, DebugLoc DL = {},
const Twine &Name = "") {
return createInstruction(VPInstruction::PtrAdd, {Ptr, Offset}, DL, Name);
return tryInsertInstruction(new VPInstruction(
Ptr, Offset, VPRecipeWithIRFlags::GEPFlagsTy(false), DL, Name));
}
VPValue *createInBoundsPtrAdd(VPValue *Ptr, VPValue *Offset, DebugLoc DL = {},
const Twine &Name = "") {
return tryInsertInstruction(new VPInstruction(
Ptr, Offset, VPRecipeWithIRFlags::GEPFlagsTy(true), DL, Name));
}

VPDerivedIVRecipe *createDerivedIV(InductionDescriptor::InductionKind Kind,
Expand Down
5 changes: 3 additions & 2 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7506,6 +7506,7 @@ LoopVectorizationPlanner::executePlan(
VPlanTransforms::unrollByUF(BestVPlan, BestUF,
OrigLoop->getHeader()->getModule()->getContext());
VPlanTransforms::optimizeForVFAndUF(BestVPlan, BestVF, BestUF, PSE);
VPlanTransforms::narrowInterleaveGroups(BestVPlan, BestVF);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This transform relies on a single VF, but is independent of UF, right? Can it be applied earlier, during optimization of VPlans rather than execution of BestPlan, possibly involving cloning VPlans that span a range of VF's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this might be a good first candidate for cloning/splitting VF range for a VPlan. Will look into that as follow-up!


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should optimizeForVFAndUF() above be renamed eliminateRedundantBranchOnCount()?
Otherwise narrowInterleaveGroups() below would consistently be optimizeForVF().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! There are no other simplifications there so far

LLVM_DEBUG(dbgs() << "Executing best plan with VF=" << BestVF
<< ", UF=" << BestUF << '\n');
Expand Down Expand Up @@ -9005,8 +9006,8 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// Interleave memory: for each Interleave Group we marked earlier as relevant
// for this VPlan, replace the Recipes widening its memory instructions with a
// single VPInterleaveRecipe at its insertion point.
VPlanTransforms::createInterleaveGroups(InterleaveGroups, RecipeBuilder,
CM.isScalarEpilogueAllowed());
VPlanTransforms::createInterleaveGroups(
*Plan, InterleaveGroups, RecipeBuilder, CM.isScalarEpilogueAllowed());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: RecipeBuilder already holds Plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of #10643


for (ElementCount VF : Range)
Plan->addVF(VF);
Expand Down
1 change: 0 additions & 1 deletion llvm/lib/Transforms/Vectorize/VPlan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -954,7 +954,6 @@ void VPlan::prepareToExecute(Value *TripCountV, Value *VectorTripCountV,

IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());
// FIXME: Model VF * UF computation completely in VPlan.
assert(VFxUF.getNumUsers() && "VFxUF expected to always have users");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why dropping this check? If this check is dropped, shall it early exit if VFxUF.getNumUsers() == 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The narrowing changes the loop from stepping by VFxUF to stepping by 1, removing the user of VFxUF. Could only conditionally create it, but I don't think it would cause any test changes, as there only would be a difference for scalable vectors, which isn't supported for narrowing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall it early exit then if VFxUF.getNumUsers() == 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

unsigned UF = getUF();
if (VF.getNumUsers()) {
Value *RuntimeVF = getRuntimeVF(Builder, TCTy, State.VF);
Expand Down
8 changes: 7 additions & 1 deletion llvm/lib/Transforms/Vectorize/VPlan.h
Original file line number Diff line number Diff line change
Expand Up @@ -956,7 +956,6 @@ class VPRecipeWithIRFlags : public VPSingleDefRecipe {
DisjointFlagsTy(bool IsDisjoint) : IsDisjoint(IsDisjoint) {}
};

protected:
struct GEPFlagsTy {
char IsInBounds : 1;
GEPFlagsTy(bool IsInBounds) : IsInBounds(IsInBounds) {}
Expand Down Expand Up @@ -1307,6 +1306,13 @@ class VPInstruction : public VPRecipeWithIRFlags,
assert(Opcode == Instruction::Or && "only OR opcodes can be disjoint");
}

VPInstruction(VPValue *Ptr, VPValue *Offset, GEPFlagsTy Flags = {false},
DebugLoc DL = {}, const Twine &Name = "")
: VPRecipeWithIRFlags(VPDef::VPInstructionSC,
ArrayRef<VPValue *>({Ptr, Offset}),
GEPFlagsTy(Flags), DL),
Opcode(VPInstruction::PtrAdd), Name(Name.str()) {}

VPInstruction(unsigned Opcode, std::initializer_list<VPValue *> Operands,
FastMathFlags FMFs, DebugLoc DL = {}, const Twine &Name = "");

Expand Down
41 changes: 16 additions & 25 deletions llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,9 @@ extern cl::opt<unsigned> ForceTargetInstructionCost;

bool VPRecipeBase::mayWriteToMemory() const {
switch (getVPDefID()) {
case VPInstructionSC: {
return !Instruction::isBinaryOp(cast<VPInstruction>(this)->getOpcode());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about cast or select or other instructions here? Is this still correct for them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, unified the logic separately (68ed172) and remove the change here

case VPInterleaveSC:
return cast<VPInterleaveRecipe>(this)->getNumStoreOperands() > 0;
case VPWidenStoreEVLSC:
Expand All @@ -63,6 +66,7 @@ bool VPRecipeBase::mayWriteToMemory() const {
case VPBranchOnMaskSC:
case VPScalarIVStepsSC:
case VPPredInstPHISC:
case VPVectorPointerSC:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VPVectorPointerSC should still !mayWriteToMemory, but should no longer assert not having an underlying IR which does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored to original, thanks

return false;
case VPBlendSC:
case VPReductionEVLSC:
Expand Down Expand Up @@ -644,7 +648,8 @@ Value *VPInstruction::generate(VPTransformState &State) {
"can only generate first lane for PtrAdd");
Value *Ptr = State.get(getOperand(0), /* IsScalar */ true);
Value *Addend = State.get(getOperand(1), /* IsScalar */ true);
return Builder.CreatePtrAdd(Ptr, Addend, Name);
return isInBounds() ? Builder.CreateInBoundsPtrAdd(Ptr, Addend, Name)
: Builder.CreatePtrAdd(Ptr, Addend, Name);
}
case VPInstruction::ResumePhi: {
Value *IncomingFromVPlanPred =
Expand Down Expand Up @@ -2470,51 +2475,37 @@ void VPInterleaveRecipe::execute(VPTransformState &State) {
unsigned InterleaveFactor = Group->getFactor();
auto *VecTy = VectorType::get(ScalarTy, State.VF * InterleaveFactor);

// Prepare for the new pointers.
unsigned Index = Group->getIndex(Instr);

// TODO: extend the masked interleaved-group support to reversed access.
VPValue *BlockInMask = getMask();
assert((!BlockInMask || !Group->isReverse()) &&
"Reversed masked interleave-group not supported.");

Value *Idx;
Value *Index;
// If the group is reverse, adjust the index to refer to the last vector lane
// instead of the first. We adjust the index from the first vector lane,
// rather than directly getting the pointer for lane VF - 1, because the
// pointer operand of the interleaved access is supposed to be uniform.
if (Group->isReverse()) {
Value *RuntimeVF =
getRuntimeVF(State.Builder, State.Builder.getInt32Ty(), State.VF);
Idx = State.Builder.CreateSub(RuntimeVF, State.Builder.getInt32(1));
Idx = State.Builder.CreateMul(Idx,
State.Builder.getInt32(Group->getFactor()));
Idx = State.Builder.CreateAdd(Idx, State.Builder.getInt32(Index));
Idx = State.Builder.CreateNeg(Idx);
} else
Idx = State.Builder.getInt32(-Index);
Index = State.Builder.CreateSub(RuntimeVF, State.Builder.getInt32(1));
Index = State.Builder.CreateMul(Index,
State.Builder.getInt32(Group->getFactor()));
Index = State.Builder.CreateNeg(Index);
} else {
// TODO: Drop redundant 0-index GEP as follow-up.
Index = State.Builder.getInt32(0);
}

VPValue *Addr = getAddr();
Value *ResAddr = State.get(Addr, VPLane(0));
if (auto *I = dyn_cast<Instruction>(ResAddr))
State.setDebugLocFrom(I->getDebugLoc());

// Notice current instruction could be any index. Need to adjust the address
// to the member of index 0.
//
// E.g. a = A[i+1]; // Member of index 1 (Current instruction)
// b = A[i]; // Member of index 0
// Current pointer is pointed to A[i+1], adjust it to A[i].
//
// E.g. A[i+1] = a; // Member of index 1
// A[i] = b; // Member of index 0
// A[i+2] = c; // Member of index 2 (Current instruction)
// Current pointer is pointed to A[i+2], adjust it to A[i].

bool InBounds = false;
if (auto *gep = dyn_cast<GetElementPtrInst>(ResAddr->stripPointerCasts()))
InBounds = gep->isInBounds();
ResAddr = State.Builder.CreateGEP(ScalarTy, ResAddr, Idx, "", InBounds);
ResAddr = State.Builder.CreateGEP(ScalarTy, ResAddr, Index, "", InBounds);

State.setDebugLocFrom(Instr->getDebugLoc());
Value *PoisonVec = PoisonValue::get(VecTy);
Expand Down
176 changes: 170 additions & 6 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@
#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/PatternMatch.h"

#define LV_NAME "loop-vectorize"
#define DEBUG_TYPE LV_NAME

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for LLVM_DEBUG below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in latest version, thanks

using namespace llvm;

void VPlanTransforms::VPInstructionsToVPRecipes(
Expand Down Expand Up @@ -710,6 +713,7 @@ void VPlanTransforms::optimizeForVFAndUF(VPlan &Plan, ElementCount BestVF,
// TODO: Further simplifications are possible
// 1. Replace inductions with constants.
// 2. Replace vector loop region with VPBasicBlock.
//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!

}

/// Sink users of \p FOR after the recipe defining the previous value \p
Expand Down Expand Up @@ -1590,14 +1594,19 @@ void VPlanTransforms::dropPoisonGeneratingRecipes(
}

void VPlanTransforms::createInterleaveGroups(
const SmallPtrSetImpl<const InterleaveGroup<Instruction> *> &InterleaveGroups,
VPlan &Plan,
const SmallPtrSetImpl<const InterleaveGroup<Instruction> *>
&InterleaveGroups,
VPRecipeBuilder &RecipeBuilder, bool ScalarEpilogueAllowed) {
if (InterleaveGroups.empty())
return;

// Interleave memory: for each Interleave Group we marked earlier as relevant
// for this VPlan, replace the Recipes widening its memory instructions with a
// single VPInterleaveRecipe at its insertion point.
VPDominatorTree VPDT;
VPDT.recalculate(Plan);
for (const auto *IG : InterleaveGroups) {
auto *Recipe =
cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IG->getInsertPos()));
SmallVector<VPValue *, 4> StoredValues;
for (unsigned i = 0; i < IG->getFactor(); ++i)
if (auto *SI = dyn_cast_or_null<StoreInst>(IG->getMember(i))) {
Expand All @@ -1607,9 +1616,38 @@ void VPlanTransforms::createInterleaveGroups(

bool NeedsMaskForGaps =
IG->requiresScalarEpilogue() && !ScalarEpilogueAllowed;
auto *VPIG = new VPInterleaveRecipe(IG, Recipe->getAddr(), StoredValues,
Recipe->getMask(), NeedsMaskForGaps);
VPIG->insertBefore(Recipe);

Instruction *IRInsertPos = IG->getInsertPos();
auto *InsertPos =
cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IRInsertPos));

// Get or create the start address for the interleave group.
auto *Start =
cast<VPWidenMemoryRecipe>(RecipeBuilder.getRecipe(IG->getMember(0)));
VPValue *Addr = Start->getAddr();
if (!VPDT.properlyDominates(Addr->getDefiningRecipe(), InsertPos)) {
bool InBounds = false;
if (auto *Gep = dyn_cast<GetElementPtrInst>(
getLoadStorePointerOperand(IRInsertPos)->stripPointerCasts()))
InBounds = Gep->isInBounds();

// We cannot re-use the address of the first member because it does not
// dominate the insert position. Use the address of the insert position
// and create a PtrAdd to adjust the index to start at the first member.
APInt Offset(32,
getLoadStoreType(IRInsertPos)->getScalarSizeInBits() / 8 *
IG->getIndex(IRInsertPos),
/*IsSigned=*/true);
VPValue *OffsetVPV = Plan.getOrAddLiveIn(
ConstantInt::get(IRInsertPos->getParent()->getContext(), -Offset));
VPBuilder B(InsertPos);
Addr = InBounds ? B.createInBoundsPtrAdd(InsertPos->getAddr(), OffsetVPV)
: B.createPtrAdd(InsertPos->getAddr(), OffsetVPV);
}
auto *VPIG = new VPInterleaveRecipe(IG, Addr, StoredValues,
InsertPos->getMask(), NeedsMaskForGaps);
VPIG->insertBefore(InsertPos);

unsigned J = 0;
for (unsigned i = 0; i < IG->getFactor(); ++i)
if (Instruction *Member = IG->getMember(i)) {
Expand All @@ -1623,3 +1661,129 @@ void VPlanTransforms::createInterleaveGroups(
}
}
}

static bool supportedLoad(VPWidenRecipe *R0, VPValue *V, unsigned Idx) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static bool supportedLoad(VPWidenRecipe *R0, VPValue *V, unsigned Idx) {
// Find a more descripting name for function, "supportedLoad" is too general.
// Explain what this function does. Trying to help:
// It checks if an operand of an interleaved store member is either a wide load or a member
// of an interleaved load. In the former case, same wide load must also feed the first
// interleaved store member (i.e., is "index independent", or "uniform"(?)). In the latter case, the index of
// the interleave load member must match that of the interleave store member it feeds.
static bool supportedLoad(VPWidenRecipe *StoreMember0, VPValue *OperandOfStoreMemberIdx, unsigned Idx) {

if (auto *W = dyn_cast_or_null<VPWidenLoadRecipe>(V->getDefiningRecipe())) {
if (W->getMask())
return false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (W->getMask())
return false;
return !W->getMask();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped the check and return as it is subsumed by the check below, thanks!

return !W->getMask() && (R0->getOperand(0) == V || R0->getOperand(1) == V);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must R0 have (at-least/exactly) 2 operands?
All interleaved store members are widen recipes having same opcode, but it may be unary?

}

if (auto *IR = dyn_cast_or_null<VPInterleaveRecipe>(V->getDefiningRecipe())) {
return IR->getInterleaveGroup()->getFactor() ==
IR->getInterleaveGroup()->getNumMembers() &&
IR->getVPValue(Idx) == V;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All IR's are checked to be "consecutive" when collecting StoreGroups in narrowInterleaveGroups(), including all load IR's. So here IR's factor is known to be equal to |members|. Moreover, both are known to be equal to VF. Implying that IR defines at-least Idx(+1) values, so it's ok to getVPValue(Idx).

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop braces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

return false;
}

/// Returns true if \p IR is a full interleave group with factor and number of
/// members both equal to \p VF.
static bool isConsecutiveInterleaveGroup(VPInterleaveRecipe *IR,
ElementCount VF) {
if (!IR)
return false;
auto IG = IR->getInterleaveGroup();
return IG->getFactor() == IG->getNumMembers() &&
IG->getNumMembers() == VF.getFixedValue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pass in VF's fixed value, and check if IG->getFactor() == VF && IG->getNumMembers() == VF, as documented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

}

void VPlanTransforms::narrowInterleaveGroups(VPlan &Plan, ElementCount VF) {
using namespace llvm::VPlanPatternMatch;
if (VF.isScalable())
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now get VF's fixed value for what follows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks


SmallVector<VPInterleaveRecipe *> StoreGroups;
for (auto &R : *Plan.getVectorLoopRegion()->getEntryBasicBlock()) {
if (match(&R, m_BranchOnCount(m_VPValue(), m_VPValue())) ||
isa<VPCanonicalIVPHIRecipe>(&R))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (match(&R, m_BranchOnCount(m_VPValue(), m_VPValue())) ||
isa<VPCanonicalIVPHIRecipe>(&R))
if (isa<VPCanonicalIVPHIRecipe>(&R) ||
match(&R, m_BranchOnCount(m_VPValue(), m_VPValue())))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

continue;

// Bail out on recipes not supported at the moment:
// * phi recipes other than the canonical induction
// * recipes writing to memory except interleave groups
// Only support plans with a canonical induction phi.
if (R.isPhi())
return;

auto *IR = dyn_cast<VPInterleaveRecipe>(&R);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *IR = dyn_cast<VPInterleaveRecipe>(&R);
auto *InterleaveR = dyn_cast<VPInterleaveRecipe>(&R);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

if (R.mayWriteToMemory() && !IR)
return;

if (!IR)
continue;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All non-IR loads are allowed, but all IR loads must be consecutive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we are checking all consumers, add comment, thanks!

if (!isConsecutiveInterleaveGroup(IR, VF))
return;
if (IR->getStoredValues().empty())
continue;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (IR->getStoredValues().empty())
continue;
if (IR->getStoredValues().empty())
continue;

(sequence of early continues and bail outs should be explained with comments)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!


auto *Lane0 = dyn_cast_or_null<VPWidenRecipe>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *Lane0 = dyn_cast_or_null<VPWidenRecipe>(
auto *WidenMember0 = dyn_cast_or_null<VPWidenRecipe>(

? Although there are admittedly VF members.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thanks!

IR->getStoredValues()[0]->getDefiningRecipe());
if (!Lane0)
return;
for (const auto &[I, V] : enumerate(IR->getStoredValues())) {
auto *R = dyn_cast<VPWidenRecipe>(V->getDefiningRecipe());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto *R = dyn_cast<VPWidenRecipe>(V->getDefiningRecipe());
auto *WidenMemberI = dyn_cast_or_null<VPWidenRecipe>(V->getDefiningRecipe());

V could be a live-in, use dyn_cast_or_null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, it must have a defining recipe, as guaranteed by canNarrowLoad. To be generalized in the future.

if (!R || R->getOpcode() != Lane0->getOpcode())
return;
// Work around captured structured bindings being a C++20 extension.
auto Idx = I;
if (any_of(R->operands(), [Lane0, Idx](VPValue *V) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto Idx = I;
if (any_of(R->operands(), [Lane0, Idx](VPValue *V) {
if (any_of(R->operands(), [Lane0, Idx=I](VPValue *V) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I think originally Clang complained about this being an extension. Looks fine now, thanks!

return !supportedLoad(Lane0, V, Idx);
}))
return;
}

StoreGroups.push_back(IR);
}

if (StoreGroups.empty())
return;

// Narrow operation tree rooted at store groups.
for (auto *StoreGroup : StoreGroups) {
auto *Lane0 = cast<VPWidenRecipe>(
StoreGroup->getStoredValues()[0]->getDefiningRecipe());

unsigned LoadGroupIdx =
isa<VPInterleaveRecipe>(Lane0->getOperand(1)->getDefiningRecipe()) ? 1
: 0;
unsigned WideLoadIdx = 1 - LoadGroupIdx;
auto *LoadGroup = cast<VPInterleaveRecipe>(
Lane0->getOperand(LoadGroupIdx)->getDefiningRecipe());

auto *WideLoad = cast<VPWidenLoadRecipe>(
Lane0->getOperand(WideLoadIdx)->getDefiningRecipe());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Lane0 is expected to have two operands, one being a wide load and the other a (first) member of an interleave load. Is this guaranteed by the calls to supportedLoad() above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code has been generalized to narrow all ops (and also support unary ops)


// Narrow wide load to uniform scalar load, as transformed VPlan will only
// process one original iteration.
auto *N = new VPReplicateRecipe(&WideLoad->getIngredient(),
WideLoad->operands(), true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment for true argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

// Narrow interleave group to wide load, as transformed VPlan will only
// process one original iteration.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Narrow interleave group to wide load, as transformed VPlan will only
// process one original iteration.
// Narrow interleave load group from loading VF*VF elements to a wide load of VF elements,
// corresponding to the VF interleave group members, as the transformed VPlan will only process one
// original iteration rather than VF iterations.

? (Interleave group employs a "very wide" load, which is "narrowed" down to a "wide" load.)

auto *L = new VPWidenLoadRecipe(
*cast<LoadInst>(LoadGroup->getInterleaveGroup()->getInsertPos()),
LoadGroup->getAddr(), LoadGroup->getMask(), true, false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments for true and false arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks

LoadGroup->getDebugLoc());
L->insertBefore(LoadGroup);
N->insertBefore(LoadGroup);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better use more descripting variable names.

Lane0->setOperand(LoadGroupIdx, L);
Lane0->setOperand(WideLoadIdx, N);

auto *S = new VPWidenStoreRecipe(
*cast<StoreInst>(StoreGroup->getInterleaveGroup()->getInsertPos()),
StoreGroup->getAddr(), Lane0, nullptr, true, false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments for true and false arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks

StoreGroup->getDebugLoc());
S->insertBefore(StoreGroup);
StoreGroup->eraseFromParent();
}

// Adjust induction to reflect that the transformed plan only processes one
// original iteration.
auto *CanIV = Plan.getCanonicalIV();
VPInstruction *Inc = cast<VPInstruction>(CanIV->getBackedgeValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VPInstruction *Inc = cast<VPInstruction>(CanIV->getBackedgeValue());
auto *Inc = cast<VPInstruction>(CanIV->getBackedgeValue());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

Inc->setOperand(
1, Plan.getOrAddLiveIn(ConstantInt::get(CanIV->getScalarType(), 1)));
removeDeadRecipes(Plan);
LLVM_DEBUG(dbgs() << "Narrowed interleave\n");
}
6 changes: 5 additions & 1 deletion llvm/lib/Transforms/Vectorize/VPlanTransforms.h
Original file line number Diff line number Diff line change
Expand Up @@ -114,11 +114,15 @@ struct VPlanTransforms {
// widening its memory instructions with a single VPInterleaveRecipe at its
// insertion point.
static void createInterleaveGroups(
const SmallPtrSetImpl<const InterleaveGroup<Instruction> *> &InterleaveGroups,
VPlan &Plan,
const SmallPtrSetImpl<const InterleaveGroup<Instruction> *>
&InterleaveGroups,
VPRecipeBuilder &RecipeBuilder, bool ScalarEpilogueAllowed);

/// Remove dead recipes from \p Plan.
static void removeDeadRecipes(VPlan &Plan);

static void narrowInterleaveGroups(VPlan &Plan, ElementCount VF);
};

} // namespace llvm
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@ define void @interleaved_store_first_order_recurrence(ptr noalias %src, ptr %dst
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[VECTOR_RECUR]], <4 x i32> [[BROADCAST_SPLAT]], <4 x i32> <i32 3, i32 4, i32 5, i32 6>
; CHECK-NEXT: [[TMP3:%.*]] = mul nuw nsw i64 [[TMP0]], 3
; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[DST:%.*]], i64 [[TMP3]]
; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[TMP4]], i64 2
; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP6]], i32 -2
; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i32, ptr [[TMP4]], i32 0
; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x i32> zeroinitializer, <4 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLAT]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <8 x i32> [[TMP9]], <8 x i32> [[TMP10]], <12 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>
Expand Down
Loading