-
Notifications
You must be signed in to change notification settings - Fork 14.2k
[SLP] NFC: Simplify CandidateVFs initialization #144882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SLP] NFC: Simplify CandidateVFs initialization #144882
Conversation
Also adds a comment to clarify the meaning of MaxRegVF.
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Sander de Smalen (sdesmalen-arm) ChangesAlso adds a comment to clarify the meaning of MaxRegVF. Full diff: https://github.com/llvm/llvm-project/pull/144882.diff 1 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 4551a365a6967..750c93a439c16 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -21198,7 +21198,11 @@ bool SLPVectorizerPass::vectorizeStores(
}
}
+ // MaxRegVF represents the number of instructions (scalar, or vector in
+ // case of revec) that can be vectorized to naturally fit in a vector
+ // register.
unsigned MaxRegVF = MaxVF;
+
MaxVF = std::min<unsigned>(MaxVF, bit_floor(Operands.size()));
if (MaxVF < MinVF) {
LLVM_DEBUG(dbgs() << "SLP: Vectorization infeasible as MaxVF (" << MaxVF
@@ -21207,13 +21211,11 @@ bool SLPVectorizerPass::vectorizeStores(
continue;
}
- unsigned Sz = 1 + Log2_32(MaxVF) - Log2_32(MinVF);
- SmallVector<unsigned> CandidateVFs(Sz + (NonPowerOf2VF > 0 ? 1 : 0));
- unsigned Size = MinVF;
- for (unsigned &VF : reverse(CandidateVFs)) {
- VF = Size > MaxVF ? NonPowerOf2VF : Size;
- Size *= 2;
- }
+ SmallVector<unsigned> CandidateVFs;
+ for (unsigned VF = std::max(MaxVF, NonPowerOf2VF); VF >= MinVF;
+ VF = divideCeil(VF, 2))
+ CandidateVFs.push_back(VF);
+
unsigned End = Operands.size();
unsigned Repeat = 0;
constexpr unsigned MaxAttempts = 4;
|
SmallVector<unsigned> CandidateVFs; | ||
for (unsigned VF = std::max(MaxVF, NonPowerOf2VF); VF >= MinVF; | ||
VF = divideCeil(VF, 2)) | ||
CandidateVFs.push_back(VF); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to keep the preallocated version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for making this change is because I found it really hard to understand:
- What the min-max range of resulting VFs are.
- Whether the list of VF goes from high to low, or low to high.
The parts that I found specifically difficult to follow are:
- Calculation for the size of the SmallVector. This requires adding and subtracting logarithm of min/max VF, and a conditional
+ 1
. I'm also not sure what the value is of passing a size to the constructor. By default this list will be small and SmallVector already allocates a minimum size which is likely to be big enough in practice. - The
reverse
when iterating the values inCandidateVFs
to initialise the VF seemed unnecessary.
I actually had to print the data in CandidateVFs to confirm the code did what I suspected it did :)
In the current code I think it is easier to see what the minimum and maximum values are (the loop starts either at MaxVF
or NonPowerOf2VF
, and ends at MinVF), and to understand the increment (it goes from high -> low, dividing by 2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do prefer the push_back
based version. Computing the size of the vector beforehand adds a lot of unnecessary complexity, whereas we can just push the candidate VFs one by one and not worry about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for making this change is because I found it really hard to understand:
- What the min-max range of resulting VFs are.
- Whether the list of VF goes from high to low, or low to high.
The code is pretty small, so I don't see much difference
The parts that I found specifically difficult to follow are:
- Calculation for the size of the SmallVector. This requires adding and subtracting logarithm of min/max VF, and a conditional
+ 1
. I'm also not sure what the value is of passing a size to the constructor. By default this list will be small and SmallVector already allocates a minimum size which is likely to be big enough in practice.
It highly depends on the target. True for something like X86 to ARM NEON, false for others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It highly depends on the target. True for something like X86 to ARM NEON, false for others
I see how for very wide vectors it may need to allocate a wider buffer. But I doubt that has any meaningful impact on compile-time in practice, given that this code is not on a critical path. Unless that assumption is false, I would favour readability over performance.
The code is pretty small, so I don't see much difference
All I can say is that @gbossu and myself tripped over this code recently and we find this change an improvement in the readability of this function. I hope that you're open to accepting this change, to make the code easier for us and others to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if you think it makes it easier to understand, though for me the original version is much easier to read :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I also did find that code difficult to understand when I looked at it some weeks ago. Having a straight-forward initialization loop like in this PR would have helped.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/88/builds/13060 Here is the relevant piece of the build log for the reference
|
Also adds a comment to clarify the meaning of MaxRegVF.