Skip to content

[SLP] NFC: Simplify CandidateVFs initialization #144882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 20, 2025

Conversation

sdesmalen-arm
Copy link
Collaborator

Also adds a comment to clarify the meaning of MaxRegVF.

Also adds a comment to clarify the meaning of MaxRegVF.
@llvmbot
Copy link
Member

llvmbot commented Jun 19, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Sander de Smalen (sdesmalen-arm)

Changes

Also adds a comment to clarify the meaning of MaxRegVF.


Full diff: https://github.com/llvm/llvm-project/pull/144882.diff

1 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+9-7)
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 4551a365a6967..750c93a439c16 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -21198,7 +21198,11 @@ bool SLPVectorizerPass::vectorizeStores(
         }
       }
 
+      // MaxRegVF represents the number of instructions (scalar, or vector in
+      // case of revec) that can be vectorized to naturally fit in a vector
+      // register.
       unsigned MaxRegVF = MaxVF;
+
       MaxVF = std::min<unsigned>(MaxVF, bit_floor(Operands.size()));
       if (MaxVF < MinVF) {
         LLVM_DEBUG(dbgs() << "SLP: Vectorization infeasible as MaxVF (" << MaxVF
@@ -21207,13 +21211,11 @@ bool SLPVectorizerPass::vectorizeStores(
         continue;
       }
 
-      unsigned Sz = 1 + Log2_32(MaxVF) - Log2_32(MinVF);
-      SmallVector<unsigned> CandidateVFs(Sz + (NonPowerOf2VF > 0 ? 1 : 0));
-      unsigned Size = MinVF;
-      for (unsigned &VF : reverse(CandidateVFs)) {
-        VF = Size > MaxVF ? NonPowerOf2VF : Size;
-        Size *= 2;
-      }
+      SmallVector<unsigned> CandidateVFs;
+      for (unsigned VF = std::max(MaxVF, NonPowerOf2VF); VF >= MinVF;
+           VF = divideCeil(VF, 2))
+        CandidateVFs.push_back(VF);
+
       unsigned End = Operands.size();
       unsigned Repeat = 0;
       constexpr unsigned MaxAttempts = 4;

Comment on lines +21214 to +21218
SmallVector<unsigned> CandidateVFs;
for (unsigned VF = std::max(MaxVF, NonPowerOf2VF); VF >= MinVF;
VF = divideCeil(VF, 2))
CandidateVFs.push_back(VF);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to keep the preallocated version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for making this change is because I found it really hard to understand:

  • What the min-max range of resulting VFs are.
  • Whether the list of VF goes from high to low, or low to high.

The parts that I found specifically difficult to follow are:

  • Calculation for the size of the SmallVector. This requires adding and subtracting logarithm of min/max VF, and a conditional + 1. I'm also not sure what the value is of passing a size to the constructor. By default this list will be small and SmallVector already allocates a minimum size which is likely to be big enough in practice.
  • The reverse when iterating the values in CandidateVFs to initialise the VF seemed unnecessary.

I actually had to print the data in CandidateVFs to confirm the code did what I suspected it did :)

In the current code I think it is easier to see what the minimum and maximum values are (the loop starts either at MaxVF or NonPowerOf2VF, and ends at MinVF), and to understand the increment (it goes from high -> low, dividing by 2).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do prefer the push_back based version. Computing the size of the vector beforehand adds a lot of unnecessary complexity, whereas we can just push the candidate VFs one by one and not worry about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for making this change is because I found it really hard to understand:

  • What the min-max range of resulting VFs are.
  • Whether the list of VF goes from high to low, or low to high.

The code is pretty small, so I don't see much difference

The parts that I found specifically difficult to follow are:

  • Calculation for the size of the SmallVector. This requires adding and subtracting logarithm of min/max VF, and a conditional + 1. I'm also not sure what the value is of passing a size to the constructor. By default this list will be small and SmallVector already allocates a minimum size which is likely to be big enough in practice.

It highly depends on the target. True for something like X86 to ARM NEON, false for others

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It highly depends on the target. True for something like X86 to ARM NEON, false for others

I see how for very wide vectors it may need to allocate a wider buffer. But I doubt that has any meaningful impact on compile-time in practice, given that this code is not on a critical path. Unless that assumption is false, I would favour readability over performance.

The code is pretty small, so I don't see much difference

All I can say is that @gbossu and myself tripped over this code recently and we find this change an improvement in the readability of this function. I hope that you're open to accepting this change, to make the code easier for us and others to understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if you think it makes it easier to understand, though for me the original version is much easier to read :)

Copy link
Contributor

@gbossu gbossu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I also did find that code difficult to understand when I looked at it some weeks ago. Having a straight-forward initialization loop like in this PR would have helped.

@sdesmalen-arm sdesmalen-arm merged commit 8747736 into llvm:main Jun 20, 2025
10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jun 20, 2025

LLVM Buildbot has detected a new failure on builder openmp-s390x-linux running on systemz-1 while building llvm at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/88/builds/13060

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libomp :: tasking/issue-94260-2.c' FAILED ********************
Exit Code: -11

Command Output (stdout):
--
# RUN: at line 1
/home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang -fopenmp   -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic && /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang -fopenmp -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -11

--

********************


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants