-
Notifications
You must be signed in to change notification settings - Fork 343
[MicroBenchmark,LoopInterleaving] Re-land - Check performance impact of Loop Interleaving Count with varying loop iterations #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
nilanjana87
merged 12 commits into
llvm:main
from
nilanjana87:loop_interleaving_microbenchmark_reland
Dec 8, 2023
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…nterleaving Count with varying loop iterations. This microbenchmark attempts to find the right loop trip count threshold for deciding whether to interleave a loop or not for different types of loops, such as loops with or without reduction inside it, loops with or without vectorization inside it. Note: Interleaving count of 1 means interleaving is disabled. Differential Revision: https://reviews.llvm.org/D159475
…ed a redundant function, added test cases where the compiler selects the vectorization configuration
…d vectorization hints with preprocessor-macro-based template functions, as per reviewer comments
…duced test set by default
This was referenced Nov 28, 2023
antmox
approved these changes
Dec 5, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This one should be fine IMO.
~250secs instead of 2500secs for the 1st version.
nilanjana87
added a commit
to nilanjana87/llvm-project
that referenced
this pull request
Dec 12, 2023
A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial in two cases: 1) when TC > 2 * VW * IC, such that the interleaved vectorized portion of the loop runs at least twice 2) when TC is an exact multiple of VW * IC, such that there is no epilogue loop to run where, TC = trip count, VW = vectorization width, IC = interleaving count We change the interleave count computation based on this information but we leave it the same when the flag InterleaveSmallLoopScalarReductionTrue is set to true, since it handles a special case (https://reviews.llvm.org/D81416).
nilanjana87
added a commit
to nilanjana87/llvm-project
that referenced
this pull request
Dec 12, 2023
A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial in two cases: 1) when TC > 2 * VW * IC, such that the interleaved vectorized portion of the loop runs at least twice 2) when TC is an exact multiple of VW * IC, such that there is no epilogue loop to run where, TC = trip count, VW = vectorization width, IC = interleaving count We change the interleave count computation based on this information but we leave it the same when the flag InterleaveSmallLoopScalarReductionTrue is set to true, since it handles a special case (https://reviews.llvm.org/D81416).
nilanjana87
added a commit
to nilanjana87/llvm-project
that referenced
this pull request
Dec 12, 2023
A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial in two cases: 1) when TC > 2 * VW * IC, such that the interleaved vectorized portion of the loop runs at least twice 2) when TC is an exact multiple of VW * IC, such that there is no epilogue loop to run where, TC = trip count, VW = vectorization width, IC = interleaving count We change the interleave count computation based on this information but we leave it the same when the flag InterleaveSmallLoopScalarReductionTrue is set to true, since it handles a special case (https://reviews.llvm.org/D81416).
nilanjana87
added a commit
to nilanjana87/llvm-project
that referenced
this pull request
Jan 3, 2024
A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the post-vectorization remainder tail of the loop is minimal in cases where the vector loop gets to run only a few times. This patch attempts to compute interleaving count (IC) based on the trip count so as to minimize the remainder tail while maximizing the IC.
nilanjana87
added a commit
to llvm/llvm-project
that referenced
this pull request
Jan 4, 2024
[LV] Change loops' interleave count computation A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both. The initial tests for this change were submitted in PRs: #70272 and #74689.
nilanjana87
added a commit
to swiftlang/llvm-project
that referenced
this pull request
Mar 5, 2024
[LV] Change loops' interleave count computation A set of microbenchmarks in llvm-test-suite (llvm/llvm-test-suite#56), when tested on a AArch64 platform, demonstrates that loop interleaving is beneficial when the vector loop runs at least twice or when the epilogue loop trip count (TC) is minimal. Therefore, we choose interleaving count (IC) between TC/VF & TC/2*VF (VF = vectorization factor), such that remainder TC for the epilogue loop is minimum while the IC is maximum in case the remainder TC is same for both. The initial tests for this change were submitted in PRs: llvm#70272 and llvm#74689.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is for re-committing the microbenchmark that attempts to find the right loop trip count threshold for deciding whether to interleave a loop or not for different types of loops, such as loops with or without reduction inside it, loops with or without vectorization inside it. For context, the loop vectorizer uses a threshold of TinyTripCountInterleaveThreshold that is currently set to 128.
The timeout issue in the previous PR #26 is solved in this PR by using a pre-processor configuration flag of ALL_LOOP_IC_TESTS, which must be passed during compilation to run the whole set of tests, whereas by default it runs a reduced test set.