-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[VectorCombine] foldBitcastShuffle - peek through any residual bitcasts before creating a new bitcast on top #86119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Simon Pilgrim (RKSimon) ChangesEncountered while working on #67803, wading through the chains of bitcasts that SSE intrinsics introduces - this patch helps prevents cases where the bitcast chains aren't cleared out and we can't perform further combines until after InstCombine/InstSimplify has run. I'm assuming we can't safely put this inside IRBuilderBase.CreateBitCast? Full diff: https://github.com/llvm/llvm-project/pull/86119.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
index 7e86137f23f3c8..87e05dde545cd5 100644
--- a/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
+++ b/llvm/lib/Transforms/Vectorize/VectorCombine.cpp
@@ -135,6 +135,18 @@ class VectorCombine {
};
} // namespace
+/// Return the source operand of a potentially bitcasted value while
+/// optionally checking if it has one use. If there is no bitcast or the one
+/// use check is not met, return the input value itself.
+static Value *peekThroughBitcast(Value *V, bool OneUseOnly = false) {
+ if (auto *BitCast = dyn_cast<BitCastInst>(V))
+ if (!OneUseOnly || BitCast->hasOneUse())
+ return BitCast->getOperand(0);
+
+ // V is not a bitcast or V has more than one use and OneUseOnly is true.
+ return V;
+}
+
static bool canWidenLoad(LoadInst *Load, const TargetTransformInfo &TTI) {
// Do not widen load if atomic/volatile or under asan/hwasan/memtag/tsan.
// The widened load may load data from dirty regions or create data races
@@ -751,8 +763,8 @@ bool VectorCombine::foldBitcastShuffle(Instruction &I) {
// bitcast (shuf V0, V1, MaskC) --> shuf (bitcast V0), (bitcast V1), MaskC'
++NumShufOfBitcast;
- Value *CastV0 = Builder.CreateBitCast(V0, NewShuffleTy);
- Value *CastV1 = Builder.CreateBitCast(V1, NewShuffleTy);
+ Value *CastV0 = Builder.CreateBitCast(peekThroughBitcast(V0), NewShuffleTy);
+ Value *CastV1 = Builder.CreateBitCast(peekThroughBitcast(V1), NewShuffleTy);
Value *Shuf = Builder.CreateShuffleVector(CastV0, CastV1, NewMask);
replaceValue(I, *Shuf);
return true;
diff --git a/llvm/test/Transforms/VectorCombine/X86/shuffle-inseltpoison.ll b/llvm/test/Transforms/VectorCombine/X86/shuffle-inseltpoison.ll
index 8c5c6656ca1795..74a58c8d313611 100644
--- a/llvm/test/Transforms/VectorCombine/X86/shuffle-inseltpoison.ll
+++ b/llvm/test/Transforms/VectorCombine/X86/shuffle-inseltpoison.ll
@@ -133,8 +133,7 @@ define <2 x i64> @PR35454_1(<2 x i64> %v) {
; SSE-NEXT: ret <2 x i64> [[BC3]]
;
; AVX-LABEL: @PR35454_1(
-; AVX-NEXT: [[BC:%.*]] = bitcast <2 x i64> [[V:%.*]] to <4 x i32>
-; AVX-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[BC]] to <16 x i8>
+; AVX-NEXT: [[TMP1:%.*]] = bitcast <2 x i64> [[V:%.*]] to <16 x i8>
; AVX-NEXT: [[BC1:%.*]] = shufflevector <16 x i8> [[TMP1]], <16 x i8> poison, <16 x i32> <i32 12, i32 13, i32 14, i32 15, i32 8, i32 9, i32 10, i32 11, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
; AVX-NEXT: [[ADD:%.*]] = shl <16 x i8> [[BC1]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
; AVX-NEXT: [[BC2:%.*]] = bitcast <16 x i8> [[ADD]] to <4 x i32>
@@ -164,8 +163,7 @@ define <2 x i64> @PR35454_2(<2 x i64> %v) {
; SSE-NEXT: ret <2 x i64> [[BC3]]
;
; AVX-LABEL: @PR35454_2(
-; AVX-NEXT: [[BC:%.*]] = bitcast <2 x i64> [[V:%.*]] to <4 x i32>
-; AVX-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[BC]] to <8 x i16>
+; AVX-NEXT: [[TMP1:%.*]] = bitcast <2 x i64> [[V:%.*]] to <8 x i16>
; AVX-NEXT: [[BC1:%.*]] = shufflevector <8 x i16> [[TMP1]], <8 x i16> poison, <8 x i32> <i32 6, i32 7, i32 4, i32 5, i32 2, i32 3, i32 0, i32 1>
; AVX-NEXT: [[ADD:%.*]] = shl <8 x i16> [[BC1]], <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
; AVX-NEXT: [[BC2:%.*]] = bitcast <8 x i16> [[ADD]] to <4 x i32>
diff --git a/llvm/test/Transforms/VectorCombine/X86/shuffle.ll b/llvm/test/Transforms/VectorCombine/X86/shuffle.ll
index 60cfc4d4b07079..d1484fd5ab3399 100644
--- a/llvm/test/Transforms/VectorCombine/X86/shuffle.ll
+++ b/llvm/test/Transforms/VectorCombine/X86/shuffle.ll
@@ -133,8 +133,7 @@ define <2 x i64> @PR35454_1(<2 x i64> %v) {
; SSE-NEXT: ret <2 x i64> [[BC3]]
;
; AVX-LABEL: @PR35454_1(
-; AVX-NEXT: [[BC:%.*]] = bitcast <2 x i64> [[V:%.*]] to <4 x i32>
-; AVX-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[BC]] to <16 x i8>
+; AVX-NEXT: [[TMP1:%.*]] = bitcast <2 x i64> [[V:%.*]] to <16 x i8>
; AVX-NEXT: [[BC1:%.*]] = shufflevector <16 x i8> [[TMP1]], <16 x i8> poison, <16 x i32> <i32 12, i32 13, i32 14, i32 15, i32 8, i32 9, i32 10, i32 11, i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
; AVX-NEXT: [[ADD:%.*]] = shl <16 x i8> [[BC1]], <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
; AVX-NEXT: [[BC2:%.*]] = bitcast <16 x i8> [[ADD]] to <4 x i32>
@@ -164,8 +163,7 @@ define <2 x i64> @PR35454_2(<2 x i64> %v) {
; SSE-NEXT: ret <2 x i64> [[BC3]]
;
; AVX-LABEL: @PR35454_2(
-; AVX-NEXT: [[BC:%.*]] = bitcast <2 x i64> [[V:%.*]] to <4 x i32>
-; AVX-NEXT: [[TMP1:%.*]] = bitcast <4 x i32> [[BC]] to <8 x i16>
+; AVX-NEXT: [[TMP1:%.*]] = bitcast <2 x i64> [[V:%.*]] to <8 x i16>
; AVX-NEXT: [[BC1:%.*]] = shufflevector <8 x i16> [[TMP1]], <8 x i16> poison, <8 x i32> <i32 6, i32 7, i32 4, i32 5, i32 2, i32 3, i32 0, i32 1>
; AVX-NEXT: [[ADD:%.*]] = shl <8 x i16> [[BC1]], <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
; AVX-NEXT: [[BC2:%.*]] = bitcast <8 x i16> [[ADD]] to <4 x i32>
|
FYI, folding bitcast of bitcast is always safe. llvm-project/llvm/lib/IR/Instructions.cpp Lines 3504 to 3569 in 857161c
|
…ts before creating a new bitcast on top Encountered while working on llvm#67803, this helps prevents cases where the bitcast chains aren't cleared out and we can't perform further combines until after InstCombine/InstSimplify has run. I'm assuming we can't safely put this inside IRBuilderBase.CreateBitCast?
89881d2
to
df919cb
Compare
It was mainly potential OneUse restrictions that concerned me. VectorCombine won't mind but I can imagine there are cases where it would be a problem. |
Any further comments? |
Is that for further transforms in VectorCombine or somewhere else? Can we have a test that shows doing this optimization in VectorCombine has a benefit? |
Its mainly a phaseordering problem for #67803 https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/PhaseOrdering/X86/pr67803.ll
The aim is to merge all this into Once the peek through bitcasts is working, my intention is to add a vectorcombine fold for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Encountered while working on #67803, wading through the chains of bitcasts that SSE intrinsics introduces - this patch helps prevents cases where the bitcast chains aren't cleared out and we can't perform further combines until after InstCombine/InstSimplify has run.
I'm assuming we can't safely put this inside IRBuilderBase.CreateBitCast?