-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[RISCV] Allow folding vmerge with implicit passthru when true has tied dest #78565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-backend-risc-v Author: Luke Lau (lukel97) ChangesWe currently don't fold a vmerge if it has an implicit merge operand and its true operand has a tied dest (i.e. has a passthru operand). This restriction was added in https://reviews.llvm.org/D151596, back whenever we had separate TU/TA pseudos. It looks like it was added because the policy might not have been handled correctly. However the policy should be set correctly if we relax this restriction today, since we compute the policy differently now that we have removed the TU/TA distinction in our pseudos. We use a TUMU policy, and relax it to TAMU iff the vmerge's merge operand is implicit. The reasoning behind this being that the tail elements always come from the vmerge's merge operand1, so if the merge operand is implicit-def then the tail is implicit-def, and hence tail agnostic. > [!NOTE] Full diff: https://github.com/llvm/llvm-project/pull/78565.diff 3 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
index 0d8688ba2eaeaff..5b20a2014cc98ca 100644
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -3531,11 +3531,6 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
return false;
if (HasTiedDest && !isImplicitDef(True->getOperand(0))) {
- // The vmerge instruction must be TU.
- // FIXME: This could be relaxed, but we need to handle the policy for the
- // resulting op correctly.
- if (isImplicitDef(Merge))
- return false;
SDValue MergeOpTrue = True->getOperand(0);
// Both the vmerge instruction and the True instruction must have the same
// merge operand.
@@ -3545,9 +3540,6 @@ bool RISCVDAGToDAGISel::performCombineVMergeAndVOps(SDNode *N) {
if (IsMasked) {
assert(HasTiedDest && "Expected tied dest");
- // The vmerge instruction must be TU.
- if (isImplicitDef(Merge))
- return false;
// The vmerge instruction must have an all 1s mask since we're going to keep
// the mask from the True instruction.
// FIXME: Support mask agnostic True instruction which would have an
diff --git a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
index c639f092444fc43..b3166bf5b01071d 100644
--- a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-vops.ll
@@ -1187,3 +1187,41 @@ define <vscale x 2 x i32> @vmerge_larger_vl_false_becomes_tail(<vscale x 2 x i32
%b = call <vscale x 2 x i32> @llvm.riscv.vmerge.nxv2i32.nxv2i32(<vscale x 2 x i32> poison, <vscale x 2 x i32> %false, <vscale x 2 x i32> %a, <vscale x 2 x i1> %m, i64 3)
ret <vscale x 2 x i32> %b
}
+
+declare <vscale x 2 x i32> @llvm.riscv.vmacc.nxv2i32.nxv2i32(<vscale x 2 x i32>, <vscale x 2 x i32>, <vscale x 2 x i32>, i64, i64)
+
+define <vscale x 2 x i32> @true_tied_dest_vmerge_implicit_passthru(<vscale x 2 x i32> %passthru, <vscale x 2 x i32> %x, <vscale x 2 x i32> %y, <vscale x 2 x i1> %m, i64 %avl) {
+; CHECK-LABEL: true_tied_dest_vmerge_implicit_passthru:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e32, m1, ta, mu
+; CHECK-NEXT: vmacc.vv v8, v9, v10, v0.t
+; CHECK-NEXT: ret
+ %a = call <vscale x 2 x i32> @llvm.riscv.vmacc.nxv2i32.nxv2i32(<vscale x 2 x i32> %passthru, <vscale x 2 x i32> %x, <vscale x 2 x i32> %y, i64 %avl, i64 0)
+ %b = call <vscale x 2 x i32> @llvm.riscv.vmerge.nxv2i32.nxv2i32(
+ <vscale x 2 x i32> poison,
+ <vscale x 2 x i32> %passthru,
+ <vscale x 2 x i32> %a,
+ <vscale x 2 x i1> %m,
+ i64 %avl
+ )
+ ret <vscale x 2 x i32> %b
+}
+
+declare <vscale x 2 x i32> @llvm.riscv.vadd.mask.nxv2i32.i32(<vscale x 2 x i32>, <vscale x 2 x i32>, i32, <vscale x 2 x i1>, i64, i64)
+
+define <vscale x 2 x i32> @true_mask_vmerge_implicit_passthru(<vscale x 2 x i32> %passthru, <vscale x 2 x i32> %x, <vscale x 2 x i32> %y, <vscale x 2 x i1> %m, i64 %avl) {
+; CHECK-LABEL: true_mask_vmerge_implicit_passthru:
+; CHECK: # %bb.0:
+; CHECK-NEXT: vsetvli zero, a0, e32, m1, ta, mu
+; CHECK-NEXT: vadd.vv v8, v9, v10, v0.t
+; CHECK-NEXT: ret
+ %a = call <vscale x 2 x i32> @llvm.riscv.vadd.mask.nxv2i32.nxv2i32(<vscale x 2 x i32> %passthru, <vscale x 2 x i32> %x, <vscale x 2 x i32> %y, <vscale x 2 x i1> %m, i64 %avl, i64 0)
+ %b = call <vscale x 2 x i32> @llvm.riscv.vmerge.nxv2i32.nxv2i32(
+ <vscale x 2 x i32> poison,
+ <vscale x 2 x i32> %passthru,
+ <vscale x 2 x i32> %a,
+ <vscale x 2 x i1> shufflevector(<vscale x 2 x i1> insertelement(<vscale x 2 x i1> poison, i1 true, i32 0), <vscale x 2 x i1> poison, <vscale x 2 x i32> zeroinitializer),
+ i64 %avl
+ )
+ ret <vscale x 2 x i32> %b
+}
diff --git a/llvm/test/CodeGen/RISCV/rvv/vmadd-vp.ll b/llvm/test/CodeGen/RISCV/rvv/vmadd-vp.ll
index 6a6d7d2a41424c5..d699ee3f1dc29fc 100644
--- a/llvm/test/CodeGen/RISCV/rvv/vmadd-vp.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/vmadd-vp.ll
@@ -91,9 +91,8 @@ define <vscale x 1 x i8> @vmadd_vv_nxv1i8_ta(<vscale x 1 x i8> %a, <vscale x 1 x
define <vscale x 1 x i8> @vmadd_vx_nxv1i8_ta(<vscale x 1 x i8> %a, i8 %b, <vscale x 1 x i8> %c, <vscale x 1 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv1i8_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e8, mf8, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e8, mf8, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 1 x i8> poison, i8 %b, i32 0
%vb = shufflevector <vscale x 1 x i8> %elt.head, <vscale x 1 x i8> poison, <vscale x 1 x i32> zeroinitializer
@@ -192,9 +191,8 @@ define <vscale x 2 x i8> @vmadd_vv_nxv2i8_ta(<vscale x 2 x i8> %a, <vscale x 2 x
define <vscale x 2 x i8> @vmadd_vx_nxv2i8_ta(<vscale x 2 x i8> %a, i8 %b, <vscale x 2 x i8> %c, <vscale x 2 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv2i8_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e8, mf4, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e8, mf4, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 2 x i8> poison, i8 %b, i32 0
%vb = shufflevector <vscale x 2 x i8> %elt.head, <vscale x 2 x i8> poison, <vscale x 2 x i32> zeroinitializer
@@ -293,9 +291,8 @@ define <vscale x 4 x i8> @vmadd_vv_nxv4i8_ta(<vscale x 4 x i8> %a, <vscale x 4 x
define <vscale x 4 x i8> @vmadd_vx_nxv4i8_ta(<vscale x 4 x i8> %a, i8 %b, <vscale x 4 x i8> %c, <vscale x 4 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv4i8_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e8, mf2, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e8, mf2, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 4 x i8> poison, i8 %b, i32 0
%vb = shufflevector <vscale x 4 x i8> %elt.head, <vscale x 4 x i8> poison, <vscale x 4 x i32> zeroinitializer
@@ -394,9 +391,8 @@ define <vscale x 8 x i8> @vmadd_vv_nxv8i8_ta(<vscale x 8 x i8> %a, <vscale x 8 x
define <vscale x 8 x i8> @vmadd_vx_nxv8i8_ta(<vscale x 8 x i8> %a, i8 %b, <vscale x 8 x i8> %c, <vscale x 8 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv8i8_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e8, m1, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e8, m1, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 8 x i8> poison, i8 %b, i32 0
%vb = shufflevector <vscale x 8 x i8> %elt.head, <vscale x 8 x i8> poison, <vscale x 8 x i32> zeroinitializer
@@ -495,9 +491,8 @@ define <vscale x 16 x i8> @vmadd_vv_nxv16i8_ta(<vscale x 16 x i8> %a, <vscale x
define <vscale x 16 x i8> @vmadd_vx_nxv16i8_ta(<vscale x 16 x i8> %a, i8 %b, <vscale x 16 x i8> %c, <vscale x 16 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv16i8_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e8, m2, ta, ma
-; CHECK-NEXT: vmacc.vx v10, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v10, v0
+; CHECK-NEXT: vsetvli zero, a1, e8, m2, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v10, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 16 x i8> poison, i8 %b, i32 0
%vb = shufflevector <vscale x 16 x i8> %elt.head, <vscale x 16 x i8> poison, <vscale x 16 x i32> zeroinitializer
@@ -596,9 +591,8 @@ define <vscale x 32 x i8> @vmadd_vv_nxv32i8_ta(<vscale x 32 x i8> %a, <vscale x
define <vscale x 32 x i8> @vmadd_vx_nxv32i8_ta(<vscale x 32 x i8> %a, i8 %b, <vscale x 32 x i8> %c, <vscale x 32 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv32i8_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e8, m4, ta, ma
-; CHECK-NEXT: vmacc.vx v12, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v12, v0
+; CHECK-NEXT: vsetvli zero, a1, e8, m4, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v12, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 32 x i8> poison, i8 %b, i32 0
%vb = shufflevector <vscale x 32 x i8> %elt.head, <vscale x 32 x i8> poison, <vscale x 32 x i32> zeroinitializer
@@ -700,9 +694,8 @@ define <vscale x 64 x i8> @vmadd_vv_nxv64i8_ta(<vscale x 64 x i8> %a, <vscale x
define <vscale x 64 x i8> @vmadd_vx_nxv64i8_ta(<vscale x 64 x i8> %a, i8 %b, <vscale x 64 x i8> %c, <vscale x 64 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv64i8_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e8, m8, ta, ma
-; CHECK-NEXT: vmacc.vx v16, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v16, v0
+; CHECK-NEXT: vsetvli zero, a1, e8, m8, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v16, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 64 x i8> poison, i8 %b, i32 0
%vb = shufflevector <vscale x 64 x i8> %elt.head, <vscale x 64 x i8> poison, <vscale x 64 x i32> zeroinitializer
@@ -801,9 +794,8 @@ define <vscale x 1 x i16> @vmadd_vv_nxv1i16_ta(<vscale x 1 x i16> %a, <vscale x
define <vscale x 1 x i16> @vmadd_vx_nxv1i16_ta(<vscale x 1 x i16> %a, i16 %b, <vscale x 1 x i16> %c, <vscale x 1 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv1i16_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e16, mf4, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e16, mf4, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 1 x i16> poison, i16 %b, i32 0
%vb = shufflevector <vscale x 1 x i16> %elt.head, <vscale x 1 x i16> poison, <vscale x 1 x i32> zeroinitializer
@@ -902,9 +894,8 @@ define <vscale x 2 x i16> @vmadd_vv_nxv2i16_ta(<vscale x 2 x i16> %a, <vscale x
define <vscale x 2 x i16> @vmadd_vx_nxv2i16_ta(<vscale x 2 x i16> %a, i16 %b, <vscale x 2 x i16> %c, <vscale x 2 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv2i16_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e16, mf2, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e16, mf2, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 2 x i16> poison, i16 %b, i32 0
%vb = shufflevector <vscale x 2 x i16> %elt.head, <vscale x 2 x i16> poison, <vscale x 2 x i32> zeroinitializer
@@ -1003,9 +994,8 @@ define <vscale x 4 x i16> @vmadd_vv_nxv4i16_ta(<vscale x 4 x i16> %a, <vscale x
define <vscale x 4 x i16> @vmadd_vx_nxv4i16_ta(<vscale x 4 x i16> %a, i16 %b, <vscale x 4 x i16> %c, <vscale x 4 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv4i16_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e16, m1, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e16, m1, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 4 x i16> poison, i16 %b, i32 0
%vb = shufflevector <vscale x 4 x i16> %elt.head, <vscale x 4 x i16> poison, <vscale x 4 x i32> zeroinitializer
@@ -1104,9 +1094,8 @@ define <vscale x 8 x i16> @vmadd_vv_nxv8i16_ta(<vscale x 8 x i16> %a, <vscale x
define <vscale x 8 x i16> @vmadd_vx_nxv8i16_ta(<vscale x 8 x i16> %a, i16 %b, <vscale x 8 x i16> %c, <vscale x 8 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv8i16_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e16, m2, ta, ma
-; CHECK-NEXT: vmacc.vx v10, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v10, v0
+; CHECK-NEXT: vsetvli zero, a1, e16, m2, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v10, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 8 x i16> poison, i16 %b, i32 0
%vb = shufflevector <vscale x 8 x i16> %elt.head, <vscale x 8 x i16> poison, <vscale x 8 x i32> zeroinitializer
@@ -1205,9 +1194,8 @@ define <vscale x 16 x i16> @vmadd_vv_nxv16i16_ta(<vscale x 16 x i16> %a, <vscale
define <vscale x 16 x i16> @vmadd_vx_nxv16i16_ta(<vscale x 16 x i16> %a, i16 %b, <vscale x 16 x i16> %c, <vscale x 16 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv16i16_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e16, m4, ta, ma
-; CHECK-NEXT: vmacc.vx v12, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v12, v0
+; CHECK-NEXT: vsetvli zero, a1, e16, m4, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v12, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 16 x i16> poison, i16 %b, i32 0
%vb = shufflevector <vscale x 16 x i16> %elt.head, <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer
@@ -1309,9 +1297,8 @@ define <vscale x 32 x i16> @vmadd_vv_nxv32i16_ta(<vscale x 32 x i16> %a, <vscale
define <vscale x 32 x i16> @vmadd_vx_nxv32i16_ta(<vscale x 32 x i16> %a, i16 %b, <vscale x 32 x i16> %c, <vscale x 32 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv32i16_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e16, m8, ta, ma
-; CHECK-NEXT: vmacc.vx v16, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v16, v0
+; CHECK-NEXT: vsetvli zero, a1, e16, m8, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v16, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 32 x i16> poison, i16 %b, i32 0
%vb = shufflevector <vscale x 32 x i16> %elt.head, <vscale x 32 x i16> poison, <vscale x 32 x i32> zeroinitializer
@@ -1410,9 +1397,8 @@ define <vscale x 1 x i32> @vmadd_vv_nxv1i32_ta(<vscale x 1 x i32> %a, <vscale x
define <vscale x 1 x i32> @vmadd_vx_nxv1i32_ta(<vscale x 1 x i32> %a, i32 %b, <vscale x 1 x i32> %c, <vscale x 1 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv1i32_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e32, mf2, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e32, mf2, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 1 x i32> poison, i32 %b, i32 0
%vb = shufflevector <vscale x 1 x i32> %elt.head, <vscale x 1 x i32> poison, <vscale x 1 x i32> zeroinitializer
@@ -1511,9 +1497,8 @@ define <vscale x 2 x i32> @vmadd_vv_nxv2i32_ta(<vscale x 2 x i32> %a, <vscale x
define <vscale x 2 x i32> @vmadd_vx_nxv2i32_ta(<vscale x 2 x i32> %a, i32 %b, <vscale x 2 x i32> %c, <vscale x 2 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv2i32_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e32, m1, ta, ma
-; CHECK-NEXT: vmacc.vx v9, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
+; CHECK-NEXT: vsetvli zero, a1, e32, m1, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v9, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 2 x i32> poison, i32 %b, i32 0
%vb = shufflevector <vscale x 2 x i32> %elt.head, <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer
@@ -1612,9 +1597,8 @@ define <vscale x 4 x i32> @vmadd_vv_nxv4i32_ta(<vscale x 4 x i32> %a, <vscale x
define <vscale x 4 x i32> @vmadd_vx_nxv4i32_ta(<vscale x 4 x i32> %a, i32 %b, <vscale x 4 x i32> %c, <vscale x 4 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv4i32_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e32, m2, ta, ma
-; CHECK-NEXT: vmacc.vx v10, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v10, v0
+; CHECK-NEXT: vsetvli zero, a1, e32, m2, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v10, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 4 x i32> poison, i32 %b, i32 0
%vb = shufflevector <vscale x 4 x i32> %elt.head, <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
@@ -1713,9 +1697,8 @@ define <vscale x 8 x i32> @vmadd_vv_nxv8i32_ta(<vscale x 8 x i32> %a, <vscale x
define <vscale x 8 x i32> @vmadd_vx_nxv8i32_ta(<vscale x 8 x i32> %a, i32 %b, <vscale x 8 x i32> %c, <vscale x 8 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv8i32_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e32, m4, ta, ma
-; CHECK-NEXT: vmacc.vx v12, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v12, v0
+; CHECK-NEXT: vsetvli zero, a1, e32, m4, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v12, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 8 x i32> poison, i32 %b, i32 0
%vb = shufflevector <vscale x 8 x i32> %elt.head, <vscale x 8 x i32> poison, <vscale x 8 x i32> zeroinitializer
@@ -1817,9 +1800,8 @@ define <vscale x 16 x i32> @vmadd_vv_nxv16i32_ta(<vscale x 16 x i32> %a, <vscale
define <vscale x 16 x i32> @vmadd_vx_nxv16i32_ta(<vscale x 16 x i32> %a, i32 %b, <vscale x 16 x i32> %c, <vscale x 16 x i1> %m, i32 zeroext %evl) {
; CHECK-LABEL: vmadd_vx_nxv16i32_ta:
; CHECK: # %bb.0:
-; CHECK-NEXT: vsetvli zero, a1, e32, m8, ta, ma
-; CHECK-NEXT: vmacc.vx v16, a0, v8
-; CHECK-NEXT: vmerge.vvm v8, v8, v16, v0
+; CHECK-NEXT: vsetvli zero, a1, e32, m8, ta, mu
+; CHECK-NEXT: vmadd.vx v8, a0, v16, v0.t
; CHECK-NEXT: ret
%elt.head = insertelement <vscale x 16 x i32> poison, i32 %b, i32 0
%vb = shufflevector <vscale x 16 x i32> %elt.head, <vscale x 16 x i32> poison, <vscale x 16 x i32> zeroinitializer
@@ -1965,9 +1947,8 @@ define <vscale x 1 x i64> @vmadd_vx_nxv1i64_ta(<vscale x 1 x i64> %a, i64 %b, <v
;
; RV64-LABEL: vmadd_vx_nxv1i64_ta:
; RV64: # %bb.0:
-; RV64-NEXT: vsetvli zero, a1, e64, m1, ta, ma
-; RV64-NEXT: vmacc.vx v9, a0, v8
-; RV64-NEXT: vmerge.vvm v8, v8, v9, v0
+; RV64-NEXT: vsetvli zero, a1, e64, m1, ta, mu
+; RV64-NEXT: vmadd.vx v8, a0, v9, v0.t
; RV64-NEXT: ret
%elt.head = insertelement <vscale x 1 x i64> poison, i64 %b, i32 0
%vb = shufflevector <vscale x 1 x i64> %elt.head, <vscale x 1 x i64> poison, <vscale x 1 x i32> zeroinitializer
@@ -2113,9 +2094,8 @@ define <vscale x 2 x i64> @vmadd_vx_nxv2i64_ta(<vscale x 2 x i64> %a, i64 %b, <v
;
; RV64-LABEL: vmadd_vx_nxv2i64_ta:
; RV64: # %bb.0:
-; RV64-NEXT: vsetvli zero, a1, e64, m2, ta, ma
-; RV64-NEXT: vmacc.vx v10, a0, v8
-; RV64-NEXT: vmerge.vvm v8, v8, v10, v0
+; RV64-NEXT: vsetvli zero, a1, e64, m2, ta, mu
+; RV64-NEXT: vmadd.vx v8, a0, v10, v0.t
; RV64-NEXT: ret
%elt.head = insertelement <vscale x 2 x i64> poison, i64 %b, i32 0
%vb = shufflevector <vscale x 2 x i64> %elt.head, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
@@ -2261,9 +2241,8 @@ define <vscale x 4 x i64> @vmadd_vx_nxv4i64_ta(<vscale x 4 x i64> %a, i64 %b, <v
;
; RV64-LABEL: vmadd_vx_nxv4i64_ta:
; RV64: # %bb.0:
-; RV64-NEXT: vsetvli zero, a1, e64, m4, ta, ma
-; RV64-NEXT: vmacc.vx v12, a0, v8
-; RV64-NEXT: vmerge.vvm v8, v8, v12, v0
+; RV64-NEXT: vsetvli zero, a1, e64, m4, ta, mu
+; RV64-NEXT: vmadd.vx v8, a0, v12, v0.t
; RV64-NEXT: ret
%elt.head = insertelement <vscale x 4 x i64> poison, i64 %b, i32 0
%vb = shufflevector <vscale x 4 x i64> %elt.head, <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
@@ -2412,9 +2391,8 @@ define <vscale x 8 x i64> @vmadd_vx_nxv8i64_ta(<vscale x 8 x i64> %a, i64 %b, <v
;
; RV64-LABEL: vmadd_vx_nxv8i64_ta:
; RV64: # %bb.0:
-; RV64-NEXT: vsetvli zero, a1, e64, m8, ta, ma
-; RV64-NEXT: vmacc.vx v16, a0, v8
-; RV64-NEXT: vmerge.vvm v8, v8, v16, v0
+; RV64-NEXT: vsetvli zero, a1, e64, m8, ta, mu
+; RV64-NEXT: vmadd.vx v8, a0, v16, v0.t
; RV64-NEXT: ret
%elt.head = insertelement <vscale x 8 x i64> poison, i64 %b, i32 0
%vb = shufflevector <vscale x 8 x i64> %elt.head, <vscale x 8 x i64> poison, <vscale x 8 x i32> zeroinitializer
Footnotes
|
Haven't read the patch, but I'm not convinced of the correctness of the justification. Consider the following case: The transform would create: Note the difference in the last element. Not sure I got my example entirely correct, but the basic idea I'm aiming for is that if the original sequence produces values which consume 4 distinct input registers, there's no folded form for the VADD which can do the same. Unless I'm missing something? |
We have an additional constraint that C has to be the same as PT though. In order for folding to occur all of these need to be “equivalent”
- VADD’s passthru
- VMERGE’s passthru
- VMERGES’s false
The code structure doesn’t make this particularly clear, and by “equivalent” I mean they all need to be undef or equal to VMERGE’s false IIUC. So we always transform from 3 distinct input registers to 3 input registers: A, B, and VMERGE’s false.
So in the example below, C = PT so the two results would be the same.
|
The formatting of the above comment got messed up when I replied via email, sorry about that. Should be fixed now. |
6fd3400
to
2389203
Compare
Rebased, gentle ping |
2389203
to
b24f3fe
Compare
Rebased, ping |
b24f3fe
to
dc6853b
Compare
Ping |
dc6853b
to
d00ab4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks. Will wait to hear back from @preames as well to make sure the correctness point is addressed |
Could this PR replace #78403 ? |
Not on its own, because the issue in #78403 stems from the fact that
to
So the vselect/vmerge won't be pulled outside the add which is needed for the vmerge peephole. With that said, I returned to this PR because I was attempting to generalize #78403 by teaching |
Thank you for the explanation. |
d00ab4e
to
8470c72
Compare
8470c72
to
35a1872
Compare
…n't due to it having an implicit merge operand.
…s tied dest We currently don't fold a vmerge if it has an implicit merge operand and its true operand has a tied dest (i.e. has a passthru operand). This restriction was added in https://reviews.llvm.org/D151596, back whenever we had separate TU/TA pseudos. It looks like it was added because the policy might not have been handled correctly. However the policy should be set correctly if we relax this restriction today, since we compute the policy differently now that we have removed the TU/TA distinction in our pseudos. We use a TUMU policy, and relax it to TAMU iff the vmerge's merge operand is implicit. The reasoning behind this being that the tail elements always come from the vmerge's merge operand[1], so if the merge operand is implicit-def then the tail is implicit-def, and hence tail agnostic. [1] unless the VL was shrunk, in which case we conservatively use TUMU.
35a1872
to
e68def3
Compare
Sending out one last ping before I land this. I've rebased it and done another look over, it still seems correct to me. This PR allows the transform to happen when both the vmerge and the true node have passsthru operands, AND vmerge's passthru is implicit. It was most likely restricted previously because when vmerge's passthru is implicit the policy becomes tail agnostic. But even if both the vmerge and true node have passthru operands, its still correct to do the transform and use a tail agnostic policy. Because the original vmerge's tail was undef (its passthru was implicit), so we can clobber the resulting node's tail with a tail agnostic policy (the VL is the same) |
Everywhere you wrote "implicit" did you meant "implicit_def"? |
Yes, but I think nowadays this might actually be NoRegister |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…d dest (llvm#78565) We currently don't fold a vmerge if it has an implicit-def passthru and its true operand also has a passthru (i.e. tied dest). This restriction was added in https://reviews.llvm.org/D151596, back whenever we had separate TU/TA pseudos. It looks like it was added because the policy might not have been handled correctly. However the policy should be set correctly if we relax this restriction today, since we compute the policy differently now that we have removed the TU/TA distinction in our pseudos. We use a TUMU policy, and relax it to TAMU iff the vmerge's passthru is implicit-def. The reasoning behind this being that the tail elements always come from the vmerge's passthru[^1], so if vmerge's passthru is implicit-def then the tail is also implicit-def. So a tail agnostic policy is OK. [^1]: unless the VL was shrunk, but in this case which case we conservatively use TUMU.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/64/builds/219 Here is the relevant piece of the build log for the reference:
|
We currently don't fold a vmerge if it has an implicit-def passthru and its true operand also has a passthru (i.e. tied dest).
This restriction was added in https://reviews.llvm.org/D151596, back whenever we had separate TU/TA pseudos. It looks like it was added because the policy might not have been handled correctly.
However the policy should be set correctly if we relax this restriction today, since we compute the policy differently now that we have removed the TU/TA distinction in our pseudos.
We use a TUMU policy, and relax it to TAMU iff the vmerge's passthru is implicit-def.
The reasoning behind this being that the tail elements always come from the vmerge's passthru1, so if vmerge's passthru is implicit-def then the tail is also implicit-def. So a tail agnostic policy is OK.
Note
The tests were included in a separate commit this PR so reviewers can see the diff
Footnotes
unless the VL was shrunk, but in this case which case we conservatively use TUMU. ↩