Skip to content

[RISCV] Fix same mask vmerge peephole discarding false operand #107827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Sep 9, 2024

This fixes the issue raised in #106108 (comment)

True's passthru needs to be equivalent to vmerge's false, but we also allow true's passthru to be undef.

However if it's undef then we need to replace it with false, otherwise we end up discarding the false operand entirely.

The changes in fixed-vectors-strided-load-store-asm.ll undo the changes in #106108 where we introduced this miscompile.

This fixes the issue raised in llvm#106108 (comment)

True's passthru needs to be equivalent to vmerge's false, but we also allow true's passthru to be undef.

However if it's undef then we need to replace it with vmerge's false, otherwise we end up discarding the false operand entirely.

The changes in fixed-vectors-strided-load-store-asm.ll undo the changes in llvm#106108 where we introduced this miscompile.
@llvmbot
Copy link
Member

llvmbot commented Sep 9, 2024

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

This fixes the issue raised in #106108 (comment)

True's passthru needs to be equivalent to vmerge's false, but we also allow true's passthru to be undef.

However if it's undef then we need to replace it with false, otherwise we end up discarding the false operand entirely.

The changes in fixed-vectors-strided-load-store-asm.ll undo the changes in #106108 where we introduced this miscompile.


Full diff: https://github.com/llvm/llvm-project/pull/107827.diff

3 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp (+14-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+12-10)
  • (modified) llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir (+21)
diff --git a/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp b/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
index 48ea2a01876177..1bad53f807546c 100644
--- a/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
+++ b/llvm/lib/Target/RISCV/RISCVVectorPeephole.cpp
@@ -66,7 +66,7 @@ class RISCVVectorPeephole : public MachineFunctionPass {
   bool convertToWholeRegister(MachineInstr &MI) const;
   bool convertToUnmasked(MachineInstr &MI) const;
   bool convertAllOnesVMergeToVMv(MachineInstr &MI) const;
-  bool convertSameMaskVMergeToVMv(MachineInstr &MI) const;
+  bool convertSameMaskVMergeToVMv(MachineInstr &MI);
   bool foldUndefPassthruVMV_V_V(MachineInstr &MI);
   bool foldVMV_V_V(MachineInstr &MI);
 
@@ -396,7 +396,7 @@ bool RISCVVectorPeephole::convertAllOnesVMergeToVMv(MachineInstr &MI) const {
 /// ->
 /// %true = PseudoVADD_VV_M1_MASK %false, %x, %y, %mask, vl1, sew, policy
 /// %x = PseudoVMV_V_V %passthru, %true, vl2, sew, tu_mu
-bool RISCVVectorPeephole::convertSameMaskVMergeToVMv(MachineInstr &MI) const {
+bool RISCVVectorPeephole::convertSameMaskVMergeToVMv(MachineInstr &MI) {
   unsigned NewOpc = getVMV_V_VOpcodeForVMERGE_VVM(MI);
   if (!NewOpc)
     return false;
@@ -405,18 +405,24 @@ bool RISCVVectorPeephole::convertSameMaskVMergeToVMv(MachineInstr &MI) const {
       !hasSameEEW(MI, *True))
     return false;
 
-  // True's passthru needs to be equivalent to False
-  Register TruePassthruReg = True->getOperand(1).getReg();
-  Register FalseReg = MI.getOperand(2).getReg();
-  if (TruePassthruReg != RISCV::NoRegister && TruePassthruReg != FalseReg)
-    return false;
-
   const MachineInstr *TrueV0Def = V0Defs.lookup(True);
   const MachineInstr *MIV0Def = V0Defs.lookup(&MI);
   assert(TrueV0Def && TrueV0Def->isCopy() && MIV0Def && MIV0Def->isCopy());
   if (TrueV0Def->getOperand(1).getReg() != MIV0Def->getOperand(1).getReg())
     return false;
 
+  // True's passthru needs to be equivalent to False
+  Register TruePassthruReg = True->getOperand(1).getReg();
+  Register FalseReg = MI.getOperand(2).getReg();
+  if (TruePassthruReg != FalseReg) {
+    // If True's passthru is undef see if we can change it to False
+    if (TruePassthruReg != RISCV::NoRegister ||
+        !MRI->hasOneUse(MI.getOperand(3).getReg()) ||
+        !ensureDominates(MI.getOperand(2), *True))
+      return false;
+    True->getOperand(1).setReg(MI.getOperand(2).getReg());
+  }
+
   MI.setDesc(TII->get(NewOpc));
   MI.removeOperand(2); // False operand
   MI.removeOperand(3); // Mask operand
diff --git a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll
index 569ada7949b1b5..e57b6a22dd6eab 100644
--- a/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll
+++ b/llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll
@@ -62,11 +62,12 @@ define void @gather_masked(ptr noalias nocapture %A, ptr noalias nocapture reado
 ; CHECK-NEXT:    li a4, 5
 ; CHECK-NEXT:  .LBB1_1: # %vector.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vsetvli zero, a3, e8, m1, ta, ma
-; CHECK-NEXT:    vlse8.v v8, (a1), a4, v0.t
-; CHECK-NEXT:    vle8.v v9, (a0)
-; CHECK-NEXT:    vadd.vv v8, v9, v8
-; CHECK-NEXT:    vse8.v v8, (a0)
+; CHECK-NEXT:    vmv1r.v v9, v8
+; CHECK-NEXT:    vsetvli zero, a3, e8, m1, ta, mu
+; CHECK-NEXT:    vlse8.v v9, (a1), a4, v0.t
+; CHECK-NEXT:    vle8.v v10, (a0)
+; CHECK-NEXT:    vadd.vv v9, v10, v9
+; CHECK-NEXT:    vse8.v v9, (a0)
 ; CHECK-NEXT:    addi a0, a0, 32
 ; CHECK-NEXT:    addi a1, a1, 160
 ; CHECK-NEXT:    bne a0, a2, .LBB1_1
@@ -343,11 +344,12 @@ define void @scatter_masked(ptr noalias nocapture %A, ptr noalias nocapture read
 ; CHECK-NEXT:    li a4, 5
 ; CHECK-NEXT:  .LBB7_1: # %vector.body
 ; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
-; CHECK-NEXT:    vsetvli zero, a3, e8, m1, ta, ma
-; CHECK-NEXT:    vle8.v v8, (a1)
-; CHECK-NEXT:    vlse8.v v9, (a0), a4, v0.t
-; CHECK-NEXT:    vadd.vv v8, v9, v8
-; CHECK-NEXT:    vsse8.v v8, (a0), a4, v0.t
+; CHECK-NEXT:    vsetvli zero, a3, e8, m1, ta, mu
+; CHECK-NEXT:    vle8.v v9, (a1)
+; CHECK-NEXT:    vmv1r.v v10, v8
+; CHECK-NEXT:    vlse8.v v10, (a0), a4, v0.t
+; CHECK-NEXT:    vadd.vv v9, v10, v9
+; CHECK-NEXT:    vsse8.v v9, (a0), a4, v0.t
 ; CHECK-NEXT:    addi a1, a1, 32
 ; CHECK-NEXT:    addi a0, a0, 160
 ; CHECK-NEXT:    bne a1, a2, .LBB7_1
diff --git a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir
index 875d4229bbc6e1..2383842bc49d97 100644
--- a/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir
+++ b/llvm/test/CodeGen/RISCV/rvv/rvv-peephole-vmerge-to-vmv.mir
@@ -138,3 +138,24 @@ body: |
     %true:vrnov0 = PseudoVADD_VV_M1_MASK %false, $noreg, $noreg, $v0, 4, 4 /* e16 */, 0 /* tu, mu */
     $v0 = COPY %mask
     %x:vrnov0 = PseudoVMERGE_VVM_M1 %pt, %false, %true, $v0, 8, 5 /* e32 */
+...
+---
+name: same_mask_undef_truepassthru
+body: |
+  bb.0:
+    liveins: $v8, $v0
+    ; CHECK-LABEL: name: same_mask_undef_truepassthru
+    ; CHECK: liveins: $v8, $v0
+    ; CHECK-NEXT: {{  $}}
+    ; CHECK-NEXT: %false:vrnov0 = COPY $v8
+    ; CHECK-NEXT: %mask:vr = COPY $v0
+    ; CHECK-NEXT: $v0 = COPY %mask
+    ; CHECK-NEXT: %true:vrnov0 = PseudoVADD_VV_M1_MASK %false, $noreg, $noreg, $v0, 4, 5 /* e32 */, 1 /* ta, mu */
+    ; CHECK-NEXT: $v0 = COPY %mask
+    %false:vrnov0 = COPY $v8
+    %mask:vr = COPY $v0
+    $v0 = COPY %mask
+    %true:vrnov0 = PseudoVADD_VV_M1_MASK $noreg, $noreg, $noreg, $v0, 4, 5 /* e32 */, 0 /* tu, mu */
+    $v0 = COPY %mask
+    %x:vrnov0 = PseudoVMERGE_VVM_M1 $noreg, %false, %true, $v0, 4, 5 /* e32 */
+...

Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Collaborator

@rofirrim rofirrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been testing this patch and fixes all the issues we encountered. Thanks @lukel97 !

@lukel97 lukel97 merged commit 111932d into llvm:main Sep 9, 2024
10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 9, 2024

LLVM Buildbot has detected a new failure on builder clang-hip-vega20 running on hip-vega20-0 while building llvm at step 3 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/5207

Here is the relevant piece of the build log for the reference
Step 3 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/hip-build.sh --jobs=' (failure)
...
[38/40] : && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/InOneWeekend-hip-6.0.2.dir/workload/ray-tracing/InOneWeekend/main.cc.o -o External/HIP/InOneWeekend-hip-6.0.2  --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --hip-link -rtlib=compiler-rt -unwindlib=libgcc -frtlib-add-rpath && cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /usr/local/bin/cmake -E create_symlink /buildbot/llvm-test-suite/External/HIP/InOneWeekend.reference_output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend.reference_output-hip-6.0.2
[39/40] /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG  -O3 -DNDEBUG   -w -Werror=date-time --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /buildbot/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc
[40/40] : && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -o External/HIP/TheNextWeek-hip-6.0.2  --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --hip-link -rtlib=compiler-rt -unwindlib=libgcc -frtlib-add-rpath && cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /usr/local/bin/cmake -E create_symlink /buildbot/llvm-test-suite/External/HIP/TheNextWeek.reference_output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/TheNextWeek.reference_output-hip-6.0.2
+ build_step 'Testing HIP test-suite'
+ echo '@@@BUILD_STEP Testing HIP test-suite@@@'
+ ninja -v check-hip-simple
@@@BUILD_STEP Testing HIP test-suite@@@
[0/1] cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
-- Testing: 7 tests, 7 workers --
Testing:  0.. 10.. 20.. 30.. 40
FAIL: test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test (4 of 7)
******************** TEST 'test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test' FAILED ********************

/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out --redirect-input /dev/null --summary /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.time /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend-hip-6.0.2
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP ; /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2

+ cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP
+ /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2
/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'M' and 'i'

********************
/usr/bin/strip: /bin/bash.stripped: Bad file descriptor
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test


Testing Time: 338.75s

Total Discovered Tests: 7
  Passed: 6 (85.71%)
  Failed: 1 (14.29%)
FAILED: External/HIP/CMakeFiles/check-hip-simple-hip-6.0.2 
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
ninja: build stopped: subcommand failed.
Step 12 (Testing HIP test-suite) failure: Testing HIP test-suite (failure)
@@@BUILD_STEP Testing HIP test-suite@@@
[0/1] cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
-- Testing: 7 tests, 7 workers --
Testing:  0.. 10.. 20.. 30.. 40
FAIL: test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test (4 of 7)
******************** TEST 'test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test' FAILED ********************

/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out --redirect-input /dev/null --summary /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.time /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend-hip-6.0.2
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP ; /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2

+ cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP
+ /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2
/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'M' and 'i'

********************
/usr/bin/strip: /bin/bash.stripped: Bad file descriptor
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test


Testing Time: 338.75s

Total Discovered Tests: 7
  Passed: 6 (85.71%)
  Failed: 1 (14.29%)
FAILED: External/HIP/CMakeFiles/check-hip-simple-hip-6.0.2 
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
ninja: build stopped: subcommand failed.
program finished with exit code 1
elapsedTime=450.939746

lukel97 added a commit that referenced this pull request Sep 10, 2024
In #107827 we now set true's passthru to the false operand if it was
undef. We need to remember to also constrain the regclass in case true
is a masked pseudo which needs its passthrus to be in VR[M*]NoV0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants