Skip to content

[RISCV] Move VMV0 elimination past machine SSA opts #126850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 2 additions & 6 deletions llvm/lib/Target/RISCV/RISCVTargetMachine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -589,8 +589,6 @@ void RISCVPassConfig::addPreEmitPass2() {

void RISCVPassConfig::addMachineSSAOptimization() {
addPass(createRISCVVectorPeepholePass());
// TODO: Move this to pre regalloc
addPass(createRISCVVMV0EliminationPass());
addPass(createRISCVFoldMemOffsetPass());

TargetPassConfig::addMachineSSAOptimization();
Expand All @@ -604,10 +602,6 @@ void RISCVPassConfig::addMachineSSAOptimization() {
}

void RISCVPassConfig::addPreRegAlloc() {
// TODO: Move this as late as possible before regalloc
if (TM->getOptLevel() == CodeGenOptLevel::None)
addPass(createRISCVVMV0EliminationPass());

addPass(createRISCVPreRAExpandPseudoPass());
if (TM->getOptLevel() != CodeGenOptLevel::None) {
addPass(createRISCVMergeBaseOffsetOptPass());
Expand All @@ -621,6 +615,8 @@ void RISCVPassConfig::addPreRegAlloc() {

if (TM->getOptLevel() != CodeGenOptLevel::None && EnableMachinePipeliner)
addPass(&MachinePipelinerID);

addPass(createRISCVVMV0EliminationPass());
}

void RISCVPassConfig::addFastRegAlloc() {
Expand Down
5 changes: 2 additions & 3 deletions llvm/lib/Target/RISCV/RISCVVMV0Elimination.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -131,10 +131,9 @@ bool RISCVVMV0Elimination::runOnMachineFunction(MachineFunction &MF) {

// Peek through a single copy to match what isel does.
if (MachineInstr *SrcMI = MRI.getVRegDef(Src);
SrcMI->isCopy() && SrcMI->getOperand(1).getReg().isVirtual()) {
assert(SrcMI->getOperand(1).getSubReg() == RISCV::NoSubRegister);
SrcMI->isCopy() && SrcMI->getOperand(1).getReg().isVirtual() &&
SrcMI->getOperand(1).getSubReg() == RISCV::NoSubRegister)
Src = SrcMI->getOperand(1).getReg();
}

BuildMI(MBB, MI, MI.getDebugLoc(), TII->get(RISCV::COPY), RISCV::V0)
.addReg(Src);
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/CodeGen/RISCV/O0-pipeline.ll
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,11 @@
; CHECK-NEXT: RISC-V DAG->DAG Pattern Instruction Selection
; CHECK-NEXT: Finalize ISel and expand pseudo-instructions
; CHECK-NEXT: Local Stack Slot Allocation
; CHECK-NEXT: RISC-V VMV0 Elimination
; CHECK-NEXT: RISC-V Pre-RA pseudo instruction expansion pass
; CHECK-NEXT: RISC-V Insert Read/Write CSR Pass
; CHECK-NEXT: RISC-V Insert Write VXRM Pass
; CHECK-NEXT: RISC-V Landing Pad Setup
; CHECK-NEXT: RISC-V VMV0 Elimination
; CHECK-NEXT: Init Undef Pass
; CHECK-NEXT: Eliminate PHI nodes for register allocation
; CHECK-NEXT: Two-Address instruction pass
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/CodeGen/RISCV/O3-pipeline.ll
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,6 @@
; CHECK-NEXT: RISC-V DAG->DAG Pattern Instruction Selection
; CHECK-NEXT: Finalize ISel and expand pseudo-instructions
; CHECK-NEXT: RISC-V Vector Peephole Optimization
; CHECK-NEXT: RISC-V VMV0 Elimination
; CHECK-NEXT: RISC-V Fold Memory Offset
; CHECK-NEXT: Lazy Machine Block Frequency Analysis
; CHECK-NEXT: Early Tail Duplication
Expand Down Expand Up @@ -129,6 +128,7 @@
; CHECK-NEXT: RISC-V Insert Read/Write CSR Pass
; CHECK-NEXT: RISC-V Insert Write VXRM Pass
; CHECK-NEXT: RISC-V Landing Pad Setup
; CHECK-NEXT: RISC-V VMV0 Elimination
; CHECK-NEXT: Detect Dead Lanes
; CHECK-NEXT: Init Undef Pass
; CHECK-NEXT: Process Implicit Definitions
Expand Down
20 changes: 8 additions & 12 deletions llvm/test/CodeGen/RISCV/rvv/ceil-vp.ll
Original file line number Diff line number Diff line change
Expand Up @@ -1515,40 +1515,36 @@ define <vscale x 16 x double> @vp_ceil_vv_nxv16f64(<vscale x 16 x double> %va, <
; CHECK-NEXT: vmv1r.v v0, v6
; CHECK-NEXT: vsetvli zero, a2, e64, m8, ta, ma
; CHECK-NEXT: vfabs.v v24, v16, v0.t
; CHECK-NEXT: addi a2, sp, 16
; CHECK-NEXT: vs8r.v v24, (a2) # Unknown-size Folded Spill
Copy link
Contributor

@wangpc-pp wangpc-pp Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we reload the value right after the spill here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Luke, have you had a chance to look into this? I don't know that this is a blocker, but this one is really weird. It appears to be the only reload from the stack slot - as in, we didn't need to spill anything here at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's the same thing as this case here, definitely very weird: #126850 (comment)

My main suspicion is that the physical COPYs to $v0 throw everything off, possible due to it assigning a register hint. I had a previous idea to rejig the vmv0 elimination pass to instead emit COPYs to vmv0 around the uses, instead of $v0, and see if the register allocator likes that better. I'll give it another shot.

FWIW these test cases with the weird spilling are very artificial (LMUL 16 types) and I would pray we never see them in the real world.

; CHECK-NEXT: vl8r.v v24, (a2) # Unknown-size Folded Reload
; CHECK-NEXT: vsetvli zero, zero, e64, m8, ta, mu
; CHECK-NEXT: vmflt.vf v6, v24, fa5, v0.t
; CHECK-NEXT: fsrmi a2, 3
; CHECK-NEXT: vmv1r.v v0, v6
; CHECK-NEXT: vsetvli zero, zero, e64, m8, ta, ma
; CHECK-NEXT: vfcvt.x.f.v v24, v16, v0.t
; CHECK-NEXT: addi a3, sp, 16
; CHECK-NEXT: vs8r.v v24, (a3) # Unknown-size Folded Spill
; CHECK-NEXT: fsrm a2
; CHECK-NEXT: addi a2, sp, 16
; CHECK-NEXT: vl8r.v v24, (a2) # Unknown-size Folded Reload
; CHECK-NEXT: vfcvt.f.x.v v24, v24, v0.t
; CHECK-NEXT: vsetvli zero, zero, e64, m8, ta, mu
; CHECK-NEXT: vfsgnj.vv v16, v24, v16, v0.t
; CHECK-NEXT: vs8r.v v16, (a2) # Unknown-size Folded Spill
; CHECK-NEXT: bltu a0, a1, .LBB44_2
; CHECK-NEXT: # %bb.1:
; CHECK-NEXT: mv a0, a1
; CHECK-NEXT: .LBB44_2:
; CHECK-NEXT: vmv1r.v v0, v7
; CHECK-NEXT: vsetvli zero, a0, e64, m8, ta, ma
; CHECK-NEXT: vfabs.v v16, v8, v0.t
; CHECK-NEXT: vfabs.v v24, v8, v0.t
; CHECK-NEXT: vsetvli zero, zero, e64, m8, ta, mu
; CHECK-NEXT: vmflt.vf v7, v16, fa5, v0.t
; CHECK-NEXT: vmflt.vf v7, v24, fa5, v0.t
; CHECK-NEXT: fsrmi a0, 3
; CHECK-NEXT: vmv1r.v v0, v7
; CHECK-NEXT: vsetvli zero, zero, e64, m8, ta, ma
; CHECK-NEXT: vfcvt.x.f.v v16, v8, v0.t
; CHECK-NEXT: vfcvt.x.f.v v24, v8, v0.t
; CHECK-NEXT: fsrm a0
; CHECK-NEXT: vfcvt.f.x.v v16, v16, v0.t
; CHECK-NEXT: vfcvt.f.x.v v24, v24, v0.t
; CHECK-NEXT: vsetvli zero, zero, e64, m8, ta, mu
; CHECK-NEXT: vfsgnj.vv v8, v16, v8, v0.t
; CHECK-NEXT: addi a0, sp, 16
; CHECK-NEXT: vl8r.v v16, (a0) # Unknown-size Folded Reload
; CHECK-NEXT: vfsgnj.vv v8, v24, v8, v0.t
; CHECK-NEXT: csrr a0, vlenb
; CHECK-NEXT: slli a0, a0, 3
; CHECK-NEXT: add sp, sp, a0
Expand Down
Loading