Skip to content

RegAlloc: Fix verifier error after failed allocation #119690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 21, 2025

Conversation

arsenm
Copy link
Contributor

@arsenm arsenm commented Dec 12, 2024

In some cases after reporting an allocation failure, this would fail
the verifier. It picks the first allocatable register and assigns it,
but didn't update the liveness appropriately. When VirtRegRewriter
relied on the liveness to set kill flags, it would incorrectly add
kill flags if there was another overlapping kill of the virtual
register.

We can't properly assign the register to an overlapping range, so
break the liveness of the failing register (and any other interfering
registers) instead. Give the virtual register dummy liveness by
effectively deleting all the uses by setting them to undef.

The edge case not tested here which I'm worried about is if the read
of the register is a def of a subregister. I've been unable to come up
with a test where this occurs.

https://reviews.llvm.org/D122616

@arsenm arsenm marked this pull request as ready for review December 12, 2024 10:47
@llvmbot
Copy link
Member

llvmbot commented Dec 12, 2024

@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-regalloc

Author: Matt Arsenault (arsenm)

Changes

In some cases after reporting an allocation failure, this would fail
the verifier. It picks the first allocatable register and assigns it,
but didn't update the liveness appropriately. When VirtRegRewriter
relied on the liveness to set kill flags, it would incorrectly add
kill flags if there was another overlapping kill of the virtual
register.

We can't properly assign the register to an overlapping range, so
break the liveness of the failing register (and any other interfering
registers) instead. Give the virtual register dummy liveness by
effectively deleting all the uses by setting them to undef.

The edge case not tested here which I'm worried about is if the read
of the register is a def of a subregister. I've been unable to come up
with a test where this occurs.

https://reviews.llvm.org/D122616


Full diff: https://github.com/llvm/llvm-project/pull/119690.diff

10 Files Affected:

  • (modified) llvm/lib/CodeGen/RegAllocBase.cpp (+36)
  • (modified) llvm/lib/CodeGen/RegAllocBase.h (+6)
  • (modified) llvm/lib/CodeGen/RegAllocBasic.cpp (+1)
  • (modified) llvm/lib/CodeGen/RegAllocGreedy.cpp (+1)
  • (modified) llvm/test/CodeGen/AMDGPU/illegal-eviction-assert.mir (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/issue48473.mir (+2-1)
  • (modified) llvm/test/CodeGen/AMDGPU/regalloc-failure-overlapping-insert-assert.mir (+2-5)
  • (added) llvm/test/CodeGen/AMDGPU/register-killed-error-after-alloc-failure0.mir (+59)
  • (added) llvm/test/CodeGen/AMDGPU/register-killed-error-after-alloc-failure1.ll (+30)
  • (modified) llvm/test/CodeGen/AMDGPU/remaining-virtual-register-operands.ll (+1-2)
diff --git a/llvm/lib/CodeGen/RegAllocBase.cpp b/llvm/lib/CodeGen/RegAllocBase.cpp
index 980a6756963d9f..bb0c8a32a7bc92 100644
--- a/llvm/lib/CodeGen/RegAllocBase.cpp
+++ b/llvm/lib/CodeGen/RegAllocBase.cpp
@@ -65,6 +65,7 @@ void RegAllocBase::init(VirtRegMap &vrm, LiveIntervals &lis,
   Matrix = &mat;
   MRI->freezeReservedRegs();
   RegClassInfo.runOnMachineFunction(vrm.getMachineFunction());
+  FailedVRegs.clear();
 }
 
 // Visit all the live registers. If they are already assigned to a physical
@@ -128,6 +129,7 @@ void RegAllocBase::allocatePhysRegs() {
 
       // Keep going after reporting the error.
       VRM->assignVirt2Phys(VirtReg->reg(), AvailablePhysReg);
+      FailedVRegs.insert(VirtReg->reg());
     } else if (AvailablePhysReg)
       Matrix->assign(*VirtReg, AvailablePhysReg);
 
@@ -161,6 +163,40 @@ void RegAllocBase::postOptimization() {
   DeadRemats.clear();
 }
 
+void RegAllocBase::cleanupFailedVRegs() {
+  SmallSet<Register, 8> JunkRegs;
+
+  for (Register FailedReg : FailedVRegs) {
+    JunkRegs.insert(FailedReg);
+
+    MCRegister PhysReg = VRM->getPhys(FailedReg);
+    LiveInterval &FailedInterval = LIS->getInterval(FailedReg);
+
+    // The liveness information for the failed register and anything interfering
+    // with the physical register we arbitrarily chose is junk and needs to be
+    // deleted.
+    for (MCRegUnitIterator Units(PhysReg, TRI); Units.isValid(); ++Units) {
+      LiveIntervalUnion::Query &Q = Matrix->query(FailedInterval, *Units);
+      for (const LiveInterval *InterferingReg : Q.interferingVRegs())
+        JunkRegs.insert(InterferingReg->reg());
+    }
+  }
+
+  // TODO: Probably need to set undef on any physreg uses not associated with
+  // a virtual register.
+  for (Register JunkReg : JunkRegs) {
+    // We still should produce valid IR. Kill all the uses and reduce the live
+    // ranges so that we don't think it's possible to introduce kill flags
+    // later which will fail the verifier.
+    for (MachineOperand &MO : MRI->reg_operands(JunkReg)) {
+      if (MO.readsReg())
+        MO.setIsUndef(true);
+    }
+
+    LIS->shrinkToUses(&LIS->getInterval(JunkReg));
+  }
+}
+
 void RegAllocBase::enqueue(const LiveInterval *LI) {
   const Register Reg = LI->reg();
 
diff --git a/llvm/lib/CodeGen/RegAllocBase.h b/llvm/lib/CodeGen/RegAllocBase.h
index 5bd52da61f2dc5..1fdbab694bb0e3 100644
--- a/llvm/lib/CodeGen/RegAllocBase.h
+++ b/llvm/lib/CodeGen/RegAllocBase.h
@@ -37,6 +37,7 @@
 #define LLVM_LIB_CODEGEN_REGALLOCBASE_H
 
 #include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallSet.h"
 #include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/RegAllocCommon.h"
 #include "llvm/CodeGen/RegisterClassInfo.h"
@@ -81,6 +82,7 @@ class RegAllocBase {
   /// always available for the remat of all the siblings of the original reg.
   SmallPtrSet<MachineInstr *, 32> DeadRemats;
 
+  SmallSet<Register, 2> FailedVRegs;
   RegAllocBase(const RegAllocFilterFunc F = nullptr)
       : shouldAllocateRegisterImpl(F) {}
 
@@ -104,6 +106,10 @@ class RegAllocBase {
   // rematerialization.
   virtual void postOptimization();
 
+  /// Perform cleanups on registers that failed to allocate. This hacks on the
+  /// liveness in order to avoid spurious verifier errors in later passes.
+  void cleanupFailedVRegs();
+
   // Get a temporary reference to a Spiller instance.
   virtual Spiller &spiller() = 0;
 
diff --git a/llvm/lib/CodeGen/RegAllocBasic.cpp b/llvm/lib/CodeGen/RegAllocBasic.cpp
index c05aa1e40e4779..7393267c24980b 100644
--- a/llvm/lib/CodeGen/RegAllocBasic.cpp
+++ b/llvm/lib/CodeGen/RegAllocBasic.cpp
@@ -323,6 +323,7 @@ bool RABasic::runOnMachineFunction(MachineFunction &mf) {
 
   allocatePhysRegs();
   postOptimization();
+  cleanupFailedVRegs();
 
   // Diagnostic output before rewriting
   LLVM_DEBUG(dbgs() << "Post alloc VirtRegMap:\n" << *VRM << "\n");
diff --git a/llvm/lib/CodeGen/RegAllocGreedy.cpp b/llvm/lib/CodeGen/RegAllocGreedy.cpp
index af48e916feab45..f746762b223818 100644
--- a/llvm/lib/CodeGen/RegAllocGreedy.cpp
+++ b/llvm/lib/CodeGen/RegAllocGreedy.cpp
@@ -2772,6 +2772,7 @@ bool RAGreedy::runOnMachineFunction(MachineFunction &mf) {
   if (VerifyEnabled)
     MF->verify(this, "Before post optimization", &errs());
   postOptimization();
+  cleanupFailedVRegs();
   reportStats();
 
   releaseMemory();
diff --git a/llvm/test/CodeGen/AMDGPU/illegal-eviction-assert.mir b/llvm/test/CodeGen/AMDGPU/illegal-eviction-assert.mir
index 99c27fa0bc95c4..443d1e9d1bdf08 100644
--- a/llvm/test/CodeGen/AMDGPU/illegal-eviction-assert.mir
+++ b/llvm/test/CodeGen/AMDGPU/illegal-eviction-assert.mir
@@ -17,8 +17,8 @@
 
 ...
 
-# CHECK: S_NOP 0, implicit-def renamable $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, implicit-def renamable $vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27, implicit-def renamable $vgpr0_vgpr1_vgpr2_vgpr3, implicit-def renamable $vgpr28_vgpr29_vgpr30_vgpr31, implicit-def renamable $vgpr0_vgpr1_vgpr2_vgpr3
-# CHECK: S_NOP 0, implicit killed renamable $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, implicit killed renamable $vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27, implicit killed renamable $vgpr0_vgpr1_vgpr2_vgpr3, implicit killed renamable $vgpr28_vgpr29_vgpr30_vgpr31, implicit killed renamable $vgpr0_vgpr1_vgpr2_vgpr3
+# CHECK: S_NOP 0, implicit-def renamable $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, implicit-def renamable $vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27, implicit-def dead renamable $vgpr0_vgpr1_vgpr2_vgpr3, implicit-def renamable $vgpr28_vgpr29_vgpr30_vgpr31, implicit-def dead renamable $vgpr0_vgpr1_vgpr2_vgpr3
+# CHECK: S_NOP 0, implicit killed renamable $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, implicit killed renamable $vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27, implicit undef renamable $vgpr0_vgpr1_vgpr2_vgpr3, implicit killed renamable $vgpr28_vgpr29_vgpr30_vgpr31, implicit undef renamable $vgpr0_vgpr1_vgpr2_vgpr3
 
 ---
 name:            foo
diff --git a/llvm/test/CodeGen/AMDGPU/issue48473.mir b/llvm/test/CodeGen/AMDGPU/issue48473.mir
index e272bd34803831..ba03855d01cd9b 100644
--- a/llvm/test/CodeGen/AMDGPU/issue48473.mir
+++ b/llvm/test/CodeGen/AMDGPU/issue48473.mir
@@ -43,7 +43,8 @@
 # %25 to $sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67
 
 # CHECK-LABEL: name: issue48473
-# CHECK: S_NOP 0, implicit killed renamable $sgpr0_sgpr1_sgpr2_sgpr3, implicit killed renamable $sgpr12_sgpr13_sgpr14_sgpr15, implicit killed renamable $sgpr16_sgpr17_sgpr18_sgpr19_sgpr20_sgpr21_sgpr22_sgpr23, implicit killed renamable $sgpr24_sgpr25_sgpr26_sgpr27_sgpr28_sgpr29_sgpr30_sgpr31, implicit killed renamable $sgpr84_sgpr85_sgpr86_sgpr87, implicit killed renamable $sgpr36_sgpr37_sgpr38_sgpr39_sgpr40_sgpr41_sgpr42_sgpr43, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7, implicit killed renamable $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51, implicit killed renamable $sgpr88_sgpr89_sgpr90_sgpr91, implicit killed renamable $sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83, implicit killed renamable $sgpr0_sgpr1_sgpr2_sgpr3, implicit killed renamable $sgpr52_sgpr53_sgpr54_sgpr55_sgpr56_sgpr57_sgpr58_sgpr59, implicit killed renamable $sgpr92_sgpr93_sgpr94_sgpr95, implicit killed renamable $sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75, implicit renamable $sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75, implicit killed renamable $sgpr96_sgpr97_sgpr98_sgpr99, implicit killed renamable $sgpr8_sgpr9_sgpr10_sgpr11, implicit killed renamable $sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67
+# CHECK: S_NOP 0, implicit undef renamable $sgpr0_sgpr1_sgpr2_sgpr3, implicit killed renamable $sgpr12_sgpr13_sgpr14_sgpr15, implicit killed renamable $sgpr16_sgpr17_sgpr18_sgpr19_sgpr20_sgpr21_sgpr22_sgpr23, implicit killed renamable $sgpr24_sgpr25_sgpr26_sgpr27_sgpr28_sgpr29_sgpr30_sgpr31, implicit killed renamable $sgpr84_sgpr85_sgpr86_sgpr87, implicit killed renamable $sgpr36_sgpr37_sgpr38_sgpr39_sgpr40_sgpr41_sgpr42_sgpr43, implicit killed renamable $sgpr4_sgpr5_sgpr6_sgpr7, implicit killed renamable $sgpr44_sgpr45_sgpr46_sgpr47_sgpr48_sgpr49_sgpr50_sgpr51, implicit killed renamable $sgpr88_sgpr89_sgpr90_sgpr91, implicit killed renamable $sgpr76_sgpr77_sgpr78_sgpr79_sgpr80_sgpr81_sgpr82_sgpr83, implicit undef renamable $sgpr0_sgpr1_sgpr2_sgpr3, implicit killed renamable $sgpr52_sgpr53_sgpr54_sgpr55_sgpr56_sgpr57_sgpr58_sgpr59, implicit killed renamable $sgpr92_sgpr93_sgpr94_sgpr95, implicit killed renamable $sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75, implicit renamable $sgpr68_sgpr69_sgpr70_sgpr71_sgpr72_sgpr73_sgpr74_sgpr75, implicit killed renamable $sgpr96_sgpr97_sgpr98_sgpr99, implicit killed renamable $sgpr8_sgpr9_sgpr10_sgpr11, implicit killed renamable $sgpr60_sgpr61_sgpr62_sgpr63_sgpr64_sgpr65_sgpr66_sgpr67
+
 
 ---
 name:            issue48473
diff --git a/llvm/test/CodeGen/AMDGPU/regalloc-failure-overlapping-insert-assert.mir b/llvm/test/CodeGen/AMDGPU/regalloc-failure-overlapping-insert-assert.mir
index fe01728c005633..1e33f242668a60 100644
--- a/llvm/test/CodeGen/AMDGPU/regalloc-failure-overlapping-insert-assert.mir
+++ b/llvm/test/CodeGen/AMDGPU/regalloc-failure-overlapping-insert-assert.mir
@@ -1,10 +1,7 @@
-# RUN: not llc -mtriple=amdgcn -mcpu=gfx908 -verify-machineinstrs=0 -start-before=greedy,1 -stop-after=virtregrewriter,2 %s -o /dev/null 2>&1 | FileCheck -check-prefix=ERR %s
-# RUN: not --crash llc -mtriple=amdgcn -mcpu=gfx908 -verify-machineinstrs -start-before=greedy,1 -stop-after=virtregrewriter,2 %s -o /dev/null 2>&1 | FileCheck -check-prefixes=ERR,VERIFIER %s
-
-# FIXME: We should not produce a verifier error after erroring
+# RUN: not llc -mtriple=amdgcn -mcpu=gfx908 -verify-machineinstrs -start-before=greedy,1 -stop-after=virtregrewriter,1 %s -filetype=null 2>&1 | FileCheck -check-prefix=ERR %s
 
 # ERR: error: inline assembly requires more registers than available
-# VERIFIER: *** Bad machine code: Using an undefined physical register ***
+# ERR-NOT: Bad machine code
 
 # This testcase cannot be compiled with the enforced register
 # budget. Previously, tryLastChanceRecoloring would assert here. It
diff --git a/llvm/test/CodeGen/AMDGPU/register-killed-error-after-alloc-failure0.mir b/llvm/test/CodeGen/AMDGPU/register-killed-error-after-alloc-failure0.mir
new file mode 100644
index 00000000000000..15102d7ecb6662
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/register-killed-error-after-alloc-failure0.mir
@@ -0,0 +1,59 @@
+# RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -vgpr-regalloc=basic -sgpr-regalloc=basic -start-before=regallocbasic,0 -stop-after=virtregrewriter,2 -verify-machineinstrs -o - %s 2> %t.basic.err | FileCheck -check-prefix=BASIC %s
+# RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -start-before=greedy,0 -stop-after=virtregrewriter,2 -verify-machineinstrs -o - %s 2> %t.greedy.err | FileCheck -check-prefix=GREEDY %s
+
+# RUN: FileCheck -check-prefix=ERR -implicit-check-not=error %s < %t.basic.err
+# RUN: FileCheck -check-prefix=ERR -implicit-check-not=error %s < %t.greedy.err
+
+# This testcase must fail register allocation. It should also not
+# produce a verifier error after doing so. Previously, it would not
+# properly update the liveness for the dummy selected register. As a
+# result, VirtRegRewriter would incorrectly add kill flags which
+# combined with other uses of the physical register produced a
+# verifier error.
+
+# ERR: error: <unknown>:0:0: ran out of registers during register allocation
+
+# GREEDY: SI_SPILL_V256_SAVE undef $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7
+# GREEDY-NEXT: SI_SPILL_V512_SAVE undef $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19
+# GREEDY-NEXT: SI_SPILL_V128_SAVE undef $vgpr0_vgpr1_vgpr2_vgpr3
+
+# GREEDY: dead renamable $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19 = SI_SPILL_V512_RESTORE
+# GREEDY: dead renamable $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7 = SI_SPILL_V256_RESTORE
+# GREEDY: S_NOP 0, implicit undef renamable $vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19, implicit undef renamable $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7, implicit undef renamable $vgpr0_vgpr1_vgpr2_vgpr3
+# GREEDY: S_NOP 0, implicit killed renamable $vgpr20_vgpr21
+
+
+# BASIC: SI_SPILL_V128_SAVE undef $vgpr0_vgpr1_vgpr2_vgpr3
+# BASIC: SI_SPILL_V256_SAVE killed $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23
+# BASIC: SI_SPILL_V512_SAVE undef $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15
+# BASIC: SI_SPILL_V64_SAVE killed $vgpr0_vgpr1, %stack.1, $sgpr32, 0, implicit $exec :: (store (s64) into %stack.1, align 4, addrspace 5)
+# BASIC: dead renamable $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 = SI_SPILL_V512_RESTORE
+# BASIC: renamable $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23 = SI_SPILL_V256_RESTORE
+# BASIC: dead renamable $vgpr0_vgpr1_vgpr2_vgpr3 = SI_SPILL_V128_RESTORE
+# BASIC: S_NOP 0, implicit undef renamable $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15, implicit killed renamable $vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23, implicit undef renamable $vgpr0_vgpr1_vgpr2_vgpr3
+# BASIC: renamable $vgpr0_vgpr1 = SI_SPILL_V64_RESTORE
+
+--- |
+  define void @killed_reg_after_regalloc_failure() #0 {
+    ret void
+  }
+
+  attributes #0 = { "amdgpu-waves-per-eu"="10,10" }
+
+...
+---
+name:            killed_reg_after_regalloc_failure
+tracksRegLiveness: true
+machineFunctionInfo:
+  scratchRSrcReg:  '$sgpr0_sgpr1_sgpr2_sgpr3'
+  frameOffsetReg:  '$sgpr33'
+  stackPtrOffsetReg: '$sgpr32'
+body:             |
+  bb.0:
+    S_NOP 0, implicit-def %0:vreg_512, implicit-def %1:vreg_256, implicit-def %2:vreg_128
+    S_NOP 0, implicit-def %3:vreg_64
+    S_NOP 0, implicit %0, implicit %1, implicit %2
+    S_NOP 0, implicit %3
+    S_ENDPGM 0
+
+...
diff --git a/llvm/test/CodeGen/AMDGPU/register-killed-error-after-alloc-failure1.ll b/llvm/test/CodeGen/AMDGPU/register-killed-error-after-alloc-failure1.ll
new file mode 100644
index 00000000000000..5e466a9470fc53
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/register-killed-error-after-alloc-failure1.ll
@@ -0,0 +1,30 @@
+; RUN: not llc -mtriple=amdgcn -mcpu=gfx908 -verify-machineinstrs -filetype=null %s 2>&1 | FileCheck -check-prefix=ERR -implicit-check-not=error %s
+
+; ERR: error: inline assembly requires more registers than available
+; ERR-NOT: ERROR
+; ERR-NOT: Bad machine code
+
+; This test requires respecting undef on the spill source operand when
+; expanding the pseudos to avoid all verifier errors
+
+%asm.output = type { <16 x i32>, <8 x i32>, <4 x i32>, <3 x i32>, <3 x i32> }
+
+define void @foo(<32 x i32> addrspace(1)* %arg) #0 {
+  %agpr0 = call i32 asm sideeffect "; def $0","=${a0}"()
+  %asm = call %asm.output asm sideeffect "; def $0 $1 $2 $3 $4","=v,=v,=v,=v,=v"()
+  %vgpr0 = extractvalue %asm.output %asm, 0
+  %vgpr1 = extractvalue %asm.output %asm, 1
+  %vgpr2 = extractvalue %asm.output %asm, 2
+  %vgpr3 = extractvalue %asm.output %asm, 3
+  %vgpr4 = extractvalue %asm.output %asm, 4
+  call void asm sideeffect "; clobber", "~{a[0:31]},~{v[0:31]}"()
+  call void asm sideeffect "; use $0","v"(<16 x i32> %vgpr0)
+  call void asm sideeffect "; use $0","v"(<8 x i32> %vgpr1)
+  call void asm sideeffect "; use $0","v"(<4 x i32> %vgpr2)
+  call void asm sideeffect "; use $0","v"(<3 x i32> %vgpr3)
+  call void asm sideeffect "; use $0","v"(<3 x i32> %vgpr4)
+  call void asm sideeffect "; use $0","{a1}"(i32 %agpr0)
+  ret void
+}
+
+attributes #0 = { "amdgpu-waves-per-eu"="8,8" }
diff --git a/llvm/test/CodeGen/AMDGPU/remaining-virtual-register-operands.ll b/llvm/test/CodeGen/AMDGPU/remaining-virtual-register-operands.ll
index 8e3054cceb85b7..d09d6608b216eb 100644
--- a/llvm/test/CodeGen/AMDGPU/remaining-virtual-register-operands.ll
+++ b/llvm/test/CodeGen/AMDGPU/remaining-virtual-register-operands.ll
@@ -1,4 +1,4 @@
-; RUN: not --crash llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs < %s 2>&1 | FileCheck %s
+; RUN: not llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 -verify-machineinstrs -filetype=null %s 2>&1 | FileCheck -implicit-check-not=error %s
 
 ; This testcase fails register allocation at the same time it performs
 ; virtual register splitting (by introducing VGPR to AGPR copies). We
@@ -11,7 +11,6 @@
 ; it takes the first avialable register.
 
 ; CHECK: error: <unknown>:0:0: ran out of registers during register allocation
-; CHECK: Bad machine code: Using an undefined physical register
 define amdgpu_kernel void @alloc_failure_with_split_vregs(float %v0, float %v1) #0 {
   %agpr0 = call float asm sideeffect "; def $0", "=${a0}"()
   %agpr.vec = insertelement <16 x float> undef, float %agpr0, i32 0

@arsenm arsenm force-pushed the users/arsenm/regalloc-fix-verifier-error-after-fail branch from 5c46398 to bca529a Compare December 12, 2024 10:49
Copy link

github-actions bot commented Dec 12, 2024

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:
git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' ad3f7d2c71b062dd1c2fb1fa78e81cc7b3ba53e9 d906c567b996ca992fa4cf4385137e8eb96ca269 llvm/test/CodeGen/AMDGPU/register-killed-error-after-alloc-failure1.ll llvm/lib/CodeGen/RegAllocBase.cpp llvm/lib/CodeGen/RegAllocBase.h llvm/lib/CodeGen/RegAllocBasic.cpp llvm/lib/CodeGen/RegAllocGreedy.cpp llvm/test/CodeGen/AMDGPU/ran-out-of-registers-error-all-regs-reserved.ll llvm/test/CodeGen/AMDGPU/remaining-virtual-register-operands.ll llvm/test/CodeGen/X86/inline-asm-assertion.ll

The following files introduce new uses of undef:

  • llvm/lib/CodeGen/RegAllocBase.cpp
  • llvm/test/CodeGen/AMDGPU/register-killed-error-after-alloc-failure1.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

@arsenm arsenm force-pushed the users/arsenm/amdgpu-delete-spills-of-undef-values branch from 2bd0046 to 0f84c63 Compare December 12, 2024 12:08
@arsenm arsenm force-pushed the users/arsenm/regalloc-fix-verifier-error-after-fail branch from bca529a to 8d34266 Compare December 12, 2024 12:08
@arsenm arsenm force-pushed the users/arsenm/amdgpu-delete-spills-of-undef-values branch from 0f84c63 to 08b4cd5 Compare December 13, 2024 01:01
@arsenm arsenm force-pushed the users/arsenm/regalloc-fix-verifier-error-after-fail branch 2 times, most recently from 4ef1f6e to 060a3a6 Compare December 13, 2024 03:07
@arsenm arsenm force-pushed the users/arsenm/amdgpu-delete-spills-of-undef-values branch from 08b4cd5 to 62956df Compare December 16, 2024 01:59
@arsenm arsenm force-pushed the users/arsenm/regalloc-fix-verifier-error-after-fail branch from 060a3a6 to 2c3b6e5 Compare December 16, 2024 02:00
@arsenm arsenm force-pushed the users/arsenm/amdgpu-delete-spills-of-undef-values branch from 62956df to 13d828c Compare December 17, 2024 06:02
Base automatically changed from users/arsenm/amdgpu-delete-spills-of-undef-values to main December 17, 2024 06:08
@RKSimon
Copy link
Collaborator

RKSimon commented Dec 17, 2024

@arsenm any luck with the CodeGen/AMDGPU/register-killed-error-after-alloc-failure0.mir test failure?

@arsenm arsenm force-pushed the users/arsenm/regalloc-fix-verifier-error-after-fail branch from 2c3b6e5 to b26be2e Compare December 18, 2024 02:47
@arsenm
Copy link
Contributor Author

arsenm commented Dec 18, 2024

@arsenm any luck with the CodeGen/AMDGPU/register-killed-error-after-alloc-failure0.mir test failure?

I get different stack IDs for the spills on macOS. Not sure how much I should care, I could just hack around it with a regex check line

In some cases after reporting an allocation failure, this would fail
the verifier. It picks the first allocatable register and assigns it,
but didn't update the liveness appropriately. When VirtRegRewriter
relied on the liveness to set kill flags, it would incorrectly add
kill flags if there was another overlapping kill of the virtual
register.

We can't properly assign the register to an overlapping range, so
break the liveness of the failing register (and any other interfering
registers) instead. Give the virtual register dummy liveness by
effectively deleting all the uses by setting them to undef.

The edge case not tested here which I'm worried about is if the read
of the register is a def of a subregister. I've been unable to come up
with a test where this occurs.

https://reviews.llvm.org/D122616
@arsenm arsenm force-pushed the users/arsenm/regalloc-fix-verifier-error-after-fail branch from 0848303 to d906c56 Compare February 21, 2025 11:52
@arsenm
Copy link
Contributor Author

arsenm commented Feb 21, 2025

Current failure seems unrelated

@arsenm arsenm merged commit 34167f9 into main Feb 21, 2025
8 of 11 checks passed
@arsenm arsenm deleted the users/arsenm/regalloc-fix-verifier-error-after-fail branch February 21, 2025 15:11
@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 21, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-ubuntu running on as-builder-4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/187/builds/4464

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/ran-out-of-registers-errors.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: not /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn-amd-amdhsa -stress-regalloc=1 -vgpr-regalloc=greedy -filetype=null /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/ran-out-of-registers-errors.ll 2>&1 | /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -check-prefixes=CHECK,GREEDY -implicit-check-not=error /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/ran-out-of-registers-errors.ll
+ not /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn-amd-amdhsa -stress-regalloc=1 -vgpr-regalloc=greedy -filetype=null /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/ran-out-of-registers-errors.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -check-prefixes=CHECK,GREEDY -implicit-check-not=error /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/ran-out-of-registers-errors.ll
RUN: at line 2: not /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn-amd-amdhsa -stress-regalloc=1 -vgpr-regalloc=basic -filetype=null /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/ran-out-of-registers-errors.ll 2>&1 | /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -implicit-check-not=error -check-prefixes=CHECK,BASIC /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/ran-out-of-registers-errors.ll
+ /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/FileCheck -implicit-check-not=error -check-prefixes=CHECK,BASIC /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/ran-out-of-registers-errors.ll
+ not /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/build/bin/llc -mtriple=amdgcn-amd-amdhsa -stress-regalloc=1 -vgpr-regalloc=basic -filetype=null /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/ran-out-of-registers-errors.ll
/home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/ran-out-of-registers-errors.ll:28:10: error: CHECK: expected string not found in input
; CHECK: error: inline assembly requires more registers than available at line 23
         ^
<stdin>:1:113: note: scanning from here
error: <unknown>:0:0: ran out of registers during register allocation in function 'ran_out_of_registers_general'
                                                                                                                ^

Input file: <stdin>
Check file: /home/buildbot/worker/as-builder-4/ramdisk/expensive-checks/llvm-project/llvm/test/CodeGen/AMDGPU/ran-out-of-registers-errors.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: error: <unknown>:0:0: ran out of registers during register allocation in function 'ran_out_of_registers_general' 
check:28                                                                                                                     X error: no match found
          2:  
check:28     ~
          3: # After Basic Register Allocator 
check:28     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          4: ********** INTERVALS ********** 
check:28     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          5: VGPR0_LO16 [0B,32r:0)[140r,144r:1) 0@0B-phi 1@140r 
check:28     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          6: VGPR0_HI16 [0B,32r:0)[140r,144r:1) 0@0B-phi 1@140r 
check:28     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          .
          .
          .
>>>>>>

--

********************


@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 21, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/14149

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg-crash.ll' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
RUN: at line 1: not /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=+xnack -amdgpu-use-amdgpu-trackers=1  2>&1  < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg-crash.ll | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=ERR-GCNTRACKERS /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg-crash.ll
+ not /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -mattr=+xnack -amdgpu-use-amdgpu-trackers=1
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck -check-prefixes=ERR-GCNTRACKERS /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg-crash.ll

--

********************


arsenm added a commit that referenced this pull request Feb 21, 2025
This reverts commit 34167f9.

Different set of verifier errors appears after other regalloc failure
tests with EXPENSIVE_CHECKS.
arsenm added a commit that referenced this pull request Feb 23, 2025
This reverts commit 0c50054.

Reapply with more fixes to avoid expensive_checks failures. Make sure to
call splitSeparateComponents after shrinkToUses, and update the VirtRegMap
with the split registers. Also set undef on all physical register aliases to
the assigned register.
arsenm added a commit that referenced this pull request Feb 26, 2025
This reverts commit 0c50054.

Reapply with more fixes to avoid expensive_checks failures. Make sure to
call splitSeparateComponents after shrinkToUses, and update the VirtRegMap
with the split registers. Also set undef on all physical register aliases to
the assigned register.
arsenm added a commit that referenced this pull request Feb 26, 2025
…" (#128400)

Reapply "RegAlloc: Fix verifier error after failed allocation (#119690)"

This reverts commit 0c50054.

Reapply with more fixes to avoid expensive_checks failures. Make sure to
call splitSeparateComponents after shrinkToUses, and update the VirtRegMap
with the split registers. Also set undef on all physical register aliases to
the assigned register.

Move physreg handling. Not sure if necessary

Remove intervals from regunits. Not sure if necessary
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants