-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[CodeGen] Guard copy propagation in machine CSE against undefs #97413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Typo "propagation" |
Definitely needs a test if we're going ahead with this hack |
@llvm/pr-subscribers-backend-amdgpu Author: Vikram Hegde (vikramRH) ChangesFull diff: https://github.com/llvm/llvm-project/pull/97413.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/MachineCSE.cpp b/llvm/lib/CodeGen/MachineCSE.cpp
index 4e6101f875589..e39aae56bf116 100644
--- a/llvm/lib/CodeGen/MachineCSE.cpp
+++ b/llvm/lib/CodeGen/MachineCSE.cpp
@@ -184,7 +184,7 @@ bool MachineCSE::PerformTrivialCopyPropagation(MachineInstr *MI,
continue;
bool OnlyOneUse = MRI->hasOneNonDBGUse(Reg);
MachineInstr *DefMI = MRI->getVRegDef(Reg);
- if (!DefMI->isCopy())
+ if (!DefMI || !DefMI->isCopy())
continue;
Register SrcReg = DefMI->getOperand(1).getReg();
if (!SrcReg.isVirtual())
diff --git a/llvm/test/CodeGen/AMDGPU/machine-cse-copyprop.mir b/llvm/test/CodeGen/AMDGPU/machine-cse-copyprop.mir
new file mode 100644
index 0000000000000..ab759657d5d7a
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/machine-cse-copyprop.mir
@@ -0,0 +1,21 @@
+# RUN: llc -mtriple=amdgcn -run-pass=machine-cse -verify-machineinstrs -o - %s | FileCheck %s
+
+# Test to ensure that this does not crash on undefs
+# CHECK-LABEL: name: machine-cse-copyprop
+# CHECK: IMPLICIT_DEF
+# CHECK-NOT: COPY
+# CHECK: S_ADD_I32
+---
+name: machine-cse-copyprop
+tracksRegLiveness: true
+body: |
+ bb.0:
+ %0:sreg_32 = IMPLICIT_DEF
+ %1:sreg_32 = IMPLICIT_DEF
+ %2:sreg_32 = COPY %0
+ %3:sreg_32 = COPY %1
+ %4:sreg_64 = REG_SEQUENCE undef %10:sreg_32, %subreg.sub0, %2:sreg_32, %subreg.sub1
+ %5:sreg_64 = REG_SEQUENCE undef %11:sreg_32, %subreg.sub0, %3:sreg_32, %subreg.sub1
+ %6:sreg_32 = S_ADD_I32 %4.sub1:sreg_64, %5.sub1:sreg_64, implicit-def $scc
+
+...
|
%2:sreg_32 = COPY %0 | ||
%3:sreg_32 = COPY %1 | ||
%4:sreg_64 = REG_SEQUENCE undef %10:sreg_32, %subreg.sub0, %2:sreg_32, %subreg.sub1 | ||
%5:sreg_64 = REG_SEQUENCE undef %11:sreg_32, %subreg.sub0, %3:sreg_32, %subreg.sub1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did these get produced in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is a usecase (which is also the behaviour seen with new atomic optimizer changes). DPP combine translates following code
%20:vreg_64 = V_SET_INACTIVE_B64 killed %106:vreg_64, 0, implicit-def dead $scc, implicit $exec
%187:vgpr_32 = V_MOV_B32_dpp %19.sub0:vreg_64(tied-def 0), %20.sub0:vreg_64, 273, 15, 15, 0, implicit $exec
%188:vgpr_32 = V_MOV_B32_dpp %19.sub1:vreg_64(tied-def 0), %20.sub1:vreg_64, 273, 15, 15, 0, implicit $exec
%21:vreg_64 = REG_SEQUENCE %187:vgpr_32, %subreg.sub0, %188:vgpr_32, %subreg.sub1
%111:vgpr_32, %113:sreg_64_xexec = V_ADD_CO_U32_e64 %20.sub0:vreg_64, %21.sub0:vreg_64, 0, implicit $exec
%112:vgpr_32, dead %114:sreg_64_xexec = V_ADDC_U32_e64 %20.sub1:vreg_64, %21.sub1:vreg_64, killed %113:sreg_64_xexec, 0, implicit $exec
to
%19:vreg_64 = V_SET_INACTIVE_B64 killed %105:vreg_64, 0, implicit-def dead $scc, implicit $exec
%187:vgpr_32 = V_MOV_B32_dpp %18.sub1:vreg_64(tied-def 0), %19.sub1:vreg_64, 273, 15, 15, 0, implicit $exec
%20:vreg_64 = REG_SEQUENCE undef %186:vgpr_32, %subreg.sub0, %187:vgpr_32, %subreg.sub1
%110:vgpr_32, %112:sreg_64_xexec = V_ADD_CO_U32_e64_dpp %188:vgpr_32(tied-def 0), %19.sub0:vreg_64, %19.sub0:vreg_64, 0, 273, 15, 15, 1, implicit $exec
%111:vgpr_32, dead %113:sreg_64_xexec = V_ADDC_U32_e64 %19.sub1:vreg_64, %20.sub1:vreg_64, killed %112:sreg_64_xexec, 0, implicit $exec
I'm not sure "NFC" is appropriate if this avoids a compiler crash. |
ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with nit. Also still would like to the sample IR that got here, it probably should not happen for anything well formed
91c6382
to
08bdad3
Compare
https://godbolt.org/z/K7fPTq8zc (basically the optimized i64 atomic add via DPP path which would be generated after #96934). |
…97413) Change-Id: If66c6bbe53a41012b8e814042c14dade0ce2852a
…97413) Change-Id: If66c6bbe53a41012b8e814042c14dade0ce2852a
No description provided.