-
Notifications
You must be signed in to change notification settings - Fork 24
[WIP] Enhance 3D register allocation strategy #442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: aie-public
Are you sure you want to change the base?
Conversation
6143ec2
to
9c52370
Compare
TODO : Verify if there is any negative impact of following change Implement TRI.getSubClassWithSubReg to avoid generation of
Rather force it to generate 7 copies. Basically for all the individual sub-regs. This will help super-reg-rewrite to split them into new virtual reg with there own live-ranges |
9c52370
to
f26d771
Compare
TODO :
|
@@ -66,6 +67,173 @@ void AIE2PPassConfig::addPreRegBankSelect() { | |||
} | |||
} | |||
|
|||
static bool onlyAllocateLIwith3DInstruction(MachineRegisterInfo &MRI, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: onlyAllocateLIWith3DInstruction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have access to TII, so we could move this switch to that class in a AIE-specific hook. We could also use a nice name, something like isHighPriorityLIUseInstruction
(not sure about this name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea of exposing TII was to move switch under a AIE-specific hook.
Chose to delay that until I see some result.
; AIE2P-VREGS-NEXT: [[COPY3:%[0-9]+]]:em = COPY [[LDA_dms_lda_idx_imm]] | ||
; AIE2P-VREGS-NEXT: [[COPY4:%[0-9]+]]:edj = COPY [[LDA_dms_lda_idx_imm2]] | ||
; AIE2P-VREGS-NEXT: [[COPY5:%[0-9]+]]:edc = COPY [[LDA_dms_lda_idx_imm3]] | ||
; AIE2P-VREGS-NEXT: [[COPY2:%[0-9]+]]:edn_as_32bit = COPY [[LDA_dms_lda_idx_imm1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we mixing AIE2 and AIE2P here? I mean, there is nothing related to your implementation, but I see no edn_as_32bit
for AIE2P. I see spill_eDN_to_eR
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, maybe this test is failing.....
DebugVars); | ||
} | ||
|
||
// Re-write all the collected unassigned VRegs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe include a description in the commit message!
|
||
// Re-write all the collected unassigned VRegs | ||
for (auto &VReg : UnAssignedPhysRegs) { | ||
MCRegister DummyPhysReg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A brief explanation of this part will be nice!
Hi @krishnamtibrewala, I see a good potential in your approach! Did you manage to get some QoR numbers? |
Thank you @andcarminati for the feedback, I will incorporate all of them. |
f26d771
to
455cec8
Compare
03a5a5b
to
d944642
Compare
; AIE2P-VREGS-NEXT: [[COPY4:%[0-9]+]]:edc_as_32bit = COPY [[MOV_PD_imm11_pseudo]] | ||
; AIE2P-VREGS-NEXT: [[COPY5:%[0-9]+]]:edn_as_32bit = COPY [[COPY2]] | ||
; AIE2P-VREGS-NEXT: [[COPY6:%[0-9]+]]:edj_as_32bit = COPY [[COPY3]] | ||
; AIE2P-VREGS-NEXT: [[COPY7:%[0-9]+]]:em_as_32bit = COPY [[COPY1]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO : Try to understand why COPY from physical reg leads to spill_* and COPY from some [[COPY*]] lead to *_32bit RC
…f copies With the new strategy liverange splitting end up creating bundle copies where in some of sub-reg are no longer in use at all and when we split them in Super-Reg-Rewrite we end up creating live range that start of index and ends as dead on same slot index. But there is another reg on the same slot-index (since we have a MOV bundle) which actually have a valid live range.
The new strategy exposes a fundamental problem on how bundled instruction in case of sub-reg are created by live range splitting logic(Refer : SplitEditor::buildSingleSubRegCopy) . From standard llvm perspective it is not a problem but when it comes to AIE and what we do in Super-Reg-Rewrite pass. We make them a complete register(which we want/need to do) but now there are COPY instr where in we end a live range on the Bundle and create a new live range by a different COPY instruction in the same bundle which are using the same reg class for src & dst. The major issue comes when reg-alloc end up assigning same register to such COPY in the same bundle, AFAIK this happens because the bundle is assign one unique stack slot. By expanding the CopyBundle we provide the COPY MI a unique slot and the associate operands a proper LiveInterval
d944642
to
6ee4bd0
Compare
I have been investigating Conv2D_bfp16_0 and why it performs so poorly. The issue is that we spill 3D addressing registers frequently.
Upon further investigation of how we have implemented its staged register allocation and how the greedy register allocator works with registers (including sub-registers), I believe the problem lies in how LLVM treats and assumes the impact of modifications in a sub-register on the super-register. This assumption does not hold for AIE 2D/3D registers.
When we split a 2D/3D register, say "Z" (which was used by PADD_3D), we create two new virtual registers, "X" and "Y," which are treated as the same register class as "Z." Instead of "Z," "Y" is used with PADD_3D. The key point is that "X," even though it is considered a "3D" register (i.e., $d1_3d), does not need to follow the constraint of assigning sub-registers to (m1 dn1 dj1 dc1 dn5 dj5 dc5). But we end up forcing this constrain on reg-alloc.
We started by splitting the live ranges of one 3D register "Z" and ended with two 3D registers in the same "3D" register class. This has a snowball effect, as we continually split to achieve the "3D" register constraints for every new virtual register greedy-alloc ends up creating. In Conv2D_bfp16_0, we start with five 3D virtual registers and end up creating 32. Furthermore, due to the handling mechanism of sub-registers, we cannot use getLargestLegalSuperClass(...) to avoid spilling and use other scalar registers. (Note: If spilling has already occurred—common in our staged allocation when allocating 3D/2D registers—the aie-superreg-rewrite pass cannot undo its effects. It only helps in the next pass of the greedy register allocator.)
Ideally, what we could do is:
The potential benefits of this approach include reduced spilling and increasing the greedy allocator's flexibility to manage 2D/3D sub-registers.
To implement this with minimal modifications to the target-independent code: