-
Notifications
You must be signed in to change notification settings - Fork 24
[AIE2p] Use multi-slot pseudo for const COPY with unique def #454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: aie-public
Are you sure you want to change the base?
Conversation
Note : The best place to do this would be after register coalescing, that is where const COPY are created, and we should do this before RA |
How is this different from constant rematerialization by RA? |
ece36b2
to
b49e586
Compare
Hi @martien-de-jong, i do not see that happening in RA, additionally the idea here is to covert the Scalar COPY instruction into PseudoImm move so that we can be packed in a same VLIW bundle. |
What is stopping you from implementing this before RA? Also, I think that register coalescing removes copies rather than creating them. I think that PHI elimination is the biggest creator of copies. |
Hi @martien-de-jong you are right PHI is the biggest creator of COPY, the register coalescing pass helps to clean up these copies to a certain extend leading to IR like. (Note: When it comes to COPY from a unmatching sub-reg to sub-reg, coalescing pass does not do a great job for us) bb.0 bb.1 The only motivation to implement it before RA is the live range of %1 might reduce, aiding it in RA (both are big IFs) I saw this helpful in Conv2D_bfp16_* test cases. |
@krishnamtibrewala yes, everything is related. There are more liverange considerations around REQ_SEQ and subreg use, especially across PHI nodes. I have a feeling that a combined PHI-elimination + constant materialization + register coalescing might be quite powerful. (although rematerialization might be reserved as a repair mechanism in core RA. It might influence coalescing decisions though.) |
This will help in Liner code. Since reg-to-reg copy goes only on one slot by using multi-slot in case of const copy we can bundle them or use different slot for better packing.
PS : Very small optimization