Skip to content

[AIE2p] Use multi-slot pseudo for const COPY with unique def #454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: aie-public
Choose a base branch
from

Conversation

krishnamtibrewala
Copy link
Collaborator

This will help in Liner code. Since reg-to-reg copy goes only on one slot by using multi-slot in case of const copy we can bundle them or use different slot for better packing.
PS : Very small optimization

@krishnamtibrewala
Copy link
Collaborator Author

Note : The best place to do this would be after register coalescing, that is where const COPY are created, and we should do this before RA

@martien-de-jong
Copy link
Collaborator

How is this different from constant rematerialization by RA?

@krishnamtibrewala
Copy link
Collaborator Author

How is this different from constant rematerialization by RA?

Hi @martien-de-jong, i do not see that happening in RA, additionally the idea here is to covert the Scalar COPY instruction into PseudoImm move so that we can be packed in a same VLIW bundle.

@martien-de-jong
Copy link
Collaborator

What is stopping you from implementing this before RA? Also, I think that register coalescing removes copies rather than creating them. I think that PHI elimination is the biggest creator of copies.

@krishnamtibrewala
Copy link
Collaborator Author

What is stopping you from implementing this before RA? Also, I think that register coalescing removes copies rather than creating them. I think that PHI elimination is the biggest creator of copies.

Hi @martien-de-jong you are right PHI is the biggest creator of COPY, the register coalescing pass helps to clean up these copies to a certain extend leading to IR like. (Note: When it comes to COPY from a unmatching sub-reg to sub-reg, coalescing pass does not do a great job for us)

bb.0
%1 = mov_imm_pseudo 0
%2 = COPY %1
%3 = ADD %1, %x

bb.1
%4 = COPY %1

The only motivation to implement it before RA is the live range of %1 might reduce, aiding it in RA (both are big IFs)
The current implementation was more from ease of implementation & show a working PoC by using copyPhysReg(...) to pick a mov_imm_pseudo rather than mov_scl when possible, helping scheduler to do better bundling.

I saw this helpful in Conv2D_bfp16_* test cases.

@martien-de-jong
Copy link
Collaborator

@krishnamtibrewala yes, everything is related. There are more liverange considerations around REQ_SEQ and subreg use, especially across PHI nodes. I have a feeling that a combined PHI-elimination + constant materialization + register coalescing might be quite powerful. (although rematerialization might be reserved as a repair mechanism in core RA. It might influence coalescing decisions though.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants