Skip to content

Avoid broadcast/extract when implementing memset #416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: aie-public
Choose a base branch
from

Conversation

martien-de-jong
Copy link
Collaborator

@martien-de-jong martien-de-jong commented Mar 31, 2025

The generic G_MEMSET legalizer helper would tweak the alignment of stack objects to make them amenable for vector implementations. However, the vector store that it creates doesn't have that alignment info available, and our legalization scalarizes it. With that scalarization it is not better than the original code that uses 32 bit stores.

I have disabled the cleverness which gives good results, also in stack size on the reduced example that I have added as a regression test.

DRAFT DRAFT DRAFT
This would be a draft PR, but I want to see whether it comes through standard CI testing

@martien-de-jong martien-de-jong changed the title Avoid broadcat/extract when implementing memset Avoid broadcast/extract when implementing memset Mar 31, 2025
@andcarminati
Copy link
Collaborator

I think we should extend the target hook to have something like:

 LLT AIEBaseTargetLowering::getOptimalMemOpLLT(
     const MemOp &Op, const AttributeList &FuncAttributes) const {
 
+  bool AllowVectors =  && Op.isFixedDstAlign();
   if (Subtarget.isAIE2P()) {
-    if (AllowVecRegMemOps && Op.size() >= 64 && Op.isAligned(Align(64)))
+    if (AllowVectors && Op.size() >= 64 && Op.isAligned(Align(64)))
       return LLT::fixed_vector(16, 32);
   }
+  
   if (Subtarget.isAIE2() || Subtarget.isAIE2P()) {
-    if (AllowVecRegMemOps && Op.size() >= 32 && Op.isAligned(Align(32)))
+    if (AllowVectors && Op.size() >= 32 && Op.isAligned(Align(32)))
       return LLT::fixed_vector(8, 32);
-    if (AllowVecRegMemOps && Op.size() >= 16 && Op.isAligned(Align(16)))
+    if (AllowVectors && Op.size() >= 16 && Op.isAligned(Align(16)))
       return LLT::fixed_vector(4, 32);
     if (Op.size() >= 4 && Op.isAligned(Align(4)))
       return LLT::scalar(32);

It is risky to remove all that code.

@andcarminati
Copy link
Collaborator

I think we should extend the target hook to have something like:

 LLT AIEBaseTargetLowering::getOptimalMemOpLLT(
     const MemOp &Op, const AttributeList &FuncAttributes) const {
 
+  bool AllowVectors =  && Op.isFixedDstAlign();
   if (Subtarget.isAIE2P()) {
-    if (AllowVecRegMemOps && Op.size() >= 64 && Op.isAligned(Align(64)))
+    if (AllowVectors && Op.size() >= 64 && Op.isAligned(Align(64)))
       return LLT::fixed_vector(16, 32);
   }
+  
   if (Subtarget.isAIE2() || Subtarget.isAIE2P()) {
-    if (AllowVecRegMemOps && Op.size() >= 32 && Op.isAligned(Align(32)))
+    if (AllowVectors && Op.size() >= 32 && Op.isAligned(Align(32)))
       return LLT::fixed_vector(8, 32);
-    if (AllowVecRegMemOps && Op.size() >= 16 && Op.isAligned(Align(16)))
+    if (AllowVectors && Op.size() >= 16 && Op.isAligned(Align(16)))
       return LLT::fixed_vector(4, 32);
     if (Op.size() >= 4 && Op.isAligned(Align(4)))
       return LLT::scalar(32);

It is risky to remove all that code.

Discussed offline: we have an alignment here, we are just spotting the wrong one in the legalizer.

if (MFI.getObjectAlign(FI) < Alignment)
MFI.setObjectAlignment(FI, Alignment);
}
}
Copy link
Collaborator Author

@martien-de-jong martien-de-jong Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. blunt fix. We cut away an optimization and don't meet it in PR testing.
However, this is not the way. Instead I propose we change the lowering code to get the appropriate alignment in the MMO's of the store instructions that are generated. We need to have a less lazy constructor of that derived MMO. I think that should be perfectly acceptable to upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants