Skip to content

[AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types #92725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 26, 2024

Conversation

vikramRH
Copy link
Contributor

@vikramRH vikramRH commented May 20, 2024

Kindly review only top commits here (i.e commits except the first). These are incremental changes over #89217 , with core logic being the same. Only reason to split these up into separate PR is for ease of review.
This patch along with #89217 and #91190 should get us ready to enable 64 bit optimizations in atomic optimizer.

Copy link

github-actions bot commented May 20, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@vikramRH vikramRH force-pushed the permlane_generic branch from db19330 to 881e116 Compare May 20, 2024 10:02
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this and the previous, can you add a section to AMDGPUUsage for the intrinsics and what types they support

Register Src1Cast =
MRI.getType(Src1).isScalar()
? Src1
: B.buildBitcast(LLT::scalar(Size), Src2).getReg(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the other patch, shouldn't need any bitcasts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will take over the changes from #89217 once finalized,

@vikramRH
Copy link
Contributor Author

  1. Added/updated tests for permlanex16, permlane64
  2. This needs [AMDGPU] Extend readlane, writelane and readfirstlane intrinsic lowering for generic types #89217 to land first so that only incremental changes can be reviewed.

@vikramRH
Copy link
Contributor Author

vikramRH commented Jun 17, 2024

Updated this PR to be in sync with #89217, However still plan is to land this only after changes in #89217 are accepted.

@vikramRH vikramRH changed the title [AMDGPU][WIP] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types [AMDGPU] Extend permlane16, permlanex16 and permlane64 intrinsic lowering for generic types Jun 23, 2024
@vikramRH vikramRH marked this pull request as ready for review June 23, 2024 17:05
@vikramRH vikramRH merged commit 35f7b60 into llvm:main Jun 26, 2024
8 checks passed
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…ring for generic types (llvm#92725)

These are incremental changes over llvm#89217 , with core logic being the
same. This patch along with llvm#89217 and llvm#91190 should get us ready to enable 64
bit optimizations in atomic optimizer.
jrbyrnes pushed a commit to jrbyrnes/llvm-project that referenced this pull request Aug 16, 2024
…ring for generic types (llvm#92725)

These are incremental changes over llvm#89217 , with core logic being the
same. This patch along with llvm#89217 and llvm#91190 should get us ready to enable 64
bit optimizations in atomic optimizer.

Change-Id: Ief70422a47461606c29134b217f40204ee4a198b
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Sep 11, 2024
…ring for generic types (llvm#92725)

These are incremental changes over llvm#89217 , with core logic being the
same. This patch along with llvm#89217 and llvm#91190 should get us ready to enable 64
bit optimizations in atomic optimizer.

Change-Id: Ief70422a47461606c29134b217f40204ee4a198b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants