Skip to content

[AMDGPU] Enable atomic optimizer for 64 bit divergent values #96473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

vikramRH
Copy link
Contributor

Kindly review only the top commit here, all the remaining changes are the same as in #89217 and #92725.
This is the final patch in the series.

@vikramRH vikramRH changed the title [AMDGPU] Enable atomic optimizer for 64 bit values [AMDGPU] Enable atomic optimizer for 64 bit divergent values Jun 24, 2024
@arsenm
Copy link
Contributor

arsenm commented Jun 24, 2024

Kindly review only the top commit here

If you're going to repost with a pre-commit, it would be better to have all the pieces squashed into one. Also you could look into using graphite or SPR for managing dependent pull requests

@vikramRH
Copy link
Contributor Author

vikramRH commented Jun 26, 2024

Apologies for the commit spam here, graphite seems a good option hereon. However all dependent patches have landed, the diff here is now up to date.

@@ -402,34 +413,30 @@ Value *AMDGPUAtomicOptimizerImpl::buildReduction(IRBuilder<> &B,

// Reduce within each pair of rows (i.e. 32 lanes).
assert(ST->hasPermLaneX16());
V = B.CreateBitCast(V, IntNTy);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please submit an NFC cleanup patch that just removes unnecessary bitcasting, before adding support for new atomic operations.

case Type::IntegerTyID: {
if (Ty->getIntegerBitWidth() == 32 || Ty->getIntegerBitWidth() == 64)
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget pointers

@@ -178,6 +178,20 @@ bool AMDGPUAtomicOptimizerImpl::run(Function &F) {
return Changed;
}

static bool shouldOptimize(Type *Ty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better name that expresses why this type is handleable.

Also in a follow up, really should cover the i16/half/bfloat and 2 x half, 2 x bfloat cases

@vikramRH
Copy link
Contributor Author

closing this in favour of #96933 and #96934

@vikramRH vikramRH closed this Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants