-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[AMDGPU] Enable atomic optimizer for 64 bit divergent values #96473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ing for generic types
…ring for generic types
If you're going to repost with a pre-commit, it would be better to have all the pieces squashed into one. Also you could look into using graphite or SPR for managing dependent pull requests |
Apologies for the commit spam here, graphite seems a good option hereon. However all dependent patches have landed, the diff here is now up to date. |
@@ -402,34 +413,30 @@ Value *AMDGPUAtomicOptimizerImpl::buildReduction(IRBuilder<> &B, | |||
|
|||
// Reduce within each pair of rows (i.e. 32 lanes). | |||
assert(ST->hasPermLaneX16()); | |||
V = B.CreateBitCast(V, IntNTy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please submit an NFC cleanup patch that just removes unnecessary bitcasting, before adding support for new atomic operations.
case Type::IntegerTyID: { | ||
if (Ty->getIntegerBitWidth() == 32 || Ty->getIntegerBitWidth() == 64) | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget pointers
@@ -178,6 +178,20 @@ bool AMDGPUAtomicOptimizerImpl::run(Function &F) { | |||
return Changed; | |||
} | |||
|
|||
static bool shouldOptimize(Type *Ty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better name that expresses why this type is handleable.
Also in a follow up, really should cover the i16/half/bfloat and 2 x half, 2 x bfloat cases
Kindly review only the top commit here, all the remaining changes are the same as in #89217 and #92725.
This is the final patch in the series.