-
Notifications
You must be signed in to change notification settings - Fork 14.1k
[flang][cuda] Using nvvm intrinsics for the syncthread and threadfence families of calls #120020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-flang-fir-hlfir Author: Renaud Kauffmann (Renaud-K) ChangesI am trying to get the call to syncthreads1 identified as an intrinsic. I have modelled the changes after ieee_set_rounding_mode.
Full diff: https://github.com/llvm/llvm-project/pull/120020.diff 2 Files Affected:
diff --git a/flang/include/flang/Optimizer/Builder/IntrinsicCall.h b/flang/include/flang/Optimizer/Builder/IntrinsicCall.h
index bc0020e614db24..77683ad4b3c7b1 100644
--- a/flang/include/flang/Optimizer/Builder/IntrinsicCall.h
+++ b/flang/include/flang/Optimizer/Builder/IntrinsicCall.h
@@ -392,6 +392,7 @@ struct IntrinsicLibrary {
fir::ExtendedValue genSum(mlir::Type, llvm::ArrayRef<fir::ExtendedValue>);
void genSignalSubroutine(llvm::ArrayRef<fir::ExtendedValue>);
void genSleep(llvm::ArrayRef<fir::ExtendedValue>);
+ void genSyncThreads(llvm::ArrayRef<fir::ExtendedValue>);
fir::ExtendedValue genSystem(std::optional<mlir::Type>,
mlir::ArrayRef<fir::ExtendedValue> args);
void genSystemClock(llvm::ArrayRef<fir::ExtendedValue>);
diff --git a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
index 547cebefd2df47..c358c492f66a5d 100644
--- a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
+++ b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
@@ -642,6 +642,7 @@ static constexpr IntrinsicHandler handlers[]{
{"dim", asValue},
{"mask", asBox, handleDynamicOptional}}},
/*isElemental=*/false},
+ {"syncthreads1", &I::genSyncThreads},
{"system",
&I::genSystem,
{{{"command", asBox}, {"exitstat", asBox, handleDynamicOptional}}},
@@ -1639,8 +1640,9 @@ mlir::Value toValue(const fir::ExtendedValue &val, fir::FirOpBuilder &builder,
//===----------------------------------------------------------------------===//
static bool isIntrinsicModuleProcedure(llvm::StringRef name) {
+ llvm::errs() << "isIntrinsicModuleProcedure: " << name << "\n";
return name.starts_with("c_") || name.starts_with("compiler_") ||
- name.starts_with("ieee_") || name.starts_with("__ppc_");
+ name.starts_with("ieee_") || name.starts_with("__ppc_") || name == "syncthreads1";
}
static bool isCoarrayIntrinsic(llvm::StringRef name) {
@@ -1684,6 +1686,7 @@ lookupIntrinsicHandler(fir::FirOpBuilder &builder,
llvm::StringRef intrinsicName,
std::optional<mlir::Type> resultType) {
llvm::StringRef name = genericName(intrinsicName);
+ llvm::errs() << "Looking up " << intrinsicName << " with name " << name << "\n";
if (const IntrinsicHandler *handler = findIntrinsicHandler(name))
return std::make_optional<IntrinsicHandlerEntry>(handler);
bool isPPCTarget = fir::getTargetTriple(builder.getModule()).isPPC();
@@ -7290,6 +7293,22 @@ IntrinsicLibrary::genSum(mlir::Type resultType,
resultType, args);
}
+// SYNCTHREADS
+void IntrinsicLibrary::genSyncThreads(llvm::ArrayRef<fir::ExtendedValue> args) {
+ constexpr llvm::StringLiteral funcName = "llvm.nvvm.barrier0";
+ mlir::func::FuncOp funcOp = builder.getNamedFunction(funcName);
+ mlir::MLIRContext *context = builder.getContext();
+ mlir::FunctionType funcType =
+ mlir::FunctionType::get(context, {}, {});
+
+ if (!funcOp)
+ funcOp = builder.createFunction(loc, funcName, funcType);
+
+ llvm::SmallVector<mlir::Value> noArgs;
+ builder.create<fir::CallOp>(loc, funcOp, noArgs);
+
+}
+
// SYSTEM
fir::ExtendedValue
IntrinsicLibrary::genSystem(std::optional<mlir::Type> resultType,
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
1b0239b
to
b7a6883
Compare
If it's ready for review, can you update the title and description? |
Still WIP. |
Looking at it |
Is that all threadfence family or just the simple threadfence? |
It is just
But for threadfence, I get:
But I am expecting:
|
Looking at it, it looks like there is smth wrong with EDIT: After looking at it more closely, I found your issue. The threadfence entries are not sorted in the table. You need to move them and it will work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can you add the prefix [flang][cuda] in you commit title
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
No description provided.