Skip to content

[flang][cuda] Using nvvm intrinsics for the syncthread and threadfence families of calls #120020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 18, 2024

Conversation

Renaud-K
Copy link
Contributor

@Renaud-K Renaud-K commented Dec 15, 2024

No description provided.

@Renaud-K Renaud-K requested a review from jeanPerier December 15, 2024 21:15
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:fir-hlfir labels Dec 15, 2024
@llvmbot
Copy link
Member

llvmbot commented Dec 15, 2024

@llvm/pr-subscribers-flang-fir-hlfir

Author: Renaud Kauffmann (Renaud-K)

Changes

I am trying to get the call to syncthreads1 identified as an intrinsic. I have modelled the changes after ieee_set_rounding_mode.
I am not able to get it to even try to do look-up for a possible intrinsic match. (As the debug messages show)
It works for the ieee call though. I am not sure what I am missing.

module mtests
  integer, device :: n(5) = (/ (I, I = 3,7) /)
  interface syncthreads1
    subroutine syncthreads1_0()
    end subroutine
  end interface
  contains
    attributes(global) subroutine testany() bind(c)
      integer i
      i = threadIdx%x
      if (i < 3) then 
        n(i) = 1
        !call syncthreads1()
      endif
    end subroutine
end module mtests
 
program t
use mtests
use ieee_arithmetic
call testany<<<1,5>>> ()
call syncthreads1()
call ieee_set_rounding_mode(ieee_to_zero)
end

Full diff: https://github.com/llvm/llvm-project/pull/120020.diff

2 Files Affected:

  • (modified) flang/include/flang/Optimizer/Builder/IntrinsicCall.h (+1)
  • (modified) flang/lib/Optimizer/Builder/IntrinsicCall.cpp (+20-1)
diff --git a/flang/include/flang/Optimizer/Builder/IntrinsicCall.h b/flang/include/flang/Optimizer/Builder/IntrinsicCall.h
index bc0020e614db24..77683ad4b3c7b1 100644
--- a/flang/include/flang/Optimizer/Builder/IntrinsicCall.h
+++ b/flang/include/flang/Optimizer/Builder/IntrinsicCall.h
@@ -392,6 +392,7 @@ struct IntrinsicLibrary {
   fir::ExtendedValue genSum(mlir::Type, llvm::ArrayRef<fir::ExtendedValue>);
   void genSignalSubroutine(llvm::ArrayRef<fir::ExtendedValue>);
   void genSleep(llvm::ArrayRef<fir::ExtendedValue>);
+  void genSyncThreads(llvm::ArrayRef<fir::ExtendedValue>);
   fir::ExtendedValue genSystem(std::optional<mlir::Type>,
                                mlir::ArrayRef<fir::ExtendedValue> args);
   void genSystemClock(llvm::ArrayRef<fir::ExtendedValue>);
diff --git a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
index 547cebefd2df47..c358c492f66a5d 100644
--- a/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
+++ b/flang/lib/Optimizer/Builder/IntrinsicCall.cpp
@@ -642,6 +642,7 @@ static constexpr IntrinsicHandler handlers[]{
        {"dim", asValue},
        {"mask", asBox, handleDynamicOptional}}},
      /*isElemental=*/false},
+    {"syncthreads1", &I::genSyncThreads},
     {"system",
      &I::genSystem,
      {{{"command", asBox}, {"exitstat", asBox, handleDynamicOptional}}},
@@ -1639,8 +1640,9 @@ mlir::Value toValue(const fir::ExtendedValue &val, fir::FirOpBuilder &builder,
 //===----------------------------------------------------------------------===//
 
 static bool isIntrinsicModuleProcedure(llvm::StringRef name) {
+  llvm::errs() << "isIntrinsicModuleProcedure: " << name << "\n";
   return name.starts_with("c_") || name.starts_with("compiler_") ||
-         name.starts_with("ieee_") || name.starts_with("__ppc_");
+         name.starts_with("ieee_") || name.starts_with("__ppc_") || name == "syncthreads1";
 }
 
 static bool isCoarrayIntrinsic(llvm::StringRef name) {
@@ -1684,6 +1686,7 @@ lookupIntrinsicHandler(fir::FirOpBuilder &builder,
                        llvm::StringRef intrinsicName,
                        std::optional<mlir::Type> resultType) {
   llvm::StringRef name = genericName(intrinsicName);
+  llvm::errs() << "Looking up " << intrinsicName << " with name " << name << "\n";
   if (const IntrinsicHandler *handler = findIntrinsicHandler(name))
     return std::make_optional<IntrinsicHandlerEntry>(handler);
   bool isPPCTarget = fir::getTargetTriple(builder.getModule()).isPPC();
@@ -7290,6 +7293,22 @@ IntrinsicLibrary::genSum(mlir::Type resultType,
                       resultType, args);
 }
 
+// SYNCTHREADS
+void IntrinsicLibrary::genSyncThreads(llvm::ArrayRef<fir::ExtendedValue> args) {
+  constexpr llvm::StringLiteral funcName = "llvm.nvvm.barrier0";
+  mlir::func::FuncOp funcOp = builder.getNamedFunction(funcName);
+  mlir::MLIRContext *context = builder.getContext();
+  mlir::FunctionType funcType =
+      mlir::FunctionType::get(context, {}, {});
+
+  if (!funcOp) 
+    funcOp = builder.createFunction(loc, funcName, funcType);
+  
+  llvm::SmallVector<mlir::Value> noArgs;
+  builder.create<fir::CallOp>(loc, funcOp, noArgs);
+
+}
+
 // SYSTEM
 fir::ExtendedValue
 IntrinsicLibrary::genSystem(std::optional<mlir::Type> resultType,

Copy link

github-actions bot commented Dec 15, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@clementval
Copy link
Contributor

If it's ready for review, can you update the title and description?

@Renaud-K Renaud-K changed the title WIP: Trying to get a call treated as an intrinsic WIP: Using nvvm intrinsics for the syncthread and threadfence families of calls Dec 18, 2024
@Renaud-K
Copy link
Contributor Author

Still WIP.
Can you spot the issue with threadfence?

@clementval
Copy link
Contributor

Still WIP. Can you spot the issue with threadfence?

Looking at it

@clementval
Copy link
Contributor

Still WIP. Can you spot the issue with threadfence?

Is that all threadfence family or just the simple threadfence?

@Renaud-K
Copy link
Contributor Author

It is just threadfence.
I get this for the others

    fir.call @llvm.nvvm.membar.cta() fastmath<contract> : () -> ()
    fir.call @llvm.nvvm.membar.sys() fastmath<contract> : () -> ()
  func.func private @llvm.nvvm.membar.cta()
  func.func private @llvm.nvvm.membar.sys()

But for threadfence, I get:

    fir.call @_QPthreadfence() fastmath<contract> : () -> ()
  func.func private @_QPthreadfence() attributes {cuf.proc_attr = #cuf.cuda_proc<device>}

But I am expecting:

    fir.call @llvm.nvvm.membar.gl() fastmath<contract> : () -> ()

@clementval
Copy link
Contributor

clementval commented Dec 18, 2024

Looking at it, it looks like there is smth wrong with findIntrinsicHandler. Locally on my machine it returns the handler from trailz instead of what we expect.
The way we look in the table might be the issue.

EDIT: After looking at it more closely, I found your issue. The threadfence entries are not sorted in the table. You need to move them and it will work.

@Renaud-K Renaud-K changed the title WIP: Using nvvm intrinsics for the syncthread and threadfence families of calls Using nvvm intrinsics for the syncthread and threadfence families of calls Dec 18, 2024
Copy link
Contributor

@clementval clementval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you add the prefix [flang][cuda] in you commit title

Copy link
Contributor

@jeanPerier jeanPerier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

@Renaud-K Renaud-K changed the title Using nvvm intrinsics for the syncthread and threadfence families of calls [flang][cuda] Using nvvm intrinsics for the syncthread and threadfence families of calls Dec 18, 2024
@Renaud-K Renaud-K merged commit cb0effc into llvm:main Dec 18, 2024
8 checks passed
@Renaud-K Renaud-K deleted the wip-syncthreads branch January 28, 2025 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants