[Clang][OpenCL][AMDGPU] Allow a kernel to call another kernel #115821

lalaniket8 · 2024-11-12T06:09:14Z

This feature is currently not supported in the compiler.
To facilitate this we emit a stub version of each kernel
function body with different name mangling scheme, and
replaces the respective kernel call-sites appropriately.

Fixes #60313

D120566 was an earlier attempt made to upstream a solution
for this issue.

lalaniket8 · 2024-11-12T06:09:31Z

[Clang][OpenCL][AMDGPU] Allow a kernel to call another kernel #115821 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2024-11-12T06:09:31Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

clang/lib/CodeGen/CGOpenCLRuntime.cpp

jhuber6

So, the kernel metadata has a lot of special codegen associated with it. It seems the approach here is to turn the kernels into thin wrappers that call an outlined function?

clang/test/CodeGenOpenCL/opencl-kernel-call.cl

clang/test/CodeGenOpenCL/visibility.cl

shiltian · 2024-11-26T15:09:40Z

Yes, that's the idea. In that way, we will not have any function call to a amdgpu_kernel.

llvmbot · 2024-11-27T08:41:24Z

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-clang

Author: Aniket Lal (lalaniket8)

Changes

OpenCL allows a kernel function to call another kernel function.
To facilitate this we emit a stub version of each kernel function
with different name mangling scheme, and replace the kernel
callsite appropriately.

Fixes #60313

Patch is 23.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/115821.diff

15 Files Affected:

(modified) clang/include/clang/AST/GlobalDecl.h (+26-11)
(modified) clang/lib/AST/Expr.cpp (+2-1)
(modified) clang/lib/AST/ItaniumMangle.cpp (+15)
(modified) clang/lib/AST/Mangle.cpp (+1-1)
(modified) clang/lib/AST/MicrosoftMangle.cpp (+6)
(modified) clang/lib/CodeGen/CGBlocks.cpp (+10-6)
(modified) clang/lib/CodeGen/CGCall.cpp (+9-2)
(modified) clang/lib/CodeGen/CGExpr.cpp (+4-1)
(modified) clang/lib/CodeGen/CGOpenCLRuntime.cpp (+9-2)
(modified) clang/lib/CodeGen/CGOpenCLRuntime.h (+3-1)
(modified) clang/lib/CodeGen/CodeGenModule.cpp (+7)
(added) clang/test/CodeGenOpenCL/opencl-kernel-call.cl (+43)
(modified) clang/test/CodeGenOpenCL/reflect.cl (+1-1)
(modified) clang/test/CodeGenOpenCL/spir-calling-conv.cl (+2-2)
(modified) clang/test/CodeGenOpenCL/visibility.cl (+40-13)

diff --git a/clang/include/clang/AST/GlobalDecl.h b/clang/include/clang/AST/GlobalDecl.h
index 386693cabb1fbb..8a9f4b4c60e5e5 100644
--- a/clang/include/clang/AST/GlobalDecl.h
+++ b/clang/include/clang/AST/GlobalDecl.h
@@ -71,6 +71,10 @@ class GlobalDecl {
   GlobalDecl(const FunctionDecl *D, unsigned MVIndex = 0)
       : MultiVersionIndex(MVIndex) {
     if (!D->hasAttr<CUDAGlobalAttr>()) {
+      if (D->hasAttr<OpenCLKernelAttr>()) {
+        Value.setPointerAndInt(D, unsigned(KernelReferenceKind::Kernel));
+        return;
+      }
       Init(D);
       return;
     }
@@ -78,7 +82,8 @@ class GlobalDecl {
   }
   GlobalDecl(const FunctionDecl *D, KernelReferenceKind Kind)
       : Value(D, unsigned(Kind)) {
-    assert(D->hasAttr<CUDAGlobalAttr>() && "Decl is not a GPU kernel!");
+    assert((D->hasAttr<CUDAGlobalAttr>() && "Decl is not a GPU kernel!") ||
+           (D->hasAttr<OpenCLKernelAttr>() && "Decl is not a OpenCL kernel!"));
   }
   GlobalDecl(const NamedDecl *D) { Init(D); }
   GlobalDecl(const BlockDecl *D) { Init(D); }
@@ -130,13 +135,15 @@ class GlobalDecl {
   }
 
   KernelReferenceKind getKernelReferenceKind() const {
-    assert(((isa<FunctionDecl>(getDecl()) &&
-             cast<FunctionDecl>(getDecl())->hasAttr<CUDAGlobalAttr>()) ||
-            (isa<FunctionTemplateDecl>(getDecl()) &&
-             cast<FunctionTemplateDecl>(getDecl())
-                 ->getTemplatedDecl()
-                 ->hasAttr<CUDAGlobalAttr>())) &&
-           "Decl is not a GPU kernel!");
+    assert((((isa<FunctionDecl>(getDecl()) &&
+              cast<FunctionDecl>(getDecl())->hasAttr<CUDAGlobalAttr>()) ||
+             (isa<FunctionTemplateDecl>(getDecl()) &&
+              cast<FunctionTemplateDecl>(getDecl())
+                  ->getTemplatedDecl()
+                  ->hasAttr<CUDAGlobalAttr>())) &&
+            "Decl is not a GPU kernel!") ||
+           (isDeclOpenCLKernel() && "Decl is not a OpenCL kernel!"));
+
     return static_cast<KernelReferenceKind>(Value.getInt());
   }
 
@@ -196,13 +203,21 @@ class GlobalDecl {
   }
 
   GlobalDecl getWithKernelReferenceKind(KernelReferenceKind Kind) {
-    assert(isa<FunctionDecl>(getDecl()) &&
-           cast<FunctionDecl>(getDecl())->hasAttr<CUDAGlobalAttr>() &&
-           "Decl is not a GPU kernel!");
+    assert((isa<FunctionDecl>(getDecl()) &&
+            cast<FunctionDecl>(getDecl())->hasAttr<CUDAGlobalAttr>() &&
+            "Decl is not a GPU kernel!") ||
+           (isDeclOpenCLKernel() && "Decl is not a OpenCL kernel!"));
     GlobalDecl Result(*this);
     Result.Value.setInt(unsigned(Kind));
     return Result;
   }
+
+  bool isDeclOpenCLKernel() const {
+    auto FD = dyn_cast<FunctionDecl>(getDecl());
+    if (FD)
+      return FD->hasAttr<OpenCLKernelAttr>();
+    return FD;
+  }
 };
 
 } // namespace clang
diff --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp
index a4fb4d5a1f2ec4..ddd88ac6a21050 100644
--- a/clang/lib/AST/Expr.cpp
+++ b/clang/lib/AST/Expr.cpp
@@ -693,7 +693,8 @@ std::string PredefinedExpr::ComputeName(PredefinedIdentKind IK,
           GD = GlobalDecl(CD, Ctor_Base);
         else if (const CXXDestructorDecl *DD = dyn_cast<CXXDestructorDecl>(ND))
           GD = GlobalDecl(DD, Dtor_Base);
-        else if (ND->hasAttr<CUDAGlobalAttr>())
+        else if (ND->hasAttr<CUDAGlobalAttr>() ||
+                 ND->hasAttr<OpenCLKernelAttr>())
           GD = GlobalDecl(cast<FunctionDecl>(ND));
         else
           GD = GlobalDecl(ND);
diff --git a/clang/lib/AST/ItaniumMangle.cpp b/clang/lib/AST/ItaniumMangle.cpp
index 27a993a631dae9..7e46d6c520fb69 100644
--- a/clang/lib/AST/ItaniumMangle.cpp
+++ b/clang/lib/AST/ItaniumMangle.cpp
@@ -526,6 +526,7 @@ class CXXNameMangler {
   void mangleSourceName(const IdentifierInfo *II);
   void mangleRegCallName(const IdentifierInfo *II);
   void mangleDeviceStubName(const IdentifierInfo *II);
+  void mangleOCLDeviceStubName(const IdentifierInfo *II);
   void mangleSourceNameWithAbiTags(
       const NamedDecl *ND, const AbiTagList *AdditionalAbiTags = nullptr);
   void mangleLocalName(GlobalDecl GD,
@@ -1561,8 +1562,13 @@ void CXXNameMangler::mangleUnqualifiedName(
       bool IsDeviceStub =
           FD && FD->hasAttr<CUDAGlobalAttr>() &&
           GD.getKernelReferenceKind() == KernelReferenceKind::Stub;
+      bool IsOCLDeviceStub =
+          FD && FD->hasAttr<OpenCLKernelAttr>() &&
+          GD.getKernelReferenceKind() == KernelReferenceKind::Stub;
       if (IsDeviceStub)
         mangleDeviceStubName(II);
+      else if (IsOCLDeviceStub)
+        mangleOCLDeviceStubName(II);
       else if (IsRegCall)
         mangleRegCallName(II);
       else
@@ -1780,6 +1786,15 @@ void CXXNameMangler::mangleDeviceStubName(const IdentifierInfo *II) {
       << II->getName();
 }
 
+void CXXNameMangler::mangleOCLDeviceStubName(const IdentifierInfo *II) {
+  // <source-name> ::= <positive length number> __clang_ocl_kern_imp_
+  // <identifier> <number> ::= [n] <non-negative decimal integer> <identifier>
+  // ::= <unqualified source code identifier>
+  StringRef OCLDeviceStubNamePrefix = "__clang_ocl_kern_imp_";
+  Out << II->getLength() + OCLDeviceStubNamePrefix.size() - 1
+      << OCLDeviceStubNamePrefix << II->getName();
+}
+
 void CXXNameMangler::mangleSourceName(const IdentifierInfo *II) {
   // <source-name> ::= <positive length number> <identifier>
   // <number> ::= [n] <non-negative decimal integer>
diff --git a/clang/lib/AST/Mangle.cpp b/clang/lib/AST/Mangle.cpp
index 15be9c62bf8880..1d1c4dd0e39b7a 100644
--- a/clang/lib/AST/Mangle.cpp
+++ b/clang/lib/AST/Mangle.cpp
@@ -539,7 +539,7 @@ class ASTNameGenerator::Implementation {
         GD = GlobalDecl(CtorD, Ctor_Complete);
       else if (const auto *DtorD = dyn_cast<CXXDestructorDecl>(D))
         GD = GlobalDecl(DtorD, Dtor_Complete);
-      else if (D->hasAttr<CUDAGlobalAttr>())
+      else if (D->hasAttr<CUDAGlobalAttr>() || D->hasAttr<OpenCLKernelAttr>())
         GD = GlobalDecl(cast<FunctionDecl>(D));
       else
         GD = GlobalDecl(D);
diff --git a/clang/lib/AST/MicrosoftMangle.cpp b/clang/lib/AST/MicrosoftMangle.cpp
index 94a7ce6c1321d3..e439875a2538ba 100644
--- a/clang/lib/AST/MicrosoftMangle.cpp
+++ b/clang/lib/AST/MicrosoftMangle.cpp
@@ -1161,9 +1161,15 @@ void MicrosoftCXXNameMangler::mangleUnqualifiedName(GlobalDecl GD,
                   ->getTemplatedDecl()
                   ->hasAttr<CUDAGlobalAttr>())) &&
             GD.getKernelReferenceKind() == KernelReferenceKind::Stub;
+        bool IsOCLDeviceStub =
+            ND && (isa<FunctionDecl>(ND) && ND->hasAttr<OpenCLKernelAttr>()) &&
+            GD.getKernelReferenceKind() == KernelReferenceKind::Stub;
         if (IsDeviceStub)
           mangleSourceName(
               (llvm::Twine("__device_stub__") + II->getName()).str());
+        else if (IsOCLDeviceStub)
+          mangleSourceName(
+              (llvm::Twine("__clang_ocl_kern_imp_") + II->getName()).str());
         else
           mangleSourceName(II->getName());
         break;
diff --git a/clang/lib/CodeGen/CGBlocks.cpp b/clang/lib/CodeGen/CGBlocks.cpp
index a7584a95c8ca7b..7f98a897c36907 100644
--- a/clang/lib/CodeGen/CGBlocks.cpp
+++ b/clang/lib/CodeGen/CGBlocks.cpp
@@ -48,7 +48,7 @@ CGBlockInfo::CGBlockInfo(const BlockDecl *block, StringRef name)
 BlockByrefHelpers::~BlockByrefHelpers() {}
 
 /// Build the given block as a global block.
-static llvm::Constant *buildGlobalBlock(CodeGenModule &CGM,
+static llvm::Constant *buildGlobalBlock(CodeGenModule &CGM, GlobalDecl GD,
                                         const CGBlockInfo &blockInfo,
                                         llvm::Constant *blockFn);
 
@@ -1085,8 +1085,10 @@ llvm::Value *CodeGenFunction::EmitBlockLiteral(const CGBlockInfo &blockInfo) {
       blockAddr.getPointer(), ConvertType(blockInfo.getBlockExpr()->getType()));
 
   if (IsOpenCL) {
-    CGM.getOpenCLRuntime().recordBlockInfo(blockInfo.BlockExpression, InvokeFn,
-                                           result, blockInfo.StructureType);
+    CGM.getOpenCLRuntime().recordBlockInfo(
+        blockInfo.BlockExpression, InvokeFn, result, blockInfo.StructureType,
+        CurGD && CurGD.isDeclOpenCLKernel() &&
+            (CurGD.getKernelReferenceKind() == KernelReferenceKind::Kernel));
   }
 
   return result;
@@ -1285,7 +1287,7 @@ CodeGenModule::GetAddrOfGlobalBlock(const BlockExpr *BE,
   return getAddrOfGlobalBlockIfEmitted(BE);
 }
 
-static llvm::Constant *buildGlobalBlock(CodeGenModule &CGM,
+static llvm::Constant *buildGlobalBlock(CodeGenModule &CGM, GlobalDecl GD,
                                         const CGBlockInfo &blockInfo,
                                         llvm::Constant *blockFn) {
   assert(blockInfo.CanBeGlobal);
@@ -1378,7 +1380,9 @@ static llvm::Constant *buildGlobalBlock(CodeGenModule &CGM,
     CGM.getOpenCLRuntime().recordBlockInfo(
         blockInfo.BlockExpression,
         cast<llvm::Function>(blockFn->stripPointerCasts()), Result,
-        literal->getValueType());
+        literal->getValueType(),
+        GD && GD.isDeclOpenCLKernel() &&
+            (GD.getKernelReferenceKind() == KernelReferenceKind::Kernel));
   return Result;
 }
 
@@ -1487,7 +1491,7 @@ llvm::Function *CodeGenFunction::GenerateBlockFunction(
     auto GenVoidPtrTy = getContext().getLangOpts().OpenCL
                             ? CGM.getOpenCLRuntime().getGenericVoidPointerType()
                             : VoidPtrTy;
-    buildGlobalBlock(CGM, blockInfo,
+    buildGlobalBlock(CGM, CurGD, blockInfo,
                      llvm::ConstantExpr::getPointerCast(fn, GenVoidPtrTy));
   }
 
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 20455dbb820914..215fe5bd9da8dd 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -2343,6 +2343,15 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
   // Collect function IR attributes from the CC lowering.
   // We'll collect the paramete and result attributes later.
   CallingConv = FI.getEffectiveCallingConvention();
+  GlobalDecl GD = CalleeInfo.getCalleeDecl();
+  const Decl *TargetDecl = CalleeInfo.getCalleeDecl().getDecl();
+  if (TargetDecl) {
+    if (auto FD = dyn_cast<FunctionDecl>(TargetDecl)) {
+      if (FD->hasAttr<OpenCLKernelAttr>() &&
+          GD.getKernelReferenceKind() == KernelReferenceKind::Stub)
+        CallingConv = llvm::CallingConv::C;
+    }
+  }
   if (FI.isNoReturn())
     FuncAttrs.addAttribute(llvm::Attribute::NoReturn);
   if (FI.isCmseNSCall())
@@ -2352,8 +2361,6 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
   AddAttributesFromFunctionProtoType(getContext(), FuncAttrs,
                                      CalleeInfo.getCalleeFunctionProtoType());
 
-  const Decl *TargetDecl = CalleeInfo.getCalleeDecl().getDecl();
-
   // Attach assumption attributes to the declaration. If this is a call
   // site, attach assumptions from the caller to the call as well.
   AddAttributesFromOMPAssumes(FuncAttrs, TargetDecl);
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 229f0e29f02341..e2222a6393b9b1 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -5692,7 +5692,10 @@ CGCallee CodeGenFunction::EmitCallee(const Expr *E) {
   // Resolve direct calls.
   } else if (auto DRE = dyn_cast<DeclRefExpr>(E)) {
     if (auto FD = dyn_cast<FunctionDecl>(DRE->getDecl())) {
-      return EmitDirectCallee(*this, FD);
+      auto CalleeDecl = FD->hasAttr<OpenCLKernelAttr>()
+                            ? GlobalDecl(FD, KernelReferenceKind::Stub)
+                            : FD;
+      return EmitDirectCallee(*this, CalleeDecl);
     }
   } else if (auto ME = dyn_cast<MemberExpr>(E)) {
     if (auto FD = dyn_cast<FunctionDecl>(ME->getMemberDecl())) {
diff --git a/clang/lib/CodeGen/CGOpenCLRuntime.cpp b/clang/lib/CodeGen/CGOpenCLRuntime.cpp
index 115b618056a445..a78d783831293e 100644
--- a/clang/lib/CodeGen/CGOpenCLRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenCLRuntime.cpp
@@ -126,14 +126,21 @@ static const BlockExpr *getBlockExpr(const Expr *E) {
 /// corresponding block expression.
 void CGOpenCLRuntime::recordBlockInfo(const BlockExpr *E,
                                       llvm::Function *InvokeF,
-                                      llvm::Value *Block, llvm::Type *BlockTy) {
-  assert(!EnqueuedBlockMap.contains(E) && "Block expression emitted twice");
+                                      llvm::Value *Block, llvm::Type *BlockTy,
+                                      bool isBlkExprInOCLKern) {
+
+  // FIXME: Since OpenCL Kernels are emitted twice (kernel version and stub
+  // version), its constituent BlockExpr will also be emitted twice.
+  assert((!EnqueuedBlockMap.contains(E) ||
+          EnqueuedBlockMap[E].isBlkExprInOCLKern != isBlkExprInOCLKern) &&
+         "Block expression emitted twice");
   assert(isa<llvm::Function>(InvokeF) && "Invalid invoke function");
   assert(Block->getType()->isPointerTy() && "Invalid block literal type");
   EnqueuedBlockMap[E].InvokeFunc = InvokeF;
   EnqueuedBlockMap[E].BlockArg = Block;
   EnqueuedBlockMap[E].BlockTy = BlockTy;
   EnqueuedBlockMap[E].KernelHandle = nullptr;
+  EnqueuedBlockMap[E].isBlkExprInOCLKern = isBlkExprInOCLKern;
 }
 
 llvm::Function *CGOpenCLRuntime::getInvokeFunction(const Expr *E) {
diff --git a/clang/lib/CodeGen/CGOpenCLRuntime.h b/clang/lib/CodeGen/CGOpenCLRuntime.h
index 34613c3516f374..78bb5980cd87dc 100644
--- a/clang/lib/CodeGen/CGOpenCLRuntime.h
+++ b/clang/lib/CodeGen/CGOpenCLRuntime.h
@@ -46,6 +46,7 @@ class CGOpenCLRuntime {
     llvm::Value *KernelHandle;  /// Enqueued block kernel reference.
     llvm::Value *BlockArg;      /// The first argument to enqueued block kernel.
     llvm::Type *BlockTy;        /// Type of the block argument.
+    bool isBlkExprInOCLKern; /// Does the BlockExpr reside in an OpenCL Kernel.
   };
   /// Maps block expression to block information.
   llvm::DenseMap<const Expr *, EnqueuedBlockInfo> EnqueuedBlockMap;
@@ -93,7 +94,8 @@ class CGOpenCLRuntime {
   /// \param InvokeF invoke function emitted for the block expression.
   /// \param Block block literal emitted for the block expression.
   void recordBlockInfo(const BlockExpr *E, llvm::Function *InvokeF,
-                       llvm::Value *Block, llvm::Type *BlockTy);
+                       llvm::Value *Block, llvm::Type *BlockTy,
+                       bool isBlkExprInOCLKern);
 
   /// \return LLVM block invoke function emitted for an expression derived from
   /// the block expression.
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 716c43431667cc..df196479994e4b 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -1887,6 +1887,9 @@ static std::string getMangledNameImpl(CodeGenModule &CGM, GlobalDecl GD,
     } else if (FD && FD->hasAttr<CUDAGlobalAttr>() &&
                GD.getKernelReferenceKind() == KernelReferenceKind::Stub) {
       Out << "__device_stub__" << II->getName();
+    } else if (FD && FD->hasAttr<OpenCLKernelAttr>() &&
+               GD.getKernelReferenceKind() == KernelReferenceKind::Stub) {
+      Out << "__clang_ocl_kern_imp_" << II->getName();
     } else {
       Out << II->getName();
     }
@@ -3850,6 +3853,10 @@ void CodeGenModule::EmitGlobal(GlobalDecl GD) {
 
   // Ignore declarations, they will be emitted on their first use.
   if (const auto *FD = dyn_cast<FunctionDecl>(Global)) {
+
+    if (FD->hasAttr<OpenCLKernelAttr>() && FD->doesThisDeclarationHaveABody())
+      addDeferredDeclToEmit(GlobalDecl(FD, KernelReferenceKind::Stub));
+
     // Update deferred annotations with the latest declaration if the function
     // function was already used or defined.
     if (FD->hasAttr<AnnotateAttr>()) {
diff --git a/clang/test/CodeGenOpenCL/opencl-kernel-call.cl b/clang/test/CodeGenOpenCL/opencl-kernel-call.cl
new file mode 100644
index 00000000000000..f575728f237630
--- /dev/null
+++ b/clang/test/CodeGenOpenCL/opencl-kernel-call.cl
@@ -0,0 +1,43 @@
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -emit-llvm -o - %s | FileCheck %s
+
+// CHECK: define dso_local amdgpu_kernel void @callee_kern({{.*}})
+__attribute__((noinline)) kernel void callee_kern(global int *A){
+  *A = 1;
+}
+
+__attribute__((noinline)) kernel void ext_callee_kern(global int *A);
+
+// CHECK: define dso_local void @callee_func({{.*}})
+__attribute__((noinline)) void callee_func(global int *A){
+  *A = 2;
+}
+
+// CHECK: define dso_local amdgpu_kernel void @caller_kern({{.*}})
+kernel void caller_kern(global int* A){
+  callee_kern(A);
+  // CHECK: tail call void @__clang_ocl_kern_imp_callee_kern({{.*}})
+  ext_callee_kern(A);
+  // CHECK: tail call void @__clang_ocl_kern_imp_ext_callee_kern({{.*}})
+  callee_func(A);
+  // CHECK: tail call void @callee_func({{.*}})
+
+}
+
+// CHECK: define dso_local void @__clang_ocl_kern_imp_callee_kern({{.*}})
+
+// CHECK: declare void @__clang_ocl_kern_imp_ext_callee_kern({{.*}})
+
+// CHECK: define dso_local void @caller_func({{.*}})
+void caller_func(global int* A){
+  callee_kern(A);
+  // CHECK: tail call void @__clang_ocl_kern_imp_callee_kern({{.*}}) #7
+  ext_callee_kern(A);
+  // CHECK: tail call void @__clang_ocl_kern_imp_ext_callee_kern({{.*}}) #8
+  callee_func(A);
+  // CHECK: tail call void @callee_func({{.*}})
+}
+
+// CHECK: define dso_local void @__clang_ocl_kern_imp_caller_kern({{.*}}) 
+// CHECK: tail call void @__clang_ocl_kern_imp_callee_kern({{.*}}) 
+// CHECK: tail call void @__clang_ocl_kern_imp_ext_callee_kern({{.*}}) 
+// CHECK: tail call void @callee_func({{.*}})
diff --git a/clang/test/CodeGenOpenCL/reflect.cl b/clang/test/CodeGenOpenCL/reflect.cl
index 9ae4a5f027d358..0e3e50be9745eb 100644
--- a/clang/test/CodeGenOpenCL/reflect.cl
+++ b/clang/test/CodeGenOpenCL/reflect.cl
@@ -13,7 +13,7 @@ bool device_function() {
 }
 
 // CHECK-LABEL: define dso_local spir_kernel void @kernel_function(
-// CHECK-SAME: ptr addrspace(1) noundef align 4 [[I:%.*]]) #[[ATTR2:[0-9]+]] !kernel_arg_addr_space !4 !kernel_arg_access_qual !5 !kernel_arg_type !6 !kernel_arg_base_type !6 !kernel_arg_type_qual !7 {
+// CHECK-SAME: ptr addrspace(1) noundef align 4 [[I:%.*]]) #[[ATTR2:[0-9]+]] !kernel_arg_addr_space !5 !kernel_arg_access_qual !6 !kernel_arg_type !7 !kernel_arg_base_type !7 !kernel_arg_type_qual !8 {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[I_ADDR:%.*]] = alloca ptr addrspace(1), align 4
 // CHECK-NEXT:    store ptr addrspace(1) [[I]], ptr [[I_ADDR]], align 4
diff --git a/clang/test/CodeGenOpenCL/spir-calling-conv.cl b/clang/test/CodeGenOpenCL/spir-calling-conv.cl
index 569ea0cbe1af60..6c8f20511b8bd6 100644
--- a/clang/test/CodeGenOpenCL/spir-calling-conv.cl
+++ b/clang/test/CodeGenOpenCL/spir-calling-conv.cl
@@ -11,8 +11,8 @@ kernel void foo(global int *A)
   // CHECK: %{{[a-z0-9_]+}} = tail call spir_func i32 @get_dummy_id(i32 noundef 0)
   A[id] = id;
   bar(A);
-  // CHECK: tail call spir_kernel void @bar(ptr addrspace(1) noundef align 4 %A)
+  // CHECK: tail call void @__clang_ocl_kern_imp_bar(ptr addrspace(1) noundef align 4 %A)
 }
 
 // CHECK: declare spir_func i32 @get_dummy_id(i32 noundef)
-// CHECK: declare spir_kernel void @bar(ptr addrspace(1) noundef align 4)
+// CHECK: declare void @__clang_ocl_kern_imp_bar(ptr addrspace(1) noundef align 4)
diff --git a/clang/test/CodeGenOpenCL/visibility.cl b/clang/test/CodeGenOpenCL/visibility.cl
index addfe33377f939..e5dc5b29c5140b 100644
--- a/clang/test/CodeGenOpenCL/visibility.cl
+++ b/clang/test/CodeGenOpenCL/visibility.cl
@@ -85,31 +85,42 @@ __attribute__((visibility("default"))) extern void ext_func_default();
 void use() {
     glob = ext + ext_hidden + ext_protected + ext_default;
     ext_kern();
+    // FVIS-DEFAULT: tail call void @__clang_ocl_kern_imp_ext_kern()
+    // FVIS-PROTECTED: tail call void @__clang_ocl_kern_imp_ext_kern()
+    // FVIS-HIDDEN: tail call void @__clang_ocl_kern_imp_ext_kern()
     ext_kern_hidden();
+    // FVIS-DEFAULT: tail call void @__clang_ocl_kern_imp_ext_kern_hidden()
+    // FVIS-PROTECTED: tail call void @__clang_ocl_kern_imp_ext_kern_hidden()
+    // FVIS-HIDDEN: tail call void @__clang_ocl_kern_imp_ext_kern_hidden()
     ext_kern_protected();
+    // FVIS-DEFAULT: tail call void @__clang_ocl_kern_imp_ext_kern_protected()
+    // FVIS-PROTECTED: tail call void @__clang_ocl_kern_imp_ext_kern_protected()
+    // FVIS-HIDDEN: tail call voi...
[truncated]

llvmbot · 2024-11-27T08:41:25Z

@llvm/pr-subscribers-clang-codegen

Author: Aniket Lal (lalaniket8)

Changes

OpenCL allows a kernel function to call another kernel function.
To facilitate this we emit a stub version of each kernel function
with different name mangling scheme, and replace the kernel
callsite appropriately.

Fixes #60313

Patch is 23.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/115821.diff

15 Files Affected:

(modified) clang/include/clang/AST/GlobalDecl.h (+26-11)
(modified) clang/lib/AST/Expr.cpp (+2-1)
(modified) clang/lib/AST/ItaniumMangle.cpp (+15)
(modified) clang/lib/AST/Mangle.cpp (+1-1)
(modified) clang/lib/AST/MicrosoftMangle.cpp (+6)
(modified) clang/lib/CodeGen/CGBlocks.cpp (+10-6)
(modified) clang/lib/CodeGen/CGCall.cpp (+9-2)
(modified) clang/lib/CodeGen/CGExpr.cpp (+4-1)
(modified) clang/lib/CodeGen/CGOpenCLRuntime.cpp (+9-2)
(modified) clang/lib/CodeGen/CGOpenCLRuntime.h (+3-1)
(modified) clang/lib/CodeGen/CodeGenModule.cpp (+7)
(added) clang/test/CodeGenOpenCL/opencl-kernel-call.cl (+43)
(modified) clang/test/CodeGenOpenCL/reflect.cl (+1-1)
(modified) clang/test/CodeGenOpenCL/spir-calling-conv.cl (+2-2)
(modified) clang/test/CodeGenOpenCL/visibility.cl (+40-13)

diff --git a/clang/include/clang/AST/GlobalDecl.h b/clang/include/clang/AST/GlobalDecl.h
index 386693cabb1fbb..8a9f4b4c60e5e5 100644
--- a/clang/include/clang/AST/GlobalDecl.h
+++ b/clang/include/clang/AST/GlobalDecl.h
@@ -71,6 +71,10 @@ class GlobalDecl {
   GlobalDecl(const FunctionDecl *D, unsigned MVIndex = 0)
       : MultiVersionIndex(MVIndex) {
     if (!D->hasAttr<CUDAGlobalAttr>()) {
+      if (D->hasAttr<OpenCLKernelAttr>()) {
+        Value.setPointerAndInt(D, unsigned(KernelReferenceKind::Kernel));
+        return;
+      }
       Init(D);
       return;
     }
@@ -78,7 +82,8 @@ class GlobalDecl {
   }
   GlobalDecl(const FunctionDecl *D, KernelReferenceKind Kind)
       : Value(D, unsigned(Kind)) {
-    assert(D->hasAttr<CUDAGlobalAttr>() && "Decl is not a GPU kernel!");
+    assert((D->hasAttr<CUDAGlobalAttr>() && "Decl is not a GPU kernel!") ||
+           (D->hasAttr<OpenCLKernelAttr>() && "Decl is not a OpenCL kernel!"));
   }
   GlobalDecl(const NamedDecl *D) { Init(D); }
   GlobalDecl(const BlockDecl *D) { Init(D); }
@@ -130,13 +135,15 @@ class GlobalDecl {
   }
 
   KernelReferenceKind getKernelReferenceKind() const {
-    assert(((isa<FunctionDecl>(getDecl()) &&
-             cast<FunctionDecl>(getDecl())->hasAttr<CUDAGlobalAttr>()) ||
-            (isa<FunctionTemplateDecl>(getDecl()) &&
-             cast<FunctionTemplateDecl>(getDecl())
-                 ->getTemplatedDecl()
-                 ->hasAttr<CUDAGlobalAttr>())) &&
-           "Decl is not a GPU kernel!");
+    assert((((isa<FunctionDecl>(getDecl()) &&
+              cast<FunctionDecl>(getDecl())->hasAttr<CUDAGlobalAttr>()) ||
+             (isa<FunctionTemplateDecl>(getDecl()) &&
+              cast<FunctionTemplateDecl>(getDecl())
+                  ->getTemplatedDecl()
+                  ->hasAttr<CUDAGlobalAttr>())) &&
+            "Decl is not a GPU kernel!") ||
+           (isDeclOpenCLKernel() && "Decl is not a OpenCL kernel!"));
+
     return static_cast<KernelReferenceKind>(Value.getInt());
   }
 
@@ -196,13 +203,21 @@ class GlobalDecl {
   }
 
   GlobalDecl getWithKernelReferenceKind(KernelReferenceKind Kind) {
-    assert(isa<FunctionDecl>(getDecl()) &&
-           cast<FunctionDecl>(getDecl())->hasAttr<CUDAGlobalAttr>() &&
-           "Decl is not a GPU kernel!");
+    assert((isa<FunctionDecl>(getDecl()) &&
+            cast<FunctionDecl>(getDecl())->hasAttr<CUDAGlobalAttr>() &&
+            "Decl is not a GPU kernel!") ||
+           (isDeclOpenCLKernel() && "Decl is not a OpenCL kernel!"));
     GlobalDecl Result(*this);
     Result.Value.setInt(unsigned(Kind));
     return Result;
   }
+
+  bool isDeclOpenCLKernel() const {
+    auto FD = dyn_cast<FunctionDecl>(getDecl());
+    if (FD)
+      return FD->hasAttr<OpenCLKernelAttr>();
+    return FD;
+  }
 };
 
 } // namespace clang
diff --git a/clang/lib/AST/Expr.cpp b/clang/lib/AST/Expr.cpp
index a4fb4d5a1f2ec4..ddd88ac6a21050 100644
--- a/clang/lib/AST/Expr.cpp
+++ b/clang/lib/AST/Expr.cpp
@@ -693,7 +693,8 @@ std::string PredefinedExpr::ComputeName(PredefinedIdentKind IK,
           GD = GlobalDecl(CD, Ctor_Base);
         else if (const CXXDestructorDecl *DD = dyn_cast<CXXDestructorDecl>(ND))
           GD = GlobalDecl(DD, Dtor_Base);
-        else if (ND->hasAttr<CUDAGlobalAttr>())
+        else if (ND->hasAttr<CUDAGlobalAttr>() ||
+                 ND->hasAttr<OpenCLKernelAttr>())
           GD = GlobalDecl(cast<FunctionDecl>(ND));
         else
           GD = GlobalDecl(ND);
diff --git a/clang/lib/AST/ItaniumMangle.cpp b/clang/lib/AST/ItaniumMangle.cpp
index 27a993a631dae9..7e46d6c520fb69 100644
--- a/clang/lib/AST/ItaniumMangle.cpp
+++ b/clang/lib/AST/ItaniumMangle.cpp
@@ -526,6 +526,7 @@ class CXXNameMangler {
   void mangleSourceName(const IdentifierInfo *II);
   void mangleRegCallName(const IdentifierInfo *II);
   void mangleDeviceStubName(const IdentifierInfo *II);
+  void mangleOCLDeviceStubName(const IdentifierInfo *II);
   void mangleSourceNameWithAbiTags(
       const NamedDecl *ND, const AbiTagList *AdditionalAbiTags = nullptr);
   void mangleLocalName(GlobalDecl GD,
@@ -1561,8 +1562,13 @@ void CXXNameMangler::mangleUnqualifiedName(
       bool IsDeviceStub =
           FD && FD->hasAttr<CUDAGlobalAttr>() &&
           GD.getKernelReferenceKind() == KernelReferenceKind::Stub;
+      bool IsOCLDeviceStub =
+          FD && FD->hasAttr<OpenCLKernelAttr>() &&
+          GD.getKernelReferenceKind() == KernelReferenceKind::Stub;
       if (IsDeviceStub)
         mangleDeviceStubName(II);
+      else if (IsOCLDeviceStub)
+        mangleOCLDeviceStubName(II);
       else if (IsRegCall)
         mangleRegCallName(II);
       else
@@ -1780,6 +1786,15 @@ void CXXNameMangler::mangleDeviceStubName(const IdentifierInfo *II) {
       << II->getName();
 }
 
+void CXXNameMangler::mangleOCLDeviceStubName(const IdentifierInfo *II) {
+  // <source-name> ::= <positive length number> __clang_ocl_kern_imp_
+  // <identifier> <number> ::= [n] <non-negative decimal integer> <identifier>
+  // ::= <unqualified source code identifier>
+  StringRef OCLDeviceStubNamePrefix = "__clang_ocl_kern_imp_";
+  Out << II->getLength() + OCLDeviceStubNamePrefix.size() - 1
+      << OCLDeviceStubNamePrefix << II->getName();
+}
+
 void CXXNameMangler::mangleSourceName(const IdentifierInfo *II) {
   // <source-name> ::= <positive length number> <identifier>
   // <number> ::= [n] <non-negative decimal integer>
diff --git a/clang/lib/AST/Mangle.cpp b/clang/lib/AST/Mangle.cpp
index 15be9c62bf8880..1d1c4dd0e39b7a 100644
--- a/clang/lib/AST/Mangle.cpp
+++ b/clang/lib/AST/Mangle.cpp
@@ -539,7 +539,7 @@ class ASTNameGenerator::Implementation {
         GD = GlobalDecl(CtorD, Ctor_Complete);
       else if (const auto *DtorD = dyn_cast<CXXDestructorDecl>(D))
         GD = GlobalDecl(DtorD, Dtor_Complete);
-      else if (D->hasAttr<CUDAGlobalAttr>())
+      else if (D->hasAttr<CUDAGlobalAttr>() || D->hasAttr<OpenCLKernelAttr>())
         GD = GlobalDecl(cast<FunctionDecl>(D));
       else
         GD = GlobalDecl(D);
diff --git a/clang/lib/AST/MicrosoftMangle.cpp b/clang/lib/AST/MicrosoftMangle.cpp
index 94a7ce6c1321d3..e439875a2538ba 100644
--- a/clang/lib/AST/MicrosoftMangle.cpp
+++ b/clang/lib/AST/MicrosoftMangle.cpp
@@ -1161,9 +1161,15 @@ void MicrosoftCXXNameMangler::mangleUnqualifiedName(GlobalDecl GD,
                   ->getTemplatedDecl()
                   ->hasAttr<CUDAGlobalAttr>())) &&
             GD.getKernelReferenceKind() == KernelReferenceKind::Stub;
+        bool IsOCLDeviceStub =
+            ND && (isa<FunctionDecl>(ND) && ND->hasAttr<OpenCLKernelAttr>()) &&
+            GD.getKernelReferenceKind() == KernelReferenceKind::Stub;
         if (IsDeviceStub)
           mangleSourceName(
               (llvm::Twine("__device_stub__") + II->getName()).str());
+        else if (IsOCLDeviceStub)
+          mangleSourceName(
+              (llvm::Twine("__clang_ocl_kern_imp_") + II->getName()).str());
         else
           mangleSourceName(II->getName());
         break;
diff --git a/clang/lib/CodeGen/CGBlocks.cpp b/clang/lib/CodeGen/CGBlocks.cpp
index a7584a95c8ca7b..7f98a897c36907 100644
--- a/clang/lib/CodeGen/CGBlocks.cpp
+++ b/clang/lib/CodeGen/CGBlocks.cpp
@@ -48,7 +48,7 @@ CGBlockInfo::CGBlockInfo(const BlockDecl *block, StringRef name)
 BlockByrefHelpers::~BlockByrefHelpers() {}
 
 /// Build the given block as a global block.
-static llvm::Constant *buildGlobalBlock(CodeGenModule &CGM,
+static llvm::Constant *buildGlobalBlock(CodeGenModule &CGM, GlobalDecl GD,
                                         const CGBlockInfo &blockInfo,
                                         llvm::Constant *blockFn);
 
@@ -1085,8 +1085,10 @@ llvm::Value *CodeGenFunction::EmitBlockLiteral(const CGBlockInfo &blockInfo) {
       blockAddr.getPointer(), ConvertType(blockInfo.getBlockExpr()->getType()));
 
   if (IsOpenCL) {
-    CGM.getOpenCLRuntime().recordBlockInfo(blockInfo.BlockExpression, InvokeFn,
-                                           result, blockInfo.StructureType);
+    CGM.getOpenCLRuntime().recordBlockInfo(
+        blockInfo.BlockExpression, InvokeFn, result, blockInfo.StructureType,
+        CurGD && CurGD.isDeclOpenCLKernel() &&
+            (CurGD.getKernelReferenceKind() == KernelReferenceKind::Kernel));
   }
 
   return result;
@@ -1285,7 +1287,7 @@ CodeGenModule::GetAddrOfGlobalBlock(const BlockExpr *BE,
   return getAddrOfGlobalBlockIfEmitted(BE);
 }
 
-static llvm::Constant *buildGlobalBlock(CodeGenModule &CGM,
+static llvm::Constant *buildGlobalBlock(CodeGenModule &CGM, GlobalDecl GD,
                                         const CGBlockInfo &blockInfo,
                                         llvm::Constant *blockFn) {
   assert(blockInfo.CanBeGlobal);
@@ -1378,7 +1380,9 @@ static llvm::Constant *buildGlobalBlock(CodeGenModule &CGM,
     CGM.getOpenCLRuntime().recordBlockInfo(
         blockInfo.BlockExpression,
         cast<llvm::Function>(blockFn->stripPointerCasts()), Result,
-        literal->getValueType());
+        literal->getValueType(),
+        GD && GD.isDeclOpenCLKernel() &&
+            (GD.getKernelReferenceKind() == KernelReferenceKind::Kernel));
   return Result;
 }
 
@@ -1487,7 +1491,7 @@ llvm::Function *CodeGenFunction::GenerateBlockFunction(
     auto GenVoidPtrTy = getContext().getLangOpts().OpenCL
                             ? CGM.getOpenCLRuntime().getGenericVoidPointerType()
                             : VoidPtrTy;
-    buildGlobalBlock(CGM, blockInfo,
+    buildGlobalBlock(CGM, CurGD, blockInfo,
                      llvm::ConstantExpr::getPointerCast(fn, GenVoidPtrTy));
   }
 
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 20455dbb820914..215fe5bd9da8dd 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -2343,6 +2343,15 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
   // Collect function IR attributes from the CC lowering.
   // We'll collect the paramete and result attributes later.
   CallingConv = FI.getEffectiveCallingConvention();
+  GlobalDecl GD = CalleeInfo.getCalleeDecl();
+  const Decl *TargetDecl = CalleeInfo.getCalleeDecl().getDecl();
+  if (TargetDecl) {
+    if (auto FD = dyn_cast<FunctionDecl>(TargetDecl)) {
+      if (FD->hasAttr<OpenCLKernelAttr>() &&
+          GD.getKernelReferenceKind() == KernelReferenceKind::Stub)
+        CallingConv = llvm::CallingConv::C;
+    }
+  }
   if (FI.isNoReturn())
     FuncAttrs.addAttribute(llvm::Attribute::NoReturn);
   if (FI.isCmseNSCall())
@@ -2352,8 +2361,6 @@ void CodeGenModule::ConstructAttributeList(StringRef Name,
   AddAttributesFromFunctionProtoType(getContext(), FuncAttrs,
                                      CalleeInfo.getCalleeFunctionProtoType());
 
-  const Decl *TargetDecl = CalleeInfo.getCalleeDecl().getDecl();
-
   // Attach assumption attributes to the declaration. If this is a call
   // site, attach assumptions from the caller to the call as well.
   AddAttributesFromOMPAssumes(FuncAttrs, TargetDecl);
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 229f0e29f02341..e2222a6393b9b1 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -5692,7 +5692,10 @@ CGCallee CodeGenFunction::EmitCallee(const Expr *E) {
   // Resolve direct calls.
   } else if (auto DRE = dyn_cast<DeclRefExpr>(E)) {
     if (auto FD = dyn_cast<FunctionDecl>(DRE->getDecl())) {
-      return EmitDirectCallee(*this, FD);
+      auto CalleeDecl = FD->hasAttr<OpenCLKernelAttr>()
+                            ? GlobalDecl(FD, KernelReferenceKind::Stub)
+                            : FD;
+      return EmitDirectCallee(*this, CalleeDecl);
     }
   } else if (auto ME = dyn_cast<MemberExpr>(E)) {
     if (auto FD = dyn_cast<FunctionDecl>(ME->getMemberDecl())) {
diff --git a/clang/lib/CodeGen/CGOpenCLRuntime.cpp b/clang/lib/CodeGen/CGOpenCLRuntime.cpp
index 115b618056a445..a78d783831293e 100644
--- a/clang/lib/CodeGen/CGOpenCLRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenCLRuntime.cpp
@@ -126,14 +126,21 @@ static const BlockExpr *getBlockExpr(const Expr *E) {
 /// corresponding block expression.
 void CGOpenCLRuntime::recordBlockInfo(const BlockExpr *E,
                                       llvm::Function *InvokeF,
-                                      llvm::Value *Block, llvm::Type *BlockTy) {
-  assert(!EnqueuedBlockMap.contains(E) && "Block expression emitted twice");
+                                      llvm::Value *Block, llvm::Type *BlockTy,
+                                      bool isBlkExprInOCLKern) {
+
+  // FIXME: Since OpenCL Kernels are emitted twice (kernel version and stub
+  // version), its constituent BlockExpr will also be emitted twice.
+  assert((!EnqueuedBlockMap.contains(E) ||
+          EnqueuedBlockMap[E].isBlkExprInOCLKern != isBlkExprInOCLKern) &&
+         "Block expression emitted twice");
   assert(isa<llvm::Function>(InvokeF) && "Invalid invoke function");
   assert(Block->getType()->isPointerTy() && "Invalid block literal type");
   EnqueuedBlockMap[E].InvokeFunc = InvokeF;
   EnqueuedBlockMap[E].BlockArg = Block;
   EnqueuedBlockMap[E].BlockTy = BlockTy;
   EnqueuedBlockMap[E].KernelHandle = nullptr;
+  EnqueuedBlockMap[E].isBlkExprInOCLKern = isBlkExprInOCLKern;
 }
 
 llvm::Function *CGOpenCLRuntime::getInvokeFunction(const Expr *E) {
diff --git a/clang/lib/CodeGen/CGOpenCLRuntime.h b/clang/lib/CodeGen/CGOpenCLRuntime.h
index 34613c3516f374..78bb5980cd87dc 100644
--- a/clang/lib/CodeGen/CGOpenCLRuntime.h
+++ b/clang/lib/CodeGen/CGOpenCLRuntime.h
@@ -46,6 +46,7 @@ class CGOpenCLRuntime {
     llvm::Value *KernelHandle;  /// Enqueued block kernel reference.
     llvm::Value *BlockArg;      /// The first argument to enqueued block kernel.
     llvm::Type *BlockTy;        /// Type of the block argument.
+    bool isBlkExprInOCLKern; /// Does the BlockExpr reside in an OpenCL Kernel.
   };
   /// Maps block expression to block information.
   llvm::DenseMap<const Expr *, EnqueuedBlockInfo> EnqueuedBlockMap;
@@ -93,7 +94,8 @@ class CGOpenCLRuntime {
   /// \param InvokeF invoke function emitted for the block expression.
   /// \param Block block literal emitted for the block expression.
   void recordBlockInfo(const BlockExpr *E, llvm::Function *InvokeF,
-                       llvm::Value *Block, llvm::Type *BlockTy);
+                       llvm::Value *Block, llvm::Type *BlockTy,
+                       bool isBlkExprInOCLKern);
 
   /// \return LLVM block invoke function emitted for an expression derived from
   /// the block expression.
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 716c43431667cc..df196479994e4b 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -1887,6 +1887,9 @@ static std::string getMangledNameImpl(CodeGenModule &CGM, GlobalDecl GD,
     } else if (FD && FD->hasAttr<CUDAGlobalAttr>() &&
                GD.getKernelReferenceKind() == KernelReferenceKind::Stub) {
       Out << "__device_stub__" << II->getName();
+    } else if (FD && FD->hasAttr<OpenCLKernelAttr>() &&
+               GD.getKernelReferenceKind() == KernelReferenceKind::Stub) {
+      Out << "__clang_ocl_kern_imp_" << II->getName();
     } else {
       Out << II->getName();
     }
@@ -3850,6 +3853,10 @@ void CodeGenModule::EmitGlobal(GlobalDecl GD) {
 
   // Ignore declarations, they will be emitted on their first use.
   if (const auto *FD = dyn_cast<FunctionDecl>(Global)) {
+
+    if (FD->hasAttr<OpenCLKernelAttr>() && FD->doesThisDeclarationHaveABody())
+      addDeferredDeclToEmit(GlobalDecl(FD, KernelReferenceKind::Stub));
+
     // Update deferred annotations with the latest declaration if the function
     // function was already used or defined.
     if (FD->hasAttr<AnnotateAttr>()) {
diff --git a/clang/test/CodeGenOpenCL/opencl-kernel-call.cl b/clang/test/CodeGenOpenCL/opencl-kernel-call.cl
new file mode 100644
index 00000000000000..f575728f237630
--- /dev/null
+++ b/clang/test/CodeGenOpenCL/opencl-kernel-call.cl
@@ -0,0 +1,43 @@
+// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -emit-llvm -o - %s | FileCheck %s
+
+// CHECK: define dso_local amdgpu_kernel void @callee_kern({{.*}})
+__attribute__((noinline)) kernel void callee_kern(global int *A){
+  *A = 1;
+}
+
+__attribute__((noinline)) kernel void ext_callee_kern(global int *A);
+
+// CHECK: define dso_local void @callee_func({{.*}})
+__attribute__((noinline)) void callee_func(global int *A){
+  *A = 2;
+}
+
+// CHECK: define dso_local amdgpu_kernel void @caller_kern({{.*}})
+kernel void caller_kern(global int* A){
+  callee_kern(A);
+  // CHECK: tail call void @__clang_ocl_kern_imp_callee_kern({{.*}})
+  ext_callee_kern(A);
+  // CHECK: tail call void @__clang_ocl_kern_imp_ext_callee_kern({{.*}})
+  callee_func(A);
+  // CHECK: tail call void @callee_func({{.*}})
+
+}
+
+// CHECK: define dso_local void @__clang_ocl_kern_imp_callee_kern({{.*}})
+
+// CHECK: declare void @__clang_ocl_kern_imp_ext_callee_kern({{.*}})
+
+// CHECK: define dso_local void @caller_func({{.*}})
+void caller_func(global int* A){
+  callee_kern(A);
+  // CHECK: tail call void @__clang_ocl_kern_imp_callee_kern({{.*}}) #7
+  ext_callee_kern(A);
+  // CHECK: tail call void @__clang_ocl_kern_imp_ext_callee_kern({{.*}}) #8
+  callee_func(A);
+  // CHECK: tail call void @callee_func({{.*}})
+}
+
+// CHECK: define dso_local void @__clang_ocl_kern_imp_caller_kern({{.*}}) 
+// CHECK: tail call void @__clang_ocl_kern_imp_callee_kern({{.*}}) 
+// CHECK: tail call void @__clang_ocl_kern_imp_ext_callee_kern({{.*}}) 
+// CHECK: tail call void @callee_func({{.*}})
diff --git a/clang/test/CodeGenOpenCL/reflect.cl b/clang/test/CodeGenOpenCL/reflect.cl
index 9ae4a5f027d358..0e3e50be9745eb 100644
--- a/clang/test/CodeGenOpenCL/reflect.cl
+++ b/clang/test/CodeGenOpenCL/reflect.cl
@@ -13,7 +13,7 @@ bool device_function() {
 }
 
 // CHECK-LABEL: define dso_local spir_kernel void @kernel_function(
-// CHECK-SAME: ptr addrspace(1) noundef align 4 [[I:%.*]]) #[[ATTR2:[0-9]+]] !kernel_arg_addr_space !4 !kernel_arg_access_qual !5 !kernel_arg_type !6 !kernel_arg_base_type !6 !kernel_arg_type_qual !7 {
+// CHECK-SAME: ptr addrspace(1) noundef align 4 [[I:%.*]]) #[[ATTR2:[0-9]+]] !kernel_arg_addr_space !5 !kernel_arg_access_qual !6 !kernel_arg_type !7 !kernel_arg_base_type !7 !kernel_arg_type_qual !8 {
 // CHECK-NEXT:  entry:
 // CHECK-NEXT:    [[I_ADDR:%.*]] = alloca ptr addrspace(1), align 4
 // CHECK-NEXT:    store ptr addrspace(1) [[I]], ptr [[I_ADDR]], align 4
diff --git a/clang/test/CodeGenOpenCL/spir-calling-conv.cl b/clang/test/CodeGenOpenCL/spir-calling-conv.cl
index 569ea0cbe1af60..6c8f20511b8bd6 100644
--- a/clang/test/CodeGenOpenCL/spir-calling-conv.cl
+++ b/clang/test/CodeGenOpenCL/spir-calling-conv.cl
@@ -11,8 +11,8 @@ kernel void foo(global int *A)
   // CHECK: %{{[a-z0-9_]+}} = tail call spir_func i32 @get_dummy_id(i32 noundef 0)
   A[id] = id;
   bar(A);
-  // CHECK: tail call spir_kernel void @bar(ptr addrspace(1) noundef align 4 %A)
+  // CHECK: tail call void @__clang_ocl_kern_imp_bar(ptr addrspace(1) noundef align 4 %A)
 }
 
 // CHECK: declare spir_func i32 @get_dummy_id(i32 noundef)
-// CHECK: declare spir_kernel void @bar(ptr addrspace(1) noundef align 4)
+// CHECK: declare void @__clang_ocl_kern_imp_bar(ptr addrspace(1) noundef align 4)
diff --git a/clang/test/CodeGenOpenCL/visibility.cl b/clang/test/CodeGenOpenCL/visibility.cl
index addfe33377f939..e5dc5b29c5140b 100644
--- a/clang/test/CodeGenOpenCL/visibility.cl
+++ b/clang/test/CodeGenOpenCL/visibility.cl
@@ -85,31 +85,42 @@ __attribute__((visibility("default"))) extern void ext_func_default();
 void use() {
     glob = ext + ext_hidden + ext_protected + ext_default;
     ext_kern();
+    // FVIS-DEFAULT: tail call void @__clang_ocl_kern_imp_ext_kern()
+    // FVIS-PROTECTED: tail call void @__clang_ocl_kern_imp_ext_kern()
+    // FVIS-HIDDEN: tail call void @__clang_ocl_kern_imp_ext_kern()
     ext_kern_hidden();
+    // FVIS-DEFAULT: tail call void @__clang_ocl_kern_imp_ext_kern_hidden()
+    // FVIS-PROTECTED: tail call void @__clang_ocl_kern_imp_ext_kern_hidden()
+    // FVIS-HIDDEN: tail call void @__clang_ocl_kern_imp_ext_kern_hidden()
     ext_kern_protected();
+    // FVIS-DEFAULT: tail call void @__clang_ocl_kern_imp_ext_kern_protected()
+    // FVIS-PROTECTED: tail call void @__clang_ocl_kern_imp_ext_kern_protected()
+    // FVIS-HIDDEN: tail call voi...
[truncated]

clang/include/clang/AST/GlobalDecl.h

shiltian · 2024-11-27T14:01:05Z

clang/lib/AST/ItaniumMangle.cpp

+  // <source-name> ::= <positive length number> __clang_ocl_kern_imp_
+  // <identifier> <number> ::= [n] <non-negative decimal integer> <identifier>
+  // ::= <unqualified source code identifier>
+  StringRef OCLDeviceStubNamePrefix = "__clang_ocl_kern_imp_";


I'd suggest using some symbols that a regular user-defined function can't be named to avoid conflict.

Names starting with double underscore are reserved, and shouldn't conflict with user defined names.

but you still can't prevent people from doing it right?

Well, it's undefined behavior if they do, the same that you can technically define a function in C called _Z3foov with a completely different signature from what C++ would expect. This sort of thing is generally fine as long as we make some effort to avoid unintentional conflicts.

In terms of how the symbols should behave, this might be backwards. Alternatively we could preserve the as-is name as the callable implicit function, and add some mangling to the kernel entry point. However, I bet there's a lot more code expecting the current kernel name used directly as kernel symbol behavior though, so maybe changing that is best left for later

rjmccall · 2024-12-02T07:31:12Z

clang/include/clang/AST/GlobalDecl.h

+      if (D->hasAttr<OpenCLKernelAttr>()) {
+        Value.setPointerAndInt(D, unsigned(KernelReferenceKind::Kernel));
+        return;
+      }


It looks like there's new commonality between OpenCL and CUDA kernels, and this feels like a generalizable feature. Please add a method to FunctionDecl (and, looking at the rest of the patch, also FunctionTemplateDecl?) named something like isReferenceableKernel() that checks for either of these two attributes. Most of the diffs in this patch should end up just changing various bits of existing code that check specifically for CUDAGlobalAttr to call that method instead.

For this change specifically, you'll want to make sure that getDefaultKernelReference does the right thing for OpenCL kernels.

Added isReferenceableKernel() to FunctionDecl. Since OpenCL does not support template kernel functions, I have not added it to FunctionTemplateDecl.

clang/lib/AST/ItaniumMangle.cpp

rjmccall · 2024-12-02T07:46:27Z

clang/lib/CodeGen/CGBlocks.cpp

+    CGM.getOpenCLRuntime().recordBlockInfo(
+        blockInfo.BlockExpression, InvokeFn, result, blockInfo.StructureType,
+        CurGD && CurGD.isDeclOpenCLKernel() &&
+            (CurGD.getKernelReferenceKind() == KernelReferenceKind::Kernel));


This is trying to protect against double-registering the block, right? But that's only necessary if you're double-emitting the kernel body, which you really shouldn't be doing — there are a lot of other places that assume that function bodies are only emitted once, like static locals. Probably the kernel should be emitted by calling the stub or vice-versa?

Does that mean we replace the function body of the stub with a call to the respective kernel?
This way we emit the function body only once.
So a kernel to kernel function call is replaced by a kernel to stub to kernel call?

For example, the following snippet:

kernel void callee_kern(){ } kernel void caller_kern(){ callee_kern(); }

should generate IR that looks like:

define dso_local amdgpu_kernel void @callee_kern({{.*}}){ } define dso_local void @_clang_ocl_kern_imp_callee_kern({{.*}}){ entry: tail call amdgpu void @callee_kern({{.*}}) } define dso_local amdgpu_kernel void @caller_kern({{.*}}){ tail call void @__clang_ocl_kern_imp_callee_kern(); } define dso_local void @_clang_ocl_kern_imp_caller_kern({{.*}}){ tail call void @__clang_ocl_kern_imp_callee_kern(); }

That's one of the options, yeah. You'd always emit stubs as directly calling their kernels (so _clang_ocl_kern_imp_caller_kern would actually just call caller_kern, at least coming out of IRGen), and then you'd let LLVM decide when and whether to inline. The main disadvantage is that there might not already be code to emit calls to kernels; I imagine that's normally only done by the OpenCL runtime, and the ABI is probably pretty different from the normal call ABI. But maybe that's not a big deal.

The second option is that you could emit stubs by making sure the kernel is emitted first and then directly cloning the kernel body into the stub. Whether this is materially different from just inlining, I don't know, but it's an option. We do have code for this sort of thing already — we need it for emitting virtual varargs thunks in C++.

The third option is that you could emit kernels as directly calling their stubs. The advantage here is that all the calls are just normal calls that you definitely already have code to handle. The disadvantage is that you'd always be forcing the stub to be emitted, even in the 99% case that there are no other uses of it. It'll probably get reliably inlined away, but still, it's burning some extra compile time.

The last option is that you could emit kernels by emitting the stub and then directly cloning the function body into the kernel.

Up to you; they all have trade-offs. But I do think you need to not double-emit the kernel body.

a kernel to kernel function call is replaced by a kernel to stub to kernel call

Calls to kernel (from device and kernel functions) is not supported during backend IselLowering : https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp#L1127
So, we can't have a stub calling a kernel go into the backend.
In-lining the callee kernel body into the stub function might not be helpful in cases where callee kernel is just a function declaration.

kernel is emitted first and then directly cloning the kernel body into the stub

emit kernels by emitting the stub and then directly cloning the function body into the kernel

Some doubts on these points: Wouldn't this also result in having two copies of the kernel body ? Would this avoid the problems we see during double emission ?

Calls to kernel (from device and kernel functions) is not supported during backend IselLowering :

That makes sense. Okay, so you need to call the stub. That's probably fine — LLVM should reliably clean it up regardless — but it'll cause a lot of test case churn unless you force inlining prior to LLVM.

Some doubts on these points: Wouldn't this also result in having two copies of the kernel body ? Would this avoid the problems we see during double emission ?

Having two copies of a function body is not the problem. This is essentially just inlining, so as long as the kernel body doesn't do something that's semantically incompatible with inlining (which there aren't many examples of — it's mostly just the address-of-label extension, which I assume GPUs aren't going to handle well anyway), it's fine to inline.

The problem is that the visitor for emitting the body does not expect to run twice on the same code, and so it will emit code incorrectly for structures that require supporting global variables and functions, such as blocks and static locals. For example, this patch contains code to work around the double-emission of a block.

I have implemented the emission of kernels as directly calling its respective stubs. The kernel body only consists of a call to the stub. The stub contains the kernel body and is emitted with alwaysinline attribute.

rjmccall · 2024-12-02T07:49:32Z

clang/lib/CodeGen/CGCall.cpp

+    if (auto FD = dyn_cast<FunctionDecl>(TargetDecl)) {
+      if (FD->hasAttr<OpenCLKernelAttr>() &&
+          GD.getKernelReferenceKind() == KernelReferenceKind::Stub)
+        CallingConv = llvm::CallingConv::C;


This seems like the wrong place to be doing this. If we're calling the stub, the CGFunctionInfo needs to contain the right CC information for the stub, and then we won't need special overrides like this.

Moved change of calling convention to CodeGenTypes::arrangeFunctionDeclaration() so that we obtain the correct CGFunctionInfo.

rjmccall · 2024-12-02T08:00:58Z

clang/lib/CodeGen/CGExpr.cpp

+      auto CalleeDecl = FD->hasAttr<OpenCLKernelAttr>()
+                            ? GlobalDecl(FD, KernelReferenceKind::Stub)
+                            : FD;
+      return EmitDirectCallee(*this, CalleeDecl);


Hmm. It looks like the CUDA folks had this same problem and came up with an awkward workaround for it in EmitDirectCallee. We should really just be requesting the right GD in the first place. Could you add a getGlobalDeclForDirectCall function that does the right thing for both modes? If it ends up causing complicated behavior/test changes in CUDA mode, you can feel free to exclude CUDA for now and just leave a comment saying that the workaround should be removed in favor of doing the right thing in that function.

I have added getGlobalDeclForDirectCall() which returns the correct GD for OpenCL mode, however I am a bit unclear on what is required for CUDA mode.

CUDA folks had this same problem and came up with an awkward workaround for it in EmitDirectCallee

Are you referring to the reassignment of CalleePtr in EmitDirectCallee()?
Do we want to pass a GD into EmitDirectCallee() which already contains the correct CalleePtr for CUDA mode ?

Yeah, that's what I meant, and yeah, I feel like we should just pass down the right GD.

I see, I can get the correct CalleePtr which is of type llvm::Constant, but I am not able to figure out how to convert the same into a GlobalDecl object.

clang/test/CodeGenOpenCL/opencl-kernel-call.cl

yxsamliu

LGTM. Thanks

This feature is currently not supported in the compiler. To facilitate this we emit a stub version of each kernel function body with different name mangling scheme, and replaces the respective kernel call-sites appropriately. Fixes #60313 D120566 was an earlier attempt made to upstream a solution for this issue.

Simplifying isDeclOpenCLKernel() and removing resolved comments

… avoid double emission of kernel body.

Test aggregate paratemers passed byref and byval

…changes

github-actions · 2025-04-08T04:59:52Z

@lalaniket8 Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

To facilitate OpenCL Kernel to Kernel call, we emit a stub version of each kernel function body with different name mangling scheme, and replaces the respective kernel call-sites appropriately. Corresponding upstream PR llvm#115821 Fixes https://ontrack-internal.amd.com/browse/SWDEV-245936

Due to a previous workaround allowing kernels to be called from other functions, Clang currently doesn't use the `byref` attribute for aggregate kernel arguments. The issue was recently resolved in #115821. With that fix, we can now enable the use of `byref` consistently across all languages. Co-authored-by: Matt Arsenault <[email protected]>

MrSidims · 2025-04-09T10:00:46Z

@lalaniket8 @arsenm I don't have a strong opinion, but shouldn't this transformation be done during lowering to the target? Current version of the patch brings odd behavior for LLVM IR to SPIR-V lowering for OpenCL kernels. SPIR-V don't allow one EntryPoint to refer another EntryPoint, so during such lowering the translator moves kernel's body to impl function (just like this patch does). Together they result in quite odd behavior: KhronosGroup/SPIRV-LLVM-Translator#3115

yxsamliu · 2025-04-09T12:43:32Z

@lalaniket8 @arsenm I don't have a strong opinion, but shouldn't this transformation be done during lowering to the target? Current version of the patch brings odd behavior for LLVM IR to SPIR-V lowering for OpenCL kernels. SPIR-V don't allow one EntryPoint to refer another EntryPoint, so during such lowering the translator moves kernel's body to impl function (just like this patch does). Together they result in quite odd behavior: KhronosGroup/SPIRV-LLVM-Translator#3115

My understanding is that for most targets kernel and non-kernel functions have different calling conventions which affects how struct-type arguments are passed. Allowing kernel functions callable in IR could cause issues in mid-end, that is why for any kernel a non-kernel __clang_ocl_kern_imp* function is created and used where the kernel function is called. This is necessary to have the proper IR.

For LLVM/SPIRV translator, this change should be transparent. __clang_ocl_kern_imp* should behave like a common non-kernel function called by a kernel function.

@arsenm

arsenm · 2025-04-13T07:08:21Z

@lalaniket8 @arsenm I don't have a strong opinion, but shouldn't this transformation be done during lowering to the target?

This is the lowering to the target.

Current version of the patch brings odd behavior for LLVM IR to SPIR-V lowering for OpenCL kernels. SPIR-V don't allow one EntryPoint to refer another EntryPoint,

Yes, that's the entire point of this change. We've avoided introducing the invalid situation of an entry calling another entry, which requires 2 IR function bodies with different calling conventions. The calling convention is getting set incorrectly on the implementation function if you see an entry calling another entry

…ents (#134892) Due to a previous workaround allowing kernels to be called from other functions, Clang currently doesn't use the `byref` attribute for aggregate kernel arguments. The issue was recently resolved in #115821. With that fix, we can now enable the use of `byref` consistently across all languages. Co-authored-by: Matt Arsenault <[email protected]> Fixes SWDEV-247226. Co-authored-by: Matt Arsenault <[email protected]>

…ernel arguments (#134892) Due to a previous workaround allowing kernels to be called from other functions, Clang currently doesn't use the `byref` attribute for aggregate kernel arguments. The issue was recently resolved in llvm/llvm-project#115821. With that fix, we can now enable the use of `byref` consistently across all languages. Co-authored-by: Matt Arsenault <[email protected]> Fixes SWDEV-247226. Co-authored-by: Matt Arsenault <[email protected]>

MrSidims · 2025-04-14T11:14:55Z

This is the lowering to the target.

My glossary might be lacking some definitions, but what I really meant by lowering is: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU or https://github.com/llvm/llvm-project/tree/main/llvm/lib/Target/SPIRV .

I definitely see a value of having this resolved in one place instead of multiple places, and clang is a good candidate for it. On the other hand: a. some targets don't have such restriction, and for them this change, while harmless, is still redundant; b. clang is not a single frontend, especially now - in MLIR world, does it mean, what every frontend should make such adjustment when lowering to AMDGPU or SPIR-V?

Just a note/disclaimer, this change makes sense and it doesn't bring (known to me) regressions in some downsteam (the reason I'm here is a surprise to see this during some of my experiments) and adjusting tests in the translator is totally fine for me. What I'm trying to understand if it's really should be done in clang for OpenCL for every target. Again, some of the targets might already have such transformation implemented and this change results in generation of 3 functions (instead of a single kernel).

Also, do you plan to introduce the same change for OpenMP or Fortran at some point of time? If so, again, doesn't it make sense to modify https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU instead of clang?

bwyma · 2025-04-14T14:24:21Z

@t-tye Would you mind reviewing this patch from a debug perspective?

Potentially multiple subprograms with the same source name, same source correlation in different routines, missing artificial or trampoline attributes, stepping between functions works as expected, etc. Your input would be appreciated.

MrSidims · 2025-04-14T14:58:01Z

this change makes sense and it doesn't bring (known to me) regressions

Actually, there is an incorrect behavior in the following test case: https://godbolt.org/z/dc3T7Mo3G , note __clang_ocl_kern_imp_sample_kernel_float was generated, but was never called. @lalaniket8 can this be addressed?

lalaniket8 · 2025-04-14T15:11:43Z

this change makes sense and it doesn't bring (known to me) regressions

Actually, there is an incorrect behavior in the following test case: https://godbolt.org/z/dc3T7Mo3G , note __clang_ocl_kern_imp_sample_kernel_float was generated, but was never called. @lalaniket8 can this be addressed?

The call was placed and inlined since -O2 optimization flag was passed. Default inlining happens with -O1 and higher flags.
You can change it to -O0 and see the call to __clang_ocl_kern_imp_sample_kernel_float.

…ents (llvm#134892) Due to a previous workaround allowing kernels to be called from other functions, Clang currently doesn't use the `byref` attribute for aggregate kernel arguments. The issue was recently resolved in llvm#115821. With that fix, we can now enable the use of `byref` consistently across all languages. Co-authored-by: Matt Arsenault <[email protected]> Fixes SWDEV-247226. Co-authored-by: Matt Arsenault <[email protected]>

arsenm · 2025-05-01T17:38:59Z

My glossary might be lacking some definitions, but what I really meant by lowering is

The IR is not a direct vehicle for language semantics. This case should explicitly not be representable in the target IR, and the frontend needs to implement the calling convention in terms of valid IR for the target.

What I'm trying to understand if it's really should be done in clang for OpenCL for every target.

This is implementing a stupid OpenCL feature nobody asked for, so clang should deal with it. Every other language can happily ignore this. AMDGPU, NVPTX, and SPIRV all have the same behavior where kernels / entry points are a separate calling convention that cannot be called from device code. This call should be unrepresentable in the IR, and forcing it to be such just makes the backends worse. We cannot move the calling convention lowering forward without removing this artificial usage.

lalaniket8 force-pushed the users/lalaniket8/emit-device-version-of-openCL-kernel branch 4 times, most recently from 9449cea to f8265de Compare November 13, 2024 10:15

lalaniket8 force-pushed the users/lalaniket8/emit-device-version-of-openCL-kernel branch from f8265de to 61c9004 Compare November 25, 2024 09:07

lalaniket8 commented Nov 25, 2024

View reviewed changes

clang/lib/CodeGen/CGOpenCLRuntime.cpp Outdated Show resolved Hide resolved

shiltian requested review from arsenm, yxsamliu, shiltian and jhuber6 November 26, 2024 14:56

jhuber6 reviewed Nov 26, 2024

View reviewed changes

clang/test/CodeGenOpenCL/opencl-kernel-call.cl Outdated Show resolved Hide resolved

clang/test/CodeGenOpenCL/visibility.cl Outdated Show resolved Hide resolved

lalaniket8 force-pushed the users/lalaniket8/emit-device-version-of-openCL-kernel branch from 61c9004 to 00167a3 Compare November 27, 2024 08:38

lalaniket8 requested review from cdevadas and rjmccall November 27, 2024 08:40

lalaniket8 marked this pull request as ready for review November 27, 2024 08:40

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Nov 27, 2024

lalaniket8 changed the title ~~[Clang] Emit stub version of OpenCL Kernel~~ [Clang][OpenCL][AMDGPU] Allow a kernel to call another kernel Nov 27, 2024

shiltian reviewed Nov 27, 2024

View reviewed changes

clang/include/clang/AST/GlobalDecl.h Outdated Show resolved Hide resolved

shiltian reviewed Nov 27, 2024

View reviewed changes

lalaniket8 force-pushed the users/lalaniket8/emit-device-version-of-openCL-kernel branch from 00167a3 to 013801b Compare November 29, 2024 08:36

rjmccall requested changes Dec 2, 2024

View reviewed changes

arsenm reviewed Dec 2, 2024

View reviewed changes

clang/test/CodeGenOpenCL/opencl-kernel-call.cl Show resolved Hide resolved

arsenm reviewed Dec 3, 2024

View reviewed changes

clang/test/CodeGenOpenCL/opencl-kernel-call.cl Outdated Show resolved Hide resolved

yxsamliu approved these changes Apr 3, 2025

View reviewed changes

arsenm approved these changes Apr 7, 2025

View reviewed changes

anikelal added 8 commits April 7, 2025 14:15

[Clang][OpenCL][AMDGPU] Allow a kernel to call another kernel

24e84d2

Simplifying isDeclOpenCLKernel() and removing resolved comments

Emitting OpenCL Kernels as calls to their respective inlined stubs to…

3f35a63

… avoid double emission of kernel body.

Set correct calling convention for stubs in X86 and SPIR targets

ae8f161

Test aggregate paratemers passed byref and byval

Update tests

799178f

Changing dyn_cast and assert to cast

721cfbb

Removed always-inline attribute from stubs and made required littest …

6581158

…changes

Minor fixes

1b8c7cf

lalaniket8 force-pushed the users/lalaniket8/emit-device-version-of-openCL-kernel branch from 037f952 to 1b8c7cf Compare April 7, 2025 10:39

lalaniket8 merged commit 642481a into main Apr 8, 2025
11 checks passed

lalaniket8 deleted the users/lalaniket8/emit-device-version-of-openCL-kernel branch April 8, 2025 04:59

shiltian mentioned this pull request Apr 8, 2025

[Clang][OpenCL][AMDGPU] Use byref for aggregate OpenCL kernel arguments #134892

Merged

MrSidims requested a review from svenvh April 9, 2025 09:36

lalaniket8 mentioned this pull request Apr 29, 2025

[Clang][OpenCL][AMDGPU] OpenCL Kernel stubs should be assigned alwaysinline attribute #137769

Open

[Clang][OpenCL][AMDGPU] Allow a kernel to call another kernel #115821

[Clang][OpenCL][AMDGPU] Allow a kernel to call another kernel #115821

Conversation

lalaniket8 commented Nov 12, 2024 • edited Loading

lalaniket8 commented Nov 12, 2024 • edited Loading

github-actions bot commented Nov 12, 2024

jhuber6 left a comment

Choose a reason for hiding this comment

shiltian commented Nov 26, 2024

llvmbot commented Nov 27, 2024 • edited Loading

llvmbot commented Nov 27, 2024

Choose a reason for hiding this comment

lalaniket8 Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjmccall Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yxsamliu left a comment

Choose a reason for hiding this comment

github-actions bot commented Apr 8, 2025

MrSidims commented Apr 9, 2025 • edited Loading

yxsamliu commented Apr 9, 2025

arsenm commented Apr 13, 2025

MrSidims commented Apr 14, 2025 • edited Loading

bwyma commented Apr 14, 2025

MrSidims commented Apr 14, 2025

lalaniket8 commented Apr 14, 2025

arsenm commented May 1, 2025

lalaniket8 commented Nov 12, 2024 •

edited

Loading

lalaniket8 commented Nov 12, 2024 •

edited

Loading

llvmbot commented Nov 27, 2024 •

edited

Loading

lalaniket8 Dec 2, 2024 •

edited

Loading

rjmccall Dec 4, 2024 •

edited

Loading

MrSidims commented Apr 9, 2025 •

edited

Loading

MrSidims commented Apr 14, 2025 •

edited

Loading