Skip to content

[offload][SYCL] Add SYCL Module splitting. #131347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

maksimsab
Copy link
Contributor

This patch adds SYCL Module splitting - the necessary step in the SYCL compilation pipeline. Only 2 splitting modes are being added in this patch: by kernel and by source.

The previous attempt was at #119713. In this patch there is no dependency in TransformUtils on "IPO" and on "Printing Passes". In this patch a module splitting is self-contained and it doesn't introduce linking issues.

This patch adds SYCL Module splitting - the necessary step in the SYCL
compilation pipeline. Only 2 splitting modes are being added in this
patch: by kernel and by source.

The previous attempt was at llvm#119713. In this patch there is no
dependency in `TransformUtils` on "IPO" and on "Printing Passes". In
this patch a module splitting is self-contained and it doesn't introduce
linking issues.
@llvmbot
Copy link
Member

llvmbot commented Mar 14, 2025

@llvm/pr-subscribers-llvm-transforms

Author: Maksim Sabianin (maksimsab)

Changes

This patch adds SYCL Module splitting - the necessary step in the SYCL compilation pipeline. Only 2 splitting modes are being added in this patch: by kernel and by source.

The previous attempt was at #119713. In this patch there is no dependency in TransformUtils on "IPO" and on "Printing Passes". In this patch a module splitting is self-contained and it doesn't introduce linking issues.


Patch is 41.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131347.diff

13 Files Affected:

  • (added) llvm/include/llvm/Transforms/Utils/SYCLSplitModule.h (+64)
  • (added) llvm/include/llvm/Transforms/Utils/SYCLUtils.h (+26)
  • (modified) llvm/lib/Transforms/Utils/CMakeLists.txt (+2)
  • (added) llvm/lib/Transforms/Utils/SYCLSplitModule.cpp (+401)
  • (added) llvm/lib/Transforms/Utils/SYCLUtils.cpp (+26)
  • (added) llvm/test/tools/llvm-split/SYCL/device-code-split/amd-kernel-split.ll (+17)
  • (added) llvm/test/tools/llvm-split/SYCL/device-code-split/complex-indirect-call-chain.ll (+75)
  • (added) llvm/test/tools/llvm-split/SYCL/device-code-split/module-split-func-ptr.ll (+43)
  • (added) llvm/test/tools/llvm-split/SYCL/device-code-split/one-kernel-per-module.ll (+108)
  • (added) llvm/test/tools/llvm-split/SYCL/device-code-split/split-by-source.ll (+97)
  • (added) llvm/test/tools/llvm-split/SYCL/device-code-split/split-with-kernel-declarations.ll (+66)
  • (modified) llvm/tools/llvm-split/CMakeLists.txt (+1)
  • (modified) llvm/tools/llvm-split/llvm-split.cpp (+121)
diff --git a/llvm/include/llvm/Transforms/Utils/SYCLSplitModule.h b/llvm/include/llvm/Transforms/Utils/SYCLSplitModule.h
new file mode 100644
index 0000000000000..a3425d19b9c4b
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Utils/SYCLSplitModule.h
@@ -0,0 +1,64 @@
+//===-------- SYCLSplitModule.h - module split ------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// Functionality to split a module into callgraphs. A callgraph here is a set
+// of entry points with all functions reachable from them via a call. The result
+// of the split is new modules containing corresponding callgraph.
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_UTILS_SYCLSPLITMODULE_H
+#define LLVM_TRANSFORMS_UTILS_SYCLSPLITMODULE_H
+
+#include "llvm/ADT/STLFunctionalExtras.h"
+#include "llvm/ADT/StringRef.h"
+
+#include <memory>
+#include <optional>
+#include <string>
+
+namespace llvm {
+
+class Module;
+
+enum class IRSplitMode {
+  IRSM_PER_TU,     // one module per translation unit
+  IRSM_PER_KERNEL, // one module per kernel
+  IRSM_NONE        // no splitting
+};
+
+/// \returns IRSplitMode value if \p S is recognized. Otherwise, std::nullopt is
+/// returned.
+std::optional<IRSplitMode> convertStringToSplitMode(StringRef S);
+
+/// The structure represents a split LLVM Module accompanied by additional
+/// information. Split Modules are being stored at disk due to the high RAM
+/// consumption during the whole splitting process.
+struct ModuleAndSYCLMetadata {
+  std::string ModuleFilePath;
+  std::string Symbols;
+
+  ModuleAndSYCLMetadata() = default;
+  ModuleAndSYCLMetadata(const ModuleAndSYCLMetadata &) = default;
+  ModuleAndSYCLMetadata &operator=(const ModuleAndSYCLMetadata &) = default;
+  ModuleAndSYCLMetadata(ModuleAndSYCLMetadata &&) = default;
+  ModuleAndSYCLMetadata &operator=(ModuleAndSYCLMetadata &&) = default;
+
+  ModuleAndSYCLMetadata(std::string_view File, std::string Symbols)
+      : ModuleFilePath(File), Symbols(std::move(Symbols)) {}
+};
+
+using PostSYCLSplitCallbackType =
+    function_ref<void(std::unique_ptr<Module> Part, std::string Symbols)>;
+
+/// Splits the given module \p M according to the given \p Settings.
+/// Every split image is being passed to \p Callback.
+void SYCLSplitModule(std::unique_ptr<Module> M, IRSplitMode Mode,
+                     PostSYCLSplitCallbackType Callback);
+
+} // namespace llvm
+
+#endif // LLVM_TRANSFORMS_UTILS_SYCLSPLITMODULE_H
diff --git a/llvm/include/llvm/Transforms/Utils/SYCLUtils.h b/llvm/include/llvm/Transforms/Utils/SYCLUtils.h
new file mode 100644
index 0000000000000..75459eed6ac0f
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Utils/SYCLUtils.h
@@ -0,0 +1,26 @@
+//===------------ SYCLUtils.h - SYCL utility functions --------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// Utility functions for SYCL.
+//===----------------------------------------------------------------------===//
+#ifndef LLVM_TRANSFORMS_UTILS_SYCLUTILS_H
+#define LLVM_TRANSFORMS_UTILS_SYCLUTILS_H
+
+#include <llvm/ADT/SmallString.h>
+#include <llvm/ADT/SmallVector.h>
+
+namespace llvm {
+
+class raw_ostream;
+
+using SYCLStringTable = SmallVector<SmallVector<SmallString<64>>>;
+
+void writeSYCLStringTable(const SYCLStringTable &Table, raw_ostream &OS);
+
+} // namespace llvm
+
+#endif // LLVM_TRANSFORMS_UTILS_SYCLUTILS_H
diff --git a/llvm/lib/Transforms/Utils/CMakeLists.txt b/llvm/lib/Transforms/Utils/CMakeLists.txt
index 78cad0d253be8..0ba46bdadea8d 100644
--- a/llvm/lib/Transforms/Utils/CMakeLists.txt
+++ b/llvm/lib/Transforms/Utils/CMakeLists.txt
@@ -83,6 +83,8 @@ add_llvm_component_library(LLVMTransformUtils
   SizeOpts.cpp
   SplitModule.cpp
   StripNonLineTableDebugInfo.cpp
+  SYCLSplitModule.cpp
+  SYCLUtils.cpp
   SymbolRewriter.cpp
   UnifyFunctionExitNodes.cpp
   UnifyLoopExits.cpp
diff --git a/llvm/lib/Transforms/Utils/SYCLSplitModule.cpp b/llvm/lib/Transforms/Utils/SYCLSplitModule.cpp
new file mode 100644
index 0000000000000..18eca4237c8ae
--- /dev/null
+++ b/llvm/lib/Transforms/Utils/SYCLSplitModule.cpp
@@ -0,0 +1,401 @@
+//===-------- SYCLSplitModule.cpp - Split a module into call graphs -------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// See comments in the header.
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/Utils/SYCLSplitModule.h"
+#include "llvm/ADT/SetVector.h"
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Transforms/Utils/Cloning.h"
+#include "llvm/Transforms/Utils/SYCLUtils.h"
+
+#include <map>
+#include <utility>
+
+using namespace llvm;
+
+#define DEBUG_TYPE "sycl-split-module"
+
+static bool isKernel(const Function &F) {
+  return F.getCallingConv() == CallingConv::SPIR_KERNEL ||
+         F.getCallingConv() == CallingConv::AMDGPU_KERNEL;
+}
+
+static bool isEntryPoint(const Function &F) {
+  // Skip declarations, if any: they should not be included into a vector of
+  // entry points groups or otherwise we will end up with incorrectly generated
+  // list of symbols.
+  if (F.isDeclaration())
+    return false;
+
+  // Kernels are always considered to be entry points
+  return isKernel(F);
+}
+
+namespace {
+
+// A vector that contains all entry point functions in a split module.
+using EntryPointSet = SetVector<const Function *>;
+
+/// Represents a named group entry points.
+struct EntryPointGroup {
+  std::string GroupName;
+  EntryPointSet Functions;
+
+  EntryPointGroup() = default;
+  EntryPointGroup(const EntryPointGroup &) = default;
+  EntryPointGroup &operator=(const EntryPointGroup &) = default;
+  EntryPointGroup(EntryPointGroup &&) = default;
+  EntryPointGroup &operator=(EntryPointGroup &&) = default;
+
+  EntryPointGroup(StringRef GroupName,
+                  EntryPointSet Functions = EntryPointSet())
+      : GroupName(GroupName), Functions(std::move(Functions)) {}
+
+  void clear() {
+    GroupName.clear();
+    Functions.clear();
+  }
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  LLVM_DUMP_METHOD void dump() const {
+    constexpr size_t INDENT = 4;
+    dbgs().indent(INDENT) << "ENTRY POINTS"
+                          << " " << GroupName << " {\n";
+    for (const Function *F : Functions)
+      dbgs().indent(INDENT) << "  " << F->getName() << "\n";
+
+    dbgs().indent(INDENT) << "}\n";
+  }
+#endif
+};
+
+/// Annotates an llvm::Module with information necessary to perform and track
+/// the result of device code (llvm::Module instances) splitting:
+/// - entry points group from the module.
+class ModuleDesc {
+  std::unique_ptr<Module> M;
+  EntryPointGroup EntryPoints;
+
+public:
+  ModuleDesc() = delete;
+  ModuleDesc(const ModuleDesc &) = delete;
+  ModuleDesc &operator=(const ModuleDesc &) = delete;
+  ModuleDesc(ModuleDesc &&) = default;
+  ModuleDesc &operator=(ModuleDesc &&) = default;
+
+  ModuleDesc(std::unique_ptr<Module> M,
+             EntryPointGroup EntryPoints = EntryPointGroup())
+      : M(std::move(M)), EntryPoints(std::move(EntryPoints)) {
+    assert(this->M && "Module should be non-null");
+  }
+
+  Module &getModule() { return *M; }
+  const Module &getModule() const { return *M; }
+
+  std::unique_ptr<Module> releaseModule() {
+    EntryPoints.clear();
+    return std::move(M);
+  }
+
+  std::string makeSymbolTable() const {
+    SmallString<0> Data;
+    raw_svector_ostream OS(Data);
+    for (const Function *F : EntryPoints.Functions)
+      OS << F->getName() << '\n';
+
+    return std::string(OS.str());
+  }
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  LLVM_DUMP_METHOD void dump() const {
+    dbgs() << "ModuleDesc[" << M->getName() << "] {\n";
+    EntryPoints.dump();
+    dbgs() << "}\n";
+  }
+#endif
+};
+
+// Represents "dependency" or "use" graph of global objects (functions and
+// global variables) in a module. It is used during device code split to
+// understand which global variables and functions (other than entry points)
+// should be included into a split module.
+//
+// Nodes of the graph represent LLVM's GlobalObjects, edges "A" -> "B" represent
+// the fact that if "A" is included into a module, then "B" should be included
+// as well.
+//
+// Examples of dependencies which are represented in this graph:
+// - Function FA calls function FB
+// - Function FA uses global variable GA
+// - Global variable GA references (initialized with) function FB
+// - Function FA stores address of a function FB somewhere
+//
+// The following cases are treated as dependencies between global objects:
+// 1. Global object A is used within by a global object B in any way (store,
+//    bitcast, phi node, call, etc.): "A" -> "B" edge will be added to the
+//    graph;
+// 2. function A performs an indirect call of a function with signature S and
+//    there is a function B with signature S. "A" -> "B" edge will be added to
+//    the graph;
+class DependencyGraph {
+public:
+  using GlobalSet = SmallPtrSet<const GlobalValue *, 16>;
+
+  DependencyGraph(const Module &M) {
+    // Group functions by their signature to handle case (2) described above
+    DenseMap<const FunctionType *, DependencyGraph::GlobalSet>
+        FuncTypeToFuncsMap;
+    for (const auto &F : M.functions()) {
+      // Kernels can't be called (either directly or indirectly) in SYCL
+      if (isKernel(F))
+        continue;
+
+      FuncTypeToFuncsMap[F.getFunctionType()].insert(&F);
+    }
+
+    for (const auto &F : M.functions()) {
+      // case (1), see comment above the class definition
+      for (const Value *U : F.users())
+        addUserToGraphRecursively(cast<const User>(U), &F);
+
+      // case (2), see comment above the class definition
+      for (const auto &I : instructions(F)) {
+        const auto *CI = dyn_cast<CallInst>(&I);
+        if (!CI || !CI->isIndirectCall()) // Direct calls were handled above
+          continue;
+
+        const FunctionType *Signature = CI->getFunctionType();
+        const auto &PotentialCallees = FuncTypeToFuncsMap[Signature];
+        Graph[&F].insert(PotentialCallees.begin(), PotentialCallees.end());
+      }
+    }
+
+    // And every global variable (but their handling is a bit simpler)
+    for (const auto &GV : M.globals())
+      for (const Value *U : GV.users())
+        addUserToGraphRecursively(cast<const User>(U), &GV);
+  }
+
+  iterator_range<GlobalSet::const_iterator>
+  dependencies(const GlobalValue *Val) const {
+    auto It = Graph.find(Val);
+    return (It == Graph.end())
+               ? make_range(EmptySet.begin(), EmptySet.end())
+               : make_range(It->second.begin(), It->second.end());
+  }
+
+private:
+  void addUserToGraphRecursively(const User *Root, const GlobalValue *V) {
+    SmallVector<const User *, 8> WorkList;
+    WorkList.push_back(Root);
+
+    while (!WorkList.empty()) {
+      const User *U = WorkList.pop_back_val();
+      if (const auto *I = dyn_cast<const Instruction>(U)) {
+        const auto *UFunc = I->getFunction();
+        Graph[UFunc].insert(V);
+      } else if (isa<const Constant>(U)) {
+        if (const auto *GV = dyn_cast<const GlobalVariable>(U))
+          Graph[GV].insert(V);
+        // This could be a global variable or some constant expression (like
+        // bitcast or gep). We trace users of this constant further to reach
+        // global objects they are used by and add them to the graph.
+        for (const auto *UU : U->users())
+          WorkList.push_back(UU);
+      } else
+        llvm_unreachable("Unhandled type of function user");
+    }
+  }
+
+  DenseMap<const GlobalValue *, GlobalSet> Graph;
+  SmallPtrSet<const GlobalValue *, 1> EmptySet;
+};
+
+void collectFunctionsAndGlobalVariablesToExtract(
+    SetVector<const GlobalValue *> &GVs, const Module &M,
+    const EntryPointGroup &ModuleEntryPoints, const DependencyGraph &DG) {
+  // We start with module entry points
+  for (const auto *F : ModuleEntryPoints.Functions)
+    GVs.insert(F);
+
+  // Non-discardable global variables are also include into the initial set
+  for (const auto &GV : M.globals())
+    if (!GV.isDiscardableIfUnused())
+      GVs.insert(&GV);
+
+  // GVs has SetVector type. This type inserts a value only if it is not yet
+  // present there. So, recursion is not expected here.
+  size_t Idx = 0;
+  while (Idx < GVs.size()) {
+    const GlobalValue *Obj = GVs[Idx++];
+
+    for (const GlobalValue *Dep : DG.dependencies(Obj)) {
+      if (const auto *Func = dyn_cast<const Function>(Dep)) {
+        if (!Func->isDeclaration())
+          GVs.insert(Func);
+      } else
+        GVs.insert(Dep); // Global variables are added unconditionally
+    }
+  }
+}
+
+ModuleDesc extractSubModule(const Module &M,
+                            const SetVector<const GlobalValue *> &GVs,
+                            EntryPointGroup ModuleEntryPoints) {
+  // For each group of entry points collect all dependencies.
+  ValueToValueMapTy VMap;
+  // Clone definitions only for needed globals. Others will be added as
+  // declarations and removed later.
+  std::unique_ptr<Module> SubM = CloneModule(
+      M, VMap, [&](const GlobalValue *GV) { return GVs.count(GV); });
+  // Replace entry points with cloned ones.
+  EntryPointSet NewEPs;
+  const EntryPointSet &EPs = ModuleEntryPoints.Functions;
+  std::for_each(EPs.begin(), EPs.end(), [&](const Function *F) {
+    NewEPs.insert(cast<Function>(VMap[F]));
+  });
+  ModuleEntryPoints.Functions = std::move(NewEPs);
+  return ModuleDesc{std::move(SubM), std::move(ModuleEntryPoints)};
+}
+
+// The function produces a copy of input LLVM IR module M with only those
+// functions and globals that can be called from entry points that are specified
+// in ModuleEntryPoints vector, in addition to the entry point functions.
+ModuleDesc extractCallGraph(const Module &M, EntryPointGroup ModuleEntryPoints,
+                            const DependencyGraph &DG) {
+  SetVector<const GlobalValue *> GVs;
+  collectFunctionsAndGlobalVariablesToExtract(GVs, M, ModuleEntryPoints, DG);
+
+  ModuleDesc SplitM = extractSubModule(M, GVs, std::move(ModuleEntryPoints));
+  LLVM_DEBUG(SplitM.dump());
+  return SplitM;
+}
+
+using EntryPointGroupVec = SmallVector<EntryPointGroup, 0>;
+
+/// Module Splitter.
+/// It gets a module (in a form of module descriptor, to get additional info)
+/// and a collection of entry points groups. Each group specifies subset entry
+/// points from input module that should be included in a split module.
+class ModuleSplitter {
+private:
+  ModuleDesc Input;
+  EntryPointGroupVec Groups;
+  DependencyGraph DG;
+
+private:
+  EntryPointGroup drawEntryPointGroup() {
+    assert(Groups.size() > 0 && "Reached end of entry point groups list.");
+    EntryPointGroup Group = std::move(Groups.back());
+    Groups.pop_back();
+    return Group;
+  }
+
+public:
+  ModuleSplitter(ModuleDesc MD, EntryPointGroupVec GroupVec)
+      : Input(std::move(MD)), Groups(std::move(GroupVec)),
+        DG(Input.getModule()) {
+    assert(!Groups.empty() && "Entry points groups collection is empty!");
+  }
+
+  /// Gets next subsequence of entry points in an input module and provides
+  /// split submodule containing these entry points and their dependencies.
+  ModuleDesc getNextSplit() {
+    return extractCallGraph(Input.getModule(), drawEntryPointGroup(), DG);
+  }
+
+  /// Check that there are still submodules to split.
+  bool hasMoreSplits() const { return Groups.size() > 0; }
+};
+
+} // namespace
+
+static EntryPointGroupVec selectEntryPointGroups(const Module &M,
+                                                 IRSplitMode Mode) {
+  // std::map is used here to ensure stable ordering of entry point groups,
+  // which is based on their contents, this greatly helps LIT tests
+  std::map<std::string, EntryPointSet> EntryPointsMap;
+
+  static constexpr char ATTR_SYCL_MODULE_ID[] = "sycl-module-id";
+  for (const auto &F : M.functions()) {
+    if (!isEntryPoint(F))
+      continue;
+
+    std::string Key;
+    switch (Mode) {
+    case IRSplitMode::IRSM_PER_KERNEL:
+      Key = F.getName();
+      break;
+    case IRSplitMode::IRSM_PER_TU:
+      Key = F.getFnAttribute(ATTR_SYCL_MODULE_ID).getValueAsString();
+      break;
+    case IRSplitMode::IRSM_NONE:
+      llvm_unreachable("");
+    }
+
+    EntryPointsMap[Key].insert(&F);
+  }
+
+  EntryPointGroupVec Groups;
+  if (EntryPointsMap.empty()) {
+    // No entry points met, record this.
+    Groups.emplace_back("-", EntryPointSet());
+  } else {
+    Groups.reserve(EntryPointsMap.size());
+    // Start with properties of a source module
+    for (auto &[Key, EntryPoints] : EntryPointsMap)
+      Groups.emplace_back(Key, std::move(EntryPoints));
+  }
+
+  return Groups;
+}
+
+namespace llvm {
+
+std::optional<IRSplitMode> convertStringToSplitMode(StringRef S) {
+  static const StringMap<IRSplitMode> Values = {
+      {"source", IRSplitMode::IRSM_PER_TU},
+      {"kernel", IRSplitMode::IRSM_PER_KERNEL},
+      {"none", IRSplitMode::IRSM_NONE}};
+
+  auto It = Values.find(S);
+  if (It == Values.end())
+    return std::nullopt;
+
+  return It->second;
+}
+
+void SYCLSplitModule(std::unique_ptr<Module> M, IRSplitMode Mode,
+                     PostSYCLSplitCallbackType Callback) {
+  SmallVector<ModuleAndSYCLMetadata, 0> OutputImages;
+  if (Mode == IRSplitMode::IRSM_NONE) {
+    auto MD = ModuleDesc(std::move(M));
+    auto Symbols = MD.makeSymbolTable();
+    Callback(std::move(MD.releaseModule()), std::move(Symbols));
+    return;
+  }
+
+  EntryPointGroupVec Groups = selectEntryPointGroups(*M, Mode);
+  ModuleDesc MD = std::move(M);
+  ModuleSplitter Splitter(std::move(MD), std::move(Groups));
+  while (Splitter.hasMoreSplits()) {
+    ModuleDesc MD = Splitter.getNextSplit();
+    auto Symbols = MD.makeSymbolTable();
+    Callback(std::move(MD.releaseModule()), std::move(Symbols));
+  }
+}
+
+} // namespace llvm
diff --git a/llvm/lib/Transforms/Utils/SYCLUtils.cpp b/llvm/lib/Transforms/Utils/SYCLUtils.cpp
new file mode 100644
index 0000000000000..ad9864fadb828
--- /dev/null
+++ b/llvm/lib/Transforms/Utils/SYCLUtils.cpp
@@ -0,0 +1,26 @@
+//===------------ SYCLUtils.cpp - SYCL utility functions ------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// SYCL utility functions.
+//===----------------------------------------------------------------------===//
+#include "llvm/Transforms/Utils/SYCLUtils.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/Support/raw_ostream.h"
+
+namespace llvm {
+
+void writeSYCLStringTable(const SYCLStringTable &Table, raw_ostream &OS) {
+  assert(!Table.empty() && "table should contain at least column titles");
+  assert(!Table[0].empty() && "table should be non-empty");
+  OS << '[' << join(Table[0].begin(), Table[0].end(), "|") << "]\n";
+  for (size_t I = 1, E = Table.size(); I != E; ++I) {
+    assert(Table[I].size() == Table[0].size() && "row's size should be equal");
+    OS << join(Table[I].begin(), Table[I].end(), "|") << '\n';
+  }
+}
+
+} // namespace llvm
diff --git a/llvm/test/tools/llvm-split/SYCL/device-code-split/amd-kernel-split.ll b/llvm/test/tools/llvm-split/SYCL/device-code-split/amd-kernel-split.ll
new file mode 100644
index 0000000000000..a40a52107fb0c
--- /...
[truncated]

@maksimsab
Copy link
Contributor Author

Hi @jhuber6 @frasercrmck @bader @asudarsa!
The previous issue with unresolved symbols has been resolved in this patch.

Please, share you feedback when convenient.

Copy link
Contributor

@Pierre-vh Pierre-vh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm lacking context here.
Why does this have to be in lib/Transform ? How many targets will use it?
I'm a bit concerned seeing what looks like target specific files added to Transform

@maksimsab
Copy link
Contributor Author

Hi @Pierre-vh

Currently, in the downstream, our SYCL extension is capable of being compiled into SPIR-V, Intel CPU/GPU, AMDGPU, and NVPTX. We are planning to upstream support for all of these. The one missing feature of this patch, which will be added later, is the splitting by optional kernel features (spec). I think that justifies my choice of the component. Anyway, I am interested in your opinion.

The possible use case:

  1. User writes 2 kernels:
  • kernel A, which uses fp16
  • kernel B, which doesn't use fp16.
  1. If the user wants to JIT compile kernel B for a device that doesn't support fp16, then it is required that these kernels be separated from each other."

The separation is being done in the Module splitting algorithm.

@Pierre-vh
Copy link
Contributor

So this SYCL mode can, e.g. use either the SPIRV or AMDGPU target at the same time?

If that's the case then I'm a little bit less concerned, I still wish this was hidden away in some TargetMachine somewhere though but I don't see how we can do that easily.

How does it work ? Does it use a different target triple, or is it a LLVM flag that changes the compilation pipeline? Is there a high-level overview somewhere of how this works in LLVM? What attributes would a module using SYCL have and what target triple would it use?

@shiltian
Copy link
Contributor

@maksimsab I think that is orthogonal to @Pierre-vh 's concern. The issue isn’t about whether SYCL can target multiple "targets." This pass is specifically designated for SYCL modules, at least based on its description. If the pass were applicable to SYCL, CUDA, HIP, OpenCL, OpenMP, etc., then placing it in this directory would make sense.

@Pierre-vh
Copy link
Contributor

Yes exactly. If the pass is only used with a "SYCLTargetMachine" then this SplitModule impl should live right next to that, and use the module splitting override hook from TargetMachine similarly to how AMDGPU does it.

If it can be used in combination with any other TargetMachine then this is the right approach (though I wish we had a better one).

@maksimsab
Copy link
Contributor Author

High-level RFCs are available on Discourse:

So this SYCL mode can, e.g. use either the SPIRV or AMDGPU target at the same time?

Yes, a user chooses a target like:

clang++ -fsycl -fsycl-targets=nvidia_gpu_sm_90  code.cpp
or
clang++ -fsycl -fsycl-targets=amd_gpu_gfx1201 code.cpp

The list of supported targets in downstream can be observed here: https://intel.github.io/llvm/UsersManual.html#generic-options

There is no such thing as a SYCL Target since we are actually targeting Intel CPU/GPU, AMDGPU, and NVPTX. It is similar to OpenCL and OpenMP.

I would also like to have a specific folder like llvm/Target/SYCL, but that doesn't really seem appropriate.

How does it work ? Does it use a different target triple, or is it a LLVM flag that changes the compilation pipeline? Is there a high-level overview somewhere of how this works in LLVM? What attributes would a module using SYCL have and what target triple would it use?

There is some information in the mentioned RFC Offloading design for SYCL offload kind and SPIR targets. We are going to incorporate module splitting in the clang-linker-wrapper, which will be able to recognize incoming SYCL inputs. So a compilation pipeline is going to be constructed and executed in clang-linker-wrapper. We are also preparing more detailed documentation about SYCL offloading right now, which we will add to the llvm-project.

@jhuber6
Copy link
Contributor

jhuber6 commented Mar 18, 2025

Yes, a user chooses a target like:

clang++ -fsycl -fsycl-targets=nvidia_gpu_sm_90  code.cpp
or
clang++ -fsycl -fsycl-targets=amd_gpu_gfx1201 code.cpp

The list of supported targets in downstream can be observed here: https://intel.github.io/llvm/UsersManual.html#generic-options

This syntax unnerves me. You can kind of do this in OpenMP but the syntax is really verbose.

clang++ -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,x86_64-unknown-linux-gnu -Xopenmp-target=x86_64-unknown-linux-gnu --offload-arch=znver5 -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx940

I want to let -Xarch_ handle this but that requires changing a flag which had some detractors. I also have #125556 to make that generic but it needs to be updated.

@maksimsab
Copy link
Contributor Author

Would you mind if we introduce a llvm/Transforms/SYCL/ directory and put this functionality in there? We are going to add more SYCL-specific passes anyway.

@shiltian
Copy link
Contributor

Yes, a user chooses a target like:

clang++ -fsycl -fsycl-targets=nvidia_gpu_sm_90  code.cpp
or
clang++ -fsycl -fsycl-targets=amd_gpu_gfx1201 code.cpp

The list of supported targets in downstream can be observed here: https://intel.github.io/llvm/UsersManual.html#generic-options

This syntax unnerves me. You can kind of do this in OpenMP but the syntax is really verbose.

clang++ -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,x86_64-unknown-linux-gnu -Xopenmp-target=x86_64-unknown-linux-gnu --offload-arch=znver5 -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx940

I want to let -Xarch_ handle this but that requires changing a flag which had some detractors. I also have #125556 to make that generic but it needs to be updated.

FWIW, it might be better to adopt the offload bundler id format:

<target triple>:<target id>[:<target feature>]

You probably want to have the full target triple here because the OS part does matter.

For example,

# AMDGPU

-fsycl-targets=amdgcn-amd-amdhsa-gfx1201

-fsycl-targets=amdgcn-amd-amdhsa-gfx1201:xnack+

#NVIDIA GPU

-fsycl-targets=nvptx64-nvidia-unknown-sm_90

This also applies to --offload-arch @jhuber6 has mentioned.

@shiltian
Copy link
Contributor

shiltian commented Mar 19, 2025

Would you mind if we introduce a llvm/Transforms/SYCL/ directory and put this functionality in there? We are going to add more SYCL-specific passes anyway.

I don't mind it, but I'd do a RFC. I suppose it's gonna be a new LLVM library component?

@bader
Copy link
Contributor

bader commented Mar 19, 2025

There should be nothing SYCL specific about the patch. This PR adds a function that distributes content of one LLVM module into one or more LLVM modules. It's generic LLVM transformation. The logic how content is distributed is currently limited to two cases:

  1. Put each "entry point" (you can think of it as "GPU kernel") into separate LLVM module.
  2. Put all functions with specific function attribute to a separate LLVM module.

There are plans to extend this list by future patches

(1) is GPU specific, but it can be useful for non-SYCL GPU programming models as well. IIRC, @jdoerfert has been involved in prototyping the GPU code splitting logic in OpenMP offload compiler.

(2) the name of the attribute is 'sycl-module-id', but the meaning is the same as C++ compilation unit. If two functions have the same "module-id" values, they are produced from the same compilation unit.
NOTE: In SYCL case we get functions produced from different C++ compilation units (CU) by linking LLVM modules for each CU. This scenario might be rare for programming models using thinLTO framework for linking the device code as thinLTO naturally keeps the code split by compilation unit.

The primary use of this functionality is to reduce the code generation time for GPU code. This is critical for JIT compiling. Another nice property is splitting LLVM module allows the compiler to skip the code generation at all. This is useful if LLVM mode has code for different targets and we need to avoid code generation for "unwanted" targets.

The code mentions SYCL because it's currently used only by SYCL compiler. @sarnex, have you thought about using this function for the OpenMP offload compilation?

We can commit this code as a SYCL specific library right now and make it more generic if we find use cases outside of SYCL compilation flow.

@jdoerfert
Copy link
Member

FWIW, this should never be "SYCL" specific. There is no reason for it and it distracts from the issue this solves.

Any offload language, even non offload languages, might want to use this to increase compile time parallelism. An obvious user is the new "forced contained" thinLTO pipeline for AMD GPUs driven currently by @shiltian. It works great if the TUs come with an even distribution of work, but it won't work well if that isn't the case. This patch would allow us to expose parallelism. That said, we would want to follow up with heuristics later.

Long story short. SYCL needs this for their compute model and "target feature" ideas. AMD GPU wants this for thinLTO. Other targets would likely also be interested in this for their thinLTO. This should be a generic utility w/o SYCL branding, living in Transform, exposed via llvm-split (or sth), and, if necessary, with hooks to the backend for target specific stuff.

@maksimsab
Copy link
Contributor Author

I could change the main splitting method from SYCLModuleSplit to:

using FunctionCategorizer = function_ref<std::string(const Function &F)>; // Computes the category for the function.

using PostSplitCallbackType = function_ref<void(std::unique_ptr<Module>); // Accepts the split Module for further handling.

void SplitModuleByCategory(std::unique_ptr<Module> M, FunctionCategorizer FC, PostSplitCallbackType Callback);

This function would map functions/kernels according to their computed categories. This interface allows reusing for purposes other than SYCL and it doesn't require using SYCL-specifics.

What do you think about this?

@jdoerfert
Copy link
Member

I think that makes sense, though there doesn't seem to be a need for "std::string", IMHO. An integer should be just fine. The user can map strings to integers if they need to.

@maksimsab
Copy link
Contributor Author

Even in our case in splitting by source unit it would be difficult to map module ids to integers. I could use a hash but it would lead to collisions.

@jdoerfert
Copy link
Member

Even in our case in splitting by source unit it would be difficult to map module ids to integers. I could use a hash but it would lead to collisions.

Your API proposal maps the stuff to std::string, right?
Then simply map the string internally to integers.
If you have seen the string, use the old integer, otherwise the number of strings you've seen so far.

@maksimsab
Copy link
Contributor Author

If you have seen the string, use the old integer, otherwise the number of strings you've seen so far.

This would lead to complications for users, forcing them to come up with stateful functors in order to maintain previously seen functions.

Additionally, it would complicate testing since we currently depend on the sorted order provided by the std::map<std::string, ...>. With your approach, the order will depend on the order of the functions in the module.

@jdoerfert
Copy link
Member

If you have seen the string, use the old integer, otherwise the number of strings you've seen so far.

This would lead to complications for users, forcing them to come up with stateful functors in order to maintain previously seen functions.

Additionally, it would complicate testing since we currently depend on the sorted order provided by the std::map<std::string, ...>. With your approach, the order will depend on the order of the functions in the module.

I don't think this makes anything more complicated. Please try to see the following point:
If I can use a std::string as a key, and I can assign each unique std::string a number (see below), we can use int as a key by applying the mapping once.

To get integers from std::string, the user would use something like:

// Not checked for syntax errors
std::map<std::string, int> S2IMap;
int getIntKeyForString(std::string &S) {
  auto [It, _] = S2IMap.insert({S, S2IMap.size()});
  return It.second;
}

And if the user wants something other than std::string to drive their mapping, they can map that to integers as well.

@maksimsab
Copy link
Contributor Author

Hi @jdoerfert .

I am thinking of moving this functionality in llvm/Frontend/SYCL/ folder since we have many SYCL specific settings in the splitting and it looks difficult to make the splitting generic for all possible users.
Do you have any objections?

@maksimsab
Copy link
Contributor Author

maksimsab commented Apr 24, 2025

I've moved the functionality into llvm/Frontend/SYCL/. Now Transforms area is not involved in this patch.

I removed my developments related to discussed FunctionCategorizer since it isn't supposed to be generic anymore.

@maksimsab
Copy link
Contributor Author

@jhuber6 @jdoerfert

Could you please share your opinions on the latest changes?

I am a bit skeptical about recent FunctionCategorizer design since other OffloadKind users would rather interested of reusing a thin-LTO framework. The current algorithm is straightforward and non-parallel. Once we start migrating it to parallel version I see it is going to have 2 stages:

  1. Prepare a plan of moving functions/objects from input Modules to output parts.
  2. Execute the plan parallelly according to the plan.

We have been discussing the first step in this PR. I think it could be ironed separately in an effort dedicated to thin-LTO migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms SYCL https://registry.khronos.org/SYCL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants