-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[offload][SYCL] Add SYCL Module splitting. #131347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This patch adds SYCL Module splitting - the necessary step in the SYCL compilation pipeline. Only 2 splitting modes are being added in this patch: by kernel and by source. The previous attempt was at llvm#119713. In this patch there is no dependency in `TransformUtils` on "IPO" and on "Printing Passes". In this patch a module splitting is self-contained and it doesn't introduce linking issues.
@llvm/pr-subscribers-llvm-transforms Author: Maksim Sabianin (maksimsab) ChangesThis patch adds SYCL Module splitting - the necessary step in the SYCL compilation pipeline. Only 2 splitting modes are being added in this patch: by kernel and by source. The previous attempt was at #119713. In this patch there is no dependency in Patch is 41.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131347.diff 13 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Utils/SYCLSplitModule.h b/llvm/include/llvm/Transforms/Utils/SYCLSplitModule.h
new file mode 100644
index 0000000000000..a3425d19b9c4b
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Utils/SYCLSplitModule.h
@@ -0,0 +1,64 @@
+//===-------- SYCLSplitModule.h - module split ------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// Functionality to split a module into callgraphs. A callgraph here is a set
+// of entry points with all functions reachable from them via a call. The result
+// of the split is new modules containing corresponding callgraph.
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_UTILS_SYCLSPLITMODULE_H
+#define LLVM_TRANSFORMS_UTILS_SYCLSPLITMODULE_H
+
+#include "llvm/ADT/STLFunctionalExtras.h"
+#include "llvm/ADT/StringRef.h"
+
+#include <memory>
+#include <optional>
+#include <string>
+
+namespace llvm {
+
+class Module;
+
+enum class IRSplitMode {
+ IRSM_PER_TU, // one module per translation unit
+ IRSM_PER_KERNEL, // one module per kernel
+ IRSM_NONE // no splitting
+};
+
+/// \returns IRSplitMode value if \p S is recognized. Otherwise, std::nullopt is
+/// returned.
+std::optional<IRSplitMode> convertStringToSplitMode(StringRef S);
+
+/// The structure represents a split LLVM Module accompanied by additional
+/// information. Split Modules are being stored at disk due to the high RAM
+/// consumption during the whole splitting process.
+struct ModuleAndSYCLMetadata {
+ std::string ModuleFilePath;
+ std::string Symbols;
+
+ ModuleAndSYCLMetadata() = default;
+ ModuleAndSYCLMetadata(const ModuleAndSYCLMetadata &) = default;
+ ModuleAndSYCLMetadata &operator=(const ModuleAndSYCLMetadata &) = default;
+ ModuleAndSYCLMetadata(ModuleAndSYCLMetadata &&) = default;
+ ModuleAndSYCLMetadata &operator=(ModuleAndSYCLMetadata &&) = default;
+
+ ModuleAndSYCLMetadata(std::string_view File, std::string Symbols)
+ : ModuleFilePath(File), Symbols(std::move(Symbols)) {}
+};
+
+using PostSYCLSplitCallbackType =
+ function_ref<void(std::unique_ptr<Module> Part, std::string Symbols)>;
+
+/// Splits the given module \p M according to the given \p Settings.
+/// Every split image is being passed to \p Callback.
+void SYCLSplitModule(std::unique_ptr<Module> M, IRSplitMode Mode,
+ PostSYCLSplitCallbackType Callback);
+
+} // namespace llvm
+
+#endif // LLVM_TRANSFORMS_UTILS_SYCLSPLITMODULE_H
diff --git a/llvm/include/llvm/Transforms/Utils/SYCLUtils.h b/llvm/include/llvm/Transforms/Utils/SYCLUtils.h
new file mode 100644
index 0000000000000..75459eed6ac0f
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Utils/SYCLUtils.h
@@ -0,0 +1,26 @@
+//===------------ SYCLUtils.h - SYCL utility functions --------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// Utility functions for SYCL.
+//===----------------------------------------------------------------------===//
+#ifndef LLVM_TRANSFORMS_UTILS_SYCLUTILS_H
+#define LLVM_TRANSFORMS_UTILS_SYCLUTILS_H
+
+#include <llvm/ADT/SmallString.h>
+#include <llvm/ADT/SmallVector.h>
+
+namespace llvm {
+
+class raw_ostream;
+
+using SYCLStringTable = SmallVector<SmallVector<SmallString<64>>>;
+
+void writeSYCLStringTable(const SYCLStringTable &Table, raw_ostream &OS);
+
+} // namespace llvm
+
+#endif // LLVM_TRANSFORMS_UTILS_SYCLUTILS_H
diff --git a/llvm/lib/Transforms/Utils/CMakeLists.txt b/llvm/lib/Transforms/Utils/CMakeLists.txt
index 78cad0d253be8..0ba46bdadea8d 100644
--- a/llvm/lib/Transforms/Utils/CMakeLists.txt
+++ b/llvm/lib/Transforms/Utils/CMakeLists.txt
@@ -83,6 +83,8 @@ add_llvm_component_library(LLVMTransformUtils
SizeOpts.cpp
SplitModule.cpp
StripNonLineTableDebugInfo.cpp
+ SYCLSplitModule.cpp
+ SYCLUtils.cpp
SymbolRewriter.cpp
UnifyFunctionExitNodes.cpp
UnifyLoopExits.cpp
diff --git a/llvm/lib/Transforms/Utils/SYCLSplitModule.cpp b/llvm/lib/Transforms/Utils/SYCLSplitModule.cpp
new file mode 100644
index 0000000000000..18eca4237c8ae
--- /dev/null
+++ b/llvm/lib/Transforms/Utils/SYCLSplitModule.cpp
@@ -0,0 +1,401 @@
+//===-------- SYCLSplitModule.cpp - Split a module into call graphs -------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// See comments in the header.
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/Utils/SYCLSplitModule.h"
+#include "llvm/ADT/SetVector.h"
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Transforms/Utils/Cloning.h"
+#include "llvm/Transforms/Utils/SYCLUtils.h"
+
+#include <map>
+#include <utility>
+
+using namespace llvm;
+
+#define DEBUG_TYPE "sycl-split-module"
+
+static bool isKernel(const Function &F) {
+ return F.getCallingConv() == CallingConv::SPIR_KERNEL ||
+ F.getCallingConv() == CallingConv::AMDGPU_KERNEL;
+}
+
+static bool isEntryPoint(const Function &F) {
+ // Skip declarations, if any: they should not be included into a vector of
+ // entry points groups or otherwise we will end up with incorrectly generated
+ // list of symbols.
+ if (F.isDeclaration())
+ return false;
+
+ // Kernels are always considered to be entry points
+ return isKernel(F);
+}
+
+namespace {
+
+// A vector that contains all entry point functions in a split module.
+using EntryPointSet = SetVector<const Function *>;
+
+/// Represents a named group entry points.
+struct EntryPointGroup {
+ std::string GroupName;
+ EntryPointSet Functions;
+
+ EntryPointGroup() = default;
+ EntryPointGroup(const EntryPointGroup &) = default;
+ EntryPointGroup &operator=(const EntryPointGroup &) = default;
+ EntryPointGroup(EntryPointGroup &&) = default;
+ EntryPointGroup &operator=(EntryPointGroup &&) = default;
+
+ EntryPointGroup(StringRef GroupName,
+ EntryPointSet Functions = EntryPointSet())
+ : GroupName(GroupName), Functions(std::move(Functions)) {}
+
+ void clear() {
+ GroupName.clear();
+ Functions.clear();
+ }
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+ LLVM_DUMP_METHOD void dump() const {
+ constexpr size_t INDENT = 4;
+ dbgs().indent(INDENT) << "ENTRY POINTS"
+ << " " << GroupName << " {\n";
+ for (const Function *F : Functions)
+ dbgs().indent(INDENT) << " " << F->getName() << "\n";
+
+ dbgs().indent(INDENT) << "}\n";
+ }
+#endif
+};
+
+/// Annotates an llvm::Module with information necessary to perform and track
+/// the result of device code (llvm::Module instances) splitting:
+/// - entry points group from the module.
+class ModuleDesc {
+ std::unique_ptr<Module> M;
+ EntryPointGroup EntryPoints;
+
+public:
+ ModuleDesc() = delete;
+ ModuleDesc(const ModuleDesc &) = delete;
+ ModuleDesc &operator=(const ModuleDesc &) = delete;
+ ModuleDesc(ModuleDesc &&) = default;
+ ModuleDesc &operator=(ModuleDesc &&) = default;
+
+ ModuleDesc(std::unique_ptr<Module> M,
+ EntryPointGroup EntryPoints = EntryPointGroup())
+ : M(std::move(M)), EntryPoints(std::move(EntryPoints)) {
+ assert(this->M && "Module should be non-null");
+ }
+
+ Module &getModule() { return *M; }
+ const Module &getModule() const { return *M; }
+
+ std::unique_ptr<Module> releaseModule() {
+ EntryPoints.clear();
+ return std::move(M);
+ }
+
+ std::string makeSymbolTable() const {
+ SmallString<0> Data;
+ raw_svector_ostream OS(Data);
+ for (const Function *F : EntryPoints.Functions)
+ OS << F->getName() << '\n';
+
+ return std::string(OS.str());
+ }
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+ LLVM_DUMP_METHOD void dump() const {
+ dbgs() << "ModuleDesc[" << M->getName() << "] {\n";
+ EntryPoints.dump();
+ dbgs() << "}\n";
+ }
+#endif
+};
+
+// Represents "dependency" or "use" graph of global objects (functions and
+// global variables) in a module. It is used during device code split to
+// understand which global variables and functions (other than entry points)
+// should be included into a split module.
+//
+// Nodes of the graph represent LLVM's GlobalObjects, edges "A" -> "B" represent
+// the fact that if "A" is included into a module, then "B" should be included
+// as well.
+//
+// Examples of dependencies which are represented in this graph:
+// - Function FA calls function FB
+// - Function FA uses global variable GA
+// - Global variable GA references (initialized with) function FB
+// - Function FA stores address of a function FB somewhere
+//
+// The following cases are treated as dependencies between global objects:
+// 1. Global object A is used within by a global object B in any way (store,
+// bitcast, phi node, call, etc.): "A" -> "B" edge will be added to the
+// graph;
+// 2. function A performs an indirect call of a function with signature S and
+// there is a function B with signature S. "A" -> "B" edge will be added to
+// the graph;
+class DependencyGraph {
+public:
+ using GlobalSet = SmallPtrSet<const GlobalValue *, 16>;
+
+ DependencyGraph(const Module &M) {
+ // Group functions by their signature to handle case (2) described above
+ DenseMap<const FunctionType *, DependencyGraph::GlobalSet>
+ FuncTypeToFuncsMap;
+ for (const auto &F : M.functions()) {
+ // Kernels can't be called (either directly or indirectly) in SYCL
+ if (isKernel(F))
+ continue;
+
+ FuncTypeToFuncsMap[F.getFunctionType()].insert(&F);
+ }
+
+ for (const auto &F : M.functions()) {
+ // case (1), see comment above the class definition
+ for (const Value *U : F.users())
+ addUserToGraphRecursively(cast<const User>(U), &F);
+
+ // case (2), see comment above the class definition
+ for (const auto &I : instructions(F)) {
+ const auto *CI = dyn_cast<CallInst>(&I);
+ if (!CI || !CI->isIndirectCall()) // Direct calls were handled above
+ continue;
+
+ const FunctionType *Signature = CI->getFunctionType();
+ const auto &PotentialCallees = FuncTypeToFuncsMap[Signature];
+ Graph[&F].insert(PotentialCallees.begin(), PotentialCallees.end());
+ }
+ }
+
+ // And every global variable (but their handling is a bit simpler)
+ for (const auto &GV : M.globals())
+ for (const Value *U : GV.users())
+ addUserToGraphRecursively(cast<const User>(U), &GV);
+ }
+
+ iterator_range<GlobalSet::const_iterator>
+ dependencies(const GlobalValue *Val) const {
+ auto It = Graph.find(Val);
+ return (It == Graph.end())
+ ? make_range(EmptySet.begin(), EmptySet.end())
+ : make_range(It->second.begin(), It->second.end());
+ }
+
+private:
+ void addUserToGraphRecursively(const User *Root, const GlobalValue *V) {
+ SmallVector<const User *, 8> WorkList;
+ WorkList.push_back(Root);
+
+ while (!WorkList.empty()) {
+ const User *U = WorkList.pop_back_val();
+ if (const auto *I = dyn_cast<const Instruction>(U)) {
+ const auto *UFunc = I->getFunction();
+ Graph[UFunc].insert(V);
+ } else if (isa<const Constant>(U)) {
+ if (const auto *GV = dyn_cast<const GlobalVariable>(U))
+ Graph[GV].insert(V);
+ // This could be a global variable or some constant expression (like
+ // bitcast or gep). We trace users of this constant further to reach
+ // global objects they are used by and add them to the graph.
+ for (const auto *UU : U->users())
+ WorkList.push_back(UU);
+ } else
+ llvm_unreachable("Unhandled type of function user");
+ }
+ }
+
+ DenseMap<const GlobalValue *, GlobalSet> Graph;
+ SmallPtrSet<const GlobalValue *, 1> EmptySet;
+};
+
+void collectFunctionsAndGlobalVariablesToExtract(
+ SetVector<const GlobalValue *> &GVs, const Module &M,
+ const EntryPointGroup &ModuleEntryPoints, const DependencyGraph &DG) {
+ // We start with module entry points
+ for (const auto *F : ModuleEntryPoints.Functions)
+ GVs.insert(F);
+
+ // Non-discardable global variables are also include into the initial set
+ for (const auto &GV : M.globals())
+ if (!GV.isDiscardableIfUnused())
+ GVs.insert(&GV);
+
+ // GVs has SetVector type. This type inserts a value only if it is not yet
+ // present there. So, recursion is not expected here.
+ size_t Idx = 0;
+ while (Idx < GVs.size()) {
+ const GlobalValue *Obj = GVs[Idx++];
+
+ for (const GlobalValue *Dep : DG.dependencies(Obj)) {
+ if (const auto *Func = dyn_cast<const Function>(Dep)) {
+ if (!Func->isDeclaration())
+ GVs.insert(Func);
+ } else
+ GVs.insert(Dep); // Global variables are added unconditionally
+ }
+ }
+}
+
+ModuleDesc extractSubModule(const Module &M,
+ const SetVector<const GlobalValue *> &GVs,
+ EntryPointGroup ModuleEntryPoints) {
+ // For each group of entry points collect all dependencies.
+ ValueToValueMapTy VMap;
+ // Clone definitions only for needed globals. Others will be added as
+ // declarations and removed later.
+ std::unique_ptr<Module> SubM = CloneModule(
+ M, VMap, [&](const GlobalValue *GV) { return GVs.count(GV); });
+ // Replace entry points with cloned ones.
+ EntryPointSet NewEPs;
+ const EntryPointSet &EPs = ModuleEntryPoints.Functions;
+ std::for_each(EPs.begin(), EPs.end(), [&](const Function *F) {
+ NewEPs.insert(cast<Function>(VMap[F]));
+ });
+ ModuleEntryPoints.Functions = std::move(NewEPs);
+ return ModuleDesc{std::move(SubM), std::move(ModuleEntryPoints)};
+}
+
+// The function produces a copy of input LLVM IR module M with only those
+// functions and globals that can be called from entry points that are specified
+// in ModuleEntryPoints vector, in addition to the entry point functions.
+ModuleDesc extractCallGraph(const Module &M, EntryPointGroup ModuleEntryPoints,
+ const DependencyGraph &DG) {
+ SetVector<const GlobalValue *> GVs;
+ collectFunctionsAndGlobalVariablesToExtract(GVs, M, ModuleEntryPoints, DG);
+
+ ModuleDesc SplitM = extractSubModule(M, GVs, std::move(ModuleEntryPoints));
+ LLVM_DEBUG(SplitM.dump());
+ return SplitM;
+}
+
+using EntryPointGroupVec = SmallVector<EntryPointGroup, 0>;
+
+/// Module Splitter.
+/// It gets a module (in a form of module descriptor, to get additional info)
+/// and a collection of entry points groups. Each group specifies subset entry
+/// points from input module that should be included in a split module.
+class ModuleSplitter {
+private:
+ ModuleDesc Input;
+ EntryPointGroupVec Groups;
+ DependencyGraph DG;
+
+private:
+ EntryPointGroup drawEntryPointGroup() {
+ assert(Groups.size() > 0 && "Reached end of entry point groups list.");
+ EntryPointGroup Group = std::move(Groups.back());
+ Groups.pop_back();
+ return Group;
+ }
+
+public:
+ ModuleSplitter(ModuleDesc MD, EntryPointGroupVec GroupVec)
+ : Input(std::move(MD)), Groups(std::move(GroupVec)),
+ DG(Input.getModule()) {
+ assert(!Groups.empty() && "Entry points groups collection is empty!");
+ }
+
+ /// Gets next subsequence of entry points in an input module and provides
+ /// split submodule containing these entry points and their dependencies.
+ ModuleDesc getNextSplit() {
+ return extractCallGraph(Input.getModule(), drawEntryPointGroup(), DG);
+ }
+
+ /// Check that there are still submodules to split.
+ bool hasMoreSplits() const { return Groups.size() > 0; }
+};
+
+} // namespace
+
+static EntryPointGroupVec selectEntryPointGroups(const Module &M,
+ IRSplitMode Mode) {
+ // std::map is used here to ensure stable ordering of entry point groups,
+ // which is based on their contents, this greatly helps LIT tests
+ std::map<std::string, EntryPointSet> EntryPointsMap;
+
+ static constexpr char ATTR_SYCL_MODULE_ID[] = "sycl-module-id";
+ for (const auto &F : M.functions()) {
+ if (!isEntryPoint(F))
+ continue;
+
+ std::string Key;
+ switch (Mode) {
+ case IRSplitMode::IRSM_PER_KERNEL:
+ Key = F.getName();
+ break;
+ case IRSplitMode::IRSM_PER_TU:
+ Key = F.getFnAttribute(ATTR_SYCL_MODULE_ID).getValueAsString();
+ break;
+ case IRSplitMode::IRSM_NONE:
+ llvm_unreachable("");
+ }
+
+ EntryPointsMap[Key].insert(&F);
+ }
+
+ EntryPointGroupVec Groups;
+ if (EntryPointsMap.empty()) {
+ // No entry points met, record this.
+ Groups.emplace_back("-", EntryPointSet());
+ } else {
+ Groups.reserve(EntryPointsMap.size());
+ // Start with properties of a source module
+ for (auto &[Key, EntryPoints] : EntryPointsMap)
+ Groups.emplace_back(Key, std::move(EntryPoints));
+ }
+
+ return Groups;
+}
+
+namespace llvm {
+
+std::optional<IRSplitMode> convertStringToSplitMode(StringRef S) {
+ static const StringMap<IRSplitMode> Values = {
+ {"source", IRSplitMode::IRSM_PER_TU},
+ {"kernel", IRSplitMode::IRSM_PER_KERNEL},
+ {"none", IRSplitMode::IRSM_NONE}};
+
+ auto It = Values.find(S);
+ if (It == Values.end())
+ return std::nullopt;
+
+ return It->second;
+}
+
+void SYCLSplitModule(std::unique_ptr<Module> M, IRSplitMode Mode,
+ PostSYCLSplitCallbackType Callback) {
+ SmallVector<ModuleAndSYCLMetadata, 0> OutputImages;
+ if (Mode == IRSplitMode::IRSM_NONE) {
+ auto MD = ModuleDesc(std::move(M));
+ auto Symbols = MD.makeSymbolTable();
+ Callback(std::move(MD.releaseModule()), std::move(Symbols));
+ return;
+ }
+
+ EntryPointGroupVec Groups = selectEntryPointGroups(*M, Mode);
+ ModuleDesc MD = std::move(M);
+ ModuleSplitter Splitter(std::move(MD), std::move(Groups));
+ while (Splitter.hasMoreSplits()) {
+ ModuleDesc MD = Splitter.getNextSplit();
+ auto Symbols = MD.makeSymbolTable();
+ Callback(std::move(MD.releaseModule()), std::move(Symbols));
+ }
+}
+
+} // namespace llvm
diff --git a/llvm/lib/Transforms/Utils/SYCLUtils.cpp b/llvm/lib/Transforms/Utils/SYCLUtils.cpp
new file mode 100644
index 0000000000000..ad9864fadb828
--- /dev/null
+++ b/llvm/lib/Transforms/Utils/SYCLUtils.cpp
@@ -0,0 +1,26 @@
+//===------------ SYCLUtils.cpp - SYCL utility functions ------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// SYCL utility functions.
+//===----------------------------------------------------------------------===//
+#include "llvm/Transforms/Utils/SYCLUtils.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/Support/raw_ostream.h"
+
+namespace llvm {
+
+void writeSYCLStringTable(const SYCLStringTable &Table, raw_ostream &OS) {
+ assert(!Table.empty() && "table should contain at least column titles");
+ assert(!Table[0].empty() && "table should be non-empty");
+ OS << '[' << join(Table[0].begin(), Table[0].end(), "|") << "]\n";
+ for (size_t I = 1, E = Table.size(); I != E; ++I) {
+ assert(Table[I].size() == Table[0].size() && "row's size should be equal");
+ OS << join(Table[I].begin(), Table[I].end(), "|") << '\n';
+ }
+}
+
+} // namespace llvm
diff --git a/llvm/test/tools/llvm-split/SYCL/device-code-split/amd-kernel-split.ll b/llvm/test/tools/llvm-split/SYCL/device-code-split/amd-kernel-split.ll
new file mode 100644
index 0000000000000..a40a52107fb0c
--- /...
[truncated]
|
Hi @jhuber6 @frasercrmck @bader @asudarsa! Please, share you feedback when convenient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm lacking context here.
Why does this have to be in lib/Transform
? How many targets will use it?
I'm a bit concerned seeing what looks like target specific files added to Transform
Hi @Pierre-vh Currently, in the downstream, our SYCL extension is capable of being compiled into SPIR-V, Intel CPU/GPU, AMDGPU, and NVPTX. We are planning to upstream support for all of these. The one missing feature of this patch, which will be added later, is the splitting by optional kernel features (spec). I think that justifies my choice of the component. Anyway, I am interested in your opinion. The possible use case:
The separation is being done in the Module splitting algorithm. |
So this SYCL mode can, e.g. use either the SPIRV or AMDGPU target at the same time? If that's the case then I'm a little bit less concerned, I still wish this was hidden away in some TargetMachine somewhere though but I don't see how we can do that easily. How does it work ? Does it use a different target triple, or is it a LLVM flag that changes the compilation pipeline? Is there a high-level overview somewhere of how this works in LLVM? What attributes would a module using SYCL have and what target triple would it use? |
@maksimsab I think that is orthogonal to @Pierre-vh 's concern. The issue isn’t about whether SYCL can target multiple "targets." This pass is specifically designated for SYCL modules, at least based on its description. If the pass were applicable to SYCL, CUDA, HIP, OpenCL, OpenMP, etc., then placing it in this directory would make sense. |
Yes exactly. If the pass is only used with a "SYCLTargetMachine" then this SplitModule impl should live right next to that, and use the module splitting override hook from TargetMachine similarly to how AMDGPU does it. If it can be used in combination with any other TargetMachine then this is the right approach (though I wish we had a better one). |
High-level RFCs are available on Discourse:
Yes, a user chooses a target like:
The list of supported targets in downstream can be observed here: https://intel.github.io/llvm/UsersManual.html#generic-options There is no such thing as a SYCL Target since we are actually targeting Intel CPU/GPU, AMDGPU, and NVPTX. It is similar to OpenCL and OpenMP. I would also like to have a specific folder like llvm/Target/SYCL, but that doesn't really seem appropriate.
There is some information in the mentioned RFC Offloading design for SYCL offload kind and SPIR targets. We are going to incorporate module splitting in the clang-linker-wrapper, which will be able to recognize incoming SYCL inputs. So a compilation pipeline is going to be constructed and executed in clang-linker-wrapper. We are also preparing more detailed documentation about SYCL offloading right now, which we will add to the llvm-project. |
This syntax unnerves me. You can kind of do this in OpenMP but the syntax is really verbose.
I want to let |
Would you mind if we introduce a |
FWIW, it might be better to adopt the offload bundler id format:
You probably want to have the full target triple here because the OS part does matter. For example,
This also applies to |
I don't mind it, but I'd do a RFC. I suppose it's gonna be a new LLVM library component? |
There should be nothing SYCL specific about the patch. This PR adds a function that distributes content of one LLVM module into one or more LLVM modules. It's generic LLVM transformation. The logic how content is distributed is currently limited to two cases:
There are plans to extend this list by future patches (1) is GPU specific, but it can be useful for non-SYCL GPU programming models as well. IIRC, @jdoerfert has been involved in prototyping the GPU code splitting logic in OpenMP offload compiler. (2) the name of the attribute is 'sycl-module-id', but the meaning is the same as C++ compilation unit. If two functions have the same "module-id" values, they are produced from the same compilation unit. The primary use of this functionality is to reduce the code generation time for GPU code. This is critical for JIT compiling. Another nice property is splitting LLVM module allows the compiler to skip the code generation at all. This is useful if LLVM mode has code for different targets and we need to avoid code generation for "unwanted" targets. The code mentions SYCL because it's currently used only by SYCL compiler. @sarnex, have you thought about using this function for the OpenMP offload compilation? We can commit this code as a SYCL specific library right now and make it more generic if we find use cases outside of SYCL compilation flow. |
FWIW, this should never be "SYCL" specific. There is no reason for it and it distracts from the issue this solves. Any offload language, even non offload languages, might want to use this to increase compile time parallelism. An obvious user is the new "forced contained" thinLTO pipeline for AMD GPUs driven currently by @shiltian. It works great if the TUs come with an even distribution of work, but it won't work well if that isn't the case. This patch would allow us to expose parallelism. That said, we would want to follow up with heuristics later. Long story short. SYCL needs this for their compute model and "target feature" ideas. AMD GPU wants this for thinLTO. Other targets would likely also be interested in this for their thinLTO. This should be a generic utility w/o SYCL branding, living in Transform, exposed via llvm-split (or sth), and, if necessary, with hooks to the backend for target specific stuff. |
I could change the main splitting method from using FunctionCategorizer = function_ref<std::string(const Function &F)>; // Computes the category for the function.
using PostSplitCallbackType = function_ref<void(std::unique_ptr<Module>); // Accepts the split Module for further handling.
void SplitModuleByCategory(std::unique_ptr<Module> M, FunctionCategorizer FC, PostSplitCallbackType Callback); This function would map functions/kernels according to their computed categories. This interface allows reusing for purposes other than SYCL and it doesn't require using SYCL-specifics. What do you think about this? |
I think that makes sense, though there doesn't seem to be a need for "std::string", IMHO. An integer should be just fine. The user can map strings to integers if they need to. |
Even in our case in splitting by source unit it would be difficult to map module ids to integers. I could use a hash but it would lead to collisions. |
Your API proposal maps the stuff to std::string, right? |
This would lead to complications for users, forcing them to come up with stateful functors in order to maintain previously seen functions. Additionally, it would complicate testing since we currently depend on the sorted order provided by the |
I don't think this makes anything more complicated. Please try to see the following point: To get integers from
And if the user wants something other than |
Hi @jdoerfert . I am thinking of moving this functionality in |
I've moved the functionality into I removed my developments related to discussed |
Could you please share your opinions on the latest changes? I am a bit skeptical about recent
We have been discussing the first step in this PR. I think it could be ironed separately in an effort dedicated to |
This patch adds SYCL Module splitting - the necessary step in the SYCL compilation pipeline. Only 2 splitting modes are being added in this patch: by kernel and by source.
The previous attempt was at #119713. In this patch there is no dependency in
TransformUtils
on "IPO" and on "Printing Passes". In this patch a module splitting is self-contained and it doesn't introduce linking issues.