[offload][SYCL] Add Module splitting by categories. #131347

maksimsab · 2025-03-14T15:54:27Z

This patch adds Module splitting by categories. The splitting algorithm is the necessary step in the SYCL compilation pipeline. Also it could be reused for other heterogenous targets.

The previous attempt was at #119713. In this patch there is no dependency in TransformUtils on "IPO" and on "Printing Passes". In this patch a module splitting is self-contained and it doesn't introduce linking issues.

This patch adds SYCL Module splitting - the necessary step in the SYCL compilation pipeline. Only 2 splitting modes are being added in this patch: by kernel and by source. The previous attempt was at llvm#119713. In this patch there is no dependency in `TransformUtils` on "IPO" and on "Printing Passes". In this patch a module splitting is self-contained and it doesn't introduce linking issues.

llvmbot · 2025-03-14T15:55:01Z

@llvm/pr-subscribers-llvm-transforms

Author: Maksim Sabianin (maksimsab)

Changes

This patch adds SYCL Module splitting - the necessary step in the SYCL compilation pipeline. Only 2 splitting modes are being added in this patch: by kernel and by source.

The previous attempt was at #119713. In this patch there is no dependency in TransformUtils on "IPO" and on "Printing Passes". In this patch a module splitting is self-contained and it doesn't introduce linking issues.

Patch is 41.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/131347.diff

13 Files Affected:

(added) llvm/include/llvm/Transforms/Utils/SYCLSplitModule.h (+64)
(added) llvm/include/llvm/Transforms/Utils/SYCLUtils.h (+26)
(modified) llvm/lib/Transforms/Utils/CMakeLists.txt (+2)
(added) llvm/lib/Transforms/Utils/SYCLSplitModule.cpp (+401)
(added) llvm/lib/Transforms/Utils/SYCLUtils.cpp (+26)
(added) llvm/test/tools/llvm-split/SYCL/device-code-split/amd-kernel-split.ll (+17)
(added) llvm/test/tools/llvm-split/SYCL/device-code-split/complex-indirect-call-chain.ll (+75)
(added) llvm/test/tools/llvm-split/SYCL/device-code-split/module-split-func-ptr.ll (+43)
(added) llvm/test/tools/llvm-split/SYCL/device-code-split/one-kernel-per-module.ll (+108)
(added) llvm/test/tools/llvm-split/SYCL/device-code-split/split-by-source.ll (+97)
(added) llvm/test/tools/llvm-split/SYCL/device-code-split/split-with-kernel-declarations.ll (+66)
(modified) llvm/tools/llvm-split/CMakeLists.txt (+1)
(modified) llvm/tools/llvm-split/llvm-split.cpp (+121)

diff --git a/llvm/include/llvm/Transforms/Utils/SYCLSplitModule.h b/llvm/include/llvm/Transforms/Utils/SYCLSplitModule.h
new file mode 100644
index 0000000000000..a3425d19b9c4b
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Utils/SYCLSplitModule.h
@@ -0,0 +1,64 @@
+//===-------- SYCLSplitModule.h - module split ------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// Functionality to split a module into callgraphs. A callgraph here is a set
+// of entry points with all functions reachable from them via a call. The result
+// of the split is new modules containing corresponding callgraph.
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_UTILS_SYCLSPLITMODULE_H
+#define LLVM_TRANSFORMS_UTILS_SYCLSPLITMODULE_H
+
+#include "llvm/ADT/STLFunctionalExtras.h"
+#include "llvm/ADT/StringRef.h"
+
+#include <memory>
+#include <optional>
+#include <string>
+
+namespace llvm {
+
+class Module;
+
+enum class IRSplitMode {
+  IRSM_PER_TU,     // one module per translation unit
+  IRSM_PER_KERNEL, // one module per kernel
+  IRSM_NONE        // no splitting
+};
+
+/// \returns IRSplitMode value if \p S is recognized. Otherwise, std::nullopt is
+/// returned.
+std::optional<IRSplitMode> convertStringToSplitMode(StringRef S);
+
+/// The structure represents a split LLVM Module accompanied by additional
+/// information. Split Modules are being stored at disk due to the high RAM
+/// consumption during the whole splitting process.
+struct ModuleAndSYCLMetadata {
+  std::string ModuleFilePath;
+  std::string Symbols;
+
+  ModuleAndSYCLMetadata() = default;
+  ModuleAndSYCLMetadata(const ModuleAndSYCLMetadata &) = default;
+  ModuleAndSYCLMetadata &operator=(const ModuleAndSYCLMetadata &) = default;
+  ModuleAndSYCLMetadata(ModuleAndSYCLMetadata &&) = default;
+  ModuleAndSYCLMetadata &operator=(ModuleAndSYCLMetadata &&) = default;
+
+  ModuleAndSYCLMetadata(std::string_view File, std::string Symbols)
+      : ModuleFilePath(File), Symbols(std::move(Symbols)) {}
+};
+
+using PostSYCLSplitCallbackType =
+    function_ref<void(std::unique_ptr<Module> Part, std::string Symbols)>;
+
+/// Splits the given module \p M according to the given \p Settings.
+/// Every split image is being passed to \p Callback.
+void SYCLSplitModule(std::unique_ptr<Module> M, IRSplitMode Mode,
+                     PostSYCLSplitCallbackType Callback);
+
+} // namespace llvm
+
+#endif // LLVM_TRANSFORMS_UTILS_SYCLSPLITMODULE_H
diff --git a/llvm/include/llvm/Transforms/Utils/SYCLUtils.h b/llvm/include/llvm/Transforms/Utils/SYCLUtils.h
new file mode 100644
index 0000000000000..75459eed6ac0f
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Utils/SYCLUtils.h
@@ -0,0 +1,26 @@
+//===------------ SYCLUtils.h - SYCL utility functions --------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// Utility functions for SYCL.
+//===----------------------------------------------------------------------===//
+#ifndef LLVM_TRANSFORMS_UTILS_SYCLUTILS_H
+#define LLVM_TRANSFORMS_UTILS_SYCLUTILS_H
+
+#include <llvm/ADT/SmallString.h>
+#include <llvm/ADT/SmallVector.h>
+
+namespace llvm {
+
+class raw_ostream;
+
+using SYCLStringTable = SmallVector<SmallVector<SmallString<64>>>;
+
+void writeSYCLStringTable(const SYCLStringTable &Table, raw_ostream &OS);
+
+} // namespace llvm
+
+#endif // LLVM_TRANSFORMS_UTILS_SYCLUTILS_H
diff --git a/llvm/lib/Transforms/Utils/CMakeLists.txt b/llvm/lib/Transforms/Utils/CMakeLists.txt
index 78cad0d253be8..0ba46bdadea8d 100644
--- a/llvm/lib/Transforms/Utils/CMakeLists.txt
+++ b/llvm/lib/Transforms/Utils/CMakeLists.txt
@@ -83,6 +83,8 @@ add_llvm_component_library(LLVMTransformUtils
   SizeOpts.cpp
   SplitModule.cpp
   StripNonLineTableDebugInfo.cpp
+  SYCLSplitModule.cpp
+  SYCLUtils.cpp
   SymbolRewriter.cpp
   UnifyFunctionExitNodes.cpp
   UnifyLoopExits.cpp
diff --git a/llvm/lib/Transforms/Utils/SYCLSplitModule.cpp b/llvm/lib/Transforms/Utils/SYCLSplitModule.cpp
new file mode 100644
index 0000000000000..18eca4237c8ae
--- /dev/null
+++ b/llvm/lib/Transforms/Utils/SYCLSplitModule.cpp
@@ -0,0 +1,401 @@
+//===-------- SYCLSplitModule.cpp - Split a module into call graphs -------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// See comments in the header.
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/Utils/SYCLSplitModule.h"
+#include "llvm/ADT/SetVector.h"
+#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/Module.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Transforms/Utils/Cloning.h"
+#include "llvm/Transforms/Utils/SYCLUtils.h"
+
+#include <map>
+#include <utility>
+
+using namespace llvm;
+
+#define DEBUG_TYPE "sycl-split-module"
+
+static bool isKernel(const Function &F) {
+  return F.getCallingConv() == CallingConv::SPIR_KERNEL ||
+         F.getCallingConv() == CallingConv::AMDGPU_KERNEL;
+}
+
+static bool isEntryPoint(const Function &F) {
+  // Skip declarations, if any: they should not be included into a vector of
+  // entry points groups or otherwise we will end up with incorrectly generated
+  // list of symbols.
+  if (F.isDeclaration())
+    return false;
+
+  // Kernels are always considered to be entry points
+  return isKernel(F);
+}
+
+namespace {
+
+// A vector that contains all entry point functions in a split module.
+using EntryPointSet = SetVector<const Function *>;
+
+/// Represents a named group entry points.
+struct EntryPointGroup {
+  std::string GroupName;
+  EntryPointSet Functions;
+
+  EntryPointGroup() = default;
+  EntryPointGroup(const EntryPointGroup &) = default;
+  EntryPointGroup &operator=(const EntryPointGroup &) = default;
+  EntryPointGroup(EntryPointGroup &&) = default;
+  EntryPointGroup &operator=(EntryPointGroup &&) = default;
+
+  EntryPointGroup(StringRef GroupName,
+                  EntryPointSet Functions = EntryPointSet())
+      : GroupName(GroupName), Functions(std::move(Functions)) {}
+
+  void clear() {
+    GroupName.clear();
+    Functions.clear();
+  }
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  LLVM_DUMP_METHOD void dump() const {
+    constexpr size_t INDENT = 4;
+    dbgs().indent(INDENT) << "ENTRY POINTS"
+                          << " " << GroupName << " {\n";
+    for (const Function *F : Functions)
+      dbgs().indent(INDENT) << "  " << F->getName() << "\n";
+
+    dbgs().indent(INDENT) << "}\n";
+  }
+#endif
+};
+
+/// Annotates an llvm::Module with information necessary to perform and track
+/// the result of device code (llvm::Module instances) splitting:
+/// - entry points group from the module.
+class ModuleDesc {
+  std::unique_ptr<Module> M;
+  EntryPointGroup EntryPoints;
+
+public:
+  ModuleDesc() = delete;
+  ModuleDesc(const ModuleDesc &) = delete;
+  ModuleDesc &operator=(const ModuleDesc &) = delete;
+  ModuleDesc(ModuleDesc &&) = default;
+  ModuleDesc &operator=(ModuleDesc &&) = default;
+
+  ModuleDesc(std::unique_ptr<Module> M,
+             EntryPointGroup EntryPoints = EntryPointGroup())
+      : M(std::move(M)), EntryPoints(std::move(EntryPoints)) {
+    assert(this->M && "Module should be non-null");
+  }
+
+  Module &getModule() { return *M; }
+  const Module &getModule() const { return *M; }
+
+  std::unique_ptr<Module> releaseModule() {
+    EntryPoints.clear();
+    return std::move(M);
+  }
+
+  std::string makeSymbolTable() const {
+    SmallString<0> Data;
+    raw_svector_ostream OS(Data);
+    for (const Function *F : EntryPoints.Functions)
+      OS << F->getName() << '\n';
+
+    return std::string(OS.str());
+  }
+
+#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
+  LLVM_DUMP_METHOD void dump() const {
+    dbgs() << "ModuleDesc[" << M->getName() << "] {\n";
+    EntryPoints.dump();
+    dbgs() << "}\n";
+  }
+#endif
+};
+
+// Represents "dependency" or "use" graph of global objects (functions and
+// global variables) in a module. It is used during device code split to
+// understand which global variables and functions (other than entry points)
+// should be included into a split module.
+//
+// Nodes of the graph represent LLVM's GlobalObjects, edges "A" -> "B" represent
+// the fact that if "A" is included into a module, then "B" should be included
+// as well.
+//
+// Examples of dependencies which are represented in this graph:
+// - Function FA calls function FB
+// - Function FA uses global variable GA
+// - Global variable GA references (initialized with) function FB
+// - Function FA stores address of a function FB somewhere
+//
+// The following cases are treated as dependencies between global objects:
+// 1. Global object A is used within by a global object B in any way (store,
+//    bitcast, phi node, call, etc.): "A" -> "B" edge will be added to the
+//    graph;
+// 2. function A performs an indirect call of a function with signature S and
+//    there is a function B with signature S. "A" -> "B" edge will be added to
+//    the graph;
+class DependencyGraph {
+public:
+  using GlobalSet = SmallPtrSet<const GlobalValue *, 16>;
+
+  DependencyGraph(const Module &M) {
+    // Group functions by their signature to handle case (2) described above
+    DenseMap<const FunctionType *, DependencyGraph::GlobalSet>
+        FuncTypeToFuncsMap;
+    for (const auto &F : M.functions()) {
+      // Kernels can't be called (either directly or indirectly) in SYCL
+      if (isKernel(F))
+        continue;
+
+      FuncTypeToFuncsMap[F.getFunctionType()].insert(&F);
+    }
+
+    for (const auto &F : M.functions()) {
+      // case (1), see comment above the class definition
+      for (const Value *U : F.users())
+        addUserToGraphRecursively(cast<const User>(U), &F);
+
+      // case (2), see comment above the class definition
+      for (const auto &I : instructions(F)) {
+        const auto *CI = dyn_cast<CallInst>(&I);
+        if (!CI || !CI->isIndirectCall()) // Direct calls were handled above
+          continue;
+
+        const FunctionType *Signature = CI->getFunctionType();
+        const auto &PotentialCallees = FuncTypeToFuncsMap[Signature];
+        Graph[&F].insert(PotentialCallees.begin(), PotentialCallees.end());
+      }
+    }
+
+    // And every global variable (but their handling is a bit simpler)
+    for (const auto &GV : M.globals())
+      for (const Value *U : GV.users())
+        addUserToGraphRecursively(cast<const User>(U), &GV);
+  }
+
+  iterator_range<GlobalSet::const_iterator>
+  dependencies(const GlobalValue *Val) const {
+    auto It = Graph.find(Val);
+    return (It == Graph.end())
+               ? make_range(EmptySet.begin(), EmptySet.end())
+               : make_range(It->second.begin(), It->second.end());
+  }
+
+private:
+  void addUserToGraphRecursively(const User *Root, const GlobalValue *V) {
+    SmallVector<const User *, 8> WorkList;
+    WorkList.push_back(Root);
+
+    while (!WorkList.empty()) {
+      const User *U = WorkList.pop_back_val();
+      if (const auto *I = dyn_cast<const Instruction>(U)) {
+        const auto *UFunc = I->getFunction();
+        Graph[UFunc].insert(V);
+      } else if (isa<const Constant>(U)) {
+        if (const auto *GV = dyn_cast<const GlobalVariable>(U))
+          Graph[GV].insert(V);
+        // This could be a global variable or some constant expression (like
+        // bitcast or gep). We trace users of this constant further to reach
+        // global objects they are used by and add them to the graph.
+        for (const auto *UU : U->users())
+          WorkList.push_back(UU);
+      } else
+        llvm_unreachable("Unhandled type of function user");
+    }
+  }
+
+  DenseMap<const GlobalValue *, GlobalSet> Graph;
+  SmallPtrSet<const GlobalValue *, 1> EmptySet;
+};
+
+void collectFunctionsAndGlobalVariablesToExtract(
+    SetVector<const GlobalValue *> &GVs, const Module &M,
+    const EntryPointGroup &ModuleEntryPoints, const DependencyGraph &DG) {
+  // We start with module entry points
+  for (const auto *F : ModuleEntryPoints.Functions)
+    GVs.insert(F);
+
+  // Non-discardable global variables are also include into the initial set
+  for (const auto &GV : M.globals())
+    if (!GV.isDiscardableIfUnused())
+      GVs.insert(&GV);
+
+  // GVs has SetVector type. This type inserts a value only if it is not yet
+  // present there. So, recursion is not expected here.
+  size_t Idx = 0;
+  while (Idx < GVs.size()) {
+    const GlobalValue *Obj = GVs[Idx++];
+
+    for (const GlobalValue *Dep : DG.dependencies(Obj)) {
+      if (const auto *Func = dyn_cast<const Function>(Dep)) {
+        if (!Func->isDeclaration())
+          GVs.insert(Func);
+      } else
+        GVs.insert(Dep); // Global variables are added unconditionally
+    }
+  }
+}
+
+ModuleDesc extractSubModule(const Module &M,
+                            const SetVector<const GlobalValue *> &GVs,
+                            EntryPointGroup ModuleEntryPoints) {
+  // For each group of entry points collect all dependencies.
+  ValueToValueMapTy VMap;
+  // Clone definitions only for needed globals. Others will be added as
+  // declarations and removed later.
+  std::unique_ptr<Module> SubM = CloneModule(
+      M, VMap, [&](const GlobalValue *GV) { return GVs.count(GV); });
+  // Replace entry points with cloned ones.
+  EntryPointSet NewEPs;
+  const EntryPointSet &EPs = ModuleEntryPoints.Functions;
+  std::for_each(EPs.begin(), EPs.end(), [&](const Function *F) {
+    NewEPs.insert(cast<Function>(VMap[F]));
+  });
+  ModuleEntryPoints.Functions = std::move(NewEPs);
+  return ModuleDesc{std::move(SubM), std::move(ModuleEntryPoints)};
+}
+
+// The function produces a copy of input LLVM IR module M with only those
+// functions and globals that can be called from entry points that are specified
+// in ModuleEntryPoints vector, in addition to the entry point functions.
+ModuleDesc extractCallGraph(const Module &M, EntryPointGroup ModuleEntryPoints,
+                            const DependencyGraph &DG) {
+  SetVector<const GlobalValue *> GVs;
+  collectFunctionsAndGlobalVariablesToExtract(GVs, M, ModuleEntryPoints, DG);
+
+  ModuleDesc SplitM = extractSubModule(M, GVs, std::move(ModuleEntryPoints));
+  LLVM_DEBUG(SplitM.dump());
+  return SplitM;
+}
+
+using EntryPointGroupVec = SmallVector<EntryPointGroup, 0>;
+
+/// Module Splitter.
+/// It gets a module (in a form of module descriptor, to get additional info)
+/// and a collection of entry points groups. Each group specifies subset entry
+/// points from input module that should be included in a split module.
+class ModuleSplitter {
+private:
+  ModuleDesc Input;
+  EntryPointGroupVec Groups;
+  DependencyGraph DG;
+
+private:
+  EntryPointGroup drawEntryPointGroup() {
+    assert(Groups.size() > 0 && "Reached end of entry point groups list.");
+    EntryPointGroup Group = std::move(Groups.back());
+    Groups.pop_back();
+    return Group;
+  }
+
+public:
+  ModuleSplitter(ModuleDesc MD, EntryPointGroupVec GroupVec)
+      : Input(std::move(MD)), Groups(std::move(GroupVec)),
+        DG(Input.getModule()) {
+    assert(!Groups.empty() && "Entry points groups collection is empty!");
+  }
+
+  /// Gets next subsequence of entry points in an input module and provides
+  /// split submodule containing these entry points and their dependencies.
+  ModuleDesc getNextSplit() {
+    return extractCallGraph(Input.getModule(), drawEntryPointGroup(), DG);
+  }
+
+  /// Check that there are still submodules to split.
+  bool hasMoreSplits() const { return Groups.size() > 0; }
+};
+
+} // namespace
+
+static EntryPointGroupVec selectEntryPointGroups(const Module &M,
+                                                 IRSplitMode Mode) {
+  // std::map is used here to ensure stable ordering of entry point groups,
+  // which is based on their contents, this greatly helps LIT tests
+  std::map<std::string, EntryPointSet> EntryPointsMap;
+
+  static constexpr char ATTR_SYCL_MODULE_ID[] = "sycl-module-id";
+  for (const auto &F : M.functions()) {
+    if (!isEntryPoint(F))
+      continue;
+
+    std::string Key;
+    switch (Mode) {
+    case IRSplitMode::IRSM_PER_KERNEL:
+      Key = F.getName();
+      break;
+    case IRSplitMode::IRSM_PER_TU:
+      Key = F.getFnAttribute(ATTR_SYCL_MODULE_ID).getValueAsString();
+      break;
+    case IRSplitMode::IRSM_NONE:
+      llvm_unreachable("");
+    }
+
+    EntryPointsMap[Key].insert(&F);
+  }
+
+  EntryPointGroupVec Groups;
+  if (EntryPointsMap.empty()) {
+    // No entry points met, record this.
+    Groups.emplace_back("-", EntryPointSet());
+  } else {
+    Groups.reserve(EntryPointsMap.size());
+    // Start with properties of a source module
+    for (auto &[Key, EntryPoints] : EntryPointsMap)
+      Groups.emplace_back(Key, std::move(EntryPoints));
+  }
+
+  return Groups;
+}
+
+namespace llvm {
+
+std::optional<IRSplitMode> convertStringToSplitMode(StringRef S) {
+  static const StringMap<IRSplitMode> Values = {
+      {"source", IRSplitMode::IRSM_PER_TU},
+      {"kernel", IRSplitMode::IRSM_PER_KERNEL},
+      {"none", IRSplitMode::IRSM_NONE}};
+
+  auto It = Values.find(S);
+  if (It == Values.end())
+    return std::nullopt;
+
+  return It->second;
+}
+
+void SYCLSplitModule(std::unique_ptr<Module> M, IRSplitMode Mode,
+                     PostSYCLSplitCallbackType Callback) {
+  SmallVector<ModuleAndSYCLMetadata, 0> OutputImages;
+  if (Mode == IRSplitMode::IRSM_NONE) {
+    auto MD = ModuleDesc(std::move(M));
+    auto Symbols = MD.makeSymbolTable();
+    Callback(std::move(MD.releaseModule()), std::move(Symbols));
+    return;
+  }
+
+  EntryPointGroupVec Groups = selectEntryPointGroups(*M, Mode);
+  ModuleDesc MD = std::move(M);
+  ModuleSplitter Splitter(std::move(MD), std::move(Groups));
+  while (Splitter.hasMoreSplits()) {
+    ModuleDesc MD = Splitter.getNextSplit();
+    auto Symbols = MD.makeSymbolTable();
+    Callback(std::move(MD.releaseModule()), std::move(Symbols));
+  }
+}
+
+} // namespace llvm
diff --git a/llvm/lib/Transforms/Utils/SYCLUtils.cpp b/llvm/lib/Transforms/Utils/SYCLUtils.cpp
new file mode 100644
index 0000000000000..ad9864fadb828
--- /dev/null
+++ b/llvm/lib/Transforms/Utils/SYCLUtils.cpp
@@ -0,0 +1,26 @@
+//===------------ SYCLUtils.cpp - SYCL utility functions ------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+// SYCL utility functions.
+//===----------------------------------------------------------------------===//
+#include "llvm/Transforms/Utils/SYCLUtils.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/Support/raw_ostream.h"
+
+namespace llvm {
+
+void writeSYCLStringTable(const SYCLStringTable &Table, raw_ostream &OS) {
+  assert(!Table.empty() && "table should contain at least column titles");
+  assert(!Table[0].empty() && "table should be non-empty");
+  OS << '[' << join(Table[0].begin(), Table[0].end(), "|") << "]\n";
+  for (size_t I = 1, E = Table.size(); I != E; ++I) {
+    assert(Table[I].size() == Table[0].size() && "row's size should be equal");
+    OS << join(Table[I].begin(), Table[I].end(), "|") << '\n';
+  }
+}
+
+} // namespace llvm
diff --git a/llvm/test/tools/llvm-split/SYCL/device-code-split/amd-kernel-split.ll b/llvm/test/tools/llvm-split/SYCL/device-code-split/amd-kernel-split.ll
new file mode 100644
index 0000000000000..a40a52107fb0c
--- /...
[truncated]

maksimsab · 2025-03-17T12:42:52Z

Hi @jhuber6 @frasercrmck @bader @asudarsa!
The previous issue with unresolved symbols has been resolved in this patch.

Please, share you feedback when convenient.

llvm/include/llvm/Transforms/Utils/SYCLUtils.h

Pierre-vh

I think I'm lacking context here.
Why does this have to be in lib/Transform ? How many targets will use it?
I'm a bit concerned seeing what looks like target specific files added to Transform

maksimsab · 2025-03-17T15:48:39Z

Hi @Pierre-vh

Currently, in the downstream, our SYCL extension is capable of being compiled into SPIR-V, Intel CPU/GPU, AMDGPU, and NVPTX. We are planning to upstream support for all of these. The one missing feature of this patch, which will be added later, is the splitting by optional kernel features (spec). I think that justifies my choice of the component. Anyway, I am interested in your opinion.

The possible use case:

User writes 2 kernels:

kernel A, which uses fp16
kernel B, which doesn't use fp16.

If the user wants to JIT compile kernel B for a device that doesn't support fp16, then it is required that these kernels be separated from each other."

The separation is being done in the Module splitting algorithm.

Pierre-vh · 2025-03-17T16:04:52Z

So this SYCL mode can, e.g. use either the SPIRV or AMDGPU target at the same time?

If that's the case then I'm a little bit less concerned, I still wish this was hidden away in some TargetMachine somewhere though but I don't see how we can do that easily.

How does it work ? Does it use a different target triple, or is it a LLVM flag that changes the compilation pipeline? Is there a high-level overview somewhere of how this works in LLVM? What attributes would a module using SYCL have and what target triple would it use?

shiltian · 2025-03-17T16:08:50Z

@maksimsab I think that is orthogonal to @Pierre-vh 's concern. The issue isn’t about whether SYCL can target multiple "targets." This pass is specifically designated for SYCL modules, at least based on its description. If the pass were applicable to SYCL, CUDA, HIP, OpenCL, OpenMP, etc., then placing it in this directory would make sense.

Pierre-vh · 2025-03-18T07:11:46Z

Yes exactly. If the pass is only used with a "SYCLTargetMachine" then this SplitModule impl should live right next to that, and use the module splitting override hook from TargetMachine similarly to how AMDGPU does it.

If it can be used in combination with any other TargetMachine then this is the right approach (though I wish we had a better one).

maksimsab · 2025-03-18T13:10:23Z

High-level RFCs are available on Discourse:

So this SYCL mode can, e.g. use either the SPIRV or AMDGPU target at the same time?

Yes, a user chooses a target like:

clang++ -fsycl -fsycl-targets=nvidia_gpu_sm_90  code.cpp
or
clang++ -fsycl -fsycl-targets=amd_gpu_gfx1201 code.cpp

The list of supported targets in downstream can be observed here: https://intel.github.io/llvm/UsersManual.html#generic-options

There is no such thing as a SYCL Target since we are actually targeting Intel CPU/GPU, AMDGPU, and NVPTX. It is similar to OpenCL and OpenMP.

I would also like to have a specific folder like llvm/Target/SYCL, but that doesn't really seem appropriate.

How does it work ? Does it use a different target triple, or is it a LLVM flag that changes the compilation pipeline? Is there a high-level overview somewhere of how this works in LLVM? What attributes would a module using SYCL have and what target triple would it use?

There is some information in the mentioned RFC Offloading design for SYCL offload kind and SPIR targets. We are going to incorporate module splitting in the clang-linker-wrapper, which will be able to recognize incoming SYCL inputs. So a compilation pipeline is going to be constructed and executed in clang-linker-wrapper. We are also preparing more detailed documentation about SYCL offloading right now, which we will add to the llvm-project.

jhuber6 · 2025-03-18T13:16:27Z

Yes, a user chooses a target like:
clang++ -fsycl -fsycl-targets=nvidia_gpu_sm_90  code.cpp
or
clang++ -fsycl -fsycl-targets=amd_gpu_gfx1201 code.cpp
The list of supported targets in downstream can be observed here: https://intel.github.io/llvm/UsersManual.html#generic-options

This syntax unnerves me. You can kind of do this in OpenMP but the syntax is really verbose.

clang++ -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,x86_64-unknown-linux-gnu -Xopenmp-target=x86_64-unknown-linux-gnu --offload-arch=znver5 -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx940

I want to let -Xarch_ handle this but that requires changing a flag which had some detractors. I also have #125556 to make that generic but it needs to be updated.

maksimsab · 2025-03-19T12:56:48Z

Would you mind if we introduce a llvm/Transforms/SYCL/ directory and put this functionality in there? We are going to add more SYCL-specific passes anyway.

shiltian · 2025-03-19T13:07:52Z

Yes, a user chooses a target like:
clang++ -fsycl -fsycl-targets=nvidia_gpu_sm_90  code.cpp
or
clang++ -fsycl -fsycl-targets=amd_gpu_gfx1201 code.cpp
The list of supported targets in downstream can be observed here: https://intel.github.io/llvm/UsersManual.html#generic-options
This syntax unnerves me. You can kind of do this in OpenMP but the syntax is really verbose.
clang++ -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa,x86_64-unknown-linux-gnu -Xopenmp-target=x86_64-unknown-linux-gnu --offload-arch=znver5 -Xopenmp-target=amdgcn-amd-amdhsa --offload-arch=gfx940
I want to let -Xarch_ handle this but that requires changing a flag which had some detractors. I also have #125556 to make that generic but it needs to be updated.

FWIW, it might be better to adopt the offload bundler id format:

<target triple>:<target id>[:<target feature>]

You probably want to have the full target triple here because the OS part does matter.

For example,

# AMDGPU

-fsycl-targets=amdgcn-amd-amdhsa-gfx1201

-fsycl-targets=amdgcn-amd-amdhsa-gfx1201:xnack+

#NVIDIA GPU

-fsycl-targets=nvptx64-nvidia-unknown-sm_90

This also applies to --offload-arch @jhuber6 has mentioned.

shiltian · 2025-03-19T13:08:28Z

Would you mind if we introduce a llvm/Transforms/SYCL/ directory and put this functionality in there? We are going to add more SYCL-specific passes anyway.

I don't mind it, but I'd do a RFC. I suppose it's gonna be a new LLVM library component?

bader · 2025-03-19T21:47:13Z

There should be nothing SYCL specific about the patch. This PR adds a function that distributes content of one LLVM module into one or more LLVM modules. It's generic LLVM transformation. The logic how content is distributed is currently limited to two cases:

Put each "entry point" (you can think of it as "GPU kernel") into separate LLVM module.
Put all functions with specific function attribute to a separate LLVM module.

There are plans to extend this list by future patches

(1) is GPU specific, but it can be useful for non-SYCL GPU programming models as well. IIRC, @jdoerfert has been involved in prototyping the GPU code splitting logic in OpenMP offload compiler.

(2) the name of the attribute is 'sycl-module-id', but the meaning is the same as C++ compilation unit. If two functions have the same "module-id" values, they are produced from the same compilation unit.
NOTE: In SYCL case we get functions produced from different C++ compilation units (CU) by linking LLVM modules for each CU. This scenario might be rare for programming models using thinLTO framework for linking the device code as thinLTO naturally keeps the code split by compilation unit.

The primary use of this functionality is to reduce the code generation time for GPU code. This is critical for JIT compiling. Another nice property is splitting LLVM module allows the compiler to skip the code generation at all. This is useful if LLVM mode has code for different targets and we need to avoid code generation for "unwanted" targets.

The code mentions SYCL because it's currently used only by SYCL compiler. @sarnex, have you thought about using this function for the OpenMP offload compilation?

We can commit this code as a SYCL specific library right now and make it more generic if we find use cases outside of SYCL compilation flow.

jdoerfert · 2025-03-20T01:03:03Z

FWIW, this should never be "SYCL" specific. There is no reason for it and it distracts from the issue this solves.

Any offload language, even non offload languages, might want to use this to increase compile time parallelism. An obvious user is the new "forced contained" thinLTO pipeline for AMD GPUs driven currently by @shiltian. It works great if the TUs come with an even distribution of work, but it won't work well if that isn't the case. This patch would allow us to expose parallelism. That said, we would want to follow up with heuristics later.

Long story short. SYCL needs this for their compute model and "target feature" ideas. AMD GPU wants this for thinLTO. Other targets would likely also be interested in this for their thinLTO. This should be a generic utility w/o SYCL branding, living in Transform, exposed via llvm-split (or sth), and, if necessary, with hooks to the backend for target specific stuff.

maksimsab · 2025-03-20T14:18:44Z

I could change the main splitting method from SYCLModuleSplit to:

using FunctionCategorizer = function_ref<std::string(const Function &F)>; // Computes the category for the function.

using PostSplitCallbackType = function_ref<void(std::unique_ptr<Module>); // Accepts the split Module for further handling.

void SplitModuleByCategory(std::unique_ptr<Module> M, FunctionCategorizer FC, PostSplitCallbackType Callback);

This function would map functions/kernels according to their computed categories. This interface allows reusing for purposes other than SYCL and it doesn't require using SYCL-specifics.

What do you think about this?

jdoerfert · 2025-03-20T16:15:43Z

I think that makes sense, though there doesn't seem to be a need for "std::string", IMHO. An integer should be just fine. The user can map strings to integers if they need to.

maksimsab · 2025-03-21T14:08:03Z

Even in our case in splitting by source unit it would be difficult to map module ids to integers. I could use a hash but it would lead to collisions.

jdoerfert · 2025-03-23T22:03:31Z

Even in our case in splitting by source unit it would be difficult to map module ids to integers. I could use a hash but it would lead to collisions.

Your API proposal maps the stuff to std::string, right?
Then simply map the string internally to integers.
If you have seen the string, use the old integer, otherwise the number of strings you've seen so far.

maksimsab · 2025-03-24T12:47:44Z

If you have seen the string, use the old integer, otherwise the number of strings you've seen so far.

This would lead to complications for users, forcing them to come up with stateful functors in order to maintain previously seen functions.

Additionally, it would complicate testing since we currently depend on the sorted order provided by the std::map<std::string, ...>. With your approach, the order will depend on the order of the functions in the module.

llvm/include/llvm/Transforms/Utils/SYCLUtils.h

llvm/include/llvm/Transforms/Utils/SplitModuleByCategory.h

asudarsa

Hi @maksimsab

Changes look good. Can you please update some of the naming to remove 'SYCL' specificity? I have added a few inline comments as well. Also, please update PR title and description.

Thanks

llvm/include/llvm/Transforms/Utils/SplitModuleByCategory.h

asudarsa

This looks good to me. One minor nit.

Thanks

asudarsa

LGTM.

asudarsa · 2025-06-04T14:56:36Z

llvm/include/llvm/Transforms/Utils/SplitModuleByCategory.h

+/// Every split output is being passed to \p Callback for further possible
+/// processing.
+///
+/// Currently, the supported targets are SPIRV, AMDGPU and NVPTX.


Please update this comment. I am not sure if the targets are restrictive. I think the restriction is whether the input module has recursive calls or not.

Thanks

Agreed. Now we have call backs so it should just work for all.

Update:

This is probably because isKernel function.

This is probably because isKernel function.

Yes and the algorithm was implemented with assumption that the input is a heterogenous program, which usually don't have recursion.

which usually don't have recursion.'

FWIW, "usually" is doing the heavy lifting here. Please do not assume anything about GPU codes that is not required. So, recursion should be assumed to happen.

jhuber6

I don't have a full understanding of what this is doing yet. Didn't @shiltian need to do something similar for splitting up in thin-LTO?

llvm/lib/Transforms/Utils/SplitModuleByCategory.cpp

shiltian · 2025-06-09T16:02:05Z

llvm/lib/Transforms/Utils/SplitModuleByCategory.cpp

+/// the result of code (llvm::Module instances) splitting:
+/// - entry points group from the module.
+class ModuleDesc {
+  std::unique_ptr<Module> M;


I assume ModuleDesc "owns" a module after splitting?

ModuleDesc own an initial module and new created modules.

llvm/lib/Transforms/Utils/SplitModuleByCategory.cpp

maksimsab · 2025-06-11T14:36:07Z

@jhuber6

I don't have a full understanding of what this is doing yet.

From the perspective of the interface, the main function allows to produce modules that correspond to some callgraphs.
Lets consider the following example: Module contains of entry points E1, E2, E3, E4 and functions that are used by these entry points.
Lets suppose that we want to extract E1 and E2 in one common split module, E3 in its separate split module and we don't want to keep E4.
Then FunctionCategorizer should return the following values:

FC(E1) = 1
FC(E2) = 1
FC(E3) = 2
FC(E4) = std::nullopt.

If some function F is used in E1 and E3 then the function is being copied in both output split modules.

Probably, the word Category is not the best choice here. I don't have the better alternative for FuncitonCategorizer right now.
Also, it is possible to replace the function splitModuleByCategory with a function that accepts a mapping "EntryPoint->Group ID". Like the following:

using GroupMapping = std::unordered_map<Function *, int>; // also it is possible to use a function's string name as a key.
void splitModuleByGroups(std::unique_ptr<Module> M, const GroupMapping &GM, PostSplitCallback Callback);

What do you think?

maksimsab · 2025-06-17T12:37:49Z

@shiltian @jhuber6
Friendly ping.

shiltian · 2025-06-18T01:59:06Z

llvm/include/llvm/Transforms/Utils/SplitModuleByCategory.h

+class Function;
+
+/// Splits the given module \p M using the given \p FunctionCategorizer.
+/// \p FunctionCategorizer returns integer category for an input Function.


A side note, I think it'd be more helpful (at least putting on my AMD hat) to be able to determine where a global variable goes as well, if we'd like to make this pass generic to support all potential targets. The reason is, for AMDGPU, we probably need to categorize all functions that could potentially reference a global variable in the sam module, due to the lowering of LDS (shared) variables.

Leave this for later.

jdoerfert

Mostly minor suggestions to cleanup some stuff. Overall this is close enough to what I was thinking. I'll wait for another revision or at least responses to some of the comments before accepting.

jdoerfert · 2025-06-24T15:31:43Z

llvm/include/llvm/Transforms/Utils/SplitModuleByCategory.h

+class Function;
+
+/// Splits the given module \p M using the given \p FunctionCategorizer.
+/// \p FunctionCategorizer returns integer category for an input Function.


Leave this for later.

jdoerfert · 2025-06-24T15:32:45Z

llvm/include/llvm/Transforms/Utils/SplitModuleByCategory.h

+/// Every split output is being passed to \p Callback for further possible
+/// processing.
+///
+/// Currently, the supported targets are SPIRV, AMDGPU and NVPTX.


which usually don't have recursion.'

FWIW, "usually" is doing the heavy lifting here. Please do not assume anything about GPU codes that is not required. So, recursion should be assumed to happen.

jdoerfert · 2025-06-24T15:36:35Z

llvm/include/llvm/Transforms/Utils/SplitModuleByCategory.h

+/// Module's functions are being grouped by categories. Every such group
+/// populates a call graph containing group's functions themselves and all
+/// reachable functions and globals. Split outputs are populated from each call
+/// graph associated with some category.


Two comments:

I am unsure why this talks about a call graph and such. The entire interface works with an opaque "oraql" and categories, why that "oraql" puts functions in the same or different categories might be call graph related, or it might be because their names have the same prefix. We should not conflate one use with this generic capability. EDIT: Coming back to this after I read the implementation, I believe I understand what is happening. The interface is conflating two things, which is by itself OK. However, given the naming, this is confusing. I believe the easiest fix is to modify the comment and the name a little. As it is now, one could reasonably assume all functions with category X go into the same module, nothing else. That would have been my preferred way of doing this. The difference is that the dependence logic would then be part of the caller, and this would be a stupid splitting interface. I'm OK with keeping it the other way around for now. The interface doesn't splitModuleByCategory though, it's more like splitModuleTransitiveFromEntryPoints. And the comment here needs to define "transitive" and what are considered entry points.

I am unsure why this uses an optional, and what that means. If std::nullopt means it is duplicated into ever module, or something special like that, I can see how that is useful. However, just given the comment here std::nullopt is an option and it is not clear what that means.

I am unsure why this uses an optional, and what that means.

In SYCL case FunctionCategorizer allows to define entry points and group them together by assigning group identifiers (categories). For non-entry functions we need a way to not choose them in a selection step of the algorithm. I chose std::nullopt as an indicator that the corresponding function shouldn't be added in any entry group. The function still can be copied in case if this is transitively used by some entry points.

We could replace std::optional<int> with a simple int and use value -1 for not choosing functions in entry groups.

nullopt is fine, but include the meaning in the comment. EDIT: you did.

jdoerfert · 2025-06-24T15:37:30Z

llvm/include/llvm/Transforms/Utils/SplitModuleByCategory.h

+void splitModuleByCategory(
+    std::unique_ptr<Module> M,
+    function_ref<std::optional<int>(const Function &F)> FunctionCategorizer,
+    function_ref<void(std::unique_ptr<Module> Part)> Callback);


It might be helpful to pass the category to the callback in addition to the module. But if this is not needed right now, we can do this later.

llvm/lib/Transforms/Utils/SplitModuleByCategory.cpp

jdoerfert · 2025-06-24T16:32:59Z

llvm/tools/llvm-split/llvm-split.cpp

+  auto PostSplitCallback = [&](std::unique_ptr<Module> MPart) {
+    if (verifyModule(*MPart)) {
+      errs() << "Broken Module!\n";
+      exit(1);


So this returns an Error but then calls exit?

Yes, I repeated the already used approach below in HandleModulePart.

Not a fan but sure.

jdoerfert

Minor additional comments requested in the threads, but otherwise LGTM.

github-actions · 2025-07-10T13:05:11Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvmbot added the llvm:transforms label Mar 14, 2025

shiltian requested review from arsenm, nikic, bader, frasercrmck, jdoerfert, shiltian, Pierre-vh and jhuber6 March 17, 2025 13:49

arsenm reviewed Mar 17, 2025

View reviewed changes

llvm/include/llvm/Transforms/Utils/SYCLUtils.h Outdated Show resolved Hide resolved

Pierre-vh reviewed Mar 17, 2025

View reviewed changes

asudarsa reviewed May 27, 2025

View reviewed changes

llvm/include/llvm/Transforms/Utils/SYCLUtils.h Outdated Show resolved Hide resolved

asudarsa reviewed May 27, 2025

View reviewed changes

llvm/include/llvm/Transforms/Utils/SYCLUtils.h Outdated Show resolved Hide resolved

asudarsa reviewed May 27, 2025

View reviewed changes

llvm/include/llvm/Transforms/Utils/SYCLUtils.h Outdated Show resolved Hide resolved

asudarsa reviewed May 27, 2025

View reviewed changes

llvm/include/llvm/Transforms/Utils/SYCLUtils.h Outdated Show resolved Hide resolved

asudarsa reviewed May 27, 2025

View reviewed changes

llvm/include/llvm/Transforms/Utils/SplitModuleByCategory.h Show resolved Hide resolved

asudarsa reviewed May 27, 2025

View reviewed changes

llvm/include/llvm/Transforms/Utils/SplitModuleByCategory.h Outdated Show resolved Hide resolved

asudarsa reviewed May 27, 2025

View reviewed changes

llvm/include/llvm/Transforms/Utils/SplitModuleByCategory.h Outdated Show resolved Hide resolved

asudarsa requested changes May 27, 2025

View reviewed changes

Remove SYCL specialization from the PR.

1729c50

maksimsab changed the title ~~[offload][SYCL] Add SYCL Module splitting.~~ [offload][SYCL] Add Module splitting by categories. Jun 4, 2025

asudarsa reviewed Jun 4, 2025

View reviewed changes

llvm/include/llvm/Transforms/Utils/SplitModuleByCategory.h Outdated Show resolved Hide resolved

asudarsa reviewed Jun 4, 2025

View reviewed changes

asudarsa approved these changes Jun 4, 2025

View reviewed changes

asudarsa reviewed Jun 4, 2025

View reviewed changes

jhuber6 reviewed Jun 4, 2025

View reviewed changes

llvm/lib/Transforms/Utils/SplitModuleByCategory.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Utils/SplitModuleByCategory.cpp Outdated Show resolved Hide resolved

shiltian reviewed Jun 9, 2025

View reviewed changes

maksimsab added 2 commits June 11, 2025 06:43

Merge branch 'main' into split_patch3

0b6f17f

address most of CR feedback

c249af1

shiltian reviewed Jun 18, 2025

View reviewed changes

jdoerfert reviewed Jun 24, 2025

View reviewed changes

maksimsab added 2 commits July 1, 2025 06:45

change function's name and improve the documentation

7ad079e

Change some comments

7c96d33

jdoerfert approved these changes Jul 2, 2025

View reviewed changes

maksimsab added 2 commits July 9, 2025 07:53

Fix mistake with leaking signature

04de3db

fix grammar and use r-value references

877db4b

maksimsab added 2 commits July 10, 2025 06:06

add comment regarding implementation of group selection

4987388

apply clang-format

2e89d50

[offload][SYCL] Add Module splitting by categories. #131347

Are you sure you want to change the base?

[offload][SYCL] Add Module splitting by categories. #131347

Conversation

maksimsab commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 14, 2025

Uh oh!

maksimsab commented Mar 17, 2025

Uh oh!

Uh oh!

Pierre-vh left a comment

Choose a reason for hiding this comment

Uh oh!

maksimsab commented Mar 17, 2025

Uh oh!

Pierre-vh commented Mar 17, 2025

Uh oh!

shiltian commented Mar 17, 2025

Uh oh!

Pierre-vh commented Mar 18, 2025

Uh oh!

maksimsab commented Mar 18, 2025

Uh oh!

jhuber6 commented Mar 18, 2025

Uh oh!

maksimsab commented Mar 19, 2025

Uh oh!

shiltian commented Mar 19, 2025

Uh oh!

shiltian commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bader commented Mar 19, 2025

Uh oh!

jdoerfert commented Mar 20, 2025

Uh oh!

maksimsab commented Mar 20, 2025

Uh oh!

jdoerfert commented Mar 20, 2025

Uh oh!

maksimsab commented Mar 21, 2025

Uh oh!

jdoerfert commented Mar 23, 2025

Uh oh!

maksimsab commented Mar 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asudarsa left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asudarsa left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asudarsa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shiltian Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

maksimsab commented Mar 14, 2025 •

edited

Loading

shiltian commented Mar 19, 2025 •

edited

Loading

asudarsa left a comment •

edited

Loading

asudarsa left a comment •

edited

Loading

shiltian Jun 9, 2025 •

edited

Loading

jdoerfert Jul 2, 2025 •

edited

Loading