- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
[Clang] Rework creating offloading toolchains #125556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-clang Author: Joseph Huber (jhuber6) ChangesSummary: Patch is 31.18 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/125556.diff 11 Files Affected: 
 diff --git a/clang/include/clang/Basic/DiagnosticDriverKinds.td b/clang/include/clang/Basic/DiagnosticDriverKinds.td
index 8d599c96eb4fbf..94baa35e6a9ec7 100644
--- a/clang/include/clang/Basic/DiagnosticDriverKinds.td
+++ b/clang/include/clang/Basic/DiagnosticDriverKinds.td
@@ -119,6 +119,8 @@ def err_drv_cuda_host_arch : Error<
   "unsupported architecture '%0' for host compilation">;
 def err_drv_mix_cuda_hip : Error<
   "mixed CUDA and HIP compilation is not supported">;
+def err_drv_mix_offload : Error<
+  "mixed %0 and %1 offloading compilation is not supported">;
 def err_drv_bad_target_id : Error<
   "invalid target ID '%0'; format is a processor name followed by an optional "
   "colon-delimited list of features followed by an enable/disable sign (e.g., "
diff --git a/clang/include/clang/Driver/Driver.h b/clang/include/clang/Driver/Driver.h
index f4a52cc529b79c..b463dc2a93550e 100644
--- a/clang/include/clang/Driver/Driver.h
+++ b/clang/include/clang/Driver/Driver.h
@@ -797,22 +797,14 @@ class Driver {
   const ToolChain &getToolChain(const llvm::opt::ArgList &Args,
                                 const llvm::Triple &Target) const;
 
-  /// @}
-
-  /// Retrieves a ToolChain for a particular device \p Target triple
-  ///
-  /// \param[in] HostTC is the host ToolChain paired with the device
-  ///
-  /// \param[in] TargetDeviceOffloadKind (e.g. OFK_Cuda/OFK_OpenMP/OFK_SYCL) is
-  /// an Offloading action that is optionally passed to a ToolChain (used by
-  /// CUDA, to specify if it's used in conjunction with OpenMP)
+  /// Retrieves a ToolChain for a particular \p Target triple for offloading.
   ///
   /// Will cache ToolChains for the life of the driver object, and create them
   /// on-demand.
-  const ToolChain &getOffloadingDeviceToolChain(
-      const llvm::opt::ArgList &Args, const llvm::Triple &Target,
-      const ToolChain &HostTC,
-      const Action::OffloadKind &TargetDeviceOffloadKind) const;
+  const ToolChain &getOffloadToolChain(const llvm::opt::ArgList &Args,
+                                       const Action::OffloadKind Kind,
+                                       const llvm::Triple &Target,
+                                       const llvm::Triple &AuxTarget) const;
 
   /// Get bitmasks for which option flags to include and exclude based on
   /// the driver mode.
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index d8123cc39fdc95..482d264ef37e2c 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -932,7 +932,9 @@ def W_Joined : Joined<["-"], "W">, Group<W_Group>,
 def Xanalyzer : Separate<["-"], "Xanalyzer">,
   HelpText<"Pass <arg> to the static analyzer">, MetaVarName<"<arg>">,
   Group<StaticAnalyzer_Group>;
-def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>;
+def Xarch__ : JoinedAndSeparate<["-"], "Xarch_">, Flags<[NoXarchOption]>,
+  HelpText<"Pass <arg> to the compiliation if the target matches <arch>">,
+  MetaVarName<"<arch> <arg>">;
 def Xarch_host : Separate<["-"], "Xarch_host">, Flags<[NoXarchOption]>,
   HelpText<"Pass <arg> to the CUDA/HIP host compilation">, MetaVarName<"<arg>">;
 def Xarch_device : Separate<["-"], "Xarch_device">, Flags<[NoXarchOption]>,
@@ -1115,14 +1117,17 @@ def fno_convergent_functions : Flag<["-"], "fno-convergent-functions">,
 
 // Common offloading options
 let Group = offload_Group in {
-def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[NoXarchOption]>,
+def offload_targets_EQ : Joined<["--"], "offload-targets=">,
+  Visibility<[ClangOption, FlangOption]>, Flags<[NoXarchOption]>,
+  HelpText<"Specify a list of target architectures to use for offloading.">;
+
+def offload_arch_EQ : Joined<["--"], "offload-arch=">,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Specify an offloading device architecture for CUDA, HIP, or OpenMP. (e.g. sm_35). "
            "If 'native' is used the compiler will detect locally installed architectures. "
            "For HIP offloading, the device architecture can be followed by target ID features "
            "delimited by a colon (e.g. gfx908:xnack+:sramecc-). May be specified more than once.">;
 def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">,
-  Flags<[NoXarchOption]>,
   Visibility<[ClangOption, FlangOption]>,
   HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. "
            "'all' resets the list to its default value.">;
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 912777a9808b4b..cb0f06b7ed4631 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -879,34 +879,100 @@ void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
                    }) ||
       C.getInputArgs().hasArg(options::OPT_hip_link) ||
       C.getInputArgs().hasArg(options::OPT_hipstdpar);
+  bool IsOpenMP =
+      C.getInputArgs().hasFlag(options::OPT_fopenmp, options::OPT_fopenmp_EQ,
+                               options::OPT_fno_openmp, false);
+  bool IsSYCL = C.getInputArgs().hasFlag(options::OPT_fsycl,
+                                         options::OPT_fno_sycl, false);
+
   bool UseLLVMOffload = C.getInputArgs().hasArg(
       options::OPT_foffload_via_llvm, options::OPT_fno_offload_via_llvm, false);
+
+  llvm::DenseSet<Action::OffloadKind> Kinds;
+  const std::pair<bool, Action::OffloadKind> ActiveKinds[] = {
+      {IsCuda, Action::OFK_Cuda},
+      {IsHIP, Action::OFK_HIP},
+      {IsOpenMP, Action::OFK_OpenMP},
+      {IsSYCL, Action::OFK_SYCL}};
+  for (const auto &[Active, Kind] : ActiveKinds)
+    if (Active)
+      Kinds.insert(Kind);
+
+  // Build an offloading toolchain for every requested target and kind.
+  for (StringRef Targets :
+       C.getInputArgs().getAllArgValues(options::OPT_offload_targets_EQ)) {
+    // We currently don't support any kind of mixed offloading.
+    if (Kinds.size() > 1) {
+      Diag(clang::diag::err_drv_mix_offload)
+          << Action::GetOffloadKindName(*Kinds.begin())
+          << Action::GetOffloadKindName(*Kinds.end());
+      return;
+    }
+
+    // OpenMP offloading requires a compatible libomp.
+    if (Kinds.contains(Action::OFK_OpenMP)) {
+      OpenMPRuntimeKind RuntimeKind = getOpenMPRuntime(C.getInputArgs());
+      if (RuntimeKind != OMPRT_OMP && RuntimeKind != OMPRT_IOMP5) {
+        Diag(clang::diag::err_drv_expecting_fopenmp_with_fopenmp_targets);
+        return;
+      }
+    }
+
+    // Certain options are not allowed when combined with SYCL compilation.
+    if (Kinds.contains(Action::OFK_SYCL)) {
+      for (auto ID :
+           {options::OPT_static_libstdcxx, options::OPT_ffreestanding})
+        if (Arg *IncompatArg = C.getInputArgs().getLastArg(ID))
+          Diag(clang::diag::err_drv_argument_not_allowed_with)
+              << IncompatArg->getSpelling() << "-fsycl";
+    }
+
+    // Create a device toolchain for every specified triple.
+    for (StringRef Target : llvm::split(Targets, ",")) {
+      llvm::Triple TT(Target);
+      for (Action::OffloadKind Kind : Kinds) {
+        auto &TC = getOffloadToolChain(C.getInputArgs(), Kind, TT,
+                                       C.getDefaultToolChain().getTriple());
+
+        // Emit a warning if the detected CUDA version is too new.
+        if (Kind == Action::OFK_Cuda) {
+          auto &CudaInstallation =
+              static_cast<const toolchains::CudaToolChain &>(TC)
+                  .CudaInstallation;
+          if (CudaInstallation.isValid())
+            CudaInstallation.WarnIfUnsupportedVersion();
+        }
+
+        C.addOffloadDeviceToolChain(&TC, Kind);
+      }
+    }
+  }
+
+  // If the user did not specify the toolchains specifically we will infer them
+  // based on the usage of `--offloard-arch=`.
+  if (C.getInputArgs().hasArg(options::OPT_offload_targets_EQ))
+    return;
+
   if (IsCuda && IsHIP) {
     Diag(clang::diag::err_drv_mix_cuda_hip);
     return;
   }
   if (IsCuda && !UseLLVMOffload) {
-    const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();
-    const llvm::Triple &HostTriple = HostTC->getTriple();
-    auto OFK = Action::OFK_Cuda;
-    auto CudaTriple =
-        getNVIDIAOffloadTargetTriple(*this, C.getInputArgs(), HostTriple);
+    auto CudaTriple = getNVIDIAOffloadTargetTriple(
+        *this, C.getInputArgs(), C.getDefaultToolChain().getTriple());
     if (!CudaTriple)
       return;
-    // Use the CUDA and host triples as the key into the ToolChains map,
-    // because the device toolchain we create depends on both.
-    auto &CudaTC = ToolChains[CudaTriple->str() + "/" + HostTriple.str()];
-    if (!CudaTC) {
-      CudaTC = std::make_unique<toolchains::CudaToolChain>(
-          *this, *CudaTriple, *HostTC, C.getInputArgs());
-
-      // Emit a warning if the detected CUDA version is too new.
-      CudaInstallationDetector &CudaInstallation =
-          static_cast<toolchains::CudaToolChain &>(*CudaTC).CudaInstallation;
-      if (CudaInstallation.isValid())
-        CudaInstallation.WarnIfUnsupportedVersion();
-    }
-    C.addOffloadDeviceToolChain(CudaTC.get(), OFK);
+
+    auto &TC =
+        getOffloadToolChain(C.getInputArgs(), Action::OFK_Cuda, *CudaTriple,
+                            C.getDefaultToolChain().getTriple());
+
+    // Emit a warning if the detected CUDA version is too new.
+    const CudaInstallationDetector &CudaInstallation =
+        static_cast<const toolchains::CudaToolChain &>(TC).CudaInstallation;
+    if (CudaInstallation.isValid())
+      CudaInstallation.WarnIfUnsupportedVersion();
+    C.addOffloadDeviceToolChain(&TC, Action::OFK_Cuda);
   } else if (IsHIP && !UseLLVMOffload) {
     if (auto *OMPTargetArg =
             C.getInputArgs().getLastArg(options::OPT_fopenmp_targets_EQ)) {
@@ -914,14 +980,15 @@ void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
           << OMPTargetArg->getSpelling() << "HIP";
       return;
     }
-    const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();
-    auto OFK = Action::OFK_HIP;
+
     auto HIPTriple = getHIPOffloadTargetTriple(*this, C.getInputArgs());
     if (!HIPTriple)
       return;
-    auto *HIPTC = &getOffloadingDeviceToolChain(C.getInputArgs(), *HIPTriple,
-                                                *HostTC, OFK);
-    C.addOffloadDeviceToolChain(HIPTC, OFK);
+
+    auto &TC =
+        getOffloadToolChain(C.getInputArgs(), Action::OFK_HIP, *HIPTriple,
+                            C.getDefaultToolChain().getTriple());
+    C.addOffloadDeviceToolChain(&TC, Action::OFK_HIP);
   }
 
   if (IsCuda || IsHIP)
@@ -934,10 +1001,8 @@ void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
   // the -fopenmp-targets option or used --offload-arch with OpenMP enabled.
   bool IsOpenMPOffloading =
       ((IsCuda || IsHIP) && UseLLVMOffload) ||
-      (C.getInputArgs().hasFlag(options::OPT_fopenmp, options::OPT_fopenmp_EQ,
-                                options::OPT_fno_openmp, false) &&
-       (C.getInputArgs().hasArg(options::OPT_fopenmp_targets_EQ) ||
-        C.getInputArgs().hasArg(options::OPT_offload_arch_EQ)));
+      (IsOpenMP && (C.getInputArgs().hasArg(options::OPT_fopenmp_targets_EQ) ||
+                    C.getInputArgs().hasArg(options::OPT_offload_arch_EQ)));
   if (IsOpenMPOffloading) {
     // We expect that -fopenmp-targets is always used in conjunction with the
     // option -fopenmp specifying a valid runtime with offloading support, i.e.
@@ -1038,50 +1103,23 @@ void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
       FoundNormalizedTriples[NormalizedName] = Val;
 
       // If the specified target is invalid, emit a diagnostic.
-      if (TT.getArch() == llvm::Triple::UnknownArch)
+      if (TT.getArch() == llvm::Triple::UnknownArch) {
         Diag(clang::diag::err_drv_invalid_omp_target) << Val;
-      else {
-        const ToolChain *TC;
-        // Device toolchains have to be selected differently. They pair host
-        // and device in their implementation.
-        if (TT.isNVPTX() || TT.isAMDGCN() || TT.isSPIRV()) {
-          const ToolChain *HostTC =
-              C.getSingleOffloadToolChain<Action::OFK_Host>();
-          assert(HostTC && "Host toolchain should be always defined.");
-          auto &DeviceTC =
-              ToolChains[TT.str() + "/" + HostTC->getTriple().normalize()];
-          if (!DeviceTC) {
-            if (TT.isNVPTX())
-              DeviceTC = std::make_unique<toolchains::CudaToolChain>(
-                  *this, TT, *HostTC, C.getInputArgs());
-            else if (TT.isAMDGCN())
-              DeviceTC = std::make_unique<toolchains::AMDGPUOpenMPToolChain>(
-                  *this, TT, *HostTC, C.getInputArgs());
-            else if (TT.isSPIRV())
-              DeviceTC = std::make_unique<toolchains::SPIRVOpenMPToolChain>(
-                  *this, TT, *HostTC, C.getInputArgs());
-            else
-              assert(DeviceTC && "Device toolchain not defined.");
-          }
-
-          TC = DeviceTC.get();
-        } else
-          TC = &getToolChain(C.getInputArgs(), TT);
-        C.addOffloadDeviceToolChain(TC, Action::OFK_OpenMP);
-        auto It = DerivedArchs.find(TT.getTriple());
-        if (It != DerivedArchs.end())
-          KnownArchs[TC] = It->second;
+        continue;
       }
+
+      auto &TC = getOffloadToolChain(C.getInputArgs(), Action::OFK_OpenMP, TT,
+                                     C.getDefaultToolChain().getTriple());
+      C.addOffloadDeviceToolChain(&TC, Action::OFK_OpenMP);
+      auto It = DerivedArchs.find(TT.getTriple());
+      if (It != DerivedArchs.end())
+        KnownArchs[&TC] = It->second;
     }
   } else if (C.getInputArgs().hasArg(options::OPT_fopenmp_targets_EQ)) {
     Diag(clang::diag::err_drv_expecting_fopenmp_with_fopenmp_targets);
     return;
   }
 
-  // We need to generate a SYCL toolchain if the user specified -fsycl.
-  bool IsSYCL = C.getInputArgs().hasFlag(options::OPT_fsycl,
-                                         options::OPT_fno_sycl, false);
-
   auto argSYCLIncompatible = [&](OptSpecifier OptId) {
     if (!IsSYCL)
       return;
@@ -1103,9 +1141,9 @@ void Driver::CreateOffloadingDeviceToolChains(Compilation &C,
     // getOffloadingDeviceToolChain, because the device toolchains we're
     // going to create will depend on both.
     const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();
-    for (const auto &TargetTriple : UniqueSYCLTriplesVec) {
-      auto SYCLTC = &getOffloadingDeviceToolChain(
-          C.getInputArgs(), TargetTriple, *HostTC, Action::OFK_SYCL);
+    for (const auto &TT : UniqueSYCLTriplesVec) {
+      auto SYCLTC = &getOffloadToolChain(C.getInputArgs(), Action::OFK_SYCL, TT,
+                                         HostTC->getTriple());
       C.addOffloadDeviceToolChain(SYCLTC, Action::OFK_SYCL);
     }
   }
@@ -3409,7 +3447,8 @@ class OffloadingActionBuilder final {
       // Collect all offload arch parameters, removing duplicates.
       std::set<StringRef> GpuArchs;
       bool Error = false;
-      for (Arg *A : Args) {
+      const ToolChain &TC = *ToolChains.front();
+      for (Arg *A : C.getArgsForToolChain(&TC, "", AssociatedOffloadKind)) {
         if (!(A->getOption().matches(options::OPT_offload_arch_EQ) ||
               A->getOption().matches(options::OPT_no_offload_arch_EQ)))
           continue;
@@ -3420,7 +3459,6 @@ class OffloadingActionBuilder final {
               ArchStr == "all") {
             GpuArchs.clear();
           } else if (ArchStr == "native") {
-            const ToolChain &TC = *ToolChains.front();
             auto GPUsOrErr = ToolChains.front()->getSystemGPUArchs(Args);
             if (!GPUsOrErr) {
               TC.getDriver().Diag(diag::err_drv_undetermined_gpu_arch)
@@ -6604,6 +6642,72 @@ std::string Driver::GetClPchPath(Compilation &C, StringRef BaseName) const {
   return std::string(Output);
 }
 
+const ToolChain &Driver::getOffloadToolChain(
+    const llvm::opt::ArgList &Args, const Action::OffloadKind Kind,
+    const llvm::Triple &Target, const llvm::Triple &AuxTarget) const {
+  auto &TC = ToolChains[Target.str() + "/" + AuxTarget.str()];
+  auto &HostTC = ToolChains[AuxTarget.str()];
+
+  assert(HostTC && "Host toolchain for offloading doesn't exit?");
+  if (!TC) {
+    // Detect the toolchain based off of the target operating system.
+    switch (Target.getOS()) {
+    case llvm::Triple::CUDA:
+      TC = std::make_unique<toolchains::CudaToolChain>(*this, Target, *HostTC,
+                                                       Args);
+      break;
+    case llvm::Triple::AMDHSA:
+      if (Kind == Action::OFK_HIP)
+        TC = std::make_unique<toolchains::HIPAMDToolChain>(*this, Target,
+                                                           *HostTC, Args);
+      else if (Kind == Action::OFK_OpenMP)
+        TC = std::make_unique<toolchains::AMDGPUOpenMPToolChain>(*this, Target,
+                                                                 *HostTC, Args);
+      break;
+    default:
+      break;
+    }
+  }
+  if (!TC) {
+    // Detect the toolchain based off of the target architecture if that failed.
+    switch (Target.getArch()) {
+    case llvm::Triple::spir:
+    case llvm::Triple::spir64:
+    case llvm::Triple::spirv:
+    case llvm::Triple::spirv32:
+    case llvm::Triple::spirv64:
+      switch (Kind) {
+      case Action::OFK_SYCL:
+        TC = std::make_unique<toolchains::SYCLToolChain>(*this, Target, *HostTC,
+                                                         Args);
+        break;
+      case Action::OFK_HIP:
+        TC = std::make_unique<toolchains::HIPSPVToolChain>(*this, Target,
+                                                           *HostTC, Args);
+        break;
+      case Action::OFK_OpenMP:
+        TC = std::make_unique<toolchains::SPIRVOpenMPToolChain>(*this, Target,
+                                                                *HostTC, Args);
+        break;
+      case Action::OFK_Cuda:
+        TC = std::make_unique<toolchains::CudaToolChain>(*this, Target, *HostTC,
+                                                         Args);
+        break;
+      default:
+        break;
+      }
+      break;
+    default:
+      break;
+    }
+  }
+
+  // If all else fails, just look up the normal toolchain for the target.
+  if (!TC)
+    return getToolChain(Args, Target);
+  return *TC;
+}
+
 const ToolChain &Driver::getToolChain(const ArgList &Args,
                                       const llvm::Triple &Target) const {
 
@@ -6797,45 +6901,6 @@ const ToolChain &Driver::getToolChain(const ArgList &Args,
   return *TC;
 }
 
-const ToolChain &Driver::getOffloadingDeviceToolChain(
-    const ArgList &Args, const llvm::Triple &Target, const ToolChain &HostTC,
-    const Action::OffloadKind &TargetDeviceOffloadKind) const {
-  // Use device / host triples as the key into the ToolChains map because the
-  // device ToolChain we create depends on both.
-  auto &TC = ToolChains[Target.str() + "/" + HostTC.getTriple().str()];
-  if (!TC) {
-    // Categorized by offload kind > arch rather than OS > arch like
-    // the normal getToolChain call, as it seems a reasonable way to categorize
-    // things.
-    switch (TargetDeviceOffloadKind) {
-    case Action::OFK_HIP: {
-      if (((Target.getArch() == llvm::Triple::amdgcn ||
-            Target.getArch() == llvm::Triple::spirv64) &&
-           Target.getVendor() == llvm::Triple::AMD &&
-           Target.getOS() == llvm::Triple::AMDHSA) ||
-          !Args.hasArgNoClaim(options::OPT_offload_EQ))
-        TC = std::make_unique<toolchains::HIPAMDToolChain>(*this, Target,
-                                                           HostTC, Args);
-      else if (Target.getArch() == llvm::Triple::spirv64 &&
-               Target.getVendor() == llvm::Triple::UnknownVendor &&
-               Target.getOS() == llvm::Triple::UnknownOS)
-        TC = std::make_unique<toolchains::HIPSPVToolChain>(*this, Target,
-                                                           HostTC, Args);
-      break;
-    }
-    case Action::OFK_SYCL:
-      if (Target.isSPIROrSPIRV())
-        TC = std::make_unique<toolchains::SYCLToolChain>(*this, Target, HostTC,
-                                                         Args);
-      break;
-    default:
-      break;
-    }
-  }
-  assert(TC && "Could n...
[truncated]
 | 
| Contains two dependent commits, last one is the patch. Might need to have some additional error handling to reject known broken architectures, also need to correctly handle things like  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, if I am missing it but I don't see a test emitting the diagnostic err_drv_mix_offload anywhere.
| 
 I'll add a test for it, there's also a few other things I need to tweak. | 
| Hi @jhuber6 Just a quick ping to check if this PR is still alive. I can take a look if it is. Thanks | 
| 
 I need to redo it, but I do want it to work. | 
| ✅ With the latest revision this PR passed the C/C++ code formatter. | 
| I've updated this, it should allow creating multiple offloading toolchains. It's still a little iffy with the old driver, and I'll need a follow-up for SPIR-V on the new driver, but it's a reasonable cleanup. | 
074ca1d    to
    1f7de90      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm minus nits, nice cleanup! it looks like we now have a bit more separation between offloading language and offloading target and we could more easily add new cases which is definitely a plus
i'll have to test this out with OpenMP SPIR-V at some point, but i don't see anything that seems like it would be really bad, and i might have to do something to see if I can hook up runtime detection of Intel GPUs, we should have some tool that can do it
d8927c3    to
    447698d      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by style/syntax mostly review. LGTM overall, with a few nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new message sounds odd, as there's no such thing as HIP or CUDA architecture. I guess it should be something along the lines of "determine target architecture for the HIP compilation".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can reword them a bit. Mostly it just comes from the fact that now it's a generic 'offload-arch' utility instead of two separate ones we can't as easily make a distinction on which architecture it failed on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"HIP compilation can not target GPU architecture..."
Summary: This is a new option that tries to make selecting offloading toolchains more generic. Currently we infer the toolchain from a combination of the kind and the `--offload-arch=` option. Doing this becomes complicated when we want to start supporting multiple targets for a single toolchain, i.e. HIP on SPIR-V or AMDGCN. Currently we hack in a special 'architecture' instead, which isn't extensible. Additionally in the future we may accept compiling CUDA or HIP to some other architecture.
| LLVM Buildbot has detected a new failure on builder  Full details are available at: https://lab.llvm.org/buildbot/#/builders/186/builds/10917 Here is the relevant piece of the build log for the reference | 
| LLVM Buildbot has detected a new failure on builder  Full details are available at: https://lab.llvm.org/buildbot/#/builders/72/builds/13357 Here is the relevant piece of the build log for the reference | 
| LLVM Buildbot has detected a new failure on builder  Full details are available at: https://lab.llvm.org/buildbot/#/builders/64/builds/4943 Here is the relevant piece of the build log for the reference | 
| LLVM Buildbot has detected a new failure on builder  Full details are available at: https://lab.llvm.org/buildbot/#/builders/168/builds/14433 Here is the relevant piece of the build log for the reference | 
Summary: This patch reworks how we create offloading toolchains. Previously we would handle this separately for all the different kinds. This patch instead changes this to use the target triple and the offloading kind to determine the proper toolchain. In the old case where the user only passes --offload-arch we instead infer the triple from the passed arguments. This is a pretty major overhaul but currently passes all the clang tests with only minor changes to error messages. Note: upstream patch authored by jhuber6. amd-staging cherry-pick for PSDB and SPIRV testing llvm#125556
| Failures of the sanitizer builds look related to this https://lab.llvm.org/buildbot/#/builders/169/builds/13161: https://lab.llvm.org/buildbot/#/builders/169/builds/13161:  | 
| Yeah, seems like it's getting corrupt data in a dense map of offloading kinds? Guess I'll need to do a sanitizer build. | 
| Someone else messed up it. The commit has been reverted. | 
| 
 Can you link which one? That's some awfully convenient timing that it caused failures on the offloading path at the same time I landed this. | 
| I don't know which one. My PRs yesterday afternoon have a lot of crash in ADT but after rebase a couple of hours later they were gone. I suppose that would be the one. | 
| Should be fixed by dc87a14. The sentinel value used in this constant map was triggering UBSan. | 
| // incompatible toolchains from the global option. | ||
| Option Opt = C.getDriver().getOpts().getOption(options::OPT_Xarch__); | ||
| unsigned Index = C.getArgs().getBaseArgs().MakeIndex("-Xarch_"); | ||
| Arg *A = new Arg(Opt, C.getArgs().getArgString(Index), Index, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASAN build detected memory leaks in ~75 Driver tests.
bin/clang -### --target=x86_64-linux-gnu --cuda-gpu-arch=gfx900 --rocm-path=clang/test/Driver/Inputs/rocm --cuda-device-only clang/test/Driver/hip-wavefront-size.hip
=================================================================
==845==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 88 byte(s) in 1 object(s) allocated from:
#0 0x7f7b99cdb698 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:95
#1 0x7f7b894eee20 in inferOffloadToolchains(clang::driver::Compilation&, clang::driver::Action::OffloadKind)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any guess why? We call MakeArgString here which creates permanent string storage. The argument string and index should be managed by the argument list as well. The new argument we get out is appended, we do similar things elsewhere. Do you know what could be problematic here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if #150142 fixes it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, #150142 fixed them. Thanks!
Summary: This patch reworks how we create offloading toolchains. Previously we would handle this separately for all the different kinds. This patch instead changes this to use the target triple and the offloading kind to determine the proper toolchain. In the old case where the user only passes --offload-arch we instead infer the triple from the passed arguments. This is a pretty major overhaul but currently passes all the clang tests with only minor changes to error messages. Note: upstream patch authored by jhuber6. amd-staging cherry-pick for PSDB and SPIRV testing llvm#125556
| This broke compilation with Xcode Clang 12 (roughly corresponding to upstream Clang 10): Plus another few similar cases (on lines 953, 958 and 967). | 
| 
 It's not limited to that, I was seeing it too, it's breaking on 32-bit targets with older compilers that do not implicitly move between non-identical types. I've pushed the obvious fix. | 
| 
 Thanks for fixing that. | 
This makes the bug in PR llvm#125556 which was fixed by dc87a14 into a compile-time error.
Summary: This patch reworks how we create offloading toolchains. Previously we would handle this separately for all the different kinds. This patch instead changes this to use the target triple and the offloading kind to determine the proper toolchain. In the old case where the user only passes `--offload-arch` we instead infer the triple from the passed arguments. This is a pretty major overhaul but currently passes all the clang tests with only minor changes to error messages.
This makes the bug in PR llvm#125556 which was fixed by dc87a14 into a compile-time error. ...and fix a newly-discovered instance of this issue, triggered by a `llvm::MapVector<AccessSpecifier, ...>` in clang-tools-extra/clangd/refactor/tweaks/OverridePureVirtuals.cpp.
…19709) This patch reworks the creation of SYCL offloading toolchains, now basing it on the Offloading Kind and the target triple string, passed via either `--offload-targets` or `--offload-arch`. The following PRs/patches from LLVM community have been included in this patch: llvm/llvm-project#125556 llvm/llvm-project@08ac781 llvm/llvm-project@dc87a14 llvm/llvm-project@efffa42 d65cc97 llvm/llvm-project@34447ef intel-restricted/applications.compilers.llvm-project#30861 llvm/llvm-project@4f58c82 llvm/llvm-project@56ba118
…19709) This patch reworks the creation of SYCL offloading toolchains, now basing it on the Offloading Kind and the target triple string, passed via either `--offload-targets` or `--offload-arch`. The following PRs/patches from LLVM community have been included in this patch: llvm/llvm-project#125556 llvm/llvm-project@08ac781 llvm/llvm-project@dc87a14 llvm/llvm-project@efffa42 d65cc97 llvm/llvm-project@34447ef intel-restricted/applications.compilers.llvm-project#30861 llvm/llvm-project@4f58c82 llvm/llvm-project@56ba118
Summary:
This patch reworks how we create offloading toolchains. Previously we would handle this separately for all the different kinds. This patch instead changes this to use the target triple and the offloading kind to determine the proper toolchain. In the old case where the user only passes
--offload-archwe instead infer the triple from the passed arguments. This is a pretty major overhaul but currently passes all the clang tests with only minor changes to error messages.