Skip to content

Commit 7a9a425

Browse files
Alexander-JohnstonfwyzardRuyksteffenlarsen
authored
[SYCL][CUDA] Initial CUDA backend support (#1091)
* [SYCL][LIBCLC] Additional libclc builtins to support SYCL work Adds builtins to libclc to support the CUDA backend for SYCL. Contributors Alexander Johnston <[email protected]> David Wood <[email protected]> Victor Lomuller <[email protected]> Signed-off-by: Alexander Johnston <[email protected]> * [SYCL] CMake and lit support for SYCL CUDA backend Adds defines CMake and lit variables used for SYCL CUDA backend development and test Contributors Alexander Johnston <[email protected]> Bjoern Knafla <[email protected]> Ruyman Reyes <[email protected]> Signed-off-by: Alexander Johnston <[email protected]> * [SYCL] Local Accessor Support for CUDA Provides the LocalAccessorToSharedMemory compiler pass required for supporting SYCL local accessors in CUDA. Contributors Alexander Johnston <[email protected]> David Wood <[email protected]> Signed-off-by: Alexander Johnston <[email protected]> * [SYCL][CUDA] Change __spirv_BuiltIn.. to functions Changes the following builtins to functions __spirv_BuiltInGlobalSize __spirv_BuiltInWorkgroupSize __spirv_BuiltInNumWorkgroups __spirv_BuiltInLocalInvocationId __spirv_BuiltInWorkgroupId __spirv_BuiltInGlobalOffset Contributors David Wood <[email protected]> Signed-off-by: Alexander Johnston <[email protected]> * [SYCL][CUDA] Add SYCL CUDA support to clang driver Adds CUDA support for sycl compilation in the clang driver Contributors Alexander Johnston <[email protected]> David Wood <[email protected]> Victor Lomuller <[email protected]> Signed-off-by: Alexander Johnston <[email protected]> * [SYCL][CUDA] Initial Implementation of the CUDA backend Contributors Alan Forbes <[email protected]> Alexander Johnston <[email protected]> Bjoern Knafla <[email protected]> Daniel Soutar <[email protected]> David Wood <[email protected]> Kumudha Narasimhan <[email protected]> Mehdi Goli <[email protected]> Przemek Malon <[email protected]> Ruyman Reyes <[email protected]> Stuart Adams <[email protected]> Svetlozar Georgiev <[email protected]> Steffen Larsen <[email protected]> Victor Lomuller <[email protected]> Signed-off-by: Alexander Johnston <[email protected]> * [SYCL] Update libclc install rules Have libclc install clc-* and libspirv-* to lib and share Signed-off-by: Alexander Johnston <[email protected]> * [SYCL][CUDA] Inline cl namespace to simplify SYCL API usage Synchronise the CUDA backend with the general SYCL changes from #974. Signed-off-by: Andrea Bocci <[email protected]> * Added missing flags for device-side builtins Signed-off-by: Alexander Johnston <[email protected]> * [SYCL][CUDA] Removing unnecessary tool from the tree Acked-by: Victor Lomuller <[email protected]> Signed-off-by: Ruyman <[email protected]> * [SYCL][PI] Fix kernel group info parameter conversion Signed-off-by: Steffen Larsen <[email protected]> * [SYCL][CUDA] Refactor __SYCL_INLINE macro Synchronise the CUDA backend with the general SYCL changes from #1121. Signed-off-by: Andrea Bocci <[email protected]> * [SYCL] Have default_selector consider SYCL_BE Have the default_selector consider the env var SYCL_BE when rating device scores to make choosing a backend easier. Signed-off-by: Alexander Johnston <[email protected]> * [SYCL] Select GlobalPlugin based on SYCL_BE Rather than choose the last found plugin as GlobalPlugin, select it depending on the SYCL_BE env var. Signed-off-by: Alexander Johnston <[email protected]> * [SYCL] Improve default device selection checks Better checks for CUDA and OpenCL devices to match with SYCL_BE in the default device selection, based on the platform version info. Signed-off-by: Alexander Johnston <[email protected]> * [SYCL] Formatting update for device_selector.cpp Signed-off-by: Alexander Johnston <[email protected]> * [SYCL] Changed CUDA unit tests to call through plugin Signed-off-by: Steffen Larsen <[email protected]> * [SYCL] Pass SYCL_BE=PI_OPENCL in check-sycl To ensure that the check-sycl targets test OpenCL devices, pass SYCL_BE=PI_OPENCL. This mirrors the check-sycl-cuda target which passes SYCL_BE=PI_CUDA. Without this it is nondeterministic which device is tested by check-sycl. Signed-off-by: Alexander Johnston <[email protected]> * [SYCL][CUDA] Remove PI_CUDA specific details from clang Removes PI_CUDA specific code paths and tests from clang, opting to always enable them. Signed-off-by: Alexander Johnston <[email protected]> * [SYCL][CUDA] Disable linear_id/opencl-interop.cpp for cuda Signed-off-by: Alexander Johnston <[email protected]> * [SYCL][CUDA] Further fixes to CUDA device selection Fix platform string comparison for CUDA platform detection. Fix device info platform query so that it uses the device's plugin, rather than the GlobalPlugin. Signed-off-by: Alexander Johnston <[email protected]> * [SYCL][CUDA] Code style and cleanup to CUDA support Signed-off-by: Alexander Johnston <[email protected]> * [SYCL] Enable asserts in all buildbot builds Signed-off-by: Alexander Johnston <[email protected]> * [SYCL][CUDA] Minor test and build configuration Fix minor test and build configuration issues introduced in the development of the CUDA backend. Signed-off-by: Alexander Johnston <[email protected]> Co-authored-by: Andrea Bocci <[email protected]> Co-authored-by: Ruyman <[email protected]> Co-authored-by: Steffen Larsen <[email protected]>
1 parent a0c0e33 commit 7a9a425

File tree

820 files changed

+20902
-3437
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

820 files changed

+20902
-3437
lines changed

buildbot/configure.py

Lines changed: 39 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -11,30 +11,49 @@ def do_configure(args):
1111
sycl_dir = os.path.join(args.src_dir, "sycl")
1212
spirv_dir = os.path.join(args.src_dir, "llvm-spirv")
1313
ocl_header_dir = os.path.join(args.obj_dir, "OpenCL-Headers")
14-
icd_loader_lib = ''
14+
icd_loader_lib = os.path.join(args.obj_dir, "OpenCL-ICD-Loader", "build")
15+
llvm_targets_to_build = 'X86'
16+
llvm_enable_projects = 'clang;llvm-spirv;sycl;opencl-aot'
17+
libclc_targets_to_build = ''
18+
sycl_build_pi_cuda = 'OFF'
19+
llvm_enable_assertions = 'ON'
1520

1621
if platform.system() == 'Linux':
17-
icd_loader_lib = os.path.join(args.obj_dir, "OpenCL-ICD-Loader", "build", "libOpenCL.so")
22+
icd_loader_lib = os.path.join(icd_loader_lib, "libOpenCL.so")
1823
else:
19-
icd_loader_lib = os.path.join(args.obj_dir, "OpenCL-ICD-Loader", "build", "OpenCL.lib")
24+
icd_loader_lib = os.path.join(icd_loader_lib, "OpenCL.lib")
25+
26+
if args.cuda:
27+
llvm_targets_to_build += ';NVPTX'
28+
llvm_enable_projects += ';libclc'
29+
libclc_targets_to_build = 'nvptx64--;nvptx64--nvidiacl'
30+
sycl_build_pi_cuda = 'ON'
31+
32+
if args.assertions:
33+
llvm_enable_assertions = 'ON'
2034

2135
install_dir = os.path.join(args.obj_dir, "install")
2236

23-
cmake_cmd = ["cmake",
24-
"-G", "Ninja",
25-
"-DCMAKE_BUILD_TYPE={}".format(args.build_type),
26-
"-DLLVM_EXTERNAL_PROJECTS=sycl;llvm-spirv;opencl-aot",
27-
"-DLLVM_EXTERNAL_SYCL_SOURCE_DIR={}".format(sycl_dir),
28-
"-DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR={}".format(spirv_dir),
29-
"-DLLVM_ENABLE_PROJECTS=clang;sycl;llvm-spirv;opencl-aot",
30-
"-DOpenCL_INCLUDE_DIR={}".format(ocl_header_dir),
31-
"-DOpenCL_LIBRARY={}".format(icd_loader_lib),
32-
"-DLLVM_BUILD_TOOLS=ON",
33-
"-DSYCL_ENABLE_WERROR=ON",
34-
"-DLLVM_ENABLE_ASSERTIONS=ON",
35-
"-DCMAKE_INSTALL_PREFIX={}".format(install_dir),
36-
"-DSYCL_INCLUDE_TESTS=ON", # Explicitly include all kinds of SYCL tests.
37-
llvm_dir]
37+
cmake_cmd = [
38+
"cmake",
39+
"-G", "Ninja",
40+
"-DCMAKE_BUILD_TYPE={}".format(args.build_type),
41+
"-DLLVM_ENABLE_ASSERTIONS={}".format(llvm_enable_assertions),
42+
"-DLLVM_TARGETS_TO_BUILD={}".format(llvm_targets_to_build),
43+
"-DLLVM_EXTERNAL_PROJECTS=sycl;llvm-spirv;opencl-aot",
44+
"-DLLVM_EXTERNAL_SYCL_SOURCE_DIR={}".format(sycl_dir),
45+
"-DLLVM_EXTERNAL_LLVM_SPIRV_SOURCE_DIR={}".format(spirv_dir),
46+
"-DLLVM_ENABLE_PROJECTS={}".format(llvm_enable_projects),
47+
"-DLIBCLC_TARGETS_TO_BUILD={}".format(libclc_targets_to_build),
48+
"-DOpenCL_INCLUDE_DIR={}".format(ocl_header_dir),
49+
"-DOpenCL_LIBRARY={}".format(icd_loader_lib),
50+
"-DSYCL_BUILD_PI_CUDA={}".format(sycl_build_pi_cuda),
51+
"-DLLVM_BUILD_TOOLS=ON",
52+
"-DSYCL_ENABLE_WERROR=ON",
53+
"-DCMAKE_INSTALL_PREFIX={}".format(install_dir),
54+
"-DSYCL_INCLUDE_TESTS=ON", # Explicitly include all kinds of SYCL tests.
55+
llvm_dir
56+
]
3857

3958
print(cmake_cmd)
4059

@@ -63,6 +82,8 @@ def main():
6382
parser.add_argument("-o", "--obj-dir", metavar="OBJ_DIR", required=True, help="build directory")
6483
parser.add_argument("-t", "--build-type",
6584
metavar="BUILD_TYPE", required=True, help="build type, debug or release")
85+
parser.add_argument("--cuda", action='store_true', help="switch from OpenCL to CUDA")
86+
parser.add_argument("--assertions", action='store_true', help="build with assertions")
6687

6788
args = parser.parse_args()
6889

@@ -74,4 +95,3 @@ def main():
7495
ret = main()
7596
exit_code = 0 if ret else 1
7697
sys.exit(exit_code)
77-

clang/include/clang/Basic/DiagnosticDriverKinds.td

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,9 @@ def warn_drv_unknown_cuda_version: Warning<
6464
"Unknown CUDA version %0. Assuming the latest supported version %1">,
6565
InGroup<CudaUnknownVersion>;
6666
def err_drv_cuda_host_arch : Error<"unsupported architecture '%0' for host compilation.">;
67+
def err_drv_no_sycl_libspirv : Error<
68+
"cannot find `libspirv-nvptx64--nvidiacl.bc`. Provide path to libspirv library via "
69+
"-fsycl-libspirv-path, or pass -fno-sycl-libspirv to build without linking with libspirv.">;
6770
def err_drv_mix_cuda_hip : Error<"Mixed Cuda and HIP compilation is not supported.">;
6871
def err_drv_invalid_thread_model_for_target : Error<
6972
"invalid thread model '%0' in '%1' for this target">;

clang/include/clang/Basic/DiagnosticIDs.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ namespace clang {
2828
// Size of each of the diagnostic categories.
2929
enum {
3030
DIAG_SIZE_COMMON = 300,
31-
DIAG_SIZE_DRIVER = 250, // 200 -> 250 for SYCL related diagnostics
31+
DIAG_SIZE_DRIVER = 210,
3232
DIAG_SIZE_FRONTEND = 150,
3333
DIAG_SIZE_SERIALIZATION = 120,
3434
DIAG_SIZE_LEX = 400,

clang/include/clang/Driver/Options.td

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1872,6 +1872,9 @@ def fsycl_help_EQ : Joined<["-"], "fsycl-help=">,
18721872
def fsycl_help : Flag<["-"], "fsycl-help">, Alias<fsycl_help_EQ>,
18731873
Flags<[DriverOption, CoreOption]>, AliasArgs<["all"]>, HelpText<"Emit help information "
18741874
"from all of the offline compilation tools">;
1875+
def fsycl_libspirv_path_EQ : Joined<["-"], "fsycl-libspirv-path=">,
1876+
Flags<[CC1Option, CoreOption]>, HelpText<"Path to libspirv library">;
1877+
def fno_sycl_libspirv : Flag<["-"], "fno-sycl-libspirv">, HelpText<"Disable check for libspirv">;
18751878
def fsyntax_only : Flag<["-"], "fsyntax-only">,
18761879
Flags<[DriverOption,CoreOption,CC1Option]>, Group<Action_Group>;
18771880
def ftabstop_EQ : Joined<["-"], "ftabstop=">, Group<f_Group>;

clang/lib/Basic/Targets/NVPTX.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,8 @@ NVPTXTargetInfo::NVPTXTargetInfo(const llvm::Triple &Triple,
5757
.Default(32);
5858
}
5959

60-
TLSSupported = false;
60+
// FIXME: Needed for compiling SYCL to PTX.
61+
TLSSupported = Triple.getEnvironment() == llvm::Triple::SYCLDevice;
6162
VLASupported = false;
6263
AddrSpaceMap = &NVPTXAddrSpaceMap;
6364
UseAddrSpaceMapMangling = true;

clang/lib/Basic/Targets/NVPTX.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,12 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public TargetInfo {
141141
Opts.support("cl_khr_global_int32_extended_atomics");
142142
Opts.support("cl_khr_local_int32_base_atomics");
143143
Opts.support("cl_khr_local_int32_extended_atomics");
144+
// PTX actually supports 64 bits operations even if the Nvidia OpenCL
145+
// runtime does not report support for it.
146+
// This is required for libclc to compile 64 bits atomic functions.
147+
// FIXME: maybe we should have a way to control this ?
148+
Opts.support("cl_khr_int64_base_atomics");
149+
Opts.support("cl_khr_int64_extended_atomics");
144150
}
145151

146152
/// \returns If a target requires an address within a target specific address

clang/lib/CodeGen/BackendUtil.cpp

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -842,9 +842,6 @@ void EmitAssemblyHelper::EmitAssembly(BackendAction Action,
842842
PerFunctionPasses.add(
843843
createTargetTransformInfoWrapperPass(getTargetIRAnalysis()));
844844

845-
if (LangOpts.SYCLIsDevice)
846-
PerFunctionPasses.add(createSYCLLowerWGScopePass());
847-
848845
CreatePasses(PerModulePasses, PerFunctionPasses);
849846

850847
legacy::PassManager CodeGenPasses;

clang/lib/CodeGen/CGCall.cpp

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -755,6 +755,12 @@ CodeGenTypes::arrangeLLVMFunctionInfo(CanQualType resultType,
755755
return *FI;
756756

757757
unsigned CC = ClangCallConvToLLVMCallConv(info.getCC());
758+
// This is required so SYCL kernels are successfully processed by tools from CUDA. Kernels
759+
// with a `spir_kernel` calling convention are ignored otherwise.
760+
if (CC == llvm::CallingConv::SPIR_KERNEL && CGM.getTriple().isNVPTX() &&
761+
getContext().getLangOpts().SYCLIsDevice) {
762+
CC = llvm::CallingConv::C;
763+
}
758764

759765
// Construct the function info. We co-allocate the ArgInfos.
760766
FI = CGFunctionInfo::create(CC, instanceMethod, chainCall, info,

clang/lib/CodeGen/CodeGenAction.cpp

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
#include "CodeGenModule.h"
1111
#include "CoverageMappingGen.h"
1212
#include "MacroPPCallbacks.h"
13+
#include "SYCLLowerIR/LowerWGScope.h"
1314
#include "clang/AST/ASTConsumer.h"
1415
#include "clang/AST/ASTContext.h"
1516
#include "clang/AST/DeclCXX.h"
@@ -33,6 +34,7 @@
3334
#include "llvm/IR/GlobalValue.h"
3435
#include "llvm/IR/LLVMContext.h"
3536
#include "llvm/IR/LLVMRemarkStreamer.h"
37+
#include "llvm/IR/LegacyPassManager.h"
3638
#include "llvm/IR/Module.h"
3739
#include "llvm/IRReader/IRReader.h"
3840
#include "llvm/Linker/Linker.h"
@@ -326,6 +328,17 @@ namespace clang {
326328
CodeGenOpts.getProfileUse() != CodeGenOptions::ProfileNone)
327329
Ctx.setDiagnosticsHotnessRequested(true);
328330

331+
// The parallel_for_work_group legalization pass can emit calls to
332+
// builtins function. Definitions of those builtins can be provided in
333+
// LinkModule. We force the pass to legalize the code before the link
334+
// happens.
335+
if (LangOpts.SYCLIsDevice) {
336+
PrettyStackTraceString CrashInfo("Pre-linking SYCL passes");
337+
legacy::PassManager PreLinkingSyclPasses;
338+
PreLinkingSyclPasses.add(createSYCLLowerWGScopePass());
339+
PreLinkingSyclPasses.run(*getModule());
340+
}
341+
329342
// Link each LinkModule into our module.
330343
if (LinkInModules())
331344
return;

clang/lib/CodeGen/CodeGenModule.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,8 @@ void CodeGenModule::createSYCLRuntime() {
240240
switch (getTriple().getArch()) {
241241
case llvm::Triple::spir:
242242
case llvm::Triple::spir64:
243+
case llvm::Triple::nvptx:
244+
case llvm::Triple::nvptx64:
243245
SYCLRuntime.reset(new CGSYCLRuntime(*this));
244246
break;
245247
default:

0 commit comments

Comments
 (0)