Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions clang/lib/Basic/Targets/SPIR.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,17 @@ void SPIR64TargetInfo::getTargetDefines(const LangOptions &Opts,
DefineStd(Builder, "SPIR64", Opts);
}

bool WindowsX86_64_SPIR64TargetInfo::initFeatureMap(
llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags, StringRef CPU,
const std::vector<std::string> &FeaturesVec) const {
// Mirror X86TargetInfo's "x86_64 always has SSE2" baseline: the matching
// _M_X64 macro makes MSVC STL headers take the x86 intrinsics path, whose
// _mm_* intrinsics require sse/sse2 in the target feature set.
Features["sse"] = true;
Features["sse2"] = true;
return SPIR64TargetInfo::initFeatureMap(Features, Diags, CPU, FeaturesVec);
Comment on lines +90 to +95

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me. Why would we want to claim SSE and SSE2 availability for a target for which those are not relevant?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is confusing.
The _M_X64 macro is already defined for this target, which causes MSVC STL headers to take the x86 intrinsics path. ( yay! )
VS2026's <complex> introduces inline functions like _Sqr_error_x86_x64_fma annotated [[__gnu__::__target__("fma")]], whose bodies call _mm_set_sd / _mm_store_sd. The target attribute layers fma on top of the existing baseline, it doesn't replace it, so if the baseline lacks sse/sse2, those mm* builtins fail their feature-compatibility check at parse time, even though the function is never called from device code.
So we add SSE/SSE2 just to get past that check and to get things move forward again.
To my mind, since this target already claims _M_X64 then it's really a matter of being consistent. In for a penny...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the detailed explanation.

}

void BaseSPIRVTargetInfo::getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const {
DefineStd(Builder, "SPIRV", Opts);
Expand Down
5 changes: 5 additions & 0 deletions clang/lib/Basic/Targets/SPIR.h
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,11 @@ class LLVM_LIBRARY_VISIBILITY WindowsX86_64_SPIR64TargetInfo
return (CC == CC_SpirFunction || CC == CC_DeviceKernel) ? CCCR_OK
: CCCR_Warning;
}

bool
initFeatureMap(llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags,
StringRef CPU,
const std::vector<std::string> &FeaturesVec) const override;
};

// x86-64 SPIR64 Windows Visual Studio target
Expand Down
26 changes: 15 additions & 11 deletions clang/test/CodeGenSYCL/kernel-caller-entry-point.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,24 @@
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple amdgcn-amd-amdhsa -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-AMDGCN %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple nvptx-nvidia-cuda -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-NVPTX %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple nvptx64-nvidia-cuda -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-NVPTX %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spir-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spir64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spirv32-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spirv64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spir-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spir64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spirv32-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spirv64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-host -emit-llvm -triple x86_64-pc-windows-msvc -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-HOST,CHECK-HOST-WINDOWS %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple amdgcn-amd-amdhsa -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-AMDGCN %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple nvptx-nvidia-cuda -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-NVPTX %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple nvptx64-nvidia-cuda -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-NVPTX %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple spir64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple spirv64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple spir64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV,CHECK-SPIR-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple spirv64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-host -emit-llvm -triple x86_64-uefi -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-HOST,CHECK-HOST-WINDOWS %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple amdgcn-amd-amdhsa -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-AMDGCN %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple nvptx-nvidia-cuda -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-NVPTX %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple nvptx64-nvidia-cuda -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-NVPTX %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spir-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spir64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spirv32-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spirv64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spir-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spir64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spirv32-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spirv64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV,CHECK-SPIR-NO-SSE2 %s

// Test code generation for functions declared with the sycl_kernel_entry_point
// attribute. During host compilation, the bodies of such functions are replaced
Expand Down Expand Up @@ -704,5 +704,9 @@ int main() {
// CHECK-NVPTX: #[[NVPTX_ATTR0]] = { convergent mustprogress noinline norecurse nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "sycl-optlevel"="0" }
// CHECK-NVPTX: #[[NVPTX_ATTR1]] = { convergent nounwind "uniform-work-group-size" }
//
// CHECK-SPIR: #[[SPIR_ATTR0]] = { convergent mustprogress noinline norecurse nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "sycl-optlevel"="0" }
// WindowsX86_64_SPIR64TargetInfo::initFeatureMap adds +sse/+sse2 to the
// device-target feature baseline; every other SPIR/SPIRV target class used by
// the RUN lines above leaves it empty.
// CHECK-SPIR-NO-SSE2: #[[SPIR_ATTR0]] = { convergent mustprogress noinline norecurse nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "sycl-optlevel"="0" }
// CHECK-SPIR-SSE2: #[[SPIR_ATTR0]] = { convergent mustprogress noinline norecurse nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "sycl-optlevel"="0" "target-features"="+sse,+sse2" }
// CHECK-SPIR: #[[SPIR_ATTR1]] = { convergent nounwind "uniform-work-group-size" }
20 changes: 20 additions & 0 deletions clang/test/CodeGenSYCL/windows-msvc-spir64-sse2.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
// RUN: %clang_cc1 -triple spir64-unknown-unknown -aux-triple x86_64-pc-windows-msvc \
// RUN: -fsycl-is-device -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s

// When SYCL device code is compiled with a Windows-MSVC host, the device
// target (spir64) defines _M_X64 so that MSVC STL headers take the x86
// intrinsics path. The device target feature set must correspondingly carry
// sse/sse2 so that function-level __target__ attributes (e.g. VS2026
// <complex>'s [[gnu::target("fma")]] on _Sqr_error_x86_x64_fma) don't strip
// the baseline, which would break intrinsic calls like _mm_set_sd / _mm_store_sd.
Comment on lines +4 to +9

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this either. Why would _M_X64 be defined for device compilation when the device target is not x86-64? The issue this PR seeks to address seems to be the result of a prior workaround that might not have been well motivated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. _M_X64 is defined so MSVC's STL headers will compile at all. <yvals_core.h> (transitively included by nearly every STL header) requires one of _M_X64 / _M_IX86 / _M_ARM64 / etc. to be defined. Without it, the header errors out before any SYCL device TU using , , etc. can be parsed.

The MicrosoftX86_64_SPIR64TargetInfo class predates this PR. I'm not sure of the original motivation beyond "STL won't open otherwise." Removing it would require either patching the MSVC STL headers (not really an option) or maintaining a substantial set of SYCL-side wrappers for any STL header a device TU might transitively pull in.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I wasn't aware of the MicrosoftX86_64_SPIR64TargetInfo target. That's ... interesting. It makes sense that certain accommodations are required to use Microsoft's standard library headers given that we're using them for a purpose they weren't at all intended for.

Pondering, I wonder how much success we might have in trying to convince Microsoft to accept a series of PRs that add checks for SYCL_DEVICE_ONLY throughout their headers. That might require some executive level discussion.


#include "Inputs/sycl.hpp"

int main() {
sycl::queue q;
q.submit([&](sycl::handler &h) { h.single_task<class TestK>([=] {}); });
return 0;
}

// CHECK: spir_kernel void @{{.*}}TestK{{.*}}() [[ATTRS:#[0-9]+]]
// CHECK: attributes [[ATTRS]] = {{.*}}"target-features"="+sse,+sse2"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised we don't have an existing test to validate target-features for SPIR targets somewhere. I looked, but I wasn't able to find one. This change isn't specific to SYCL, so this test could be generalized. There are quite a few tests for other targets that validate target-features under clang/test/Driver.

49 changes: 49 additions & 0 deletions sycl/include/sycl/stl_wrappers/__msvc_bit_utils.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
//==---- __msvc_bit_utils.hpp wrapper around MSVC STL ---------------------==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#pragma once

// VS2026 MSVC STL's <__msvc_bit_utils.hpp> declares `__isa_available` (a
// runtime CPU-feature global) and other STL headers (<bit>, <vector>,
// <numeric>, <bitset>, <complex>, <__msvc_int128.hpp>) include this header
// transitively. SYCL device code cannot access host runtime globals, so
// provide a device-side definition before the real header is included.
//
// The VALUE of this variable only steers the STL's runtime feature dispatch
// — both branches compile either way. We pick __ISA_AVAILABLE_X86 (== 0,
// the baseline in <isa_availability.h>), which matches a spir64 device's
// reality (no x86 ISA), and selects the STL's scalar fallback paths if the
// dispatches are ever reached.
//
// Mirror the source structure of MSVC STL's __msvc_bit_utils.hpp, which
// declares the symbol inside `namespace std { extern "C" { ... } }`. The
// `extern "C"` makes namespace placement linkage-neutral, but matching the
// MSVC source layout keeps this wrapper visually aligned with what it shadows.
#if defined(__SYCL_DEVICE_ONLY__) && defined(_MSC_VER)
namespace std {
extern "C" {
int __isa_available __attribute__((sycl_global_var))
__attribute__((weak)) = 0;
}
} // namespace std
#endif // defined(__SYCL_DEVICE_ONLY__) && defined(_MSC_VER)

// Include real STL <__msvc_bit_utils.hpp> header - the next one from the
// include search directories.
#if defined(__has_include_next)
// GCC/clang support go through this path.
#include_next <__msvc_bit_utils.hpp>
#else
// MSVC doesn't support "#include_next", so we have to be creative.
// Our header is located in "stl_wrappers/__msvc_bit_utils.hpp" so it won't be
// picked by the following include. MSVC's installation, on the other hand,
// has the layout where the following would result in the
// <__msvc_bit_utils.hpp> we want. This is obviously hacky, but the best we
// can do...
#include <../include/__msvc_bit_utils.hpp>
#endif
6 changes: 6 additions & 0 deletions sycl/include/sycl/stl_wrappers/complex
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@

#pragma once

// Note: VS2026 MSVC STL's <complex> transitively includes
// <__msvc_bit_utils.hpp>, which declares `__isa_available`. We provide a
// device-side definition for that symbol via our wrapper at
// stl_wrappers/__msvc_bit_utils.hpp, so no per-header workaround is needed
// here.

// Include real STL <complex> header - the next one from the include search
// directories.
#if defined(__has_include_next)
Expand Down
Loading