Skip to content
Merged
11 changes: 11 additions & 0 deletions clang/lib/Basic/Targets/SPIR.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,17 @@ void SPIR64TargetInfo::getTargetDefines(const LangOptions &Opts,
DefineStd(Builder, "SPIR64", Opts);
}

bool WindowsX86_64_SPIR64TargetInfo::initFeatureMap(
llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags, StringRef CPU,
const std::vector<std::string> &FeaturesVec) const {
// Mirror X86TargetInfo's "x86_64 always has SSE2" baseline: the matching
// _M_X64 macro makes MSVC STL headers take the x86 intrinsics path, whose
// _mm_* intrinsics require sse/sse2 in the target feature set.
Features["sse"] = true;
Features["sse2"] = true;
return SPIR64TargetInfo::initFeatureMap(Features, Diags, CPU, FeaturesVec);
Comment on lines +90 to +95

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me. Why would we want to claim SSE and SSE2 availability for a target for which those are not relevant?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is confusing.
The _M_X64 macro is already defined for this target, which causes MSVC STL headers to take the x86 intrinsics path. ( yay! )
VS2026's <complex> introduces inline functions like _Sqr_error_x86_x64_fma annotated [[__gnu__::__target__("fma")]], whose bodies call _mm_set_sd / _mm_store_sd. The target attribute layers fma on top of the existing baseline, it doesn't replace it, so if the baseline lacks sse/sse2, those mm* builtins fail their feature-compatibility check at parse time, even though the function is never called from device code.
So we add SSE/SSE2 just to get past that check and to get things move forward again.
To my mind, since this target already claims _M_X64 then it's really a matter of being consistent. In for a penny...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the detailed explanation.

}

void BaseSPIRVTargetInfo::getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const {
DefineStd(Builder, "SPIRV", Opts);
Expand Down
5 changes: 5 additions & 0 deletions clang/lib/Basic/Targets/SPIR.h
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,11 @@ class LLVM_LIBRARY_VISIBILITY WindowsX86_64_SPIR64TargetInfo
return (CC == CC_SpirFunction || CC == CC_DeviceKernel) ? CCCR_OK
: CCCR_Warning;
}

bool
initFeatureMap(llvm::StringMap<bool> &Features, DiagnosticsEngine &Diags,
StringRef CPU,
const std::vector<std::string> &FeaturesVec) const override;
};

// x86-64 SPIR64 Windows Visual Studio target
Expand Down
26 changes: 15 additions & 11 deletions clang/test/CodeGenSYCL/kernel-caller-entry-point.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,24 @@
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple amdgcn-amd-amdhsa -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-AMDGCN %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple nvptx-nvidia-cuda -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-NVPTX %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple nvptx64-nvidia-cuda -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-NVPTX %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spir-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spir64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spirv32-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spirv64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spir-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spir64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spirv32-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-unknown-linux-gnu -triple spirv64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-host -emit-llvm -triple x86_64-pc-windows-msvc -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-HOST,CHECK-HOST-WINDOWS %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple amdgcn-amd-amdhsa -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-AMDGCN %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple nvptx-nvidia-cuda -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-NVPTX %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple nvptx64-nvidia-cuda -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-NVPTX %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple spir64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple spirv64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple spir64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV,CHECK-SPIR-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-pc-windows-msvc -triple spirv64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-host -emit-llvm -triple x86_64-uefi -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-HOST,CHECK-HOST-WINDOWS %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple amdgcn-amd-amdhsa -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-AMDGCN %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple nvptx-nvidia-cuda -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-NVPTX %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple nvptx64-nvidia-cuda -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-NVPTX %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spir-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spir64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spirv32-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spirv64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spir-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spir64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRNV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spirv32-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV,CHECK-SPIR-NO-SSE2 %s
// RUN: %clang_cc1 -fsycl-is-device -emit-llvm -aux-triple x86_64-uefi -triple spirv64-unknown-unknown -std=c++17 %s -o - | FileCheck --check-prefixes=CHECK-DEVICE,CHECK-SPIR,CHECK-SPIRV,CHECK-SPIR-NO-SSE2 %s

// Test code generation for functions declared with the sycl_kernel_entry_point
// attribute. During host compilation, the bodies of such functions are replaced
Expand Down Expand Up @@ -704,5 +704,9 @@ int main() {
// CHECK-NVPTX: #[[NVPTX_ATTR0]] = { convergent mustprogress noinline norecurse nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "sycl-optlevel"="0" }
// CHECK-NVPTX: #[[NVPTX_ATTR1]] = { convergent nounwind "uniform-work-group-size" }
//
// CHECK-SPIR: #[[SPIR_ATTR0]] = { convergent mustprogress noinline norecurse nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "sycl-optlevel"="0" }
// WindowsX86_64_SPIR64TargetInfo::initFeatureMap adds +sse/+sse2 to the
// device-target feature baseline; every other SPIR/SPIRV target class used by
// the RUN lines above leaves it empty.
// CHECK-SPIR-NO-SSE2: #[[SPIR_ATTR0]] = { convergent mustprogress noinline norecurse nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "sycl-optlevel"="0" }
// CHECK-SPIR-SSE2: #[[SPIR_ATTR0]] = { convergent mustprogress noinline norecurse nounwind optnone "no-trapping-math"="true" "stack-protector-buffer-size"="8" "sycl-optlevel"="0" "target-features"="+sse,+sse2" }
// CHECK-SPIR: #[[SPIR_ATTR1]] = { convergent nounwind "uniform-work-group-size" }
20 changes: 20 additions & 0 deletions clang/test/CodeGenSYCL/windows-msvc-spir64-sse2.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
// RUN: %clang_cc1 -triple spir64-unknown-unknown -aux-triple x86_64-pc-windows-msvc \
// RUN: -fsycl-is-device -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s

// When SYCL device code is compiled with a Windows-MSVC host, the device
// target (spir64) defines _M_X64 so that MSVC STL headers take the x86
// intrinsics path. The device target feature set must correspondingly carry
// sse/sse2 so that function-level __target__ attributes (e.g. VS2026
// <complex>'s [[gnu::target("fma")]] on _Sqr_error_x86_x64_fma) don't strip
// the baseline, which would break intrinsic calls like _mm_set_sd / _mm_store_sd.
Comment on lines +4 to +9

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this either. Why would _M_X64 be defined for device compilation when the device target is not x86-64? The issue this PR seeks to address seems to be the result of a prior workaround that might not have been well motivated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. _M_X64 is defined so MSVC's STL headers will compile at all. <yvals_core.h> (transitively included by nearly every STL header) requires one of _M_X64 / _M_IX86 / _M_ARM64 / etc. to be defined. Without it, the header errors out before any SYCL device TU using , , etc. can be parsed.

The MicrosoftX86_64_SPIR64TargetInfo class predates this PR. I'm not sure of the original motivation beyond "STL won't open otherwise." Removing it would require either patching the MSVC STL headers (not really an option) or maintaining a substantial set of SYCL-side wrappers for any STL header a device TU might transitively pull in.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I wasn't aware of the MicrosoftX86_64_SPIR64TargetInfo target. That's ... interesting. It makes sense that certain accommodations are required to use Microsoft's standard library headers given that we're using them for a purpose they weren't at all intended for.

Pondering, I wonder how much success we might have in trying to convince Microsoft to accept a series of PRs that add checks for SYCL_DEVICE_ONLY throughout their headers. That might require some executive level discussion.


#include "Inputs/sycl.hpp"

int main() {
sycl::queue q;
q.submit([&](sycl::handler &h) { h.single_task<class TestK>([=] {}); });
return 0;
}

// CHECK: spir_kernel void @{{.*}}TestK{{.*}}() [[ATTRS:#[0-9]+]]
// CHECK: attributes [[ATTRS]] = {{.*}}"target-features"="+sse,+sse2"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised we don't have an existing test to validate target-features for SPIR targets somewhere. I looked, but I wasn't able to find one. This change isn't specific to SYCL, so this test could be generalized. There are quite a few tests for other targets that validate target-features under clang/test/Driver.

15 changes: 15 additions & 0 deletions sycl/include/sycl/stl_wrappers/complex
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,21 @@

#pragma once

// Provide __isa_available for MSVC device code BEFORE including STL headers.
// Must come before #include_next so our definition is seen first.
#if defined(__SYCL_DEVICE_ONLY__) && defined(_MSC_VER)
// VS2026 STL headers use __isa_available (a runtime global variable) to
// detect CPU features: `if (__isa_available >= _Stl_isa_available_avx2) ...`.
// SYCL device code cannot access host runtime globals, so provide a device-
// side definition. The VALUE of this variable only steers the STL's runtime
// feature dispatch — both branches of the dispatch compile either way. We
// pick __ISA_AVAILABLE_X86 (== 0, the baseline in <isa_availability.h>),
// which matches a spir64 device's reality (no x86 ISA), and so selects the
// STL's scalar fallback paths if these dispatches are ever reached.
extern "C" int __isa_available __attribute__((sycl_global_var))
__attribute__((weak)) = 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a VS2026 installation available to look at. How is __isa_available declared in <isa_availability.h>? Is its declaration protected by a guard macro? Can we substitute our own header for <isa_availability.h> that defines this variable as constexpr with a value of 0?

@cperkinsintel cperkinsintel May 28, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. __isa_available isn't actually declared in <isa_availability.h>. It's declared inline in stl/inc/__msvc_bit_utils.hpp as:

extern "C" {
extern int __isa_available;
}

with no guard macro. pulls this in. So substituting <isa_availability.h> doesn't intercept the declaration, and shadowing __msvc_bit_utils.hpp to redefine it feels heavier than just providing a definition the way this PR does.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like Microsoft declares __isa_available within the std namespace. We presumably should too.

I imagine this could be a problem outside of just the <complex> header. A quick search of __msvc_bit_utils.hpp indicates it is included by <bit>, <vector>, <numeric>, and <bitset> as well as anything that includes <__msvc_int128.hpp>. Ideally, I think we would predefine this variable (either within the compiler or, preferably, in a pre-included header file. I don't think we have a solution for that at the moment though.

#endif // defined(__SYCL_DEVICE_ONLY__) && defined(_MSC_VER)

// Include real STL <complex> header - the next one from the include search
// directories.
#if defined(__has_include_next)
Expand Down
Loading