Skip to content

ESIMD failures if the SYCL functor isn't decomposed #18317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Naghasan opened this issue May 5, 2025 · 0 comments
Open

ESIMD failures if the SYCL functor isn't decomposed #18317

Naghasan opened this issue May 5, 2025 · 0 comments
Labels
bug Something isn't working esimd Explicit SIMD feature

Comments

@Naghasan
Copy link
Contributor

Naghasan commented May 5, 2025

Describe the bug

Noticed while working on #18258. Some ESIMD tests fails if the functor isn't decomposed. The only difference in emitted code will be the usm pointer never goes through an address space cast to generic (it passed directly as a pointer to generic instead).

The patch implements a flags that forces the non decomposition of the functor, but the behaviour can also be highlighted by simply wrapping the kernel in a lambda.

Affected tests:

SYCL :: ESIMD/lsc/lsc_load_store_2d_smoke.cpp
SYCL :: ESIMD/unified_memory_api/scatter_usm.cpp
SYCL :: ESIMD/usm_gather_scatter_rgba.cpp
SYCL :: ESIMD/usm_gather_scatter_rgba_64.cpp
SYCL :: ESIMD/vadd_raw_send_gen12.cpp

To reproduce

If the patch wasn't merged, wrap the kernel in a lambda. For instance, extracting the base test of scatter_usm.cpp:

       auto kernel = [=](sycl::nd_item<1> ndi) SYCL_ESIMD_KERNEL {
         uint16_t GlobalID = ndi.get_global_id(0);
         simd<int32_t, NOffsets> ByteOffsets(GlobalID * N * sizeof(T),
                                             VS * sizeof(T));
         auto ByteOffsetsView = ByteOffsets.template select<NOffsets, 1>();
         simd<T, N> Vals = gather<T, N, VS>(Out, ByteOffsets);
         Vals *= 2;
         auto ValsView = Vals.template select<N, 1>();
         simd_mask<NOffsets> Pred = 0;
         for (int I = 0; I < NOffsets; I++) {
           Pred[I] = (I % MaskStride == 0) ? 1 : 0;
           if (GlobalID % 4 == 0)
             scatter(Out, ByteOffsets, Vals);
           else if (GlobalID % 4 == 1)
             scatter(Out, ByteOffsetsView, Vals);
           else if (GlobalID % 4 == 2)
             scatter<T, N>(Out, ByteOffsets, ValsView);
           else if (GlobalID % 4 == 3)
             scatter<T, N>(Out, ByteOffsetsView, ValsView);
         }
       };
       cgh.parallel_for(Range, [=](sycl::nd_item<1> ndi)
                                   SYCL_ESIMD_KERNEL { kernel(ndi); });

this force the parameter to be passed as <{ ptr addrspace(4), i32, [4 x i8] }>

Environment

  • OS: [e.g Windows/Linux]
  • Target device and vendor: [e.g. Intel GPU]
  • DPC++ version: [e.g. commit hash or output of clang++ --version]
  • Dependencies version: [e.g. the output of sycl-ls --verbose]

Additional context

No response

@Naghasan Naghasan added bug Something isn't working esimd Explicit SIMD feature labels May 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working esimd Explicit SIMD feature
Projects
None yet
Development

No branches or pull requests

1 participant