Skip to content

[SYCL][Graph] Deadlock when using ext_oneapi_set_external_event to transition queue into recording mode #20563

@mmichel11

Description

@mmichel11

Initially reported by @slawekptak

Describe the bug

Setting external events onto an in-order queue from a queue that is a recording to a queue that is not recording to cause a transition into recording mode results in a deadlock. This is caused by the queue lock attempted to be acquired twice the implementation. It should be fixed to enable graph compatibility with this extension.

To reproduce

The below code snippet can be used to reproduce the issue and will hang during the submission to Q2.

// Compilation clang++ -fsycl transitive_set_external_event.cpp -o transitive_set_external_event
#include <sycl/sycl.hpp>

#include <cassert>
#include <iostream>
#include <numeric>
#include <vector>

using namespace sycl;
namespace exp_ext = ext::oneapi::experimental;

int main() {
  constexpr size_t Size = 128;

  device Dev = device::get_devices()[0];
  context Ctx{Dev};

  queue Q1{Ctx, Dev, {property::queue::in_order{}}};
  queue Q2{Ctx, Dev, {property::queue::in_order{}}};

  std::vector<int> HostA(Size), HostB(Size);
  std::iota(HostA.begin(), HostA.end(), 1);
  std::iota(HostB.begin(), HostB.end(), 100);

  int *A = malloc_device<int>(Size, Q1);
  int *B = malloc_device<int>(Size, Q1);

  Q1.copy(HostA.data(), A, Size);
  Q1.copy(HostB.data(), B, Size);
  Q1.wait_and_throw();

  exp_ext::command_graph Graph{Ctx, Dev};

  // Begin recording on Q1
  Graph.begin_recording(Q1);

  // Submit a small kernel on Q1 that increments A
  auto E1 = Q1.submit([&](handler &h) {
    h.parallel_for(range<1>{Size}, [=](id<1> i) { A[i] += 1; });
  });
 
  // Set external event to depend on E1 for Q2.
  Q2.ext_oneapi_set_external_event(E1);

  // Submissions to Q2 should be considered part of the same graph due to
  // the external event linking into recording mode. Submit a kernel on Q2
  // that multiplies B by 2 and depends implicitly on the external event.
  auto E2 = Q2.submit([&](handler &h) {
    h.parallel_for(range<1>{Size}, [=](id<1> i) { B[i] *= 2; });
  });
  
  Graph.end_recording(Q1);

  // Finalize the graph into an executable
  auto Exec = Graph.finalize();

  Q1.ext_oneapi_graph(Exec);
  Q1.wait_and_throw();

  // Copy results back to host for verification
  std::vector<int> OutA(Size), OutB(Size);
  Q1.copy(A, OutA.data(), Size);
  Q1.copy(B, OutB.data(), Size);
  Q1.wait_and_throw();

  // Verify expected result computed on host
  for (size_t i = 0; i < Size; ++i) {
    int expectedA = HostA[i] + 1;
    int expectedB = HostB[i] * 2;
    if (OutA[i] != expectedA || OutB[i] != expectedB) {
      std::cerr << "Mismatch at " << i << ": got (" << OutA[i] << ", "
                << OutB[i] << ") expected (" << expectedA
                << ", " << expectedB << ")\n";
      return 1;
    }
  }

  // Cleanup
  free(A, Q1);
  free(B, Q1);

  std::cout << "PASS\n";
  return 0;
}

Environment

  • DPC++ version: produced with 29435fc
  • Other environment details not relevant for producing deadlock

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions