-
Notifications
You must be signed in to change notification settings - Fork 795
Open
Labels
Description
Initially reported by @slawekptak
Describe the bug
Setting external events onto an in-order queue from a queue that is a recording to a queue that is not recording to cause a transition into recording mode results in a deadlock. This is caused by the queue lock attempted to be acquired twice the implementation. It should be fixed to enable graph compatibility with this extension.
To reproduce
The below code snippet can be used to reproduce the issue and will hang during the submission to Q2.
// Compilation clang++ -fsycl transitive_set_external_event.cpp -o transitive_set_external_event
#include <sycl/sycl.hpp>
#include <cassert>
#include <iostream>
#include <numeric>
#include <vector>
using namespace sycl;
namespace exp_ext = ext::oneapi::experimental;
int main() {
constexpr size_t Size = 128;
device Dev = device::get_devices()[0];
context Ctx{Dev};
queue Q1{Ctx, Dev, {property::queue::in_order{}}};
queue Q2{Ctx, Dev, {property::queue::in_order{}}};
std::vector<int> HostA(Size), HostB(Size);
std::iota(HostA.begin(), HostA.end(), 1);
std::iota(HostB.begin(), HostB.end(), 100);
int *A = malloc_device<int>(Size, Q1);
int *B = malloc_device<int>(Size, Q1);
Q1.copy(HostA.data(), A, Size);
Q1.copy(HostB.data(), B, Size);
Q1.wait_and_throw();
exp_ext::command_graph Graph{Ctx, Dev};
// Begin recording on Q1
Graph.begin_recording(Q1);
// Submit a small kernel on Q1 that increments A
auto E1 = Q1.submit([&](handler &h) {
h.parallel_for(range<1>{Size}, [=](id<1> i) { A[i] += 1; });
});
// Set external event to depend on E1 for Q2.
Q2.ext_oneapi_set_external_event(E1);
// Submissions to Q2 should be considered part of the same graph due to
// the external event linking into recording mode. Submit a kernel on Q2
// that multiplies B by 2 and depends implicitly on the external event.
auto E2 = Q2.submit([&](handler &h) {
h.parallel_for(range<1>{Size}, [=](id<1> i) { B[i] *= 2; });
});
Graph.end_recording(Q1);
// Finalize the graph into an executable
auto Exec = Graph.finalize();
Q1.ext_oneapi_graph(Exec);
Q1.wait_and_throw();
// Copy results back to host for verification
std::vector<int> OutA(Size), OutB(Size);
Q1.copy(A, OutA.data(), Size);
Q1.copy(B, OutB.data(), Size);
Q1.wait_and_throw();
// Verify expected result computed on host
for (size_t i = 0; i < Size; ++i) {
int expectedA = HostA[i] + 1;
int expectedB = HostB[i] * 2;
if (OutA[i] != expectedA || OutB[i] != expectedB) {
std::cerr << "Mismatch at " << i << ": got (" << OutA[i] << ", "
<< OutB[i] << ") expected (" << expectedA
<< ", " << expectedB << ")\n";
return 1;
}
}
// Cleanup
free(A, Q1);
free(B, Q1);
std::cout << "PASS\n";
return 0;
}
Environment
- DPC++ version: produced with 29435fc
- Other environment details not relevant for producing deadlock
Additional context
No response