-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[GPU][WIP] Add L0 runtime support #30789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
build_jenkins |
* removed opencl from targets: openvino_intel_gpu_kernels, openvino_intel_gpu_runtime * opencl will still be linked to openvino gpu plugin even with L0 rt
| url = https://github.com/herumi/xbyak_riscv.git | ||
| [submodule "src/plugins/intel_gpu/thirdparty/l0_onednn_gpu"] | ||
| path = src/plugins/intel_gpu/thirdparty/l0_onednn_gpu | ||
| url = https://github.com/jkasprza/oneDNN.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the record: I guess we will have single copy of onednn_gpu. Please update that before it is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to wait for uxlfoundation/oneDNN#4499 to be merged first.
| bool has_separate_cache; ///< Does the target hardware has separate cache for usm_device and usm_host | ||
|
|
||
| bool supports_cp_offload; ///< [L0] Does the command queue support copy offload | ||
| bool supports_cb_events; ///< [L0] Does the target runtime support counter based events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cb_events sounds ambiguous because cb usually means callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed in 8c9e855
| _profiling_info.clear(); | ||
| } | ||
| // Set event profiling data instead of retrieving it from event object | ||
| void set_profiling(uint64_t duration_nsec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about set_profiling_duration instead? I think the name is difficult to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed in d377c45
| #include <utility> | ||
| #include <utility> | ||
| #include <functional> | ||
| #include <optional> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we need this header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in d377c45
|
|
||
| QueueTypes stream::detect_queue_type(engine_types engine_type, void* queue_handle) { | ||
| switch (engine_type) { | ||
| case engine_types::sycl: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need create_surfaces_lock for sycl_stream as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here sycl_stream derives create_surfaces_lock from ocl_stream. It will create ocl::ocl_surfaces_lock for sycl_stream the same way It was implemented before changes.
| } | ||
|
|
||
| void ze_event::wait_impl() { | ||
| OV_ZE_EXPECT(zeEventHostSynchronize(m_event, default_timeout)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about endless_wait instead of default_timeout to make it explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed in 5ccd52c
| // Ensure event handle is not null | ||
| if (ev == nullptr) { | ||
| OPENVINO_THROW("[GPU] Trying to create event with null handle"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: what about OPENVINO_ASSERT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 8621d68
| void set_impl() override; | ||
| bool is_set_impl() override; | ||
| // TODO: Implement add_event_handler_impl | ||
| // bool add_event_handler_impl(event_handler, void*) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, but is it required?
| ze_event_pool_flags_t flags = is_profiling_enabled() ? ZE_EVENT_POOL_FLAG_KERNEL_TIMESTAMP : 0; | ||
| flags |= ZE_EVENT_POOL_FLAG_HOST_VISIBLE; | ||
| m_current_pool = std::make_shared<ze_event_pool>(m_engine, m_capacity, flags); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this assume single thread execution, right? isn't it necessary to have some synchronization here?
| struct ze_event_factory : public ze_base_event_factory { | ||
| public: | ||
| ze_event_factory(const ze_engine &engine, bool enable_profiling, uint32_t capacity = 255); | ||
| event::ptr create_event(uint64_t queue_stamp) override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does queue_stamp mean here? is it relevant to queue? does it mean unique id of event? What about event_stamp then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here queue_stamp is analogous to already existing queue_stamp from ocl_base_event. It is just a value provided by the stream when creating events and is used for ordering the events for example in ocl_stream::sync_events. It should be unique if the events were created by the same stream. I do not see variable named event_stamp anywhere in the code.
|
|
||
| ze_event_handle_t m_last_event = nullptr; | ||
| std::vector<event::ptr> m_events; | ||
| const ze_engine &m_engine; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please use same convention for naming as ocl_event.
|
|
||
| auto mem_properties = std::find_if(device_memory_properties.begin(), device_memory_properties.end(), [](const ze_device_memory_properties_t& p) { | ||
| auto name = std::string(p.name); | ||
| return name == "DDR" || name == "HBM"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- out of curiosity, what other name does it have other than DDR and HBM? e.g. does it have property like "SRAM"?
- Am I correct that intel GPUs have only 1 type of memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I am not sure. So far, I only encountered single type that is either DDR or HBM
| } | ||
| _mapped_ptr = nullptr; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These code is very similar to the one in ocl_memory.cpp. Could you refactor it to reduce duplication?
| ev_ze, | ||
| ze_dep_events.size(), | ||
| ze_dep_events.data())); | ||
| // FIXME: when not blocking pattern goes out of scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointer to local variable pattern is passed to the zeCommandListAppendMemoryFill. To avoid use after free, memory fill must finish execution before this variable gets destroyed.
|
I finished 1st round of review. Could you check my comment? I see code duplication in many places. Could you review whether it can be reduced? |
Details:
-DGPU_RT_TYPE=L0for cmake to build with L0 support. Current default is OCL