Fix NPU segfault with two concurrent inference pipelines#795
Fix NPU segfault with two concurrent inference pipelines#795
Conversation
0d3d675 to
13d69e0
Compare
ebee3f4 to
fb8d164
Compare
walidbarakat
left a comment
There was a problem hiding this comment.
over all i think we shouldn't handle the batch size as NPU plugin is take caring of it already
I understand that in case user don't use the same model instance id - which is something against DLS performance guide - the concurrent access to NPU plugin compiling can be the issue. but why we need the locks in FPS counter functionality ?
another question, how this bug was discovered? i tried to run these repro pipelines on my ARL machine and go no reproduction.
note : please let's keep the commit atomic and focusing the needed change for the bug resolving.
| if (_app_context) { | ||
| try { | ||
| dlstreamer::D3D11ContextPtr d3d11_ctx = dlstreamer::D3D11Context::create(_app_context); | ||
| std::lock_guard<std::mutex> lock(compile_mutex()); |
There was a problem hiding this comment.
why we need to create a lock for the remote context for GPU?
| if (_app_context) { | ||
| try { | ||
| dlstreamer::VAAPIContextPtr vaapi_ctx = dlstreamer::VAAPIContext::create(_app_context); | ||
| std::lock_guard<std::mutex> lock(compile_mutex()); |
There was a problem hiding this comment.
why we need to create a lock for the remote context for GPU?
| std::lock_guard<std::mutex> lock(compile_mutex()); | ||
| if (_openvino_context) { | ||
| _compiled_model = core().compile_model(_model, _openvino_context->remote_context(), ov_params); | ||
| } else { | ||
| std::string formatted_device = _device; | ||
| if (_batch_timeout > -1) { | ||
| formatted_device = fmt::format("BATCH:{}({})", _device, _auto_batch_num_requests); | ||
| } | ||
| _compiled_model = core().compile_model(_model, formatted_device, ov_params); |
There was a problem hiding this comment.
here the lock is used for every device and mode, narrow it down to NPU only
| _batch_size = core().get_property(config.device(), ov::optimal_batch_size); | ||
| } catch (...) { | ||
| _batch_size = 1; // Fallback if optimal batch size property is not supported | ||
| _batch_size = 1; |
There was a problem hiding this comment.
why delete the comment here?
Serialize ov::Core::compile_model via compile_mutex to prevent NPU plugin init races. Guard gvafpscounter global state with channels_mutex.
Description
When two
gvadetectpipelines targetdevice=NPUsimultaneously, the NPU VCL compiler crashes withConvertVPUMI37XX2ELF failed: bad optional accessfollowed by SIGSEGV.The root cause is a race condition in
ov::Core::compile_model()A secondary issue is that
gvafpscounteraccesses its globalfps_countersmap without holdingchannels_mutexin several functions, which is a data race when multiple pipelines run in the same process.Note:
batch-sizeof 16 or 32 still fails due to an NPU VCL compiler limitation (ConvertVPUMI37XX2ELFpass), but this PR enables support for batch-sizes up to 8 for the yolo modelReproduction
Single pipeline:
Double pipeline:
Metrics:
How Has This Been Tested?
Locally on ARL-H with Intel Core Ultra 9 285H, NPU driver v1.30, OpenVINO 2026.1.
Checklist: