Fix NPU segfault with two concurrent inference pipelines by oonyshch · Pull Request #795 · open-edge-platform/dlstreamer

oonyshch · 2026-04-21T23:05:34Z

Description

When two gvadetect pipelines target device=NPU simultaneously, the NPU VCL compiler crashes with ConvertVPUMI37XX2ELF failed: bad optional access followed by SIGSEGV.
The root cause is a race condition in ov::Core::compile_model()
A secondary issue is that gvafpscounter accesses its global fps_counters map without holding channels_mutex in several functions, which is a data race when multiple pipelines run in the same process.

Note: batch-size of 16 or 32 still fails due to an NPU VCL compiler limitation (ConvertVPUMI37XX2ELF pass), but this PR enables support for batch-sizes up to 8 for the yolo model

Reproduction

Single pipeline:

MODEL_PATH=/path/to/yolo11s/INT8/yolo11s.xml
VIDEO_PATH=/path/to/1192116-sd_640_360_30fps.mp4

gst-launch-1.0 \
  filesrc location=$VIDEO_PATH \
  ! qtdemux ! h264parse ! vaapidecodebin \
  ! vapostproc ! vapostproc \
  ! 'video/x-raw(memory:VAMemory)' \
  ! vapostproc \
  ! 'video/x-raw(memory:VAMemory)' \
  ! gvadetect model=$MODEL_PATH pre-process-backend=va device=NPU batch-size=8 nireq=6 \
  ! gvafpscounter \
  ! fakesink sync=false

Double pipeline:

MODEL_PATH=/path/to/yolo11s/INT8/yolo11s.xml
VIDEO_PATH=/path/to/1192116-sd_640_360_30fps.mp4

gst-launch-1.0 \
  filesrc location=$VIDEO_PATH \
  ! qtdemux ! h264parse ! vaapidecodebin \
  ! vapostproc ! vapostproc \
  ! 'video/x-raw(memory:VAMemory)' \
  ! vapostproc \
  ! 'video/x-raw(memory:VAMemory)' \
  ! gvadetect model=$MODEL_PATH pre-process-backend=va device=NPU batch-size=8 nireq=6 \
  ! gvafpscounter \
  ! fakesink sync=false \
  filesrc location=$VIDEO_PATH \
  ! qtdemux ! h264parse ! vaapidecodebin \
  ! vapostproc ! vapostproc \
  ! 'video/x-raw(memory:VAMemory)' \
  ! vapostproc \
  ! 'video/x-raw(memory:VAMemory)' \
  ! gvadetect model=$MODEL_PATH pre-process-backend=va device=NPU batch-size=8 nireq=6 \
  ! gvafpscounter \
  ! fakesink sync=false

Metrics:

batch-size	Single pipeline	Dual in-process
1	90.49 fps	88.95 fps
2	29.61 fps	29.21 fps
4	31.18 fps	31.05 fps
8	26.22 fps	25.77 fps

How Has This Been Tested?

Locally on ARL-H with Intel Core Ultra 9 285H, NPU driver v1.30, OpenVINO 2026.1.

Checklist:

I agree to use the MIT license for my code changes.
I have not introduced any 3rd party components incompatible with MIT.
I have not included any company confidential information, trade secret, password or security token.
I have performed a self-review of my code.

walidbarakat

over all i think we shouldn't handle the batch size as NPU plugin is take caring of it already

NPU plugin batching

I understand that in case user don't use the same model instance id - which is something against DLS performance guide - the concurrent access to NPU plugin compiling can be the issue. but why we need the locks in FPS counter functionality ?

another question, how this bug was discovered? i tried to run these repro pipelines on my ARL machine and go no reproduction.

note : please let's keep the commit atomic and focusing the needed change for the bug resolving.

walidbarakat · 2026-04-22T10:45:06Z

                if (_app_context) {
                    try {
                        dlstreamer::D3D11ContextPtr d3d11_ctx = dlstreamer::D3D11Context::create(_app_context);
+                        std::lock_guard<std::mutex> lock(compile_mutex());


why we need to create a lock for the remote context for GPU?

walidbarakat · 2026-04-22T10:45:35Z

                if (_app_context) {
                    try {
                        dlstreamer::VAAPIContextPtr vaapi_ctx = dlstreamer::VAAPIContext::create(_app_context);
+                        std::lock_guard<std::mutex> lock(compile_mutex());


why we need to create a lock for the remote context for GPU?

walidbarakat · 2026-04-22T10:46:53Z

+            std::lock_guard<std::mutex> lock(compile_mutex());
+            if (_openvino_context) {
+                _compiled_model = core().compile_model(_model, _openvino_context->remote_context(), ov_params);
+            } else {
+                std::string formatted_device = _device;
+                if (_batch_timeout > -1) {
+                    formatted_device = fmt::format("BATCH:{}({})", _device, _auto_batch_num_requests);
+                }
+                _compiled_model = core().compile_model(_model, formatted_device, ov_params);


here the lock is used for every device and mode, narrow it down to NPU only

walidbarakat · 2026-04-22T10:48:15Z

                _batch_size = core().get_property(config.device(), ov::optimal_batch_size);
            } catch (...) {
-                _batch_size = 1; // Fallback if optimal batch size property is not supported
+                _batch_size = 1;


why delete the comment here?

Serialize ov::Core::compile_model via compile_mutex to prevent NPU plugin init races. Guard gvafpscounter global state with channels_mutex.

oonyshch requested review from BaoHuiling, OskarFiedot, ZiningLi, dmichalo, jmotow, marcin-wadolkowski, mholowni, msmiatac, nszczygl9, pbartosik, qianlongding, tbujewsk, tjanczak, walidbarakat, yangjianfeng1208 and yunowo as code owners April 21, 2026 23:05

oonyshch marked this pull request as draft April 21, 2026 23:09

oonyshch force-pushed the fix/ITEP-88326-npu-concurrent-pipelines branch from 0d3d675 to 13d69e0 Compare April 21, 2026 23:29

oonyshch marked this pull request as ready for review April 21, 2026 23:32

oonyshch force-pushed the fix/ITEP-88326-npu-concurrent-pipelines branch 2 times, most recently from ebee3f4 to fb8d164 Compare April 22, 2026 09:29

pbartosik reviewed Apr 22, 2026

View reviewed changes

Comment thread src/gst/elements/gvafpscounter/fpscounter_c.cpp

walidbarakat requested changes Apr 22, 2026

View reviewed changes

jmotow approved these changes Apr 22, 2026

View reviewed changes

oonyshch added 2 commits April 23, 2026 02:27

Fix NPU segfault when running concurrent inference pipelines

fb8d164

Serialize ov::Core::compile_model via compile_mutex to prevent NPU plugin init races. Guard gvafpscounter global state with channels_mutex.

Restore accidentally removed EOS comment in fpscounter

21c7bac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix NPU segfault with two concurrent inference pipelines#795

Fix NPU segfault with two concurrent inference pipelines#795
oonyshch wants to merge 2 commits intomainfrom
fix/ITEP-88326-npu-concurrent-pipelines

oonyshch commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

walidbarakat left a comment

Uh oh!

walidbarakat Apr 22, 2026

Uh oh!

walidbarakat Apr 22, 2026

Uh oh!

walidbarakat Apr 22, 2026

Uh oh!

walidbarakat Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

oonyshch commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Reproduction

How Has This Been Tested?

Checklist:

Uh oh!

Uh oh!

walidbarakat left a comment

Choose a reason for hiding this comment

Uh oh!

walidbarakat Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

walidbarakat Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

walidbarakat Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

walidbarakat Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

oonyshch commented Apr 21, 2026 •

edited

Loading