Skip to content

New D3D11 backend#796

Draft
yunowo wants to merge 3 commits intomainfrom
d3d11
Draft

New D3D11 backend#796
yunowo wants to merge 3 commits intomainfrom
d3d11

Conversation

@yunowo
Copy link
Copy Markdown
Member

@yunowo yunowo commented Apr 22, 2026

Description

Rewrites the D3D11 async preprocessing pipeline to replace the old custom D3D11Context/D3D11ImagePool/D3D11Converter stack with GStreamer's native GstD3D11 library APIs.

  • Architecture:

    • Replaced custom d3d11_converter wrapper with GstD3D11Converter
    • Replaced custom image pool + staging texture pool with GstD3D11Allocator-based texture pools
    • Use single worker thread + bounded queue for OV submission. Custom thread pool does not improve performance due to device locks
  • Cross-device support:

    • Same D3D11 device: converter + textures on inference device, direct texture access
    • Cross D3D11 device: converter + SHARED-flagged textures on source device, output shared to inference device via DXGI handles
    • This enables model sharing scenario in the python pipelines, where D3D11 devices (decoders) are not shared across pipelines.
  • Thread safety:

    • D3D11 device lock held around all GPU operations (convert, map, buffer unref) to prevent crashes with other elements. Fixes gvawatermark + videorate crashes.
  • inference_impl and base_inference fixes:

    • Prevent FlushInference called on startup EOS signals and deadlock with PushOutput
    • Fix queue element detection. (strcmp->g_str_has_prefix)
  • Test impact

    • Detection results differ slightly from the old code. The old d3d11_converter had no color space config and was using auto processing (auto-brightness, auto-contrast, auto-color enhancement). GstD3D11Converter explicitly configures the input/output color space from GstVideoInfo and has auto processing disabled. GT needs to be updated.

Fixes # (issue)

Any Newly Introduced Dependencies

Please describe any newly introduced 3rd party dependencies in this change. List their name, license information and how they are used in the project.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Checklist:

  • I agree to use the MIT license for my code changes.
  • I have not introduced any 3rd party components incompatible with MIT.
  • I have not included any company confidential information, trade secret, password or security token.
  • I have performed a self-review of my code.

@yunowo yunowo added the windows label Apr 22, 2026
@yunowo yunowo force-pushed the d3d11 branch 3 times, most recently from a6bc3ae to ea2c39d Compare April 22, 2026 07:32
@ZiningLi ZiningLi requested a review from Copilot April 22, 2026 08:21
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Reworks the Windows D3D11 async preprocessing/inference path to use GStreamer’s native GstD3D11 APIs (converter + allocator/texture pools), replacing the prior custom D3D11 wrapper stack and thread pool.

Changes:

  • Replaced custom D3D11 converter/image pool/thread pool stack with a GstD3D11Converter-based converter and GstD3D11Allocator texture pools.
  • Added cross-device conversion + shared-texture handle path and a single worker thread with a bounded submission queue.
  • Adjusted D3D11 image/tensor metadata to include subresource index and store GstD3D11Device* on images.

Reviewed changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/monolithic/inference_backend/include/inference_backend/image.h Extends D3D11 image metadata (subresource index + GstD3D11Device pointer) and Windows texture typing.
src/monolithic/inference_backend/include/inference_backend/buffer_mapper.h Maps D3D11 tensors into Image using texture + subresource index + GstD3D11Device.
src/monolithic/inference_backend/image_inference/image_inference.cpp Switches Windows include to the new unified image_inference_async_d3d11.h.
src/monolithic/inference_backend/image_inference/async_with_d3d11/image_inference_async_d3d11.h Introduces new async D3D11 inference class (texture pools, cross-device resources, worker queue).
src/monolithic/inference_backend/image_inference/async_with_d3d11/image_inference_async_d3d11.cpp Implements the new GStreamer-based conversion pipeline + worker thread.
src/monolithic/inference_backend/image_inference/async_with_d3d11/d3d11_converter.h New D3D11Converter wrapper around GstD3D11Converter.
src/monolithic/inference_backend/image_inference/async_with_d3d11/d3d11_converter.cpp Implements conversion + crop/resize/pad configuration via GstD3D11Converter properties.
src/monolithic/inference_backend/image_inference/async_with_d3d11/CMakeLists.txt Collapses build into a single image_inference_async_d3d11 target using GStreamer D3D11/video.
src/monolithic/gst/inference_elements/base/inference_impl.cpp Fixes queue element detection and avoids holding a mutex across SubmitImages.
src/monolithic/gst/inference_elements/base/gva_base_inference.cpp Avoids flushing inference on EOS/flush-stop before any frames are processed.
src/monolithic/gst/inference_elements/CMakeLists.txt Removes linking against the old d3d11_wrapper target.
include/dlstreamer/gst/mappers/gst_to_d3d11.h Propagates D3D11 subresource index from GstD3D11Memory into D3D11Tensor.
include/dlstreamer/d3d11/tensor.h Adds d3d11_subresource_index handle + accessor for D3D11 tensor arrays.
src/monolithic/inference_backend/image_inference/async_with_d3d11/image_inference_async_d3d11/thread_pool.h Removes old custom thread pool (replaced by single worker thread).
src/monolithic/inference_backend/image_inference/async_with_d3d11/image_inference_async_d3d11/thread_pool.cpp Removes old custom thread pool implementation.
src/monolithic/inference_backend/image_inference/async_with_d3d11/image_inference_async_d3d11/image_inference_async_d3d11.h Removes old async D3D11 inference interface built on custom wrappers.
src/monolithic/inference_backend/image_inference/async_with_d3d11/image_inference_async_d3d11/image_inference_async_d3d11.cpp Removes old async D3D11 inference implementation.
src/monolithic/inference_backend/image_inference/async_with_d3d11/image_inference_async_d3d11/CMakeLists.txt Removes old per-subdir CMake target for legacy async D3D11 code.
src/monolithic/inference_backend/image_inference/async_with_d3d11/d3d11_wrapper/d3d11_images.h Removes legacy D3D11 image + pool code.
src/monolithic/inference_backend/image_inference/async_with_d3d11/d3d11_wrapper/d3d11_images.cpp Removes legacy D3D11 image + pool implementation.
src/monolithic/inference_backend/image_inference/async_with_d3d11/d3d11_wrapper/d3d11_image_map.h Removes legacy D3D11 map/staging texture pool.
src/monolithic/inference_backend/image_inference/async_with_d3d11/d3d11_wrapper/d3d11_image_map.cpp Removes legacy D3D11 map/staging texture pool implementation.
src/monolithic/inference_backend/image_inference/async_with_d3d11/d3d11_wrapper/d3d11_converter.h Removes legacy custom D3D11 video processor converter interface.
src/monolithic/inference_backend/image_inference/async_with_d3d11/d3d11_wrapper/d3d11_converter.cpp Removes legacy custom D3D11 video processor converter implementation.
src/monolithic/inference_backend/image_inference/async_with_d3d11/d3d11_wrapper/d3d11_context.h Removes legacy D3D11 context wrapper (locking, processor cache, format probing).
src/monolithic/inference_backend/image_inference/async_with_d3d11/d3d11_wrapper/d3d11_context.cpp Removes legacy D3D11 context wrapper implementation.
src/monolithic/inference_backend/image_inference/async_with_d3d11/d3d11_wrapper/CMakeLists.txt Removes the legacy d3d11_wrapper CMake target.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mholowni
Copy link
Copy Markdown
Contributor

I like this change, it looks simpler and easier to follow than previous one, but i can see that gstreamer's converter implementation does not use videoprocessor, could you share performance difference between these two implementation?

@yunowo
Copy link
Copy Markdown
Member Author

yunowo commented Apr 23, 2026

I like this change, it looks simpler and easier to follow than previous one, but i can see that gstreamer's converter implementation does not use videoprocessor, could you share performance difference between these two implementation?

Actually gstreamer supports two implementations: GST_D3D11_CONVERTER_BACKEND_VIDEO_PROCESSOR and GST_D3D11_CONVERTER_BACKEND_SHADER. This code has configured to use GST_D3D11_CONVERTER_BACKEND_VIDEO_PROCESSOR only, so the performance and hardware resource usage is similar to the old code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants