Skip to content

Use env. allocators for initializers (#25108) #25281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AndreyOrb
Copy link
Contributor

Description

Pass environment allocators into the session state, if the "session.use_env_allocators" flag was activated (#25108)

Motivation and Context

Initializers use session-local allocators even if env. allocators to be used.

@tianleiwu tianleiwu requested a review from yuslepukhin July 7, 2025 19:59
@tianleiwu
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI, Windows ARM64 QNN CI Pipeline, Windows GPU Doc Gen CI Pipeline, Windows x64 QNN CI Pipeline

Copy link

Azure Pipelines successfully started running 5 pipeline(s).

@AndreyOrb
Copy link
Contributor Author

@tianleiwu Could you rerun the failed test, please?

@fs-eire
Copy link
Contributor

fs-eire commented Jul 8, 2025

I triggered the re-run. If the error still occur, need to investigate why it happened. Error message seems showing it's related to the change:

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : WebGPU validation failed. [Buffer (unlabeled)] used in submit while mapped.
 - While calling [Queue].Submit([[CommandBuffer]])

The debug build passes because test case onnx_backend_test_series.py does not run on a Debug build.

@AndreyOrb
Copy link
Contributor Author

@fs-eire @tianleiwu
Is there a way to run this specific test locally without running all tests?

@tianleiwu
Copy link
Contributor

@fs-eire @tianleiwu Is there a way to run this specific test locally without running all tests?

You can specify a test name like

python onnx_backend_test_series.py -t test_affine_grid_2d_align_corners_expanded_cpu

@AndreyOrb
Copy link
Contributor Author

Thanks. I'm still working on setting up the env. to check the issue.
Do I have to build the --build_wheel for this test, or --use_webgpu is enough?

I'm currently building with
E:\3rdParties\onnxruntime_v1.22.0>.\build.bat --update --config Debug --build_dir ./build_web --parallel --use_binskim_compliant_compile_flags --build_shared_lib --use_webgpu --cmake_generator "Visual Studio 17 2022" --compile_no_warning_as_error --cmake_path E:\3rdParties\cmake-4.0.3\build\bin\Release\cmake.exe --windows_sdk_version 10.0.26100.0

@qjia7
Copy link
Contributor

qjia7 commented Jul 11, 2025

I triggered the re-run. If the error still occur, need to investigate why it happened. Error message seems showing it's related to the change:

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : WebGPU validation failed. [Buffer (unlabeled)] used in submit while mapped.
 - While calling [Queue].Submit([[CommandBuffer]])

The debug build passes because test case onnx_backend_test_series.py does not run on a Debug build.

I remember I ever met a similar error for webgpu. The error was that it went to UMA path after the session initialization which is not expected. The UMA path should only work for the weights uploading since it's in a mapped state. The reason that I went wrongly into UMA path is that I didn't use the session's default allocator which can correctly record whether the session initialization is finished. I used a new webgpu allocator which's session_initialized_ is false but the session has finished the initialization. That's why it went to the UMA path. After switching to the the session's allocator, that issue was resolved. Just for your reference for my case.
You can simply to comment out https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/webgpu/allocator.cc#L19-L21 to see whether your issue still exists.

@AndreyOrb
Copy link
Contributor Author

AndreyOrb commented Jul 11, 2025

@qjia7 Thanks a lot for pointing this out!
@fs-eire @tianleiwu Is there a way to run the failed test in VS in c++ directly? It will help me in debugging the issue.
onnxruntime_test_all.exe --gtest_filter= ???

@AndreyOrb
Copy link
Contributor Author

I see now that all c++ tests have passed, but the python tests have failed.
So, the onnxruntime_test_all.exe will not help me.

Is there any way to debug the c++ issue when running from the onnx_backend_test_series.py?

@yuslepukhin
Copy link
Member

I see now that all c++ tests have passed, but the python tests have failed. So, the onnxruntime_test_all.exe will not help me.

Is there any way to debug the c++ issue when running from the onnx_backend_test_series.py?

You can do mixed debugging using Python C++ Debugger extension for VS Code.

@AndreyOrb
Copy link
Contributor Author

AndreyOrb commented Jul 11, 2025

Thanks, Dmitri, will try.
I'm using VS. As it turns out, VS also has this capability: https://learn.microsoft.com/en-us/visualstudio/python/debugging-mixed-mode-c-cpp-python-in-visual-studio?view=vs-2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants