test(autoware_tensorrt_plugins): add reference kernel tests#12561
test(autoware_tensorrt_plugins): add reference kernel tests#12561mojomex wants to merge 7 commits intoautowarefoundation:mainfrom
Conversation
|
Thank you for contributing to the Autoware project! 🚧 If your pull request is in progress, switch it to draft mode. Please ensure:
|
Co-authored-by: Copilot <copilot@github.com> Signed-off-by: Max SCHMELLER <max.schmeller@tier4.jp>
Co-authored-by: Copilot <copilot@github.com> Signed-off-by: Max SCHMELLER <max.schmeller@tier4.jp>
30a2371 to
fd53532
Compare
Signed-off-by: Max SCHMELLER <max.schmeller@tier4.jp>
Co-authored-by: Copilot <copilot@github.com> Signed-off-by: Max SCHMELLER <max.schmeller@tier4.jp>
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## main #12561 +/- ##
===========================================
- Coverage 18.64% 0.22% -18.42%
===========================================
Files 1918 97 -1821
Lines 131362 3500 -127862
Branches 44502 25 -44477
===========================================
- Hits 24489 8 -24481
+ Misses 86760 3491 -83269
+ Partials 20113 1 -20112
☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| copy_to_device(input_d.get(), input); | ||
|
|
||
| ASSERT_EQ( | ||
| argsort( | ||
| input_d.get(), output_d.get(), workspace_d.get(), input.size(), | ||
| get_argsort_workspace_size(input.size()), stream.get()), | ||
| cudaSuccess); |
There was a problem hiding this comment.
| copy_to_device(input_d.get(), input); | |
| ASSERT_EQ( | |
| argsort( | |
| input_d.get(), output_d.get(), workspace_d.get(), input.size(), | |
| get_argsort_workspace_size(input.size()), stream.get()), | |
| cudaSuccess); | |
| cudaEvent_t copy_event; | |
| cudaEventCreate(©_event); | |
| copy_to_device(input_d.get(), input); | |
| cudaEventRecord(copy_event, 0); // record event on the default stream | |
| cudaStreamWaitEvent(stream.get(), copy_event, cudaEventWaitDefault); | |
| ASSERT_EQ( | |
| argsort( | |
| input_d.get(), output_d.get(), workspace_d.get(), input.size(), | |
| get_argsort_workspace_size(input.size()), stream.get()), | |
| cudaSuccess); |
According to the memcpy synchronous behavior:
For transfers from pageable host memory to device memory, a stream sync is performed before the copy is initiated. The function will return once the pageable buffer has been copied to the staging memory for DMA transfer to device memory, but the DMA to final destination may not have completed.
Since input is pageable host memory, it would be better to insert explicit synchronization before operating on input_d.
Alternatively, we can skip this kind of explicit synchronization if copy_to_device takes a CUDA stream as an argument and performs cudaMemcpyAsync inside.
| DeviceBuffer<std::uint8_t> workspace_d(get_unique_workspace_size(input.size())); | ||
|
|
||
| copy_to_device(input_d.get(), input); | ||
|
|
There was a problem hiding this comment.
same as the the case of argsort-_kernel_test.cpp. better to insert explicit sync
Summary
Adds CUDA kernel vs. known-good CPU reference tests for
autoware_tensorrt_plugins:argsortoutput against a CPU stable-sort referenceuniquevalues, inverse indices, and counts against a CPU referenceCompatibility note
While validating the test, the
uniquecounts check exposed an existing sentinel write bug, so this PR includes the minimal fix needed for the new test to pass: write the final range sentinel torange_ptr + num_out, notrange_ptr + num_out * sizeof(int64_t)(range_ptris anint64_t*).See
for a standalone version of that fix.
Validation
Look out for: