Skip to content

Mcc add perf tests improve performance #3699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

AleksandrPanov
Copy link
Contributor

@AleksandrPanov AleksandrPanov commented Mar 15, 2024

Added perf tests to mcc module.
Also these optimizations have been added:

  • added parallel_for_ to performThreshold()
  • removed toL/fromL and added dst to avoid copy data
  • added parallel_for_ to elementWise() ("batch" optimization improves performance of Windows version, Linux without changes).

Configuration:
Ryzen 5950X, 2x16 GB 3000 MHz DDR4
OS: Windows 10, Ubuntu 20.04.5 LTS

Performance results in milliseconds:

OS and alg version process, ms infer, ms
win_default 63.09 457.57
win_optimized_without_batch 48.69 111.78
win_optimized_batch 48.42 47.28
linux_default 50.88 300.7
linux_optimized_batch 36.06 41.62

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

const int num_elements = (int)src.total()*channel;
const double *psrc = (double*)src.data;
double *pdst = (double*)dst.data;
const int batch = 128;
Copy link
Contributor Author

@AleksandrPanov AleksandrPanov Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "batch" optimization improves performance in Windows

Copy link
Member

@dkurt dkurt Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which are common values of num_elements? We can make batch dependent on number of threads:

const int batch = num_elements / max(1, getNumThreads());

or

const int batch = num_elements / (getNumThreads() > 1 ? getNumThreads() * 4 : 1);

instead of 4 you may choose another constant to get batch=128 in you configuration.

Copy link
Contributor Author

@AleksandrPanov AleksandrPanov Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your second sample I got the same performance (47 ms) with a constant of 1024.
const int batch = std::max(1, getNumThreads() > 1 ? num_elements / (1024*getNumThreads()) : num_elements);
// if getNumThreads() == 1 -> batch = num_elements

In your first sample const int batch = num_elements / max(1, getNumThreads()); a regression in performance appears (from 47 ms to 57 ms).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using batch 128, but your second sample would also work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batch - the minimum required number of consecutive elements in an array that a thread can process at one time.

@AleksandrPanov AleksandrPanov requested a review from dkurt March 18, 2024 09:59
@AleksandrPanov AleksandrPanov force-pushed the mcc_add_perf_tests_improve_performance branch 4 times, most recently from b77f40d to 8ca90eb Compare March 22, 2024 07:43
@AleksandrPanov AleksandrPanov force-pushed the mcc_add_perf_tests_improve_performance branch from 8ca90eb to 5b829da Compare March 22, 2024 07:51
@asmorkalov asmorkalov merged commit 5e592c2 into opencv:4.x Mar 26, 2024
@asmorkalov asmorkalov mentioned this pull request Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants