-
Notifications
You must be signed in to change notification settings - Fork 56
Accelerate the mean operations without axis #589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Accelerate the mean operations without axis #589
Conversation
ThreeMonth03
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yungyuc , could you please review this pull request when you are available?
| template <typename A, typename T> | ||
| class SimpleArrayMixinSum | ||
| { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move sum operation to a seperate class because of complex optimization.
| value_type sum_contiguous() const | ||
| { | ||
| auto athis = static_cast<A const *>(this); | ||
| value_type result; | ||
| if constexpr (is_complex_v<value_type>) | ||
| { | ||
| result = value_type{}; | ||
| } | ||
| else | ||
| { | ||
| result = 0; | ||
| } | ||
| sum_unrolled_generic(athis->data(), athis->size(), 1, result); | ||
| return result; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forget to implement simd for common data type. Would it become a seperate pull request?
| return total; | ||
| } | ||
|
|
||
| void sum_unrolled_generic(const value_type * data_ptr, size_t size, size_t stride, value_type & result) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether it is really unroll the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to tell. If you are not sure about it, why adding it?
yungyuc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good progress. Points to address:
- Do not use threads at the time being. We need a system for controlling threading from outside the computing kernel and it is outside the scope of speeding up one operation.
- Make functions static when you can.
- Clarify why adding seemingly unrolled loop that you are not sure about.
| private: | ||
| void check_c_contiguous(small_vector<size_t> const & shape, | ||
| small_vector<size_t> const & stride) const | ||
| bool is_c_contiguous(small_vector<size_t> const & shape, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be static.
| return true; | ||
| } | ||
|
|
||
| void check_c_contiguous(small_vector<size_t> const & shape, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be static.
| const size_t prefix_len = ndim - 1; | ||
| const size_t total_combinations = calculate_total_combinations(shape, prefix_len); | ||
|
|
||
| const size_t num_threads = static_cast<size_t>(std::thread::hardware_concurrency()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not ready for using threads. Without a system to control how to use threads from outside the computing kernel here, the performance and resource consumption are not predictable.
| return total; | ||
| } | ||
|
|
||
| void sum_unrolled_generic(const value_type * data_ptr, size_t size, size_t stride, value_type & result) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to tell. If you are not sure about it, why adding it?
In this pull request, I accelerate the mean operations.

For 1D contiguous array(unrolling, wo simd):
For 3D non contiguous array(unrolling + multithread):
