Skip to content

Conversation

@ThreeMonth03
Copy link
Collaborator

In this pull request, I accelerate the mean operations.
For 1D contiguous array(unrolling, wo simd):
performance_1d_mean

For 3D non contiguous array(unrolling + multithread):
performance_3d_non_contiguous_mean_1E7

@ThreeMonth03 ThreeMonth03 changed the title Accelerate the mean operations. Accelerate the mean operations without axis. Sep 26, 2025
Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yungyuc , could you please review this pull request when you are available?

Comment on lines +146 to +149
template <typename A, typename T>
class SimpleArrayMixinSum
{

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move sum operation to a seperate class because of complex optimization.

Comment on lines +171 to +185
value_type sum_contiguous() const
{
auto athis = static_cast<A const *>(this);
value_type result;
if constexpr (is_complex_v<value_type>)
{
result = value_type{};
}
else
{
result = 0;
}
sum_unrolled_generic(athis->data(), athis->size(), 1, result);
return result;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget to implement simd for common data type. Would it become a seperate pull request?

return total;
}

void sum_unrolled_generic(const value_type * data_ptr, size_t size, size_t stride, value_type & result) const
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether it is really unroll the loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to tell. If you are not sure about it, why adding it?

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress. Points to address:

  • Do not use threads at the time being. We need a system for controlling threading from outside the computing kernel and it is outside the scope of speeding up one operation.
  • Make functions static when you can.
  • Clarify why adding seemingly unrolled loop that you are not sure about.

private:
void check_c_contiguous(small_vector<size_t> const & shape,
small_vector<size_t> const & stride) const
bool is_c_contiguous(small_vector<size_t> const & shape,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be static.

return true;
}

void check_c_contiguous(small_vector<size_t> const & shape,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be static.

const size_t prefix_len = ndim - 1;
const size_t total_combinations = calculate_total_combinations(shape, prefix_len);

const size_t num_threads = static_cast<size_t>(std::thread::hardware_concurrency());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not ready for using threads. Without a system to control how to use threads from outside the computing kernel here, the performance and resource consumption are not predictable.

return total;
}

void sum_unrolled_generic(const value_type * data_ptr, size_t size, size_t stride, value_type & result) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to tell. If you are not sure about it, why adding it?

@yungyuc yungyuc added performance Profiling, runtime, and memory consumption array Multi-dimensional array implementation labels Sep 28, 2025
@yungyuc yungyuc changed the title Accelerate the mean operations without axis. Accelerate the mean operations without axis Sep 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

array Multi-dimensional array implementation performance Profiling, runtime, and memory consumption

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants