You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The prefetch in Stream here: https://github.com/google/highway/blob/master/hwy/ops/arm_neon-inl.h#L4061 in the ARM implementation of Stream can degrade throughput. On a Jetson Nano, I have a Memset-like operation that can achieve 11 GB/s with Store, and is reduced to ~3.5 GB/s with Stream unless I remove the prefetch. Can the prefetch be removed or made optional?