The cuda::aligned_size_t<N> type is currently defined in <cuda/std/barrier>.
This requires me to include <cuda/std/barrier> any time I wish to use cuda::aligned_size_t. This is especially problematic as merely including cuda/barrier prevents compiling with <sm_70.
cuda::aligned_size_t is useful independent of its usage with barrier.
I want to be able to use cuda::aligned_size_t without including <cuda(/std)/barrier>.
I'm thinking we could add it to <cuda/cstddef>?