Open
Description
.round()
function is very slow compared to platform-native intrinsic on AVX (https://godbolt.org/z/3sdd9jrvW) because it provides a platform-agnostic behavior. Although there are many use cases when the exact behavior on half-way values or INFs and NaNs doesn't matter.
I think adding somewhat like round_fast
function is reasonable.