Description
As discussed here:
-ffast-math
can be very useful to speedup floating point operations, particularly allowing easier vectorization. I'm seeing a ~30% runtime reduction for matrix multiplication in clang from doing -ffast-math
in this benchmark:
https://github.com/pedrocr/rustc-math-bench
As mentioned in the rust issue the intrinsics already allow a part of this and a wrapper type for f32/f64
can already be implemented. Since SIMD types are already aimed at vectorization and the cost of wrapping/unwrapping is already there would it make sense to enable -ffast-math
for them anyway? Alternatively if there are cases where that doesn't make sense would it be useful to duplicate slow and fast versions of all the types for convenience?