Highlights since version 0.0.4.2:
Use __builtin_shuffle for 'generic' SIMD code
Some compilers, including GCC, provide a built-in function to efficiently shuffle vector elements without resorting to platform-specific SIMD intrinsics. If available, we now use this function instead of a hand-written byte-wise implementation for the 'generic' implementations of the SIMD routines. For non-generic implementations, the code generated by __builtin_shuffle is slightly more complicated than the hand-written intrinsics code.
See commit 724dddb for more information, including how compiler support is checked in the ./configure script.
See: 724dddb
Detect and use system-provided __get_cpuid_count
The X86 system headers coming with GCC 6.3 now provide a definition of __get_cpuid_count in cpuid.h. We define said function in a cbits module as well (for compilers not providing an implementation in their
headers), which conflicts.
A test for the declaration is now performed by ./configure, and if provided by the system, this version of the routine is used.
See: 0458a96
Various
- Dependency version bounds of
optparse-applicativeare widened to support current Stackage nightly. Related API changes are handled as well. See 495369d.