Try to write a implementation based on ctypes and/or Cython to compare performance of the bindings on CPython and PyPy. Investigate whether [Numba's CFFI support](http://numba.pydata.org/numba-doc/latest/reference/pysupported.html#numba.cffi_support.register_module) can be useful to speed up anything. Document pain points for each approach.