Turn NUTS sampler into Theano Op

If we added a NUTS Op to Theano with implementations in JAX (e.g. from numpyro) as well as C (here's an implementation https://github.com/alumbreras/NUTS-Cpp, or we can use the STAN one directly) we could get incredible speeds across different execution backends.