Skip to content

Turn NUTS sampler into Theano Op #4210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
twiecki opened this issue Nov 7, 2020 · 8 comments
Closed

Turn NUTS sampler into Theano Op #4210

twiecki opened this issue Nov 7, 2020 · 8 comments

Comments

@twiecki
Copy link
Member

twiecki commented Nov 7, 2020

If we added a NUTS Op to Theano with implementations in JAX (e.g. from numpyro) as well as C (here's an implementation https://github.com/alumbreras/NUTS-Cpp, or we can use the STAN one directly) we could get incredible speeds across different execution backends.

@twiecki twiecki transferred this issue from aesara-devs/aesara Nov 9, 2020
@twiecki twiecki changed the title Add NUTS Op Turn NUTS sampler into Theano Op Nov 9, 2020
@pymc-devs pymc-devs deleted a comment from brandonwillard Nov 9, 2020
@StephenHogg
Copy link
Contributor

I'm willing to take this one on if you can provide some guidance. Could be a good Christmas project.

@twiecki
Copy link
Member Author

twiecki commented Dec 3, 2020

Great! @brandonwillard is the person with the most mature vision on this.

@brandonwillard
Copy link
Contributor

This is a really important implementation, so, yeah, if you need any input from me, ask away.

Otherwise, here's a template for one approach to converting our samplers to Theano Ops. It's basically just a quick outline of the Op creation process.

The questions that arise within this task are mostly about where/when to "Theano-ize" the NUTS sampler.

For instance, we could make one big NUTS Op that uses all the existing Python sampler code in PyMC3 (e.g. like the template in the link above). That work would mostly be about determining sufficient Theano inputs for such an Op (e.g. the log-likelihood graphs, the sampler parameters, etc., would all need to be Theano objects).

The other end of the spectrum would have us rewriting the core functionality in pymc3.step_methods.hmc using as many pre-existing Theano Ops as possible—and perhaps a few new ones when necessary. This approach is the ideal, since it allows us to apply all the existing Theano optimizations and C transpilation. Plus, this is how we could more seamlessly convert our samplers to JAX.

One of the difficulties with this approach is that our samplers uses some custom "helper" classes (e.g. pymc3.step_methods.hmc.quadpotential.QuadPotential) and those would need to be refactored into a form that's more convenient for Theano (e.g. either as Ops and/or a series of helper functions).
The reason for this is that one cannot easily put something like a QuadPotential instance into a Theano graph; not without creating custom theano.gof.type.Types, at least.

Basically, Theano graphs model lambda expressions only, and Theano doesn't offer the OO abstraction on top of that, so one simply has to reformulate the OO-centric parts of the sampler implementations.

@StephenHogg
Copy link
Contributor

Ok so if we do the core rewrite approach - step one would be to go through pymc3.step_methods.hmc and identify everything that needs to be re-written using a Theano Op, is that right? If I have a first go at this and put that list in here, probably we can look for any holes in it and then start knocking things down.

@brandonwillard
Copy link
Contributor

Ok so if we do the core rewrite approach - step one would be to go through pymc3.step_methods.hmc and identify everything that needs to be re-written using a Theano Op

Yes, but, before that even, we might want to clarify exactly how this whole thing should work and what the inputs and outputs should be.

For example, we could write a sample function that takes a map of TensorVariables—corresponding to a model's random variables—and their log-likelihood graphs (e.g. as given by a PyMC3 Model object), integer-typed Scalars for the number of samples (e.g. warm-up, burn-in), etc., and returns a list of TensorVariables with dimensions matching each random variable input by the number of requested samples.

Once those inputs and their Theano types are clear, we can start to walk through the PyMC3 code and get an understanding of exactly what needs to be done.

We can always take a cue from pymc3.sampling, but just remember that it's doing a lot more than we want/need to right now. The same goes for any of the samplers we reimplement in Theano: we only want to shoot for the most basic and important functionality right now. Actually, @eigenfoo's littlemcmc might be a better reference point, since it's a bit more streamlined than PyMC3's.

For that matter, we might be better off implementing a few simple Metropolis samplers first to see where the difficulties are (if any). Here's an old project that started doing just that. There's not much we can borrow from it except perhaps some examples of simple sampler loop logic (e.g. a Metropolis-Hastings sampler), but it's worth a look.
Also, since most people will be primarily interested in the performance of a Theano-based sampler, these simpler samplers should provide a clearer view of any bottlenecks that need to be addressed before moving on to HMC and/or NUTS.

@StephenHogg
Copy link
Contributor

Makes sense - in the absence of the ability to think critically (at this stage), about this, let me just get the requirements for the first step clear and I'll start working on it. What you're after here is

  • a sample function (probably in a throwaway file in a branch) that takes a map of TensorVariables—corresponding to a model's random variables—and their log-likelihood graphs (e.g. as given by a PyMC3 Model object), integer-typed Scalars for the number of samples (e.g. warm-up, burn-in), etc., and returns a list of TensorVariables with dimensions matching each random variable input by the number of requested samples.
  • the function should be largely inspired by an existing MH implementation, whether from littlemcmc or elsewhere - no bells and whistles for now
  • once this is done, identify any clear bottlenecks in the code

So just to be clear, the map that the sample function should take comprises TensorVariables as keys - what datatype should be used for the log-likelihood graph (the map values)?

@StephenHogg
Copy link
Contributor

Having looked through littlemcmc a bit, I think either I need closer help working on this or should be helping someone else.

@ricardoV94
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants