Skip to content

Option for a functional API? #84

@adamboche

Description

@adamboche

Hello! I'm excited to see all the cool ideas going on in the new PyMC, and I'm looking forward to using it for real. I've been following the development a little, and had an idea I wanted to run by you. I'm still new to PyMC, so please correct me if I get anything wrong.

One of the distinctive features of PyMC is its usage of context managers for building models, like this:

with pm.Model() as model:
    eta = pm.Normal("eta", 0, 1, shape=J)
    mu = pm.Normal("mu", 0, sd=1e6)
    tau = pm.HalfCauchy("tau", 5)
    theta = pm.Deterministic("theta", mu + tau * eta)
    obs = pm.Normal("obs", theta, sd=sigma, observed=y)
    trace_h = pm.sample(1000)

plot_summary(model)

This kind of API is powerful in that it allows users to transparently access the sampling backend without extra work, and it makes common workflows really quick and easy. The decorator-based @pm.model API has similar advantages. The developer guide explains the power and flexibility that comes out of this design.

The design also has some side effects:

  • It relies on hidden global mutable state to manage the context, which can be hard for some users to understand. It's not always clear what must be done inside versus outside the context manager, or what state is attached to which objects.
  • It couples the model to the data -- there's no concept of a model in the absence of its observed data.
  • It requires passing the name of the each variable to the variable's constructor. This could be avoided by hacking the AST, but that would be rather less robust, and the Python AST is documented as unstable: "The abstract syntax itself might change with each Python release".

I've been wondering about some possible API designs. Some of them may have been discussed and rejected already; please forgive me if I'm being redundant.

One idea that might be familiar to Python developers might be using a class per model, something like this:

@model
class MyModel:
    J = ConstantInteger()
    eta = Normal(0, 1, shape=J)
    mu = Normal(0, sd=1e6)
    tau = HalfCauchy(5)
    theta = Deterministic(mu + tau * eta)


# Any of these functions could be methods instead.
model = MyModel()
observed = observe(model, data)
trace = sample(observed)
plot_summary(trace)

I'm not 100% sure that it can do everything PyMC needs, but, from my (possibly naive) perspective, having an option like this might have some benefits:

  • All the necessary state can live on the model instance, rather than in a global context or on the distribution objects. Simple functions (or methods) connect the objects of the API, making it composable and easy to use in a library.
  • The model can exist independent of any observed data.
  • No AST hacking is necessary to give each distribution a name. The setup can be done in a class decorator, as in the popular attrs library, or in the attribute initialization through the descriptor protocol, each of which produces plain ol' python objects without hidden state.

There's a lot to explore in this design space. If this seems interesting to people, I'm happy to discuss or try out some implementation ideas, to see if something like this could be possible, and if it'd be nice. I'd love to hear your thoughts! 🙂

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions