Skip to content

Remove initval from Model class and add function to draw one #4942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
twiecki opened this issue Aug 17, 2021 · 19 comments
Closed

Remove initval from Model class and add function to draw one #4942

twiecki opened this issue Aug 17, 2021 · 19 comments

Comments

@twiecki
Copy link
Member

twiecki commented Aug 17, 2021

In v3 we (ab)used the Theano .tag.test_value to evaluate the model (Model.check_test_point) at a set point as well as provide a starting point for the sampler, this point was stored in the Model class. In v4 we removed the reliance on testvalue and instead just have a initial_value dictionary in Model that gets updated with a numpy array per RV as the RV gets created. This dict is then referenced by e.g. the sampler. If a user wants to specify their own initval for an RV, they do so during creation of the RV.

After talking to @brandonwillard about several issues related to this mechanism (#4924, #4911) I propose we change this method in the following way:

  1. initval is not stored in the Model object anymore
  2. initvals can not be specified in RV instantiation anymore (this is rarely done in practice anyway, but sometimes it's necessary)
  3. There is, however, a function like pm.sample_prior_predictive() (e.g. pm.sample_mean()) to draw an initvalue.
  4. This point is stored in a dict mapping RV names to numpy arrays.
  5. Model.check_test_point as well as the sampler draw an initival using this function when they need it.
  6. What about user-specified values? pm.sample takes an initval point dict of initvals to override from the default that gets drawn. So when pm.sample draws an initval, it conditions the model on the override dict passed by the user. A user could also draw an initval herself, alter it, and pass that in, so that initvals for all RVs are provided in this way. The same can work for Model.check_test_point.

I think this should cover all the bases. It solves a few of the problems we've been wrestling with:

  • Conceptually it makes sense to not create initvals directly during model definition, because when a user changes the dimensionality of a model, we would need to take care of updating the Model.initval or the sampler will choke.
  • Having a somewhat arbitrary point associated with Model seems like an odd design choice and it's not clear why that should be the case.
@ricardoV94
Copy link
Member

ricardoV94 commented Aug 17, 2021

My only concern is with

What about user-specified values? pm.sample takes an initval point dict of initvals to override from the default. So when pm.sample draws an initval, it conditions the model on the override dict passed by the user. A user could also draw an initval herself, alter it, and pass that in, so that initvals for all RVs are provided in this way. The same can work for Model.check_test_point.

Are we asking users to draw all initvals for all model variables (and remembering to transform them)?

If they want to specify only one initval for a specific variable, it's not so easy to plug-it in, because it might require changes in other variable initvals in order to obtain a set of initial values that is internally consistent. The function model.update_start_vals tries to do this partial plug-in but it fails to address potential constraints downstream of the updated vals


That has nothing to do with whether initvals are defined during variable instantiation, and what gets stored where.

@ricardoV94
Copy link
Member

This point is stored in a dict containing RV names mapping to numpy arrays

I guess this is not cached anymore inside the model object right? That's fine, but we need to go through the many methods in the codebase that make use of it, and make sure it's being propagated

@twiecki
Copy link
Member Author

twiecki commented Aug 17, 2021

If they want to specify only one initval for a specific variable, it's not so easy to plug-it in, because it might require changes in other variable initvals in order to obtain a set of initial values that is internally consistent.

The way I imagine is that we add support to our predictive sampling functions to condition on values and hook into that machinery. And instead of pm.sample_prior_predictive() there could be something like pm.sample_mean() that draws the mean of the RVs.

@michaelosthege
Copy link
Member

I have a major concern with this: User experience.

I know I know, lately we don't care about user experience as much as we care about chastity, but if initial values can't be set at RV creation (or a line after, but that's basically the same) this creates a need to keep around this information all the way until pm.find_MAP or pm.sample are called.

An incredibly powerful modeling workflow is the creation of submodules of a model. Black-boxing parts of a data generating process into a function that creates a sub-graph for it makes it possible to build quite sophisticated multi-level, hybrid models.

Often times it is right in these submodules where setting initial values increases the robustness of the overall modeling workflow.
Taking away that API means the modelers need to come up with their own way to build an information link between these submodules (possibly imported from third party libraries) to their own pm.find_MAP/pm.sample calls.

I don't see why we should be sacrificing this powerful workflow without any real advantages over the lazy symbolic evaluation we discussed earlier. The rezieability, choking the sampler, or having an arbitrary point associated with a model are all addressed by making Model.initial_values symbolic.

@twiecki
Copy link
Member Author

twiecki commented Aug 17, 2021

@michaelosthege I agree that it is a downgrade in UX, but I think it is very minor. 99% of the models I've seen do not require an initval to be set by the user. Maybe your experience is different and it's a feature you use more often? And if it is that rarely used, it's not going to be that cumbersome to have another mechanism for doing so that's a bit more general (because it's really just conditioning).

@ricardoV94
Copy link
Member

ricardoV94 commented Aug 17, 2021

The way I imagine is that we add support to our predictive sampling functions to condition on values and hook into that machinery.

Yeah I like that idea. It's basically what I was suggesting in #4924 with the name of model.compile_initial_point_fn, no?

So then what is the difference between this and #4924:

  1. There is no customization in terms of ["random", "moment"]
  2. User specified (partial) initvals have to be non-symbolic
  3. The initial point is never cached in the model
  4. Customized initvals are not set when the distribution is defined.

Is that it? Or are there other differences?

@fonnesbeck
Copy link
Member

fonnesbeck commented Aug 17, 2021

To me it makes much more sense, both for users and developers, to have initvals passed to sample. In my experience its usually just one or two variables (in some more complex models) that can use a manual initval, and passing that as a dict to sample would be great, rather than being buried in the model specification. Having all the MCMC sampling hyperparameters in one place would be a nice change.

@twiecki
Copy link
Member Author

twiecki commented Aug 17, 2021

@ricardoV94 That all sounds right to me.

@ricardoV94
Copy link
Member

ricardoV94 commented Aug 17, 2021

Often times it is right in these submodules where setting initial values increases the robustness of the overall modeling workflow.
Taking away that API means the modelers need to come up with their own way to build an information link between these submodules (possibly imported from third party libraries) to their own pm.find_MAP/pm.sample calls.

Nothing stops a module from implementing a more sophisticated initval creation function. If we see such logic is being replicated across the codebase (which I think we were seeing a bit with the partial updating + transformation of initvals) then we can always refactor into a more general method.

Do you have any example where the proposed method would not suffice?

@michaelosthege
Copy link
Member

michaelosthege commented Aug 17, 2021

I agree that it is a downgrade in UX, but I think it is very minor. 99% of the models I've seen do not require an initval to be set by the user.

@twiecki that's because in v3 we never drew the initial values from the prior!!! I have been using v4 for multiple projects now and it's significantly less robust right now. Random walks in particular have become very unreliable without initvals.

Do we really want to knowingly downgrade the UX and at the same time removing any chance to get back the old behavior without major changes to user code?

To me it makes much more sense, both for users and developers, to have initvals passed to sample.

You can easily do that with the symbolic Model.initial_values too:

with pymc3.Model() as pmodel:
    rv = pm.Distribution(..., initval=userspecified)
    pm.sample(
        start_values=pmodel.get_initial_values()    # like in see #4924 
    )

Under the hood that needs to happen inside pm.sample by default anyway, or are you suggesting to make passing initial values to pm.sample a user responsibility?

Do you have any example where the proposed method would not suffice?

@ricardoV94 this example is a submodule for creating a random walk. It is used by a user level function where users don't even need to know what PyMC3 is.
The random walk needs to be initialized, otherwise it has terrible convergence.

Much later in their code users may decide to sample the model:

t, y = my dataset

draft_result = fit_mu_t(t, y, ...)    # builds the PyMC3 model and runs find_MAP

# do some plotting

# now decide to sample the posterior

with draft_result.pmodel:
    # Here I don't have a handle on the initial values.
    # As a user I don't even know - or care - about the variables inside
    pymc3.sample()

The proposal by @twiecki to drop the initival kwarg means that people like me will have to come up with their own mechanisms to keep track of the initvals!

I don't even know what the most elegant solution would be from a user perspective. I would probably just monkey patch a initial_values dict onto the model because that has the least architectural implications for my modeling workflow.

@twiecki
Copy link
Member Author

twiecki commented Aug 17, 2021

@twiecki that's because in v3 we never drew the initial values from the prior!!! I have been using v4 for multiple projects now and it's significantly less robust right now. Random walks in particular have become very unreliable without initvals.

Right, I would actually propose to revert the way the initval is drawn to how we did it in v3, i.e. not draw from the prior. That seems very brittle and relies heavily on good specification of priors (which many new users are not good at). It also causes unwanted behavior of users just calling pm.sample() many times until they happen to find a good draw. We already get some randomness around the mean through the jitter in the NUTS sampler.

@michaelosthege for your example, I can definitely see how attaching a point to the model seems easier here. In principle, you could have your fit_mu_t() also return a point next to the model. Of course you'd then have to tell the user to pass that to pm.sample(), which is a bit less nice, but from my perspective still OK.

Alternatively, there could be a pm.sample_representative() that by default just gets the prior mean but can be customized/overwritten by the API designer like you to e.g. return the MAP instead.

@michaelosthege
Copy link
Member

Of course you'd then have to tell the user to pass that to pm.sample(), which is a bit less nice, but from my perspective still OK.

Less nice? And what's next then, should we do the same for transform? After all the transforms only become relevant as soon as you start working with an optimization/sampling algorithm, right?

I'm hyperbolizing, of course, but coming back to the points from your proposal what really are the advantages of ditching the API ?

Conceptually it makes sense to not create initvals directly during model definition, because when a user changes the dimensionality of a model, we would need to take care of updating the Model.initval or the sampler will choke.

Conceptually yes, but from a UX perspective it's a nightmare. And technically we don't even create initvals if we keep them symbolic and it also solves the problems with samplers/dimensionality in an elegant "v4" style.

Having a somewhat arbitrary point associated with Model seems like an odd design choice and it's not clear why that should be the case.

Removing an "odd design" choice that has clear UX advantages doesn't sound like a good argument. And again, a symbolic initial_values isn't a point, but more of a promise. (Sorry for the Javascript term.)

@twiecki
Copy link
Member Author

twiecki commented Aug 18, 2021

I'm hyperbolizing, of course, but coming back to the points from your proposal what really are the advantages of ditching the API ?

  • Cleaner design
  • Hooking into existing functionality
  • More flexibility

Separately, I also talked to @junpenglao about initializing NUTS:
"I would follow Stan, sampling from prior is really unstable

For example, if you have a Cauchy prior and prior sample could be HUGE, and it either takes forever to tune or outright doesn't work

So better to initialize from the U(-1, 1) in the unbounded space like Stan"

So I think we should just do that.

Again, I think with this method, specifying an initval manually will almost never be required, so adding symbolic initial_values with various strategies of setting them seems like overkill to me. We already can draw values using existing functionality.

@fonnesbeck
Copy link
Member

I agree with @twiecki.

What about using moments like we've done in the past? It seems to have served us well.

@twiecki
Copy link
Member Author

twiecki commented Aug 18, 2021

@fonnesbeck ok, thanks for chiming in, I'd say we have enough votes then here to go ahead. This came up while talking to @brandonwillard so unless I misrepresented what we discussed I think he agrees too.

I think moments are a good method too. From there, jitter in the NUTS sampler basically has a similar effect to what STAN is doing.

@twiecki twiecki changed the title Proposal: Remove initval from Model class Remove initval from Model class and add function to draw one Aug 18, 2021
@twiecki
Copy link
Member Author

twiecki commented Aug 18, 2021

Anyone want to take this on? @ricardoV94 @kc611 @Sayam753 ?

@michaelosthege
Copy link
Member

michaelosthege commented Aug 18, 2021

At the very least please please please implement the needed function to draw initial values without removing Model.initial_values and Distribution.__new__(..., initval) API just yet.
@ricardoV94 has pointed out:

I'm not against cleaning up the v3-leftover Model.update_start_values etc. methods, but there's really no need to break the existing API of dict-collecting initials at RV creation.
After all that allegedly cleaner functional evaluation API @twiecki proposed also takes a mix of RVs and ndarrays, which is exactly what Model.initial_values can be.

*I'm assuming we want to continue determining dependent RV's initial values conditioned on user-provided ndarrays.

@ricardoV94
Copy link
Member

ricardoV94 commented Aug 18, 2021

Why don't we start we the two features everyone seems to be on board with?

  1. RV-dispatched method to extract moments / representstive_points (which @kc611 already started)
  2. A method that extracts initvals for a whole model + can have some variables initvals set to user defined constants

From there I agree with @michaelosthege that it's mostly a question of housekeeping.

@twiecki
Copy link
Member Author

twiecki commented Aug 18, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants