Skip to content

rjmcmc stepper and example test case #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Conversation

TeaWolf
Copy link

@TeaWolf TeaWolf commented Mar 15, 2022

Hi,

It was recommended to me to make a PR to track progress on this and get help. I'm new to this process so I apologize in advance for my newbieness.

I'm trying to build an RJMCMC stepper in pymc for a Bayesian model selection project I'm working on.
The structure is currently based largely on CompoundStep, but the pymc API is still a little fuzzy to me.

The basic premise is that a supermodel is specified as pm.Model(), that contains all possible sub-models and distinguishes these with label parameters (delta parameters) .
The RJMCMC kernel then has:

  • A mapping between label parameters and corresponding continuous sub-model parameters.
  • A mapping between source/destination label parameters to a Jump stepper

It then randomly chooses to stay in the sub-model parameter space, or to jump.
If staying in the sup-model parameter space, it will delegate to a NUTS sampler on those variables.
Else it will randomly choose a destination space and delegate to the corresponding Jump stepper.

The Jump stepper updates the label parameters indicating that the current sub-model has been changed as well as the continuous parameters according to some intuition on how the models should be mapped together.

The implementation I currently have seems to work for the most part. But I still need to fix stat collection and initial tuning. As well as a problem with the final trace that prevents me from using az.plot_trace(trace, ...), having instead to use az.plot_trace(trace.posterior, ...). The interface I came up with for creating a Jump is also more than a little awkward...

Copy link
Member

@michaelosthege michaelosthege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @TeaWolf
I'm sorry my feedback is rather generic and mostly code organization / OOP, but I think addressing this first will make it easier to read and understand what the stepper is doing.

Comment on lines 126 to 132
# Generate Data from the true model
timepoints = np.linspace(0, 5, 1000)
true_deltas = np.array([[0, 1],[0, 0]])
true_ks = np.array([[0, 2],[0, 0]])
initial_values = np.array([5, 0])
true_model = get_vectorized_solution_for_matrix(true_deltas,true_ks, initial_values)
true_data = [true_model[i](timepoints) for i in range(2)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General advice

ne shouldn't put execution code at the 0th indentation level.
That's because the Python interpreter will start executing this upon import, before even having parsed the entire file.
Also it leads to recursion problems with multiprocessing, because the child processes import the code again..

The typical pattern is this:

# Place this at the end of the file:
if __name__ == "__main__":
    # Now call other functions defined above:
    run_main_example()

The __name__ == "__main__" is only True if the script is primary entry point: python my_script.py

You can start adoping this pattern by simply indenting most of your code and putting it into a function.

How you can make this into a test

If you also name the file to start with test_ - for example test_rjmcmc.py - then pytest will recognize it as a file potentially containing tests.
Any top-level function named test_* will then be recognized as a test case.
For example: def test_rjmcmc_example_one():

Instead of python custom_rjmcmc_sketch.py you can then do pytest --verbose test_rjmcmc.py.

...And that's how you can have your first test case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was't aware of the problem with multiprocessing and 0 indentation, thanks for that!

I wasn't really meaning to make a unittest out of this since I'm not sure what I can assert at the moment. I meant it more as a usage example to clarify my intention. Is there a better place for me to put this script then ? Or should I just make it assert there's been no error ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider it an integration test. You could run it with fewer iterations to make it faster.
A basic assert could be if it can sample to prior/posterior correctly.
You probably also have attributes on the step method instance that you could assert.
Also a seeded test that asserts individual posterior samples is an option.

@TeaWolf
Copy link
Author

TeaWolf commented Mar 31, 2022

So I've started doing some better unit-testing on the stepper to be sure of it's correctness (I'll have to refactor it before putting it here). But I've realized the logp I'm using in the jumps doesn't seem to be correct.
The jump classes have self.logp = self.model.compile_logp() in their __init__ and later simply call self.logp(point). But in comparing this to the analytical posterior I have it doesn't seem to match up at all.
They differ by a factor of 0.6 +- 0.2.

I'm trying to base myself off the way delta_logp was written for the metropolis stepper but the aesara api is confusing me. Is compile_logp() not providing me with the posterior I'm expecting ? (potential, and all prior terms)

@ricardoV94
Copy link
Member

Could there be variable transforms going on? What model are you testing with?

@TeaWolf
Copy link
Author

TeaWolf commented Mar 31, 2022

Yes there are variable transforms going on. The model parameters have uniform priors so they're transformed using the interval transform. But in comparing logp to my anlytical one i reverse the transformation using this

    @classmethod
    def get_backward_transform(cls, rv):
        """Returns a function which performs the backward value_var transform on a scalar"""

        def f(x):
            return rv.tag.value_var.tag.transform.backward(x, *rv.owner.inputs).eval()

        return f

    @classmethod
    def backward_transform_value(cls, rv, value):
        return cls.get_backward_transform(rv)(value)

I think I got this method of obtaining the transformation from some of the pymc test code. is there a better way now?

@ricardoV94
Copy link
Member

I just wondered if the differences you are seeing between your analytical and pymc logp is because you are forgetting the jacobian of the transformations.

You can ignore them with compile_logp(jacobian=False)

@TeaWolf
Copy link
Author

TeaWolf commented Apr 1, 2022

Ah I see the documentation wasn't clear on which jacobian terms these were or how they were included. I've been fetching the jacobian terms myself when I needed them through

    def transform_log_jac_det(self, name, value):
        rv = self.model.named_vars[name]
        return rv.tag.value_var.tag.transform.log_jac_det(
            value, *rv.owner.inputs
        ).eval()

So if jacobian=True the logp(point) is effectively logp(point) + np.sum([transform_log_jac_det(var, point[var]) for var invars]) when jacobian=False. Did I get that right ?
So if I want to manoeuvre in the transformed space I should leave jacobian=True. But if I define my kernel in the untransformed space (so backwards transform -> compute -> forward) I should set jacobian=False. correct?
I'll test again using this.

@ricardoV94
Copy link
Member

ricardoV94 commented Apr 1, 2022

So if jacobian=True the logp(point) is effectively logp(point) + np.sum([transform_log_jac_det(var, point[var]) for var invars]) when jacobian=False. Did I get that right ?

Yes

So if I want to manoeuvre in the transformed space I should leave jacobian=True. But if I define my kernel in the untransformed space (so backwards transform -> compute -> forward) I should set jacobian=False. correct?

Usually you always want the jacobian, except in select cases like optimization (such as find_MAP).

@TeaWolf
Copy link
Author

TeaWolf commented Apr 1, 2022

But for instance if I were to reverse the value_var transformation in order to make my new point proposal in the original space, the value_var transformation's jacobian would not appear in the acceptance fraction. So there I would need jacobian = False, right ?
I understand it's probably better to make point proposals in the newly transformed space, but I'm trying to get what's going on :)

@TeaWolf
Copy link
Author

TeaWolf commented Apr 1, 2022

Somewhat related, I realize that I need to have multivariate (continuous, discrete) priors, as in P[k0, k1, k2, delta] = P[k0](1 - delta) + P[k1,k2]delta, is the a pymc distribution I can manage this with?
Otherwise I figure for the moment I can set sum=False in compile_logp to get all the terms and simply drop the ones I don't need depending on the value of delta. But if I go with the later, how can I recover which variables correspond to which variables in the array ?

@ricardoV94
Copy link
Member

ricardoV94 commented Apr 20, 2022

Sorry for the delay,

But for instance if I were to reverse the value_var transformation in order to make my new point proposal in the original space, the value_var transformation's jacobian would not appear in the acceptance fraction. So there I would need jacobian = False, right ? I understand it's probably better to make point proposals in the newly transformed space, but I'm trying to get what's going on :)

There are two things: 1) value transformation and 2) jacobian adjustments

  1. All densities in the graph are evaluated in terms of the untransformed (constrained) values. If you have a LogTransformation for x, all densities that depend on x, will use exp(x) instead of x
  2. Jacobian adjustments only matter for the the density of the respective value, they don't matter for other densities that may use this value. So the distribution of x needs the jacobian correction for that transformation, but other distributions that rely on x do not. It's one jacobian per value.

The jacobian is there to ensure that a sampling in the unconstrained space actually behaves as if sampling in the constrained space. If you sample x = pm.Exponential("x", 1) in log-space but don't apply the jacobian correction, the accepted samples will not follow the prescribed prior.

Does this help?

@ricardoV94
Copy link
Member

Somewhat related, I realize that I need to have multivariate (continuous, discrete) priors, as in P[k0, k1, k2, delta] = P[k0](1 - delta) + P[k1,k2]delta, is the a pymc distribution I can manage this with? Otherwise I figure for the moment I can set sum=False in compile_logp to get all the terms and simply drop the ones I don't need depending on the value of delta. But if I go with the later, how can I recover which variables correspond to which variables in the array ?

I don't understand this question. We have some multivariate discrete and continuous distributions, but is there something specific you need? Is this for the model specification or for the sampler? For the later we usually just use scipy/ numpy random distribution utilities.

@TeaWolf
Copy link
Author

TeaWolf commented Apr 23, 2022

Thanks for the reply!
Right so I have a work around at the moment that looks as follows:

    def logp(self, point):
        """
        Calculate logp by retrieving all the terms freeRv + potential terms from pymc and summing

        The prior on all the delta parameters is simply ignored if self.uniform_model_prior is True
        This way it disappears form the acceptance fraction, making all models equally probable
        """
        terms = self.logp_terms(point)

        priors = 0
        if not self.uniform_model_prior:
            raise NotImplementedError("TODO find a good way to specify different model priors")
            # Everything could be left to pymc priors on deltas but that's not feasible on finite spaces I think.
            # priors = [terms[self.logp_idx_mapping[d]] for d in self.delta_parameters] # get the model prior
        else:  
            # uniform_model_prior simply means we ignore any information given on the delta terms
            # select the continuous priors based on the model
            current_subspace_names = self.configurations_to_subspaces[self.get_configuration_from_point(point)]
            for var in current_subspace_names:
                priors += terms[self.logp_idx_mapping[var]]

            # full logp
            return priors + terms[-1] # add in the potential 

So in this formalism the prior is a joint function of both model (specified as a vector of delta) and the model parameters. the space of model parameters is of different dimension depending on what the value of deltas is.
so logp = likelihood(x, deltas) * prior(x, deltas). but since the priors are specified as uni-variate distributions, I need them to only be included when the deltas are right. This is the type of conditional distribution I'm looking for.

The fix I applied above seems to work fine, but is there a better way of getting this done in pymc?

@ricardoV94 ricardoV94 marked this pull request as draft March 30, 2023 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants