Skip to content

add utility method to plot the pdf of a variable #5032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
drbenvincent opened this issue Sep 28, 2021 · 22 comments
Open

add utility method to plot the pdf of a variable #5032

drbenvincent opened this issue Sep 28, 2021 · 22 comments

Comments

@drbenvincent
Copy link

drbenvincent commented Sep 28, 2021

In discussions in PyMC Labs, a client asked if there was a simple utility function to plot the pdf of a PyMC variable.

As a quick hack I came up with the following:

Continuous

def plot_cont(self):
    """Plot pdf of continuous dist, with semi clever setting of range"""
    samples = self.random(size=10_000)
    x = np.linspace(np.min(samples), np.max(samples), 1000)
    plt.plot(x, np.exp(self.logp(x)).eval())

# add plot method to abstract class Continuous
pm.Continuous.plot = plot_cont

# example use
pm.Normal.dist(mu=1, sd=2).plot()
pm.Laplace.dist(mu=0, b=1).plot()

which results in the following
Screenshot 2021-09-28 at 17 52 24

Discrete

def plot_disc(self):
    """Plot pdf of discrete dist, with semi clever setting of range"""
    samples = self.random(size=10_000)
    x = np.arange(np.min(samples), np.max(samples)+1)
    plt.plot(x, np.exp(self.logp(x)).eval(), "-o")

# add plot method to abstract class Continuous
pm.Discrete.plot = plot_disc

# example use
pm.Binomial.dist(n=10, p=0.2).plot()
pm.Geometric.dist(p=0.5).plot()

which results in this
Screenshot 2021-09-28 at 17 53 30

So this issue is to see what people think about adding this functionality.

I've contributed example notebooks, but not to core PyMC code before, so it may be naive but... Maybe some more polished version could be added as methods of pm.Continuous and pm.Discrete?

Thoughts and feedback welcome.

@michaelosthege
Copy link
Member

In the past I used np.exp(pm. Normal().logp(x).eval()) but that API is now a hard-to-find functional API of pm.logp.

The bigger question though is what to do with multidimensional RVs?

@drbenvincent
Copy link
Author

The bigger question though is what to do with multidimensional RVs?

In the first instance I think it makes sense to implement for univariate distributions. And override the method in distributions which are inherently multidimensional, potentially with a not implemented exception.

At a later stage you could implement custom plot functions for bivariate distributions, for example.

@michaelosthege
Copy link
Member

If you're looking for a place to put such a plotting function, that'd probably be pm.plots.

@bwengals
Copy link
Contributor

bwengals commented Oct 4, 2021

Can't help but chime in and +1 this, I at least make plots like these just about every single time I use pymc3. Though I call plt.hist on samples and then fiddle with the x limits, so this is much better. I also think it would be extra nice if arviz could be leveraged for visual continuity if possible.

@aloctavodia
Copy link
Member

Somewhat related, in Bambi we have a method to plot prior samples. https://bambinos.github.io/bambi/main/api_reference.html#bambi.models.Model.plot_priors

@drbenvincent
Copy link
Author

I also think it would be extra nice if arviz could be leveraged for visual continuity if possible.

I'm assuming we don't want to PyMC dependent upon arviz? In which case we'd have to just emulate the style? But it would be presumably be pretty easy by optionally doing

import arviz as az
az.style.use("arviz-darkgrid")

Overall I think we could make this pretty minimal. Users may want to compose their own complex and possibly multi-panel plots.

@michaelosthege
Copy link
Member

We have a dependency on ArviZ and since ArviZ no longer depends on PyMC that's fine.

But IMO any change of default style is something the user needs to do. I for example don't like the "arviz-darkgrid" style 😱

@drbenvincent
Copy link
Author

But IMO any change of default style is something the user needs to do. I for example don't like the "arviz-darkgrid" style 😱

I agree. I prefer simpler styles more suitable for publication for example

@ferrine
Copy link
Member

ferrine commented Oct 30, 2021

I like that proposal. We can raise an error on multidimensional variables. Additionally, a more flexible API is needed here. As any plotting function in the plt namespace, this should take optional axes as input and output the used axes

@drbenvincent
Copy link
Author

Yes. Was planning on full axes luxury

@ricardoV94
Copy link
Member

ricardoV94 commented Oct 30, 2021

This should probably not me a method, because we don't really have class instances in V4 anymore, but something that accepts a RV.

rv1 = pm.Normal.dist(0, 1)
with pm.Model() as m:
    rv2 = pm.Normal("rv2", 0, 1)

pm.plot(rv1)
pm.plot(rv2)

Both would yield the same. This has the advantage that you can plot the variables you defined in the model without having to rewrite with the .dist API

In V4, rv1 and rv2 are identical for this purpose.

@lancechua
Copy link
Contributor

Hi, wanted to add to this since I'd find this feature quite useful as well.
The approach I took was to determine the plot domain based on the inverse cdf.
This has the advantage of not needing to sample, but does require implementing the quantile function.
I'd say it's decent by default but needs tweaking for longer tailed and positive only distributions.

Here's a colab demo notebook for proof of concept.
https://colab.research.google.com/drive/1yLczeGk09eeEnrlUoaJ57yQKJgCnPNdg

@aloctavodia
Copy link
Member

To help interpret the pdf we can complement it with statistics such as the mean, HDI and/or quantiles. For example in this picture the dot is the mean, the thick line the IQR and the think line the 94% HDI, this is similar as the information presented in a forestplot.

exp_1

@ricardoV94
Copy link
Member

Should this be an Arviz feature? Prior predictive is pretty fast anyway

@drbenvincent
Copy link
Author

Not sure if you just mean moments/HDI should be part of Arviz or not?

But as a defence of the idea of having distribution plotting in PyMC...

  • While it does make sense to separate plotting, I think the core attraction here would be ultra-rapid one-liner visualisation when building PyMC models.
  • You could also say, why not just do it with plotting SciPy distributions, but they often don't have the same parameterisation, making it a headache.

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 21, 2022

I think it's a slippery slope to re-introducing plots in PyMC xD

@ricardoV94
Copy link
Member

Would these plot utilities need anything else other than draws?

@aloctavodia
Copy link
Member

aloctavodia commented Jan 21, 2022

I am planning on adding the statistic/forest-plot stuff into ArviZ. I guess in az.plot_posterior (which name is a misnomer). But as ArviZ works with samples that is going to return KDEs no PDFs. Of course a user could do something like...

az.plot_posterior(pm.draw(rv, draws=1000))

Having pm.plot() is nicer because the user get THE prior instead of an approximation, but with enough samples, that's general a detail.

@ricardoV94
Copy link
Member

ricardoV94 commented Jan 21, 2022

I see we would be plotting the pdf itself... yeah that is not arvizable :D

@ricardoV94
Copy link
Member

By the way, pm.draw is taken already!

@aloctavodia
Copy link
Member

Sorry I meant pm.plot

@ghost
Copy link

ghost commented Jan 21, 2022

I am thinking similarly to @ricardoV94 -- I'd be hesitant to bring plotting back. However, it is something that I do quite a lot, especially:

sns.distplot(pm.Normal.dist(mu=0, sigma=1, shape=(1000)).eval())

So my heart says yes, but my head says no. What about something for pymc-experimental?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants