Skip to content

Variable selection for pm.model_to_graphviz #5527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelosthege opened this issue Feb 25, 2022 · 14 comments · Fixed by #5634
Closed

Variable selection for pm.model_to_graphviz #5527

michaelosthege opened this issue Feb 25, 2022 · 14 comments · Fixed by #5634

Comments

@michaelosthege
Copy link
Member

Description of your problem

I have a big big model and would like to make pm.model_to_graphviz plots of only some of the variables inside.
For example after adding gp.conditionals and then only wanting to plot the nodes my new variable depends on.

My model also includes some ConstantData variables I use to store indexing information.
They aren't relevant for model understanding and I'd like to hide them.

Proposal

The pm.model_to_graphviz function could take additional kwargs to customize the plot:

  • var_names (or vars?) to plot only certain variables and their dependencies
  • show_disconnected_data: bool to hide ConstantData/MutableData nodes that don't contribute to (selected) model variables

Implementation

Variable selection should be straightforward since it's already in the constructor, just not accessible via a kwarg:

pymc/pymc/model_graph.py

Lines 33 to 36 in a3bab7d

def __init__(self, model):
self.model = model
self.var_names = get_default_varnames(self.model.named_vars, include_transformed=False)
self.var_list = self.model.named_vars.values()

@soma2000-lang
Copy link
Contributor

@michaelosthege Please can I work on this

@michaelosthege
Copy link
Member Author

Sure, go ahead

@larryshamalama
Copy link
Member

Thinking out loud: when you say plot variables and their dependencies, say that we have a simple hierarchical model x | mu ~ N(mu, 1) with mu ~ N(0, 1). Would we want the graph to include mu if we just specify var_names = ["x"], for instance? In other words, do you mean that var_names as a kwarg would be used to select the graph when we have disjoint graphs under a given model? Which is would be related to show_disconnected_data as you mention

@michaelosthege
Copy link
Member Author

Yes, always include ancestors.

With the other kwarg I just want to exclude data variables that don't have edges

@ricardoV94
Copy link
Member

I would suggest reusing the same API that arviz uses for including / excluding variables, instead of creating new specialized keyword arguments like the disconnected data thing.

@michaelosthege
Copy link
Member Author

michaelosthege commented Mar 20, 2022

Without show_disconnected_data _on addition to an ArviZ-like var_names I can only think of inconvenient ways to hide these nodes:

  • Manually passing a list of ~-prefixed names of each disconnected node
  • positively selecting each downstream node in the model.

The second option is risky - if you forget a downstream variable it will just not show up.
The first option is tedious.

Instead of a boolean kwarg maybe a setting for all data variables is better: show_data: none|connected|mutable|constant|all (default: connected)

@ricardoV94
Copy link
Member

Yes, always include ancestors.

Why?

@ricardoV94
Copy link
Member

ricardoV94 commented Mar 20, 2022

Instead of a boolean kwarg maybe a setting for all data variables is better: show_data: none|connected|mutable|constant|all (default: connected)

Maybe, I would just suggest to start as simple as possible. Always easier to add complexity than to remove it.

@michaelosthege
Copy link
Member Author

I'm never in favor of complex solutions and having the code to filter data nodes in PyMC's model_graph.py is much less complex than me having to do the node filtering outside of model_to_graphviz, in my code where I don't have easy access to all the variables.

The thing is: I need this feature. Soon. As in before next Wednesday, actually.

No pressure, I'm happy that you @larryshamalama want to pick it up.
The var_names filter is definitely the first step and would enable me to print a figure of my model before the deadline.

@larryshamalama
Copy link
Member

I can give it a shot and ask questions along the way since it will be a bit out of my comfort zone :)

My first question is: any insights why model_to_graphviz has a bare * in the signature? Or perhaps more broadly what does it do?

@michaelosthege
Copy link
Member Author

My first question is: any insights why model_to_graphviz has a bare * in the signature? Or perhaps more broadly what does it do?

def some_func(a, b, *, c):
    return

In the above example, a and b can be passed positionally OR as kwargs. c MUST be passed as a kwarg.

var_names should also go behind the * because that makes it easier for us to change the signature in the future.

@larryshamalama
Copy link
Member

Thanks! I'll give it this a shot today when I'll have a bit more time

@larryshamalama
Copy link
Member

Yes, always include ancestors.

I've been able to make some progress. On a related note, should descendants be included? Say we have Z -> Y -> X and we specify var_names = ["Y"]. Do we want X in the graph? My first guess is no but I wanted to hear from both of you.

I have yet to address some of the other discussion points

@michaelosthege
Copy link
Member Author

Yes, always include ancestors.

I've been able to make some progress. On a related note, should descendants be included? Say we have Z -> Y -> X and we specify var_names = ["Y"]. Do we want X in the graph? My first guess is no but I wanted to hear from both of you.

I have yet to address some of the other discussion points

I'd say no. Only ancestors and the node itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants