Variable selection for `pm.model_to_graphviz` #5527

michaelosthege · 2022-02-25T17:30:30Z

Description of your problem

I have a big big model and would like to make pm.model_to_graphviz plots of only some of the variables inside.
For example after adding gp.conditionals and then only wanting to plot the nodes my new variable depends on.

My model also includes some ConstantData variables I use to store indexing information.
They aren't relevant for model understanding and I'd like to hide them.

Proposal

The pm.model_to_graphviz function could take additional kwargs to customize the plot:

var_names (or vars?) to plot only certain variables and their dependencies
show_disconnected_data: bool to hide ConstantData/MutableData nodes that don't contribute to (selected) model variables

Implementation

Variable selection should be straightforward since it's already in the constructor, just not accessible via a kwarg:

pymc/pymc/model_graph.py

Lines 33 to 36 in a3bab7d

    
           def __init__(self, model): 
        
               self.model = model 
        
               self.var_names = get_default_varnames(self.model.named_vars, include_transformed=False) 
        
               self.var_list = self.model.named_vars.values()

The text was updated successfully, but these errors were encountered:

soma2000-lang · 2022-02-26T18:03:36Z

@michaelosthege Please can I work on this

michaelosthege · 2022-02-26T18:06:03Z

Sure, go ahead

larryshamalama · 2022-03-20T01:57:51Z

Thinking out loud: when you say plot variables and their dependencies, say that we have a simple hierarchical model x | mu ~ N(mu, 1) with mu ~ N(0, 1). Would we want the graph to include mu if we just specify var_names = ["x"], for instance? In other words, do you mean that var_names as a kwarg would be used to select the graph when we have disjoint graphs under a given model? Which is would be related to show_disconnected_data as you mention

michaelosthege · 2022-03-20T07:02:27Z

Yes, always include ancestors.

With the other kwarg I just want to exclude data variables that don't have edges

ricardoV94 · 2022-03-20T07:57:10Z

I would suggest reusing the same API that arviz uses for including / excluding variables, instead of creating new specialized keyword arguments like the disconnected data thing.

michaelosthege · 2022-03-20T08:47:19Z

Without show_disconnected_data _on addition to an ArviZ-like var_names I can only think of inconvenient ways to hide these nodes:

Manually passing a list of ~-prefixed names of each disconnected node
positively selecting each downstream node in the model.

The second option is risky - if you forget a downstream variable it will just not show up.
The first option is tedious.

Instead of a boolean kwarg maybe a setting for all data variables is better: show_data: none|connected|mutable|constant|all (default: connected)

ricardoV94 · 2022-03-20T08:51:31Z

Yes, always include ancestors.

Why?

ricardoV94 · 2022-03-20T08:54:14Z

Instead of a boolean kwarg maybe a setting for all data variables is better: show_data: none|connected|mutable|constant|all (default: connected)

Maybe, I would just suggest to start as simple as possible. Always easier to add complexity than to remove it.

michaelosthege · 2022-03-20T09:10:48Z

I'm never in favor of complex solutions and having the code to filter data nodes in PyMC's model_graph.py is much less complex than me having to do the node filtering outside of model_to_graphviz, in my code where I don't have easy access to all the variables.

The thing is: I need this feature. Soon. As in before next Wednesday, actually.

No pressure, I'm happy that you @larryshamalama want to pick it up.
The var_names filter is definitely the first step and would enable me to print a figure of my model before the deadline.

larryshamalama · 2022-03-20T09:58:38Z

I can give it a shot and ask questions along the way since it will be a bit out of my comfort zone :)

My first question is: any insights why model_to_graphviz has a bare * in the signature? Or perhaps more broadly what does it do?

michaelosthege · 2022-03-20T10:03:57Z

My first question is: any insights why model_to_graphviz has a bare * in the signature? Or perhaps more broadly what does it do?

def some_func(a, b, *, c):
    return

In the above example, a and b can be passed positionally OR as kwargs. c MUST be passed as a kwarg.

var_names should also go behind the * because that makes it easier for us to change the signature in the future.

larryshamalama · 2022-03-20T10:23:34Z

Thanks! I'll give it this a shot today when I'll have a bit more time

larryshamalama · 2022-03-20T22:57:16Z

Yes, always include ancestors.

I've been able to make some progress. On a related note, should descendants be included? Say we have Z -> Y -> X and we specify var_names = ["Y"]. Do we want X in the graph? My first guess is no but I wanted to hear from both of you.

I have yet to address some of the other discussion points

michaelosthege · 2022-03-21T06:36:55Z

Yes, always include ancestors.

I've been able to make some progress. On a related note, should descendants be included? Say we have Z -> Y -> X and we specify var_names = ["Y"]. Do we want X in the graph? My first guess is no but I wanted to hear from both of you.

I have yet to address some of the other discussion points

I'd say no. Only ancestors and the node itself

michaelosthege added enhancements beginner friendly labels Feb 25, 2022

danhphan mentioned this issue Mar 14, 2022

Filter variables in model_graphviz #5198

Closed

larryshamalama mentioned this issue Mar 21, 2022

Variable selection for graphviz visualization via kwarg #5634

Merged

3 tasks

michaelosthege closed this as completed in #5634 May 22, 2022

larryshamalama mentioned this issue May 23, 2022

Allow exclusion of variables and their descendants with ~ in var_names #5794

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variable selection for `pm.model_to_graphviz` #5527

Variable selection for `pm.model_to_graphviz` #5527

michaelosthege commented Feb 25, 2022

soma2000-lang commented Feb 26, 2022

michaelosthege commented Feb 26, 2022

larryshamalama commented Mar 20, 2022

michaelosthege commented Mar 20, 2022

ricardoV94 commented Mar 20, 2022

michaelosthege commented Mar 20, 2022 •

edited

Loading

ricardoV94 commented Mar 20, 2022

ricardoV94 commented Mar 20, 2022 •

edited

Loading

michaelosthege commented Mar 20, 2022

larryshamalama commented Mar 20, 2022

michaelosthege commented Mar 20, 2022

larryshamalama commented Mar 20, 2022

larryshamalama commented Mar 20, 2022

michaelosthege commented Mar 21, 2022

Variable selection for pm.model_to_graphviz #5527

Variable selection for pm.model_to_graphviz #5527

Comments

michaelosthege commented Feb 25, 2022

Description of your problem

Proposal

Implementation

soma2000-lang commented Feb 26, 2022

michaelosthege commented Feb 26, 2022

larryshamalama commented Mar 20, 2022

michaelosthege commented Mar 20, 2022

ricardoV94 commented Mar 20, 2022

michaelosthege commented Mar 20, 2022 • edited Loading

ricardoV94 commented Mar 20, 2022

ricardoV94 commented Mar 20, 2022 • edited Loading

michaelosthege commented Mar 20, 2022

larryshamalama commented Mar 20, 2022

michaelosthege commented Mar 20, 2022

larryshamalama commented Mar 20, 2022

larryshamalama commented Mar 20, 2022

michaelosthege commented Mar 21, 2022

Variable selection for `pm.model_to_graphviz` #5527

Variable selection for `pm.model_to_graphviz` #5527

michaelosthege commented Mar 20, 2022 •

edited

Loading

ricardoV94 commented Mar 20, 2022 •

edited

Loading