[WIP] Blogpost for `do` and `observe` functionality #3

drbenvincent · 2023-06-06T12:31:27Z

There are notable errors here. Yet to pin down whether this is down to my use of observe, or if my examples are highlighting bugs that need to be fixed.

review-notebook-app · 2023-06-06T12:31:31Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ricardoV94 · 2023-06-07T09:37:03Z

CC @lucianopaz, the inline comment doesn't seem to do it: https://app.reviewnb.com/pymc-labs/research/blob/do-observe-blogpost/blog-posts%2Fdo_operator%2Fdo_operator_blogpost.ipynb/discussion/#comment-1ec9f806

twiecki · 2023-06-07T12:01:18Z

Just finished reading through the first draft and holy shit, this is shaping up to be a post for the ages. I think you introduce the new concepts perfectly and take the reader from where they are step by step to where they want to be. Readers will feel like that scene in the matrix where they get uploaded Kung Fu skills.

drbenvincent · 2023-06-08T14:21:05Z

Committed a ton of updates. But I'd perhaps hold off giving any more feedback for the moment. After some discussion with Ricardo, going through the full workflow (including simulated data generation) creates some problems which may just confuse the core message.

Going to come back to this tomorrow with some perspective and potentially simplify the structure of the post a bit. I am learning towards extracting out the simulated data generation / parameter recovery study approach into a separate blog post. Kind of like, "hey, you know how we told you about do and observe in the last post. Well, we can also use that to update how we do parameter recovery studies"

review-notebook-app · 2023-06-08T18:27:55Z

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2023-06-08T18:27:54Z
----------------------------------------------------------------

new is twice, I think you can remove the second new as it's implied by introducing.

drbenvincent commented on 2023-06-12T17:08:12Z
----------------------------------------------------------------

fixed

review-notebook-app · 2023-06-08T18:28:41Z

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2023-06-08T18:28:40Z
----------------------------------------------------------------

Maybe also add that they will be included in the main-repo after a little while.

drbenvincent commented on 2023-06-12T17:08:22Z
----------------------------------------------------------------

Done

drbenvincent · 2023-06-12T17:08:13Z

fixed

View entire conversation on ReviewNB

drbenvincent · 2023-06-12T17:08:23Z

Done

View entire conversation on ReviewNB

juanitorduz · 2023-06-12T19:37:00Z

@drbenvincent I just wanna say I am very exited about this (and subsequent) posts! amazing team effort to bring pymc even closer to causal inference 💪 !

ricardoV94 · 2023-06-16T10:25:08Z

The weird arrows happen when you set multiple nodes to equivalent constants (they get merged). If you set 0.00001, 0.00002, and so on, instead of 0, it looks alright. This hack should fix it for both do and observe if you want to preview in the notebook while I do the real fix.

It also explains why sampling was still working.

ricardoV94 · 2023-06-16T10:56:08Z

Actually graphviz is the culprit! It triggers some graph rewrite that modifies the variables.

Here is a snippet that triggers it, without any pymc experimental stuff:

import pymc as pm

with pm.Model(coords_mutable={'i': [0]}) as m:
    beta_z = pm.ConstantData("beta_z", 0)
    beta_y = pm.ConstantData("beta_y", 0)
    z = pm.Bernoulli("z", p=pm.invlogit(beta_z), dims="i")
    y = pm.Normal("y", mu=beta_y + z, dims="i")
    
pm.model_to_graphviz(m)

Any operation would actually do this, pm.draw(y) will also break the name.
PyMC issue: pymc-devs/pymc#6783
PyTensor issue: pymc-devs/pytensor#347

ricardoV94 · 2023-06-16T13:38:51Z

There was another similar issue, for instance if one of the replacement values was 1.0, because the Sigma also had one input as 1.0 and it created an artificial dependency on graphviz. Again you can fix this by making the replacements odd like 1.000001 or 0.00001 (and all different from each other).

This should allow you to sketch the blog until that PyTensor PR is in. We can cut a release soon for the blogpost.

Also you should be able to use observe in that example instead of recreating the model with observed variables. It was the same kind of problem. Or did you have another problem that was making you not use it?

drbenvincent · 2023-06-16T15:45:08Z

There was another similar issue, for instance if one of the replacement values was 1.0, because the Sigma also had one input as 1.0 and it created an artificial dependency on graphviz. Again you can fix this by making the replacements odd like 1.000001 or 0.00001 (and all different from each other).

This should allow you to sketch the blog until that PyTensor PR is in. We can cut a release soon for the blogpost.

👍🏻

Also you should be able to use observe in that example instead of recreating the model with observed variables. It was the same kind of problem. Or did you have another problem that was making you not use it?

So I was previously doing this:

model_inference = observe(model_generative, {"c": observed['c'], "y": observed['y'], "z": observed['z']})
model_inference.set_dim("i", N, coord_values=np.arange(N))

That works fine when you are doing inference.

But a problem occurs when you then go on to do something like this

# P(Y | c, do(z=0))
with do(model_inference, {"z": np.zeros(N, dtype='int32')}, prune_vars=True) as model_control:
    idata_z0 = pm.sample_posterior_predictive(idata, predictions=True, var_names=["y_mu"], random_seed=SEED)

model_control looks like this

The outcome of sample_posterior_predictive is as if the data for C (and maybe Y?) is no longer present.

drbenvincent · 2023-06-16T15:50:58Z

There was another similar issue, for instance if one of the replacement values was 1.0, because the Sigma also had one input as 1.0 and it created an artificial dependency on graphviz. Again you can fix this by making the replacements odd like 1.000001 or 0.00001 (and all different from each other).

This approach has improved things. But there is still a connection between beta_cy and c which should not exist. beta_cy should only influence c

ricardoV94 · 2023-06-16T17:38:46Z

Some value must somehow still be repeated but it worked for me after the fixes so I wouldn't worry.

drbenvincent · 2023-06-16T17:45:07Z

Sounds like a good idea to wait for the new release before going ahead

…ntal

drbenvincent · 2023-06-19T10:57:46Z

The graphviz issue is now fixed with the latest package versions

drbenvincent · 2023-06-19T10:59:15Z

Also you should be able to use observe in that example instead of recreating the model with observed variables. It was the same kind of problem. Or did you have another problem that was making you not use it?

Though this does not work

ricardoV94 · 2023-06-19T11:07:02Z

Also you should be able to use observe in that example instead of recreating the model with observed variables. It was the same kind of problem. Or did you have another problem that was making you not use it?

Though this does not work

What doesn't work? The issue with sample_posterior_predictive mentioned above is still present (i.e., posterior_predictive will always sample observed variables from the prior). That is not an "observe" issue but the expected behavior of pymc so far. Is there another problem besides that?

drbenvincent · 2023-06-19T11:31:17Z

At the moment we have this kind of workflow:

define model_generative with no data
use do to set parameters to particular values to generate model_simulate which we can use to generate a dataset with sample_prior_predictive
create a model_inference from model_generative with observe, then run inference with pm.sample
then finally do causal inference using the do function on model_inference and sample with sample_posterior_predictive.

This final step is the one that 'doesn't work' in the sense that it doesn't give expected results given this workflow.

So to get the results I wanted I switched to this workflow:

define model_generative with no data
use do to set parameters to particular values to generate model_simulate which we can use to generate a dataset with sample_prior_predictive
CHANGED build model_inference scratch, providing simulated data, then run inference with pm.sample. This is suboptimal from a user workflow perspective because they are defining very similar models twice.
then finally do causal inference using the do function on model_inference and sample with sample_posterior_predictive.

So when you wrote

Also you should be able to use observe in that example instead of recreating the model with observed variables

I took that to mean the the first workflow would work. But I think I interpreted that wrong. But now I'm not clear on what you did mean.

Am I right in thinking that the first workflow above is not going to be possible? If so, then I think the post should forget about creating an empty model with no data and just start from a 'traditional' pymc model where we do provide simulated data. In which case the workflow would be like this:

build model_inference scratch, providing simulated data, then run inference with pm.sample. This is suboptimal from a user workflow perspective because they are defining very similar models twice.
then finally do causal inference using the do function on model_inference and sample with sample_posterior_predictive.

Which is fine with me, but then there will be no examples in the post where we use observe.

ricardoV94 · 2023-06-19T11:37:18Z

I took that to mean the the first workflow would work. But I think I interpreted that wrong. But now I'm not clear on what you did mean.

I just meant the arrows issue.

Am I right in thinking that the first workflow above is not going to be possible?

Still have to discuss that with @lucianopaz, I am unsure about whether it makes conceptual sense or not. Technically, there is no challenge to opting for the "new" behavior.

In any case, you don't need to recreate a new model to go get the kind of results you want right now. You can use a do intervention to replace the c observed variable by its observed values before calling sample_posterior_predictive. That way y_mu will be conditioned on the observed value of c instead of on its posterior_predictive draws as is now.

Even if you don't want to do this, there's no reason to create a model for inference as that works fine. You only needed the model for the specific posterior_predictive manipulation. This issue would arise anytime a user wants to predict quantities that depend on observed values (but not posterior predictive draws of observed RVs), which may be worth discussing regardless of the solution taken in the blog-post

review-notebook-app · 2023-06-22T22:01:00Z

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2023-06-22T22:01:00Z
----------------------------------------------------------------

I don't find it trivial that set_data and do accomplish the same thing and are somehow related, so maybe acknowledge and explain that.

review-notebook-app · 2023-06-22T22:13:15Z

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2023-06-22T22:13:15Z
----------------------------------------------------------------

I think I would move this to the top.

drbenvincent commented on 2023-06-23T12:34:56Z
----------------------------------------------------------------

Done. I've re-ordered the examples

review-notebook-app · 2023-06-22T22:13:16Z

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2023-06-22T22:13:16Z
----------------------------------------------------------------

"In this post" sounds like this post just beginning.

drbenvincent commented on 2023-06-23T12:36:21Z
----------------------------------------------------------------

fixed

review-notebook-app · 2023-06-22T22:13:17Z

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2023-06-22T22:13:17Z
----------------------------------------------------------------

Any story / example you can come up with for this?

drbenvincent commented on 2023-06-23T13:13:37Z
----------------------------------------------------------------

Done :)

review-notebook-app · 2023-06-22T22:13:18Z

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2023-06-22T22:13:17Z
----------------------------------------------------------------

*kniown

Maybe also say that usually we wouldn't know that and want to infer.

drbenvincent commented on 2023-06-23T13:19:07Z
----------------------------------------------------------------

Done

review-notebook-app · 2023-06-22T22:13:19Z

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2023-06-22T22:13:18Z
----------------------------------------------------------------

Maybe add that in this model, the parameters are not fixed like before, because here we want to infer them from data to see if we can recover the true values. The workflow is a bit implicit and readers might get lost.

drbenvincent commented on 2023-06-23T13:48:09Z
----------------------------------------------------------------

Added some clarification. Also added a schematic plot at the end of the example for people to recap what we've done

review-notebook-app · 2023-06-22T22:13:20Z

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2023-06-22T22:13:19Z
----------------------------------------------------------------

Rather than referring to future blog posts, you could just explain what's going on intuitively, for example with an example of a drug and we care about the difference between giving and not giving the drug.

drbenvincent commented on 2023-06-23T13:51:26Z
----------------------------------------------------------------

Done. This was a bit lazy of me.

review-notebook-app · 2023-06-22T22:13:20Z

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2023-06-22T22:13:20Z
----------------------------------------------------------------

the all the values

drbenvincent commented on 2023-06-23T13:53:24Z
----------------------------------------------------------------

fixed

drbenvincent · 2023-06-23T12:34:57Z

Done. I've re-ordered the examples

View entire conversation on ReviewNB

drbenvincent · 2023-06-23T12:36:22Z

fixed

View entire conversation on ReviewNB

drbenvincent · 2023-06-23T13:13:39Z

Done :)

View entire conversation on ReviewNB

…parameters

drbenvincent · 2023-06-23T13:19:08Z

Done

View entire conversation on ReviewNB

drbenvincent · 2023-06-23T13:48:10Z

Added some clarification. Also added a schematic plot at the end of the example for people to recap what we've done

View entire conversation on ReviewNB

drbenvincent · 2023-06-23T13:51:28Z

Done. This was a bit lazy of me.

View entire conversation on ReviewNB

drbenvincent · 2023-06-23T13:53:25Z

fixed

View entire conversation on ReviewNB

review-notebook-app · 2023-06-23T14:08:48Z

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2023-06-23T14:08:48Z
----------------------------------------------------------------

We've... ?

review-notebook-app · 2023-06-23T14:08:49Z

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2023-06-23T14:08:49Z
----------------------------------------------------------------

Maybe example 2 can just go, I don't know what it adds.

Early first draft

0a0e13c

drbenvincent marked this pull request as draft June 6, 2023 12:31

drbenvincent added 2 commits June 6, 2023 15:40

fix the broken simulated data part with pm.sample

ebf2bd8

updates to libraries with/without graph representations

da99062

committing progress so far

441a3dc

add suggested text by Thomas at start of causal inference section

2ed0106

update blog post structure

48774fe

use random seed, fix example 2

4cb3ae7

graphviz fixed with latest versions of pymc, pytensory, pymc_experime…

9e78580

…ntal

REORDER EXAMPLE 1 AND 2

22a8933

wording change

4946967

drbenvincent added 2 commits June 23, 2023 14:14

introduce hair loss treatment as a concrete example of the confounder

7f1183d

Add stuff about parameter recovery and not normally knowing the true …

27cf8b4

…parameters

tweaks + add schematic plot

f0a86e4

don't be lazy, explain what's happening

c3798d0

fix typo

7565a43

drbenvincent added 3 commits June 23, 2023 16:32

misc text fixes and improvements

1a6137d

remove Example 2

5bfd551

expand figure legend for final schematic figure

f69a082

twiecki approved these changes Jun 23, 2023

View reviewed changes

twiecki merged commit ee23281 into main Jun 23, 2023

twiecki deleted the do-observe-blogpost branch June 23, 2023 17:04

[WIP] Blogpost for do and observe functionality #3

[WIP] Blogpost for do and observe functionality #3

Conversation

drbenvincent commented Jun 6, 2023 • edited Loading

review-notebook-app bot commented Jun 6, 2023

ricardoV94 commented Jun 7, 2023

twiecki commented Jun 7, 2023

drbenvincent commented Jun 8, 2023

review-notebook-app bot commented Jun 8, 2023 • edited Loading

review-notebook-app bot commented Jun 8, 2023 • edited Loading

drbenvincent commented Jun 12, 2023

drbenvincent commented Jun 12, 2023

juanitorduz commented Jun 12, 2023

ricardoV94 commented Jun 16, 2023 • edited Loading

ricardoV94 commented Jun 16, 2023 • edited Loading

ricardoV94 commented Jun 16, 2023 • edited Loading

drbenvincent commented Jun 16, 2023

drbenvincent commented Jun 16, 2023

ricardoV94 commented Jun 16, 2023

drbenvincent commented Jun 16, 2023

drbenvincent commented Jun 19, 2023

drbenvincent commented Jun 19, 2023

ricardoV94 commented Jun 19, 2023

drbenvincent commented Jun 19, 2023

ricardoV94 commented Jun 19, 2023 • edited Loading

review-notebook-app bot commented Jun 22, 2023 • edited Loading

review-notebook-app bot commented Jun 22, 2023 • edited Loading

review-notebook-app bot commented Jun 22, 2023 • edited Loading

review-notebook-app bot commented Jun 22, 2023 • edited Loading

review-notebook-app bot commented Jun 22, 2023 • edited Loading

review-notebook-app bot commented Jun 22, 2023 • edited Loading

review-notebook-app bot commented Jun 22, 2023 • edited Loading

review-notebook-app bot commented Jun 22, 2023 • edited Loading

drbenvincent commented Jun 23, 2023

drbenvincent commented Jun 23, 2023

drbenvincent commented Jun 23, 2023

drbenvincent commented Jun 23, 2023

drbenvincent commented Jun 23, 2023

drbenvincent commented Jun 23, 2023

drbenvincent commented Jun 23, 2023

review-notebook-app bot commented Jun 23, 2023 • edited Loading

review-notebook-app bot commented Jun 23, 2023 • edited Loading

[WIP] Blogpost for `do` and `observe` functionality #3

[WIP] Blogpost for `do` and `observe` functionality #3

drbenvincent commented Jun 6, 2023 •

edited

Loading

review-notebook-app bot commented Jun 8, 2023 •

edited

Loading

review-notebook-app bot commented Jun 8, 2023 •

edited

Loading

ricardoV94 commented Jun 16, 2023 •

edited

Loading

ricardoV94 commented Jun 16, 2023 •

edited

Loading

ricardoV94 commented Jun 16, 2023 •

edited

Loading

ricardoV94 commented Jun 19, 2023 •

edited

Loading

review-notebook-app bot commented Jun 22, 2023 •

edited

Loading

review-notebook-app bot commented Jun 22, 2023 •

edited

Loading

review-notebook-app bot commented Jun 22, 2023 •

edited

Loading

review-notebook-app bot commented Jun 22, 2023 •

edited

Loading

review-notebook-app bot commented Jun 22, 2023 •

edited

Loading

review-notebook-app bot commented Jun 22, 2023 •

edited

Loading

review-notebook-app bot commented Jun 22, 2023 •

edited

Loading

review-notebook-app bot commented Jun 22, 2023 •

edited

Loading

review-notebook-app bot commented Jun 23, 2023 •

edited

Loading

review-notebook-app bot commented Jun 23, 2023 •

edited

Loading