Regression discontinuity example #308

drbenvincent · 2022-04-10T12:19:50Z

New example notebook! While it is nothing particularly clever in terms of the model, it shows application to quasi-experimental designs which are not covered by any other example notebook.

It is also interesting because it touches on causal inference as well as using PyMC to ask counterfactual questions.

I can see utility in adding a number of future notebooks expanding on issues of analysis of quasi-experimental designs and causal inference. But for now, this can potentially be the first 🙂

review-notebook-app · 2022-04-10T12:19:53Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

drbenvincent · 2022-04-10T12:37:59Z

I need to fix the schematic figure. It has a weird blue background for some reason. Something to do with the transparent background.

myst_nbs/case_studies/regression_discontinuity.myst.md

Co-authored-by: Oriol Abril-Pla <[email protected]>

review-notebook-app · 2022-04-11T21:34:57Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2022-04-11T21:34:57Z
----------------------------------------------------------------

Line #2.    plt.tight_layout()

remove tight_layout. The arviz-darkgrid theme uses constrained_layout which is more or less equivalent and both can't be used at the same time

drbenvincent commented on 2022-04-13T12:01:27Z
----------------------------------------------------------------

Done, in upcoming commit

OriolAbril · 2022-04-11T21:36:10Z

myst_nbs/case_studies/regression_discontinuity.myst.md

+# Regression discontinuity design analysis
+
+:::{post} April, 2022
+:tags: regression discontinuity, causal inference, quasi experimental design, counterfactuals


for the tags, maybe regression alone should also be added? I think it would be interesting to people browsing the regression tag

review-notebook-app · 2022-04-11T23:49:03Z

View / edit / reply to this conversation on ReviewNB

canyon289 commented on 2022-04-11T23:49:03Z
----------------------------------------------------------------

Line #10.        .assign(treated=lambda x: x.x > threshold)

Nice use of pandas!

drbenvincent commented on 2022-04-13T11:59:09Z
----------------------------------------------------------------

Thanks

review-notebook-app · 2022-04-11T23:49:04Z

View / edit / reply to this conversation on ReviewNB

canyon289 commented on 2022-04-11T23:49:04Z
----------------------------------------------------------------

Line #7.        plt.legend()

Nit, stick with ax.legend(). Switching between the stateful and object oriented API in mpl can cause issues and given this is a tutorial document I feel like we should show best practice in all aspects

drbenvincent commented on 2022-04-13T12:01:08Z
----------------------------------------------------------------

Done in upcoming commit

drbenvincent · 2022-04-13T11:59:10Z

Thanks

View entire conversation on ReviewNB

drbenvincent · 2022-04-13T12:01:09Z

Done in upcoming commit

View entire conversation on ReviewNB

drbenvincent · 2022-04-13T12:01:28Z

Done, in upcoming commit

View entire conversation on ReviewNB

- change notebook tag - remove plt.tight_layout() - ax.legend() -> plt.legend()

drbenvincent · 2022-04-13T12:16:55Z

That's all the comments addressed so far. Thanks for taking a look @OriolAbril + @canyon289 🙏🏻

OriolAbril · 2022-04-13T13:58:05Z

myst_nbs/case_studies/regression_discontinuity.myst.md

+- all units were exposed to the treatment (orange shaded region).
+
+```{code-cell} ipython3
+:tags: [hide-input]


I think I would show this cell, but after a couple changes, see next comments

OriolAbril · 2022-04-13T14:03:28Z

myst_nbs/case_studies/regression_discontinuity.myst.md

+plt.legend();
+```
+
+The blue shaded region (which is very narrow) shows the 95% credible region of the expected value of the post-test measurement for a range of possible pre-test measures. This is actually very interesting because it is an example of counterfactual inference. We did not observe any units that were untreated above the threshold. But assuming our model is a good description of reality, we can ask the counterfactual question of "what if a unit above the threshold was not treated?"


shows the 95% credible region of the expected value of the post-test measurement for a range of possible pre-test measures

This is not right, the plot is on mu which is the mean of the measurements, not the actual measurements. Otherwise the model would be way off in calibration terms. When plotting the 95% region of the measurements for the same x as the observations roughly 95% of the observations should fall inside that region. Note the emphasis becasue this applies x-wise and might not be true for the whole plotted region with ArviZ defaults because of the smoothing.

I thought this is what I was saying by "95% credible region of the expected value of the post-test measurement"

Maybe it would be clearer by just deleting "measurement"?

Riight, sorry about that, "expected value" is a synonym of first moment. I always forget this.

Removing measurement might help yes, but I am not sure will be enough. I find this confusing for two reasons.

We are defining a new random variable E(measurement) and giving confidence intervals on that, instead of doing it on the actual measurements which are the terms in which most people think. I am not sure using this new variable instead of y directly helps illustrating the point that comes later, which if I understand correctly is about raw values.

"expected value" doesn't seem like a technical term. I always get it wrong and I think it happens to many other non native speakers as confusion on credible regions of a value and of its mean are common questions I get. Using expectation or expectancy for example might help on that end.

OriolAbril · 2022-04-13T14:10:05Z

myst_nbs/case_studies/regression_discontinuity.myst.md

+# posterior prediction
+with model:
+    pm.set_data({"x": _x, "treated": _treated})
+    ppc = pm.sample_posterior_predictive(idata, var_names=["mu", "y"])


Side note I hope helps with the other comments. I try to distinguish between posterior predictive and pushforward posterior variables. pushforward posterior variables are deterministic transformations of the posterior parameters, posterior predictive are variables that need an extra sampling step.

In this case, mu = x + delta*treated, once the posterior values are fixed, mu is also fixed. Here the sample_posterior_predictive is being used as a convenience function to recalculate mu with the modified data,
but even if sample_... is called, there is no sampling involved. There is sampling in computing y because values of y are draws from the normal distribution of mean mu and std sigma. Multiple calls to sample_posterior_predictive with different seeds will return different values of y but always the same values for mu.

Understood. I have added a bit of text to clarify this. I would not want to change the code to manually calculate mu for new values of x and treated - I think the current code is very clear and convenient as you say. Hopefully this works for you?

No need to change the code here at all, this was mostly for context

OriolAbril · 2022-04-13T14:10:33Z

myst_nbs/case_studies/regression_discontinuity.myst.md

+
+# plotting
+ax = plot_data(df)
+_y = ppc.posterior_predictive.mu.mean(dim=["chain", "draw"])


these _y variables are not being used

Well spotted

OriolAbril · 2022-04-13T14:16:16Z

myst_nbs/case_studies/regression_discontinuity.myst.md

+
+:::{post} April, 2022
+:tags: regression, causal inference, quasi experimental design, counterfactuals
+:category: beginner


I would add explanation type here. there is some code showing how to build this model in pymc, but I think the main goal and content of the notebook is dedicated to answering explanation type questions like "what are regression discontinuities?", "when are they useful?" "what differences are between its results and regular regression ones?"

Suggested change

:category: beginner

:category: beginner, explanation

drbenvincent · 2022-04-16T12:46:02Z

Thanks @OriolAbril. I think I've addressed all your points now.

review-notebook-app · 2022-04-20T08:27:43Z

View / edit / reply to this conversation on ReviewNB

lucianopaz commented on 2022-04-20T08:27:43Z
----------------------------------------------------------------

Line #10.        .assign(treated=lambda x: x.x > threshold)

Use x.x < threshold instead. That way, the simulated data will look more like the schematic figure at the top of the notebook

drbenvincent commented on 2022-04-21T14:49:05Z
----------------------------------------------------------------

Done, in upcoming commit

review-notebook-app · 2022-04-20T08:27:44Z

View / edit / reply to this conversation on ReviewNB

lucianopaz commented on 2022-04-20T08:27:43Z
----------------------------------------------------------------

I have two comments regarding the first note.

The first minor comment is that you say "... post-test ($y$) measures where", but it should read were instead.

My second comment is that I find the note hard to understand. The pre-test x and post-test y measures are never the same because $y_{i}$ is sampled from a normal distribution. What I think you meant by the note is that in general you could have written $\mu = \alpha + \beta x_{i} + \Delta treated_{i}$, but you chose to fix $\alpha=0$ and $\beta=1$. My suggestion is to change how you phrased this note. Instead of saying that the x and y measures are the same you could say that the expected value of y is taken to be the pre-test measure x, plus some treatment effect.

drbenvincent commented on 2022-04-21T14:56:09Z
----------------------------------------------------------------

Ah, I meant that the measures were the same, not the values. As in, we measured height pre-test and also measured height post-test. I'll make this clearer.

review-notebook-app · 2022-04-20T08:27:45Z

View / edit / reply to this conversation on ReviewNB

lucianopaz commented on 2022-04-20T08:27:44Z
----------------------------------------------------------------

Line #2.        idata = pm.sample(random_seed=123)

Use the global RANDOM_SEED instead

review-notebook-app · 2022-04-20T08:27:45Z

View / edit / reply to this conversation on ReviewNB

lucianopaz commented on 2022-04-20T08:27:45Z
----------------------------------------------------------------

Line #13.    az.plot_hdi(_x, ppc.posterior_predictive["mu"], color="C0", hdi_prob=0.95)

To keep in line with what the other reviewers said about not mixing plt and OOP interfaces, you should pass ax=ax to this call. It would also be nice to make the HDI plot have a label. I believe you can do that by passing backend_kwargs={"label": "$mu$ untreated"}

drbenvincent commented on 2022-04-21T15:20:41Z
----------------------------------------------------------------

good idea. Turns out it's fill_kwargs

review-notebook-app · 2022-04-20T08:27:46Z

View / edit / reply to this conversation on ReviewNB

lucianopaz commented on 2022-04-20T08:27:46Z
----------------------------------------------------------------

Line #26.    az.plot_hdi(_x, ppc.posterior_predictive["mu"], color="C1", hdi_prob=0.95)

The same as my previous comment: add the kwargs: ax=ax, backend_kwargs={"label": "$mu$ untreated"}

review-notebook-app · 2022-04-20T08:27:47Z

View / edit / reply to this conversation on ReviewNB

lucianopaz commented on 2022-04-20T08:27:46Z
----------------------------------------------------------------

Very minor nitpick. You assumed that mu=x for the untreated subjects, so it makes sense that the HDI is the identity line, and shows no dispersion. Do you want to mention it? I don't really think it's necessary since you've already said this many times during the intro part of the notebook.

drbenvincent commented on 2022-04-21T15:42:03Z
----------------------------------------------------------------

Good point, I have made this clearer now

lucianopaz

@drbenvincent, very nice notebook! I left a few comments requesting changes. Nevertheless, I'll approve the PR so that you can merge once you've addressed them.

drbenvincent · 2022-04-21T14:49:06Z

Done, in upcoming commit

View entire conversation on ReviewNB

drbenvincent · 2022-04-21T14:56:10Z

Ah, I meant that the measures were the same, not the values. As in, we measured height pre-test and also measured height post-test. I'll make this clearer.

View entire conversation on ReviewNB

drbenvincent · 2022-04-21T15:20:42Z

good idea. Turns out it's fill_kwargs

View entire conversation on ReviewNB

drbenvincent · 2022-04-21T15:42:04Z

Good point, I have made this clearer now

View entire conversation on ReviewNB

drbenvincent and others added 17 commits January 24, 2021 16:43

create truncated regression example

a35fddb

delete truncated regression example from main branch

bc3d659

Merge branch 'pymc-devs:main' into main

3aba79b

create truncated regression example

d84d852

delete truncated regression example from main branch

d3dabca

create truncated regression example

664ab97

delete truncated regression example from main branch

cc6693f

Merge branch 'main' of https://github.com/drbenvincent/pymc-examples

612abc4

Merge branch 'main' of https://github.com/drbenvincent/pymc-examples

f0812aa

Merge remote-tracking branch 'upstream/main' into main

7372a46

Merge remote-tracking branch 'upstream/main'

22c9935

Merge remote-tracking branch 'upstream/main'

89c5af5

Merge remote-tracking branch 'upstream/main'

11347c9

Merge remote-tracking branch 'upstream/main'

4e368ec

Merge remote-tracking branch 'upstream/main'

d8b04b8

initial commit

3783728

obey the law in terms of numpy random number generation + az style

1d5b48d

improvements to schematic figure

f336722

replace image with non transparent background

8ad5f55

drbenvincent requested review from fonnesbeck and OriolAbril April 10, 2022 13:00

OriolAbril reviewed Apr 10, 2022

View reviewed changes

myst_nbs/case_studies/regression_discontinuity.myst.md Outdated Show resolved Hide resolved

Benjamin T. Vincent and others added 2 commits April 11, 2022 11:05

fix embedded image

4f5ea1e

Co-authored-by: Oriol Abril-Pla <[email protected]>

fix embedded image - not sure why this didn't register before

40c6bb1

drbenvincent requested a review from OriolAbril April 11, 2022 12:50

OriolAbril reviewed Apr 11, 2022

View reviewed changes

address all review comments

bbbbaa6

- change notebook tag - remove plt.tight_layout() - ax.legend() -> plt.legend()

OriolAbril reviewed Apr 13, 2022

View reviewed changes

Benjamin T. Vincent added 4 commits April 16, 2022 12:58

make code cell visible

9b8f0e3

remove 2 unused lines

4a5c342

add explanation tag

d5dae4a

add technical note re posterior predictive sampling

fc5297c

drbenvincent requested a review from OriolAbril April 17, 2022 15:37

lucianopaz approved these changes Apr 20, 2022

View reviewed changes

addressing Luciano's suggested edits

63b1d39

Benjamin T. Vincent added 3 commits April 21, 2022 16:53

increase clarity of sentence in introduction

4d17af5

add another explanatory markdown cell on the effect posterior

346664f

fix typo

5aad0ec

drbenvincent merged commit 0c66575 into pymc-devs:main Apr 21, 2022

drbenvincent deleted the regression-discontinuity branch April 21, 2022 16:54

Uh oh!

Regression discontinuity example #308

Regression discontinuity example #308

Uh oh!

Conversation

drbenvincent commented Apr 10, 2022

Uh oh!

review-notebook-app bot commented Apr 10, 2022

Uh oh!

drbenvincent commented Apr 10, 2022

Uh oh!

Uh oh!

review-notebook-app bot commented Apr 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

review-notebook-app bot commented Apr 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Apr 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drbenvincent commented Apr 13, 2022

Uh oh!

drbenvincent commented Apr 13, 2022

Uh oh!

drbenvincent commented Apr 13, 2022

Uh oh!

drbenvincent commented Apr 13, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drbenvincent commented Apr 16, 2022

Uh oh!

review-notebook-app bot commented Apr 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Apr 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Apr 20, 2022

Uh oh!

review-notebook-app bot commented Apr 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Apr 20, 2022

Uh oh!

review-notebook-app bot commented Apr 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucianopaz left a comment

Choose a reason for hiding this comment

Uh oh!

drbenvincent commented Apr 21, 2022

Uh oh!

review-notebook-app bot commented Apr 11, 2022 •

edited

Loading

review-notebook-app bot commented Apr 11, 2022 •

edited

Loading

review-notebook-app bot commented Apr 11, 2022 •

edited

Loading

review-notebook-app bot commented Apr 20, 2022 •

edited

Loading

review-notebook-app bot commented Apr 20, 2022 •

edited

Loading

review-notebook-app bot commented Apr 20, 2022 •

edited

Loading

review-notebook-app bot commented Apr 20, 2022 •

edited

Loading