-
-
Notifications
You must be signed in to change notification settings - Fork 269
Regression discontinuity example #308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
drbenvincent
merged 30 commits into
pymc-devs:main
from
drbenvincent:regression-discontinuity
Apr 21, 2022
Merged
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
a35fddb
create truncated regression example
drbenvincent bc3d659
delete truncated regression example from main branch
drbenvincent 3aba79b
Merge branch 'pymc-devs:main' into main
drbenvincent d84d852
create truncated regression example
drbenvincent d3dabca
delete truncated regression example from main branch
drbenvincent 664ab97
create truncated regression example
drbenvincent cc6693f
delete truncated regression example from main branch
drbenvincent 612abc4
Merge branch 'main' of https://github.com/drbenvincent/pymc-examples
f0812aa
Merge branch 'main' of https://github.com/drbenvincent/pymc-examples
7372a46
Merge remote-tracking branch 'upstream/main' into main
22c9935
Merge remote-tracking branch 'upstream/main'
89c5af5
Merge remote-tracking branch 'upstream/main'
11347c9
Merge remote-tracking branch 'upstream/main'
4e368ec
Merge remote-tracking branch 'upstream/main'
d8b04b8
Merge remote-tracking branch 'upstream/main'
3783728
initial commit
1d5b48d
obey the law in terms of numpy random number generation + az style
f336722
improvements to schematic figure
8ad5f55
replace image with non transparent background
4f5ea1e
fix embedded image
40c6bb1
fix embedded image - not sure why this didn't register before
bbbbaa6
address all review comments
9b8f0e3
make code cell visible
4a5c342
remove 2 unused lines
d5dae4a
add explanation tag
fc5297c
add technical note re posterior predictive sampling
63b1d39
addressing Luciano's suggested edits
4d17af5
increase clarity of sentence in introduction
346664f
add another explanatory markdown cell on the effect posterior
5aad0ec
fix typo
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,244 @@ | ||
--- | ||
jupytext: | ||
notebook_metadata_filter: substitutions | ||
text_representation: | ||
extension: .md | ||
format_name: myst | ||
format_version: 0.13 | ||
jupytext_version: 1.13.7 | ||
kernelspec: | ||
display_name: pymc-dev-py39 | ||
language: python | ||
name: pymc-dev-py39 | ||
--- | ||
|
||
(regression_discontinuity)= | ||
# Regression discontinuity design analysis | ||
|
||
:::{post} April, 2022 | ||
:tags: regression, causal inference, quasi experimental design, counterfactuals | ||
:category: beginner, explanation | ||
:author: Benjamin T. Vincent | ||
::: | ||
|
||
[Quasi experiments](https://en.wikipedia.org/wiki/Quasi-experiment) involve experimental interventions and quantitative measures. However, quasi-experiments do _not_ involve random assignment of units (e.g. cells, people, companies, schools, states) to test or control groups. This inability to conduct random assignment poses problems when making causal claims as it makes it harder to argue that any difference between a control and test group are because of an intervention and not because of a confounding factor. | ||
|
||
The [regression discontinuity design](https://en.wikipedia.org/wiki/Regression_discontinuity_design) is a particular form of quasi experimental design. It consists of a control and test group, but assignment of units to conditions is chosen based upon a threshold criteria, not randomly. | ||
|
||
:::{figure-md} fig-target | ||
|
||
 | ||
|
||
A schematic diagram of the regression discontinuity design. The dashed green line shows where we would have expected the post test scores of the test group to be if they had not recieved the treatment. Image taken from [https://conjointly.com/kb/regression-discontinuity-design/](https://conjointly.com/kb/regression-discontinuity-design/). | ||
::: | ||
|
||
Units with very low scores are likely to differ systematically along some dimensions than units with very high scores. For example, if we look at students who achieve the highest, and students who achieve the lowest, in all likelihood there are confounding variables. Students with high scores are likely to have come from more priviledged backgrounds than those with the lowest scores. | ||
|
||
If we gave extra tuition (our experimental intervention) to students scoring in the lowest half of scores then we can easily imagine that we have large differences in some measure of privilege between test and control groups. At a first glance, this would seem to make the regression discontinuity design useless - the whole point of random assignment is to reduce or eliminate systematic biases between control and test groups. But use of a threshold would seem to maximise the differences in confounding variables between groups. Isn't this an odd thing to do? | ||
|
||
The key point however is that it is much less likely that students scoring just below and just above the threshold systematically differ in their degree of privilege. And so _if_ we find evidence of a meaningful discontinuity in a post-test score in those just above and just below the threshold, then it is much more plausible that the intervention (applied according to the threshold criteria) was causally responsible. | ||
|
||
## Sharp v.s. fuzzy regression discontinuity designs | ||
Note that regression discontinuity designs fall into two categories. This notebook focusses on _sharp_ regression discontinuity designs, but it is important to understand both sharp and fuzzy variants: | ||
|
||
- **Sharp:** Here, the assignment to control or treatment groups is purely dictated by the threshold. There is no uncertainty in which units are in which group. | ||
- **Fuzzy:** In some situations there may not be a sharp boundary between control and treatment based upon the threshold. This could happen for example if experimenters are not strict in assigning units to groups based on the threshold. Alternatively, there could be non-compliance on the side of the actual units being studied. For example, patients may not always be fully compliant in taking medication, so some unknown proportion of patients assigned to the test group may actually be in the control group because of non compliance. | ||
|
||
```{code-cell} ipython3 | ||
import arviz as az | ||
import matplotlib.pyplot as plt | ||
import numpy as np | ||
import pandas as pd | ||
import pymc as pm | ||
``` | ||
|
||
```{code-cell} ipython3 | ||
RANDOM_SEED = 123 | ||
rng = np.random.default_rng(RANDOM_SEED) | ||
az.style.use("arviz-darkgrid") | ||
%config InlineBackend.figure_format = 'retina' | ||
``` | ||
|
||
## Generate simulated data | ||
Note that here we assume that there is negligible/zero measurement noise, but that there is some variability in the true values from pre- to post-test. It is possible to take into account measurement noise on the pre- and post-test results, but we do not engage with that in this notebook. | ||
|
||
```{code-cell} ipython3 | ||
:tags: [hide-input] | ||
|
||
# define true parameters | ||
threshold = 0.0 | ||
treatment_effect = 0.7 | ||
N = 1000 | ||
sd = 0.3 # represents change between pre and post test with zero measurement error | ||
|
||
# No measurement error, but random change from pre to post | ||
df = ( | ||
pd.DataFrame.from_dict({"x": rng.normal(size=N)}) | ||
.assign(treated=lambda x: x.x < threshold) | ||
.assign(y=lambda x: x.x + rng.normal(loc=0, scale=sd, size=N) + treatment_effect * x.treated) | ||
) | ||
|
||
df | ||
``` | ||
|
||
```{code-cell} ipython3 | ||
:tags: [hide-input] | ||
|
||
def plot_data(df): | ||
fig, ax = plt.subplots(figsize=(12, 7)) | ||
ax.plot(df.x[~df.treated], df.y[~df.treated], "o", alpha=0.3, label="untreated") | ||
ax.plot(df.x[df.treated], df.y[df.treated], "o", alpha=0.3, label="treated") | ||
ax.axvline(x=threshold, color="k", ls="--", lw=3, label="treatment threshold") | ||
ax.set(xlabel=r"observed $x$ (pre-test)", ylabel=r"observed $y$ (post-test)") | ||
ax.legend() | ||
return ax | ||
|
||
|
||
plot_data(df); | ||
``` | ||
|
||
+++ {"tags": []} | ||
|
||
## Sharp regression discontinuity model | ||
|
||
We can define our Bayesian regression discontinuity model as: | ||
|
||
$$ | ||
\begin{aligned} | ||
\Delta & \sim \text{Cauchy}(0, 1) \\ | ||
\sigma & \sim \text{HalfNormal}(0, 1) \\ | ||
\mu & = x_i + \underbrace{\Delta \cdot treated_i}_{\text{treatment effect}} \\ | ||
y_i & \sim \text{Normal}(\mu, \sigma) | ||
\end{aligned} | ||
$$ | ||
|
||
where: | ||
- $\Delta$ is the size of the discontinuity, | ||
- $\sigma$ is the standard deviation of change in the pre- to post-test scores, | ||
- $x_i$ and $y_i$ are observed pre- and post-test measures for unit $i$, and | ||
- $treated_i$ is an observed indicator variable (0 for control group, 1 for test group). | ||
|
||
Notes: | ||
- We make the simplifying assumption of no measurement error. | ||
- Here, we confine ourselves to the sitation where we use the same measure (e.g. heart rate, educational attainment, upper arm circumference) for both the pre-test ($x$) and post-test ($y$). So the _untreated_ post-test measure can be modelled as $y \sim \text{Normal}(\mu=x, \sigma)$. | ||
- In the case that the pre- and post-test measuring instruments where not identical, then we would want to build slope and intercept parameters into $\mu$ to capture the 'exchange rate' between the pre- and post-test measures. | ||
- We assume we have accurately observed whether a unit has been treated or not. That is, this model assumes a sharp discontinuity with no uncertainty. | ||
|
||
```{code-cell} ipython3 | ||
with pm.Model() as model: | ||
x = pm.MutableData("x", df.x, dims="obs_id") | ||
treated = pm.MutableData("treated", df.treated, dims="obs_id") | ||
sigma = pm.HalfNormal("sigma", 1) | ||
delta = pm.Cauchy("effect", alpha=0, beta=1) | ||
mu = pm.Deterministic("mu", x + (delta * treated), dims="obs_id") | ||
pm.Normal("y", mu=mu, sigma=sigma, observed=df.y, dims="obs_id") | ||
|
||
pm.model_to_graphviz(model) | ||
``` | ||
|
||
## Inference | ||
|
||
```{code-cell} ipython3 | ||
with model: | ||
idata = pm.sample(random_seed=RANDOM_SEED) | ||
``` | ||
|
||
We can see that we get no sampling warnings, and that plotting the MCMC traces shows no issues. | ||
|
||
```{code-cell} ipython3 | ||
az.plot_trace(idata, var_names=["effect", "sigma"]); | ||
``` | ||
|
||
We can also see that we are able to accurately recover the true discontinuity magnitude (left) and the standard deviation of the change in units between pre- and post-test (right). | ||
|
||
```{code-cell} ipython3 | ||
az.plot_posterior( | ||
idata, var_names=["effect", "sigma"], ref_val=[treatment_effect, sd], hdi_prob=0.95 | ||
); | ||
``` | ||
|
||
The most important thing is the posterior over the `effect` parameter. We can use that to base a judgement about the strength of the effect (e.g. through the 95% credible interval) or the presence/absence of an effect (e.g. through a Bayes Factor or ROPE). | ||
|
||
+++ {"tags": []} | ||
|
||
## Counterfactual questions | ||
|
||
We can use posterior prediction to ask what would we expect to see if: | ||
- no units were exposed to the treatment (blue shaded region, which is very narrow) | ||
- all units were exposed to the treatment (orange shaded region). | ||
|
||
_Technical note:_ Formally we are doing posterior prediction of `y`. Running `pm.sample_posterior_predictive` multiple times with different random seeds will result in new and different samples of `y` each time. However, this is not the case (we are not formally doing posterior prediction) for `mu`. This is because `mu` is a deterministic function (`mu = x + delta*treated`), so for our already obtained posterior samples of `delta`, the values of `mu` will be entirely determined by the values of `x` and `treated` data). | ||
|
||
```{code-cell} ipython3 | ||
:tags: [] | ||
|
||
# MODEL EXPECTATION WITHOUT TREATMENT ------------------------------------ | ||
# probe data | ||
_x = np.linspace(np.min(df.x), np.max(df.x), 500) | ||
_treated = np.zeros(_x.shape) | ||
|
||
# posterior prediction (see technical note above) | ||
with model: | ||
pm.set_data({"x": _x, "treated": _treated}) | ||
ppc = pm.sample_posterior_predictive(idata, var_names=["mu", "y"]) | ||
|
||
# plotting | ||
ax = plot_data(df) | ||
az.plot_hdi( | ||
_x, | ||
ppc.posterior_predictive["mu"], | ||
color="C0", | ||
hdi_prob=0.95, | ||
ax=ax, | ||
fill_kwargs={"label": r"$\mu$ untreated"}, | ||
) | ||
|
||
# MODEL EXPECTATION WITH TREATMENT --------------------------------------- | ||
# probe data | ||
_x = np.linspace(np.min(df.x), np.max(df.x), 500) | ||
_treated = np.ones(_x.shape) | ||
|
||
# posterior prediction (see technical note above) | ||
with model: | ||
pm.set_data({"x": _x, "treated": _treated}) | ||
ppc = pm.sample_posterior_predictive(idata, var_names=["mu", "y"]) | ||
|
||
# plotting | ||
az.plot_hdi( | ||
_x, | ||
ppc.posterior_predictive["mu"], | ||
color="C1", | ||
hdi_prob=0.95, | ||
ax=ax, | ||
fill_kwargs={"label": r"$\mu$ treated"}, | ||
) | ||
ax.legend() | ||
``` | ||
|
||
The blue shaded region shows the 95% credible region of the expected value of the post-test measurement for a range of possible pre-test measures, in the case of no treatment. This is actually infinitely narrow because this particular model assumes $\mu=x$ (see above). | ||
|
||
The orange shaded region shows the 95% credible region of the expected value of the post-test measurement for a range of possible pre-test measures in the case of treatment. | ||
|
||
Both are actually very interesting as examples of counterfactual inference. We did not observe any units that were untreated below the threshold, nor any treated units above the treshold. But assuming our model is a good description of reality, we can ask the counterfactual questions "What if a unit above the threshold was treated?" and "What if a unit below the treshold was treated?" | ||
|
||
+++ | ||
|
||
## Summary | ||
In this notebook we have merely touched the surface of how to analyse data from regression discontinuity designs. Arguably, we have restricted our focus to almost the simplest possible case so that we can focus upon the core properties of the approach which allows causal claims to be made. | ||
|
||
+++ | ||
|
||
## Authors | ||
- Authored by [Benjamin T. Vincent](https://github.com/drbenvincent) in April 2022 | ||
|
||
+++ | ||
|
||
## Watermark | ||
|
||
```{code-cell} ipython3 | ||
%load_ext watermark | ||
%watermark -n -u -v -iv -w -p aesara,aeppl,xarray | ||
``` | ||
|
||
:::{include} ../page_footer.md | ||
::: |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note I hope helps with the other comments. I try to distinguish between posterior predictive and pushforward posterior variables. pushforward posterior variables are deterministic transformations of the posterior parameters, posterior predictive are variables that need an extra sampling step.
In this case,
mu = x + delta*treated
, once the posterior values are fixed, mu is also fixed. Here thesample_posterior_predictive
is being used as a convenience function to recalculatemu
with the modified data,but even if
sample_...
is called, there is no sampling involved. There is sampling in computingy
because values ofy
are draws from the normal distribution of meanmu
and stdsigma
. Multiple calls tosample_posterior_predictive
with different seeds will return different values ofy
but always the same values formu
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I have added a bit of text to clarify this. I would not want to change the code to manually calculate
mu
for new values ofx
andtreated
- I think the current code is very clear and convenient as you say. Hopefully this works for you?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change the code here at all, this was mostly for context