-
Notifications
You must be signed in to change notification settings - Fork 77
Description
When I was looking at the multi-cell geolift notebook I started wondering if the calculate_impact
method of the PyMCModel
class was correct. In the post intervention period in the top plot you can see that the posterior expectation has narrow HDI's and the data points are far away. Yet when we look in the causal impact plot which simply looks at the difference between the data and the posterior mu distribution, we see larger HDI's which sometimes overlap with zero.

This was confirmed, the current implementation calculates the causal impact as the difference between the data and the posterior predictive distribution.
CausalPy/causalpy/pymc_models.py
Lines 169 to 173 in 714c48b
def calculate_impact( | |
self, y_true: xr.DataArray, y_pred: az.InferenceData | |
) -> xr.DataArray: | |
impact = y_true - y_pred["posterior_predictive"]["y_hat"] | |
return impact.transpose(..., "obs_ind") |
I think this should instead be a comparison between the data and the posterior expectation. So instead of y_pred["posterior_predictive"]["y_hat"]
we should have y_pred["posterior_predictive"]["mu"]
?
If so, then the implications are that our estimates of the causal impact will increase in precision. If the current code is in error, then we are not getting biased estimates, we are just getting less precise estimates than we should be getting out.
Making this change results in this...
