Skip to content

Is calculate_impact correct? #496

@drbenvincent

Description

@drbenvincent

When I was looking at the multi-cell geolift notebook I started wondering if the calculate_impact method of the PyMCModel class was correct. In the post intervention period in the top plot you can see that the posterior expectation has narrow HDI's and the data points are far away. Yet when we look in the causal impact plot which simply looks at the difference between the data and the posterior mu distribution, we see larger HDI's which sometimes overlap with zero.

Image

This was confirmed, the current implementation calculates the causal impact as the difference between the data and the posterior predictive distribution.

def calculate_impact(
self, y_true: xr.DataArray, y_pred: az.InferenceData
) -> xr.DataArray:
impact = y_true - y_pred["posterior_predictive"]["y_hat"]
return impact.transpose(..., "obs_ind")

I think this should instead be a comparison between the data and the posterior expectation. So instead of y_pred["posterior_predictive"]["y_hat"] we should have y_pred["posterior_predictive"]["mu"]?

If so, then the implications are that our estimates of the causal impact will increase in precision. If the current code is in error, then we are not getting biased estimates, we are just getting less precise estimates than we should be getting out.

Making this change results in this...

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions