-
Notifications
You must be signed in to change notification settings - Fork 89
Add Staggered Difference-in-Differences functionality #621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #621 +/- ##
==========================================
+ Coverage 93.27% 93.62% +0.34%
==========================================
Files 37 39 +2
Lines 5632 6411 +779
Branches 367 434 +67
==========================================
+ Hits 5253 6002 +749
- Misses 248 258 +10
- Partials 131 151 +20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
bugbot review |
PR SummaryAdds an imputation‑based Staggered Difference‑in‑Differences estimator with full reporting, data sim, and docs.
Written by Cursor Bugbot for commit 20ef5fd. This will update automatically on new commits. Configure here. |
Adds hdi_prob parameter to Bayesian aggregation in StaggeredDifferenceInDifferences and stores it for accurate interval reporting. Updates reporting to use the actual HDI probability used in computation, ensuring effect summaries match the computed intervals. Includes a test to verify correct storage and reporting of hdi_prob.
Previously, get_plot_data_bayesian always returned pre-computed 94% HDI intervals, ignoring the hdi_prob argument. This update recomputes the intervals when a different hdi_prob is requested. Added an integration test to verify that the method now returns intervals matching the requested hdi_prob.
The staggered DiD estimator now computes and reports pre-treatment (event_time < 0) placebo effects for eventually-treated units, in addition to post-treatment ATTs. This provides a diagnostic for the parallel trends assumption. Plots and printouts distinguish placebo and ATT estimates, and tests are updated to verify both are present and placebo effects are near zero.
Added detailed markdown explanations to the notebook for the `att_event_time_` and `att_group_time_` attributes. The new content clarifies the purpose, use cases, and structure of each table, helping users understand when and how to use these outputs for reporting, analysis, and diagnostics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive support for Staggered Difference-in-Differences (DiD) analysis to CausalPy, enabling researchers to analyze causal effects when treatment is adopted at different times across units. The implementation follows the imputation-based approach of Borusyak et al. (2024), fitting models on untreated observations only and using predictions to estimate counterfactual outcomes for treated units.
Key Changes
- Implements
StaggeredDifferenceInDifferencesexperiment class with support for both PyMC (Bayesian) and sklearn (OLS) models - Adds
generate_staggered_did_data()simulation function for creating synthetic panel data with staggered treatment adoption and dynamic treatment effects - Extends effect summary reporting to recognize and summarize staggered DiD results, including event-time ATT estimates, pre-treatment placebo checks, and cohort information
Reviewed changes
Copilot reviewed 9 out of 13 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| causalpy/experiments/staggered_did.py | New 915-line implementation of StaggeredDifferenceInDifferences class with data validation, model fitting, effect aggregation, and plotting capabilities |
| causalpy/tests/test_staggered_did.py | Comprehensive test suite with 1622 lines covering integration tests, input validation, core functionality, edge cases, and recovery tests |
| causalpy/reporting.py | Adds experiment type detection for staggered DiD and implements _effect_summary_staggered_did() function to generate prose and tabular summaries |
| causalpy/data/simulate_data.py | Adds generate_staggered_did_data() function to generate synthetic panel data with configurable cohorts, treatment effects, and noise |
| causalpy/experiments/base.py | Integrates staggered DiD effect summary into the base experiment class's effect_summary() method |
| causalpy/experiments/init.py | Exports StaggeredDifferenceInDifferences class in the experiments module |
| causalpy/init.py | Exports StaggeredDifferenceInDifferences class at the top-level package |
| docs/source/references.bib | Adds two academic references for staggered DiD methodology (Borusyak et al. 2024, Goodman-Bacon 2021) |
| docs/source/notebooks/index.md | Adds staggered_did_pymc.ipynb to the documentation notebook index |
| docs/source/_static/interrogate_badge.svg | Updates documentation coverage badge from 96.3% to 96.8% |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if ( | ||
| abs(avg_pre_att) < 0.1 * abs(avg_post_att) | ||
| if len(post_treatment) > 0 | ||
| else True | ||
| ): |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The placebo check logic has a potential issue when avg_post_att is zero. The condition abs(avg_pre_att) < 0.1 * abs(avg_post_att) will always be False when avg_post_att is 0, even if avg_pre_att is also 0. This could incorrectly flag a parallel trends violation when there's simply no treatment effect. Consider using an absolute threshold or handling the zero case explicitly.
| if ( | |
| abs(avg_pre_att) < 0.1 * abs(avg_post_att) | |
| if len(post_treatment) > 0 | |
| else True | |
| ): | |
| # When post-treatment effects exist and are non-zero, use a relative threshold. | |
| # When the average post-treatment effect is (near) zero, fall back to a small | |
| # absolute threshold for the placebo to avoid spuriously flagging violations. | |
| if len(post_treatment) > 0: | |
| if abs(avg_post_att) > 0: | |
| placebo_ok = abs(avg_pre_att) < 0.1 * abs(avg_post_att) | |
| else: | |
| # No detectable average treatment effect; treat very small pre-treatment | |
| # effects as consistent with parallel trends. | |
| placebo_ok = abs(avg_pre_att) < 1e-6 | |
| else: | |
| placebo_ok = True | |
| if placebo_ok: |
| Tuple (min_event_time, max_event_time) to restrict event-time aggregation. | ||
| If None, uses all available event-times. | ||
| reference_event_time : int, optional | ||
| Event-time to use as reference (normalized to zero effect) in plots. |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring states that reference_event_time is used as "reference (normalized to zero effect) in plots," but this parameter is not actually used anywhere in the plotting methods (_bayesian_plot or _ols_plot). Either implement this functionality or remove the parameter and its documentation.
| Event-time to use as reference (normalized to zero effect) in plots. | |
| Event-time index associated with plots (reserved for future use). |
| event_window: tuple[int, int] | None = None, | ||
| reference_event_time: int = -1, | ||
| **kwargs: dict, | ||
| ) -> None: |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kwargs parameter in the __init__ method is not used anywhere in the constructor. If it's not needed for API consistency with other experiments, it should be removed. If it is needed for consistency, add a comment explaining this.
| ) -> None: | |
| ) -> None: | |
| # NOTE: kwargs is accepted for API compatibility with other experiment classes | |
| # and is intentionally not used inside this constructor. |
| markersize=7, | ||
| color="gray", | ||
| alpha=0.7, | ||
| label="Placebo estimate (94% HDI)", |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plot labels hardcode "94% HDI" in the legend, but the actual HDI probability used can be configured via the hdi_prob parameter in aggregate_effects_bayesian (line 415). The labels should use the stored hdi_prob value to reflect the actual interval probability being displayed. Consider using f"Placebo estimate ({int(self.hdi_prob_*100)}% HDI)" instead of the hardcoded string.
| capthick=2, | ||
| markersize=8, | ||
| color="C0", | ||
| label="ATT estimate (94% HDI)", |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plot labels hardcode "94% HDI" in the legend, but the actual HDI probability used can be configured via the hdi_prob parameter in aggregate_effects_bayesian (line 415). The labels should use the stored hdi_prob value to reflect the actual interval probability being displayed. Consider using f"ATT estimate ({int(self.hdi_prob_*100)}% HDI)" instead of the hardcoded string.
| def _bayesian_plot( | ||
| self, round_to: int | None = None, **kwargs: dict | ||
| ) -> tuple[plt.Figure, list[plt.Axes]]: | ||
| """Plot event-study results for Bayesian model. | ||
| Parameters | ||
| ---------- | ||
| round_to : int, optional | ||
| Number of decimals for rounding in plot titles. |
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The round_to parameter is documented but never used in the _bayesian_plot method. Either remove this parameter and its documentation, or implement rounding functionality for the plot if needed.
| self, round_to: int | None = None, **kwargs: dict | ||
| ) -> tuple[plt.Figure, list[plt.Axes]]: | ||
| """Plot event-study results for OLS model. | ||
| Parameters | ||
| ---------- | ||
| round_to : int, optional | ||
| Number of decimals for rounding in plot titles. | ||
Copilot
AI
Dec 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The round_to parameter is documented but never used in the _ols_plot method. Either remove this parameter and its documentation, or implement rounding functionality for the plot if needed.
| self, round_to: int | None = None, **kwargs: dict | |
| ) -> tuple[plt.Figure, list[plt.Axes]]: | |
| """Plot event-study results for OLS model. | |
| Parameters | |
| ---------- | |
| round_to : int, optional | |
| Number of decimals for rounding in plot titles. | |
| self, **kwargs: dict | |
| ) -> tuple[plt.Figure, list[plt.Axes]]: | |
| """Plot event-study results for OLS model. |
Closes #620
This pull request adds support for Staggered Difference-in-Differences (DiD) analysis to the codebase, including new data simulation utilities, experiment classes, effect summary reporting, and documentation updates. These changes make it possible to analyze and summarize causal effects in settings where treatment is adopted at different times across units.
Staggered DiD Support
StaggeredDifferenceInDifferencesexperiment class in bothcausalpy/__init__.pyandcausalpy/experiments/__init__.py, making it available as a public API. [1] [2] [3] [4]generate_staggered_did_datatocausalpy/data/simulate_data.pyfor creating synthetic panel data with staggered treatment adoption and dynamic treatment effects.causalpy/experiments/base.pyandcausalpy/reporting.pyto recognize and summarize staggered DiD results, including prose and table outputs for event-time average treatment effects (ATTs). [1] [2] [3] [4]Documentation and References
staggered_did_pymc.ipynbto the documentation index.📚 Documentation preview 📚: https://causalpy--621.org.readthedocs.build/en/621/