Skip to content

Conversation

@drbenvincent
Copy link
Collaborator

@drbenvincent drbenvincent commented Dec 29, 2025

Closes #620

This pull request adds support for Staggered Difference-in-Differences (DiD) analysis to the codebase, including new data simulation utilities, experiment classes, effect summary reporting, and documentation updates. These changes make it possible to analyze and summarize causal effects in settings where treatment is adopted at different times across units.

Staggered DiD Support

  • Added import and export of the new StaggeredDifferenceInDifferences experiment class in both causalpy/__init__.py and causalpy/experiments/__init__.py, making it available as a public API. [1] [2] [3] [4]
  • Added a new data simulation function generate_staggered_did_data to causalpy/data/simulate_data.py for creating synthetic panel data with staggered treatment adoption and dynamic treatment effects.
  • Updated experiment type detection and effect summary logic in causalpy/experiments/base.py and causalpy/reporting.py to recognize and summarize staggered DiD results, including prose and table outputs for event-time average treatment effects (ATTs). [1] [2] [3] [4]

Documentation and References

  • Added a new Jupyter notebook example staggered_did_pymc.ipynb to the documentation index.
  • Added a key literature reference (Borusyak et al., 2024) on robust event-study designs to the bibliography.

📚 Documentation preview 📚: https://causalpy--621.org.readthedocs.build/en/621/

@drbenvincent drbenvincent added enhancement New feature or request major labels Dec 29, 2025
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov
Copy link

codecov bot commented Dec 29, 2025

Codecov Report

❌ Patch coverage is 96.14891% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.62%. Comparing base (fbf8a61) to head (deae3dc).

Files with missing lines Patch % Lines
causalpy/experiments/staggered_did.py 93.37% 7 Missing and 13 partials ⚠️
causalpy/reporting.py 76.92% 2 Missing and 4 partials ⚠️
causalpy/data/simulate_data.py 94.59% 1 Missing and 1 partial ⚠️
causalpy/tests/test_staggered_did.py 99.51% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #621      +/-   ##
==========================================
+ Coverage   93.27%   93.62%   +0.34%     
==========================================
  Files          37       39       +2     
  Lines        5632     6411     +779     
  Branches      367      434      +67     
==========================================
+ Hits         5253     6002     +749     
- Misses        248      258      +10     
- Partials      131      151      +20     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@drbenvincent
Copy link
Collaborator Author

bugbot review

@cursor
Copy link

cursor bot commented Dec 30, 2025

PR Summary

Adds an imputation‑based Staggered Difference‑in‑Differences estimator with full reporting, data sim, and docs.

  • New StaggeredDifferenceInDifferences experiment: trains on untreated observations, predicts counterfactuals, computes group‑time and event‑time ATT (supports PyMC and OLS), and provides plotting/get_plot_data
  • New generate_staggered_did_data utility to simulate staggered adoption panel data with dynamic effects
  • Reporting: detect staggered_did via att_event_time_ and produce event‑time ATT tables/prose (effect_summary path wired in BaseExperiment)
  • Public API exports added; extensive unit/integration tests; docs index updated with staggered_did_pymc.ipynb; badge refreshed

Written by Cursor Bugbot for commit 20ef5fd. This will update automatically on new commits. Configure here.

Adds hdi_prob parameter to Bayesian aggregation in StaggeredDifferenceInDifferences and stores it for accurate interval reporting. Updates reporting to use the actual HDI probability used in computation, ensuring effect summaries match the computed intervals. Includes a test to verify correct storage and reporting of hdi_prob.
Previously, get_plot_data_bayesian always returned pre-computed 94% HDI intervals, ignoring the hdi_prob argument. This update recomputes the intervals when a different hdi_prob is requested. Added an integration test to verify that the method now returns intervals matching the requested hdi_prob.
The staggered DiD estimator now computes and reports pre-treatment (event_time < 0) placebo effects for eventually-treated units, in addition to post-treatment ATTs. This provides a diagnostic for the parallel trends assumption. Plots and printouts distinguish placebo and ATT estimates, and tests are updated to verify both are present and placebo effects are near zero.
Added detailed markdown explanations to the notebook for the `att_event_time_` and `att_group_time_` attributes. The new content clarifies the purpose, use cases, and structure of each table, helping users understand when and how to use these outputs for reporting, analysis, and diagnostics.
@drbenvincent drbenvincent requested a review from Copilot December 30, 2025 09:17
@drbenvincent drbenvincent marked this pull request as ready for review December 30, 2025 09:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive support for Staggered Difference-in-Differences (DiD) analysis to CausalPy, enabling researchers to analyze causal effects when treatment is adopted at different times across units. The implementation follows the imputation-based approach of Borusyak et al. (2024), fitting models on untreated observations only and using predictions to estimate counterfactual outcomes for treated units.

Key Changes

  • Implements StaggeredDifferenceInDifferences experiment class with support for both PyMC (Bayesian) and sklearn (OLS) models
  • Adds generate_staggered_did_data() simulation function for creating synthetic panel data with staggered treatment adoption and dynamic treatment effects
  • Extends effect summary reporting to recognize and summarize staggered DiD results, including event-time ATT estimates, pre-treatment placebo checks, and cohort information

Reviewed changes

Copilot reviewed 9 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
causalpy/experiments/staggered_did.py New 915-line implementation of StaggeredDifferenceInDifferences class with data validation, model fitting, effect aggregation, and plotting capabilities
causalpy/tests/test_staggered_did.py Comprehensive test suite with 1622 lines covering integration tests, input validation, core functionality, edge cases, and recovery tests
causalpy/reporting.py Adds experiment type detection for staggered DiD and implements _effect_summary_staggered_did() function to generate prose and tabular summaries
causalpy/data/simulate_data.py Adds generate_staggered_did_data() function to generate synthetic panel data with configurable cohorts, treatment effects, and noise
causalpy/experiments/base.py Integrates staggered DiD effect summary into the base experiment class's effect_summary() method
causalpy/experiments/init.py Exports StaggeredDifferenceInDifferences class in the experiments module
causalpy/init.py Exports StaggeredDifferenceInDifferences class at the top-level package
docs/source/references.bib Adds two academic references for staggered DiD methodology (Borusyak et al. 2024, Goodman-Bacon 2021)
docs/source/notebooks/index.md Adds staggered_did_pymc.ipynb to the documentation notebook index
docs/source/_static/interrogate_badge.svg Updates documentation coverage badge from 96.3% to 96.8%

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +412 to +416
if (
abs(avg_pre_att) < 0.1 * abs(avg_post_att)
if len(post_treatment) > 0
else True
):
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placebo check logic has a potential issue when avg_post_att is zero. The condition abs(avg_pre_att) < 0.1 * abs(avg_post_att) will always be False when avg_post_att is 0, even if avg_pre_att is also 0. This could incorrectly flag a parallel trends violation when there's simply no treatment effect. Consider using an absolute threshold or handling the zero case explicitly.

Suggested change
if (
abs(avg_pre_att) < 0.1 * abs(avg_post_att)
if len(post_treatment) > 0
else True
):
# When post-treatment effects exist and are non-zero, use a relative threshold.
# When the average post-treatment effect is (near) zero, fall back to a small
# absolute threshold for the placebo to avoid spuriously flagging violations.
if len(post_treatment) > 0:
if abs(avg_post_att) > 0:
placebo_ok = abs(avg_pre_att) < 0.1 * abs(avg_post_att)
else:
# No detectable average treatment effect; treat very small pre-treatment
# effects as consistent with parallel trends.
placebo_ok = abs(avg_pre_att) < 1e-6
else:
placebo_ok = True
if placebo_ok:

Copilot uses AI. Check for mistakes.
Tuple (min_event_time, max_event_time) to restrict event-time aggregation.
If None, uses all available event-times.
reference_event_time : int, optional
Event-time to use as reference (normalized to zero effect) in plots.
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring states that reference_event_time is used as "reference (normalized to zero effect) in plots," but this parameter is not actually used anywhere in the plotting methods (_bayesian_plot or _ols_plot). Either implement this functionality or remove the parameter and its documentation.

Suggested change
Event-time to use as reference (normalized to zero effect) in plots.
Event-time index associated with plots (reserved for future use).

Copilot uses AI. Check for mistakes.
event_window: tuple[int, int] | None = None,
reference_event_time: int = -1,
**kwargs: dict,
) -> None:
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kwargs parameter in the __init__ method is not used anywhere in the constructor. If it's not needed for API consistency with other experiments, it should be removed. If it is needed for consistency, add a comment explaining this.

Suggested change
) -> None:
) -> None:
# NOTE: kwargs is accepted for API compatibility with other experiment classes
# and is intentionally not used inside this constructor.

Copilot uses AI. Check for mistakes.
markersize=7,
color="gray",
alpha=0.7,
label="Placebo estimate (94% HDI)",
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plot labels hardcode "94% HDI" in the legend, but the actual HDI probability used can be configured via the hdi_prob parameter in aggregate_effects_bayesian (line 415). The labels should use the stored hdi_prob value to reflect the actual interval probability being displayed. Consider using f"Placebo estimate ({int(self.hdi_prob_*100)}% HDI)" instead of the hardcoded string.

Copilot uses AI. Check for mistakes.
capthick=2,
markersize=8,
color="C0",
label="ATT estimate (94% HDI)",
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plot labels hardcode "94% HDI" in the legend, but the actual HDI probability used can be configured via the hdi_prob parameter in aggregate_effects_bayesian (line 415). The labels should use the stored hdi_prob value to reflect the actual interval probability being displayed. Consider using f"ATT estimate ({int(self.hdi_prob_*100)}% HDI)" instead of the hardcoded string.

Copilot uses AI. Check for mistakes.
Comment on lines +615 to +623
def _bayesian_plot(
self, round_to: int | None = None, **kwargs: dict
) -> tuple[plt.Figure, list[plt.Axes]]:
"""Plot event-study results for Bayesian model.
Parameters
----------
round_to : int, optional
Number of decimals for rounding in plot titles.
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The round_to parameter is documented but never used in the _bayesian_plot method. Either remove this parameter and its documentation, or implement rounding functionality for the plot if needed.

Copilot uses AI. Check for mistakes.
Comment on lines +701 to +709
self, round_to: int | None = None, **kwargs: dict
) -> tuple[plt.Figure, list[plt.Axes]]:
"""Plot event-study results for OLS model.
Parameters
----------
round_to : int, optional
Number of decimals for rounding in plot titles.
Copy link

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The round_to parameter is documented but never used in the _ols_plot method. Either remove this parameter and its documentation, or implement rounding functionality for the plot if needed.

Suggested change
self, round_to: int | None = None, **kwargs: dict
) -> tuple[plt.Figure, list[plt.Axes]]:
"""Plot event-study results for OLS model.
Parameters
----------
round_to : int, optional
Number of decimals for rounding in plot titles.
self, **kwargs: dict
) -> tuple[plt.Figure, list[plt.Axes]]:
"""Plot event-study results for OLS model.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request major

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Staggered adoption Difference-in-Differences / Event Study support (imputation-based)

2 participants