Add Staggered Difference-in-Differences functionality #621

drbenvincent · 2025-12-29T19:51:34Z

Closes #620

This pull request adds support for Staggered Difference-in-Differences (DiD) analysis to the codebase, including new data simulation utilities, experiment classes, effect summary reporting, and documentation updates. These changes make it possible to analyze and summarize causal effects in settings where treatment is adopted at different times across units.

Staggered DiD Support

Added import and export of the new StaggeredDifferenceInDifferences experiment class in both causalpy/__init__.py and causalpy/experiments/__init__.py, making it available as a public API. [1] [2] [3] [4]
Added a new data simulation function generate_staggered_did_data to causalpy/data/simulate_data.py for creating synthetic panel data with staggered treatment adoption and dynamic treatment effects.
Updated experiment type detection and effect summary logic in causalpy/experiments/base.py and causalpy/reporting.py to recognize and summarize staggered DiD results, including prose and table outputs for event-time average treatment effects (ATTs). [1] [2] [3] [4]

Documentation and References

Added a new Jupyter notebook example staggered_did_pymc.ipynb to the documentation index.
Added a key literature reference (Borusyak et al., 2024) on robust event-study designs to the bibliography.

📚 Documentation preview 📚: https://causalpy--621.org.readthedocs.build/en/621/

review-notebook-app · 2025-12-29T19:51:40Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2025-12-29T19:59:40Z

Codecov Report

❌ Patch coverage is 96.14891% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.62%. Comparing base (fbf8a61) to head (deae3dc).

Files with missing lines	Patch %	Lines
causalpy/experiments/staggered_did.py	93.37%	7 Missing and 13 partials ⚠️
causalpy/reporting.py	76.92%	2 Missing and 4 partials ⚠️
causalpy/data/simulate_data.py	94.59%	1 Missing and 1 partial ⚠️
causalpy/tests/test_staggered_did.py	99.51%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #621      +/-   ##
==========================================
+ Coverage   93.27%   93.62%   +0.34%     
==========================================
  Files          37       39       +2     
  Lines        5632     6411     +779     
  Branches      367      434      +67     
==========================================
+ Hits         5253     6002     +749     
- Misses        248      258      +10     
- Partials      131      151      +20

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

drbenvincent · 2025-12-30T07:16:58Z

bugbot review

cursor · 2025-12-30T07:17:03Z

PR Summary

Adds an imputation‑based Staggered Difference‑in‑Differences estimator with full reporting, data sim, and docs.

New StaggeredDifferenceInDifferences experiment: trains on untreated observations, predicts counterfactuals, computes group‑time and event‑time ATT (supports PyMC and OLS), and provides plotting/get_plot_data
New generate_staggered_did_data utility to simulate staggered adoption panel data with dynamic effects
Reporting: detect staggered_did via att_event_time_ and produce event‑time ATT tables/prose (effect_summary path wired in BaseExperiment)
Public API exports added; extensive unit/integration tests; docs index updated with staggered_did_pymc.ipynb; badge refreshed

^{Written by Cursor Bugbot for commit 20ef5fd. This will update automatically on new commits. Configure here.}

causalpy/reporting.py

causalpy/experiments/staggered_did.py

Adds hdi_prob parameter to Bayesian aggregation in StaggeredDifferenceInDifferences and stores it for accurate interval reporting. Updates reporting to use the actual HDI probability used in computation, ensuring effect summaries match the computed intervals. Includes a test to verify correct storage and reporting of hdi_prob.

Previously, get_plot_data_bayesian always returned pre-computed 94% HDI intervals, ignoring the hdi_prob argument. This update recomputes the intervals when a different hdi_prob is requested. Added an integration test to verify that the method now returns intervals matching the requested hdi_prob.

The staggered DiD estimator now computes and reports pre-treatment (event_time < 0) placebo effects for eventually-treated units, in addition to post-treatment ATTs. This provides a diagnostic for the parallel trends assumption. Plots and printouts distinguish placebo and ATT estimates, and tests are updated to verify both are present and placebo effects are near zero.

Added detailed markdown explanations to the notebook for the `att_event_time_` and `att_group_time_` attributes. The new content clarifies the purpose, use cases, and structure of each table, helping users understand when and how to use these outputs for reporting, analysis, and diagnostics.

Copilot

Pull request overview

This PR adds comprehensive support for Staggered Difference-in-Differences (DiD) analysis to CausalPy, enabling researchers to analyze causal effects when treatment is adopted at different times across units. The implementation follows the imputation-based approach of Borusyak et al. (2024), fitting models on untreated observations only and using predictions to estimate counterfactual outcomes for treated units.

Key Changes

Implements StaggeredDifferenceInDifferences experiment class with support for both PyMC (Bayesian) and sklearn (OLS) models
Adds generate_staggered_did_data() simulation function for creating synthetic panel data with staggered treatment adoption and dynamic treatment effects
Extends effect summary reporting to recognize and summarize staggered DiD results, including event-time ATT estimates, pre-treatment placebo checks, and cohort information

Reviewed changes

Copilot reviewed 9 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
causalpy/experiments/staggered_did.py	New 915-line implementation of StaggeredDifferenceInDifferences class with data validation, model fitting, effect aggregation, and plotting capabilities
causalpy/tests/test_staggered_did.py	Comprehensive test suite with 1622 lines covering integration tests, input validation, core functionality, edge cases, and recovery tests
causalpy/reporting.py	Adds experiment type detection for staggered DiD and implements `_effect_summary_staggered_did()` function to generate prose and tabular summaries
causalpy/data/simulate_data.py	Adds `generate_staggered_did_data()` function to generate synthetic panel data with configurable cohorts, treatment effects, and noise
causalpy/experiments/base.py	Integrates staggered DiD effect summary into the base experiment class's `effect_summary()` method
causalpy/experiments/init.py	Exports StaggeredDifferenceInDifferences class in the experiments module
causalpy/init.py	Exports StaggeredDifferenceInDifferences class at the top-level package
docs/source/references.bib	Adds two academic references for staggered DiD methodology (Borusyak et al. 2024, Goodman-Bacon 2021)
docs/source/notebooks/index.md	Adds staggered_did_pymc.ipynb to the documentation notebook index
docs/source/_static/interrogate_badge.svg	Updates documentation coverage badge from 96.3% to 96.8%

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-30T09:23:06Z

causalpy/reporting.py

+        if (
+            abs(avg_pre_att) < 0.1 * abs(avg_post_att)
+            if len(post_treatment) > 0
+            else True
+        ):


The placebo check logic has a potential issue when avg_post_att is zero. The condition abs(avg_pre_att) < 0.1 * abs(avg_post_att) will always be False when avg_post_att is 0, even if avg_pre_att is also 0. This could incorrectly flag a parallel trends violation when there's simply no treatment effect. Consider using an absolute threshold or handling the zero case explicitly.

Suggested change

if (

abs(avg_pre_att) < 0.1 * abs(avg_post_att)

if len(post_treatment) > 0

else True

):

# When post-treatment effects exist and are non-zero, use a relative threshold.

# When the average post-treatment effect is (near) zero, fall back to a small

# absolute threshold for the placebo to avoid spuriously flagging violations.

if len(post_treatment) > 0:

if abs(avg_post_att) > 0:

placebo_ok = abs(avg_pre_att) < 0.1 * abs(avg_post_att)

else:

# No detectable average treatment effect; treat very small pre-treatment

# effects as consistent with parallel trends.

placebo_ok = abs(avg_pre_att) < 1e-6

else:

placebo_ok = True

if placebo_ok:

Copilot · 2025-12-30T09:23:06Z

causalpy/experiments/staggered_did.py

+        Tuple (min_event_time, max_event_time) to restrict event-time aggregation.
+        If None, uses all available event-times.
+    reference_event_time : int, optional
+        Event-time to use as reference (normalized to zero effect) in plots.


The docstring states that reference_event_time is used as "reference (normalized to zero effect) in plots," but this parameter is not actually used anywhere in the plotting methods (_bayesian_plot or _ols_plot). Either implement this functionality or remove the parameter and its documentation.

Suggested change

Event-time to use as reference (normalized to zero effect) in plots.

Event-time index associated with plots (reserved for future use).

Copilot · 2025-12-30T09:23:07Z

causalpy/experiments/staggered_did.py

+        event_window: tuple[int, int] | None = None,
+        reference_event_time: int = -1,
+        **kwargs: dict,
+    ) -> None:


The kwargs parameter in the __init__ method is not used anywhere in the constructor. If it's not needed for API consistency with other experiments, it should be removed. If it is needed for consistency, add a comment explaining this.

Suggested change

) -> None:

) -> None:

# NOTE: kwargs is accepted for API compatibility with other experiment classes

# and is intentionally not used inside this constructor.

Copilot · 2025-12-30T09:23:07Z

causalpy/experiments/staggered_did.py

+                markersize=7,
+                color="gray",
+                alpha=0.7,
+                label="Placebo estimate (94% HDI)",


The plot labels hardcode "94% HDI" in the legend, but the actual HDI probability used can be configured via the hdi_prob parameter in aggregate_effects_bayesian (line 415). The labels should use the stored hdi_prob value to reflect the actual interval probability being displayed. Consider using f"Placebo estimate ({int(self.hdi_prob_*100)}% HDI)" instead of the hardcoded string.

Copilot · 2025-12-30T09:23:07Z

causalpy/experiments/staggered_did.py

+                capthick=2,
+                markersize=8,
+                color="C0",
+                label="ATT estimate (94% HDI)",


The plot labels hardcode "94% HDI" in the legend, but the actual HDI probability used can be configured via the hdi_prob parameter in aggregate_effects_bayesian (line 415). The labels should use the stored hdi_prob value to reflect the actual interval probability being displayed. Consider using f"ATT estimate ({int(self.hdi_prob_*100)}% HDI)" instead of the hardcoded string.

Copilot · 2025-12-30T09:23:07Z

causalpy/experiments/staggered_did.py

+    def _bayesian_plot(
+        self, round_to: int | None = None, **kwargs: dict
+    ) -> tuple[plt.Figure, list[plt.Axes]]:
+        """Plot event-study results for Bayesian model.
+
+        Parameters
+        ----------
+        round_to : int, optional
+            Number of decimals for rounding in plot titles.


The round_to parameter is documented but never used in the _bayesian_plot method. Either remove this parameter and its documentation, or implement rounding functionality for the plot if needed.

Copilot · 2025-12-30T09:23:08Z

causalpy/experiments/staggered_did.py

+        self, round_to: int | None = None, **kwargs: dict
+    ) -> tuple[plt.Figure, list[plt.Axes]]:
+        """Plot event-study results for OLS model.
+
+        Parameters
+        ----------
+        round_to : int, optional
+            Number of decimals for rounding in plot titles.
+


The round_to parameter is documented but never used in the _ols_plot method. Either remove this parameter and its documentation, or implement rounding functionality for the plot if needed.

Suggested change

self, round_to: int | None = None, **kwargs: dict

) -> tuple[plt.Figure, list[plt.Axes]]:

"""Plot event-study results for OLS model.

Parameters

----------

round_to : int, optional

Number of decimals for rounding in plot titles.

self, **kwargs: dict

) -> tuple[plt.Figure, list[plt.Axes]]:

"""Plot event-study results for OLS model.

initial commit by Claude + tweaking of the example notebook by Ben

8cb45f3

drbenvincent requested review from NathanielF and juanitorduz December 29, 2025 19:51

drbenvincent added enhancement New feature or request major labels Dec 29, 2025

drbenvincent added 3 commits December 29, 2025 20:06

more info on the model + formula

d4918d7

remove plt.tight_layout

9d5be12

add some explanation of summary stats outputs

cc2d9d2

drbenvincent mentioned this pull request Dec 29, 2025

[meta issue] Additional quasi-experimental procedures tracker #607

Open

11 tasks

increase test coverage

20ef5fd

cursor bot reviewed Dec 30, 2025

View reviewed changes

causalpy/reporting.py Show resolved Hide resolved

causalpy/reporting.py Show resolved Hide resolved

causalpy/experiments/staggered_did.py Outdated Show resolved Hide resolved

drbenvincent added 8 commits December 30, 2025 07:58

make uml

158f2cc

Add note on scalar vs dynamic treatment effects in DiD

157c21a

Add Goodman-Bacon (2021) citation to notebook and references

3a2d8f3

Revise scikit-learn section in staggered DiD notebook

deae3dc

drbenvincent requested a review from Copilot December 30, 2025 09:17

Copilot started reviewing on behalf of drbenvincent December 30, 2025 09:18 View session

drbenvincent marked this pull request as ready for review December 30, 2025 09:19

Copilot AI reviewed Dec 30, 2025

View reviewed changes

-        if (
-            abs(avg_pre_att) < 0.1 * abs(avg_post_att)
-            if len(post_treatment) > 0
-            else True
-        ):
+        # When post-treatment effects exist and are non-zero, use a relative threshold.
+        # When the average post-treatment effect is (near) zero, fall back to a small
+        # absolute threshold for the placebo to avoid spuriously flagging violations.
+        if len(post_treatment) > 0:
+            if abs(avg_post_att) > 0:
+                placebo_ok = abs(avg_pre_att) < 0.1 * abs(avg_post_att)
+            else:
+                # No detectable average treatment effect; treat very small pre-treatment
+                # effects as consistent with parallel trends.
+                placebo_ok = abs(avg_pre_att) < 1e-6
+        else:
+            placebo_ok = True
+        if placebo_ok:

	Event-time to use as reference (normalized to zero effect) in plots.
	Event-time index associated with plots (reserved for future use).

Add Staggered Difference-in-Differences functionality #621

Are you sure you want to change the base?

Add Staggered Difference-in-Differences functionality #621

Uh oh!

Conversation

drbenvincent commented Dec 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Dec 29, 2025

Uh oh!

codecov bot commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

drbenvincent commented Dec 30, 2025

Uh oh!

cursor bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drbenvincent commented Dec 29, 2025 •

edited by github-actions bot

Loading

codecov bot commented Dec 29, 2025 •

edited

Loading

cursor bot commented Dec 30, 2025 •

edited

Loading