Skip to content

Refactor effect_summary to experiment-owned implementations#606

Merged
drbenvincent merged 11 commits intopymc-labs:mainfrom
JeanVanDyk:refactor_summary
Jan 8, 2026
Merged

Refactor effect_summary to experiment-owned implementations#606
drbenvincent merged 11 commits intopymc-labs:mainfrom
JeanVanDyk:refactor_summary

Conversation

@JeanVanDyk
Copy link
Contributor

@JeanVanDyk JeanVanDyk commented Dec 23, 2025

Hi everyone,

I removed the override of effect_summary in interrupted_time_series.py, which removed in total ~100 lines of duplicated logic.

While reviewing the current implementation, I have three architectural concerns that I’d like to discuss before we merge:

  1. Relocation of _comparison_period_summary : The _comparison_period_summary method is currently located in iterrupted_time_series.py. Given that its primary responsibility is formatting data for output, it feels like it might be a better fit for reporting.py. Moving it there would help centralize our reporting logic and make it easier to find in the future, such as the _effect_summary_{experiment type} we already find there.

  2. Refactoring base.py (Lines 200–375): The logic currently sitting between lines 200 and 375appears to be exclusively specific to experiments :

if experiment_type == "`XXX":
            return _effect_summary_XXX(
                self,
                direction=direction,
                alpha=alpha,
                min_effect=min_effect,
            )

To keep base.py clean and maintainable, I believe it would be better to move this code into its corresponding experiment-specific class. This ensures our base class remains generic and doesn't "leak" logic from individual experiments.

We could however keep lines 314-375 into another method which would be called both by SyntheticControl and InterruptedTimeSeries to prevent duplicate logic.

  1. Test Organization: I am adding extensive tests for these changes. Following the patterns I've seen in the repo, I’ve placed these in the tests/ folder. However, to prevent this folder from growing out of control as we scale, I’d like to propose we organize the tests/ directory to mirror our source directory structure (e.g., tests/models/ and tests/utils/).

Let me know what you think! I’m happy to handle the refactoring if we’re in agreement on the direction.


📚 Documentation preview 📚: https://causalpy--606.org.readthedocs.build/en/606/

@cursor
Copy link

cursor bot commented Dec 23, 2025

PR Summary

Streamlines effect summarization by decentralizing logic and expanding ITS capabilities.

  • Removes monolithic BaseExperiment.effect_summary and redundant imports; keeps only EffectSummary contract
  • Implements effect_summary per experiment: DiD/PrePostNEGD, RD, RKink, Staggered DiD, Synthetic Control; InstrumentalVariable and InversePropensityWeighting return NotImplemented
  • Rewrites InterruptedTimeSeries.effect_summary with period support ("intervention", "post", "comparison") and prefix-based prose; adds _comparison_period_summary
  • Adds extensive ITS tests (tests/test_its_effect_summary.py) covering PyMC/OLS, datetime/integer indices, windows, parameters, validation, and consistency
  • Updates docs badge SVG (interrogate coverage 96.2% → 96.4%)

Written by Cursor Bugbot for commit d7b0fdc. This will update automatically on new commits. Configure here.

@codecov
Copy link

codecov bot commented Dec 23, 2025

Codecov Report

❌ Patch coverage is 97.50567% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.08%. Comparing base (9ddf58c) to head (be9c69b).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
causalpy/experiments/synthetic_control.py 75.00% 3 Missing and 3 partials ⚠️
causalpy/experiments/interrupted_time_series.py 87.17% 2 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #606      +/-   ##
==========================================
+ Coverage   93.74%   94.08%   +0.34%     
==========================================
  Files          41       42       +1     
  Lines        6827     7190     +363     
  Branches      458      458              
==========================================
+ Hits         6400     6765     +365     
+ Misses        267      262       -5     
- Partials      160      163       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution! The removal of ~175 lines of duplicated code is a win, and the test coverage you've added is excellent.

I also appreciate you raising the architectural concerns proactively. You've identified the same issues I've been thinking about: the base class is starting to accumulate experiment-specific logic, and _comparison_period_summary feels misplaced.

My main concern with the current approach: The base class now accesses ITS-specific attributes like datapost, treatment_time, and treatment_end_time - note the # type: ignore[attr-defined] comments, which are a signal that we're reaching into subclass territory. This creates an awkward coupling where BaseExperiment knows about concepts that only exist in ITS.

This feels like a good opportunity to rethink the reporting architecture more holistically rather than patching incrementally.
I see two promising directions:

Option 1: Experiment-owned effect_summary() with shared utilities
Each experiment class owns its effect_summary() implementation entirely. The base class either has no implementation or declares it abstract. reporting.py becomes a pure utility module—stateless functions that take data and return results, with no knowledge of experiment objects. ITS handles its three-period logic; SC handles multi-unit logic; scalar experiments (DiD, RD, RKink) handle theirs. Code reuse happens through utility function calls, not inheritance.

Option 2: Strategy/Generator pattern
Create an EffectSummaryGenerator hierarchy in reporting.py. Each experiment instantiates the appropriate generator (e.g., TimeSeriesEffectGenerator, ThreePeriodEffectGenerator). This keeps reporting logic centralized while avoiding base class bloat.

Both approaches keep the base class clean and make experiment-specific logic explicit. I'd lean toward Option 1 for its simplicity—it's easier to understand where logic lives, and pure utility functions are straightforward to test.

What do you think? Happy to discuss further or pair on a refactor if you're interested in tackling this. Your three-period work and tests would slot in nicely once the architecture is settled.

@drbenvincent drbenvincent added the refactor Refactor, clean up, or improvement with no visible changes to the user label Dec 25, 2025
@JeanVanDyk
Copy link
Contributor Author

Thanks for the detailed feedback! I’m glad to see the reduction in code duplication.

I completely agree with your point regarding the # type: ignore[attr-defined] signals. It’s clear that BaseExperiment shouldn't be burdened with knowing the internals of treatment_time or the specific three-period logic of an ITS.

Decision: Moving forward with Option 1
I’ve decided to go with Option 1. It feels like the most "Pythonic" and straightforward path forward. By making effect_summary() experiment-owned and treating reporting.py as a stateless utility module, we gain a much clearer mental model of where logic lives. It effectively removes the "reach-back" coupling where the base class has to guess what the subclasses are doing.

Future Refinements:

  • Logic Deduplication: If we identify shared logic across different experiment types (like standard formatting for confidence intervals), we'll have the option to move those into the reporting.py utility file to keep the experiment classes focused on their specific data structures.
  • Summary Generator: While I'm starting with Option 1 for its simplicity, having this cleaner separation now leaves the door open for a more formal Generator pattern later if the reporting logic grows significantly more complex.

@drbenvincent
Copy link
Collaborator

bugbot review

@drbenvincent
Copy link
Collaborator

Inclined to merge soon. Looks like a good improvement. Just dropping in some auto generated summary to document the changes in prose form, not just code :)


TL;DR

This PR moves summary logic into the experiment classes.

Before: BaseExperiment.effect_summary() contained a big if/elif chain that detected the experiment type and dispatched to the appropriate logic.

After: BaseExperiment.effect_summary() is abstract, and each experiment class (InterruptedTimeSeries, DifferenceInDifferences, SyntheticControl, etc.) implements its own effect_summary() method. The base class no longer needs to "know" about its subclasses.


Before: Centralized Dispatch in Base Class

┌─────────────────────────────────────────────────────────────┐
│                      BaseExperiment                          │
│                                                              │
│  effect_summary():                                           │
│    experiment_type = _detect_experiment_type(self)           │
│    is_pymc = isinstance(self.model, PyMCModel)               │
│                                                              │
│    if experiment_type == "rd":                               │
│        return _effect_summary_rd(...)                        │
│    elif experiment_type == "rkink":                          │
│        return _effect_summary_rkink(...)                     │
│    elif experiment_type == "did":                            │
│        if is_pymc:                                           │
│            return _effect_summary_did(...)                   │
│        else:                                                 │
│            # OLS DiD logic...                                │
│    else:  # ITS or Synthetic Control                         │
│        # 80+ lines of shared time-series logic...            │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Problems:

  • Base class "knows" about all experiment types
  • 132 lines of dispatch logic in base.py
  • Mixing experiment-type dispatch with model-type dispatch
  • Hard to extend or modify individual experiments

After: Experiment-Owned Implementation

┌─────────────────────────────────────────────────────────────┐
│                      BaseExperiment                          │
│                                                              │
│  @abstractmethod                                             │
│  effect_summary(...) -> EffectSummary:                       │
│      """Each experiment implements its own summary."""       │
│      raise NotImplementedError                               │
│                                                              │
└─────────────────────────────────────────────────────────────┘
                              │
          ┌───────────────────┼───────────────────┐
          │                   │                   │
          ▼                   ▼                   ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│       ITS       │ │  SyntheticCtrl  │ │       DiD       │
│                 │ │                 │ │                 │
│ effect_summary: │ │ effect_summary: │ │ effect_summary: │
│  - 3-period     │ │  - windows      │ │  - PyMC path    │
│  - intervention │ │  - cumulative   │ │  - OLS path     │
│  - comparison   │ │  - relative     │ │                 │
└─────────────────┘ └─────────────────┘ └─────────────────┘
          │                   │                   │
          └───────────────────┼───────────────────┘
                              ▼
                    ┌─────────────────┐
                    │   reporting.py  │
                    │  (utilities)    │
                    │                 │
                    │ _compute_stats  │
                    │ _generate_table │
                    │ _generate_prose │
                    └─────────────────┘

Benefits:

  • Each experiment owns its summary logic
  • Base class is generic and clean
  • reporting.py is a stateless utility module
  • Easy to extend individual experiments
  • Clear separation of concerns

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JeanVanDyk if you can update from main and resolve the conflicts then I'm happy to merge

@drbenvincent drbenvincent changed the title Refactoring ITS summary method Refactor effect_summary to experiment-owned implementations Jan 8, 2026
@drbenvincent drbenvincent merged commit 096608c into pymc-labs:main Jan 8, 2026
10 checks passed
@JeanVanDyk JeanVanDyk deleted the refactor_summary branch February 17, 2026 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactor Refactor, clean up, or improvement with no visible changes to the user

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments