feat: add % of Stores (numeric distribution) metric by murray-ds · Pull Request #457 · Data-Simply/openretailscience

murray-ds · 2026-04-05T13:25:34Z

Summary

Add PctOfStores class in pyretailscience/metrics/distribution/pct_of_stores.py that computes the percentage of stores selling each product (numeric distribution)
Add ratio_metric utility in pyretailscience/metrics/base.py for safe division with NaN on zero denominator
Refactor Acv to follow updated conventions: group_by → group_col, keyword-only params, unconditional validate_columns, input handling before parameter validation
Add ColumnHelper guideline to CLAUDE.md and add product_id to ColumnHelper
Add documentation entries in docs/metrics.md and docs/api/metrics/distribution.md
Consolidate duplicate tests in test_date.py and test_options.py using pytest.mark.parametrize
Adopt ColumnHelper in test_acv.py for consistency with new guideline

Test plan

17 PctOfStores tests pass covering: basic calculation, 100% edge cases (all stores, single store), empty data, missing columns, invalid types, group_col, within_group, both pandas/ibis inputs, custom product_col (str and list), deduplication
All ACV tests updated and pass with ColumnHelper and group_col
Full test suite passes
Lint and format clean

🤖 Generated with Claude Code

Add PctOfStores class that computes the percentage of stores selling each product. Includes ratio_metric utility in metrics/base.py for safe division with NaN on zero denominator. Also refactors ACV to use the updated conventions: group_by → group_col, keyword-only params, unconditional validate_columns, and input handling before parameter validation. Updates docs, CLAUDE.md, ColumnHelper, and consolidates duplicate tests across date and options modules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude · 2026-04-05T13:25:48Z

Claude finished @murray-ds's task in 2m 21s —— View job

PR Review: feat: add % of Stores (numeric distribution) metric

Gather context and understand request
Read changed files
Review code quality, bugs, performance, security, and test coverage
Post review feedback

Overall this is a clean, well-structured PR. The PctOfStores implementation is correct, tests are thorough, and the refactoring work is consistent with project conventions. A few findings worth addressing:

`pyretailscience/metrics/distribution/pct_of_stores.py`

count() vs nunique() (minor clarity issue)

pct_of_stores.py:88-90

per_group = store_product.group_by(group_cols).aggregate(
    **{agg_stores_col: _[store_id_col].count()},
)

.count() is functionally correct here because store_product is already .distinct()-ed on [store_id_col, *group_cols], making each row unique. However, .nunique() would be more semantically explicit — a reader shouldn't have to trace back to line 85 to understand why .count() is safe. The within_group total-stores calculation on line 95 already uses .nunique(), so this inconsistency is a bit surprising at first glance.

Fix this →

Temp column name collision risk (low probability, worth noting)

pct_of_stores.py:17

_TEMP_TOTAL_STORES = "__prs_temp_total_stores__"

The double-underscore + prefix convention is a reasonable sentinel, but if a user's dataset happens to have a column named __prs_temp_total_stores__, it would cause a conflict when joining or dropping. This is very unlikely but worth a brief comment explaining the naming choice.

`pyretailscience/metrics/base.py`

scale type annotation could be more precise

base.py:16

def ratio_metric(
    numerator: ir.NumericValue,
    denominator: ir.NumericValue,
    scale: float = PERCENTAGE_SCALE,
) -> ir.FloatingValue:

The denominator type annotation says ir.NumericValue, but the function also works (and is used in pct_of_stores.py) with a scalar expression returned by .nunique(). ir.NumericValue is the correct base class for both column and scalar numeric expressions in ibis, so this is technically fine — just worth confirming the annotation is broad enough intentionally.

`pyretailscience/utils/date.py`

Behavioural breaking change: mixed timezone inputs now raise TypeError

date.py:193-196

if start_is_naive != end_is_naive:
    msg = "start_date and end_date must have matching timezone awareness. Got naive and aware (or vice versa)."
    raise TypeError(msg)

Previously, mixed inputs (e.g., a naive start_date + tz-aware end_date) were silently accepted and treated as naive. Now they raise a TypeError. This is an improvement, but it's a breaking change for any caller passing mismatched types. There are no tests covering this new error path. If this function is part of the public API, a test like:

def test_mismatched_timezone_raises(self):
    with pytest.raises(TypeError, match="matching timezone awareness"):
        find_overlapping_periods(
            datetime.datetime(2022, 1, 1),
            datetime.datetime(2023, 1, 1, tzinfo=datetime.timezone.utc),
        )

would document and protect this new behaviour. Fix this →

np import removal

The removal of import numpy as np / import pandas as pd and the switch from np.arange to range is a nice cleanup. range is semantically correct here since only integer years are iterated.

Codecov partial branch

Codecov reports 1 partial branch in pct_of_stores.py (98.03% patch coverage). Partial coverage typically means one side of a boolean short-circuit or a conditional expression isn't exercised. Given the test suite is otherwise exhaustive, this is probably a minor branch in the within_group logic. Worth identifying and filling — but not a blocker.

Tests

The test suite is excellent overall:

All public constructor branches are covered (pandas, ibis, invalid type, missing columns, overlap validation)
within_group=True/False semantics are clearly separated with inline comments explaining the expected arithmetic
test_duplicate_store_product_not_double_counted directly validates the .distinct() deduplication, which is the most important correctness invariant
Parametrize is used appropriately throughout (replacing the old duplicated test methods in test_date.py and test_options.py)

One minor note: test_empty_dataframe creates a DataFrame without a unit_spend column — this is intentional and correct since PctOfStores doesn't require it, but a brief comment would help future readers.

Summary

Area	Status
Core logic correctness	✅
Edge cases (empty, single store, dedupe, within_group)	✅
API consistency with `Acv` conventions	✅
Test quality	✅
Missing test for new `TypeError` in `date.py`	⚠️
`.count()` vs `.nunique()` semantic clarity	Minor
Codecov partial branch	Minor

github-actions · 2026-04-05T13:26:47Z

🚀 Cloudflare Pages Deployment

✅ Preview deployed successfully!

Preview URL: https://52253f13.pyretailscience-docs.pages.dev
Deployment ID: 52253f13-c8b9-48be-8022-fe207ae22cad

This preview will be updated automatically when you push new changes to this PR.

codecov · 2026-04-05T13:30:04Z

Codecov Report

❌ Patch coverage is 98.85057% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...etailscience/metrics/distribution/pct_of_stores.py	98.03%	0 Missing and 1 partial ⚠️

Files with missing lines	Coverage Δ
pyretailscience/metrics/base.py	`100.00% <100.00%> (ø)`
pyretailscience/metrics/distribution/acv.py	`96.77% <100.00%> (ø)`
pyretailscience/options.py	`98.23% <100.00%> (+<0.01%)`	⬆️
pyretailscience/utils/date.py	`96.10% <100.00%> (+0.71%)`	⬆️
...etailscience/metrics/distribution/pct_of_stores.py	`98.03% <98.03%> (ø)`

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- Add full Google-style docstring to _normalize_datetime (#10) - Adopt ColumnHelper in test_acv.py for consistency (#7) - Rename misleading test_within_group_with_user_column_named_total_stores to test_within_group_ignores_unrelated_extra_columns (#4) - Add group_col/within_group examples to docs/metrics.md (#9) - Fix PR description: product_col is str | None, not list (#2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

product_col now accepts str | list[str] | None as specified in the metric spec. Adds test_custom_product_col_list to verify multi-column product granularity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Assume Ibis datasets could be 1B–10B rows; avoid redundant operations on already-deduplicated data and prefer the cheapest correct operation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions Bot deployed to pyretailscience-docs (Preview) April 5, 2026 13:26 View deployment

github-actions Bot deployed to pyretailscience-docs (Preview) April 5, 2026 13:35 View deployment

feat: add list support for product_col in PctOfStores

c69436a

product_col now accepts str | list[str] | None as specified in the metric spec. Adds test_custom_product_col_list to verify multi-column product granularity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions Bot deployed to pyretailscience-docs (Preview) April 5, 2026 14:24 View deployment

docs: add large-dataset performance guideline to CLAUDE.md

11ca893

Assume Ibis datasets could be 1B–10B rows; avoid redundant operations on already-deduplicated data and prefer the cheapest correct operation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions Bot deployed to pyretailscience-docs (Preview) April 5, 2026 14:34 View deployment

mvanwyk merged commit fa285a1 into main Apr 5, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add % of Stores (numeric distribution) metric#457

feat: add % of Stores (numeric distribution) metric#457
mvanwyk merged 4 commits into
mainfrom
feature/metric-pct-of-stores

murray-ds commented Apr 5, 2026 •

edited

Loading

Uh oh!

claude Bot commented Apr 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 5, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

murray-ds commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

claude Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat: add % of Stores (numeric distribution) metric

pyretailscience/metrics/distribution/pct_of_stores.py

pyretailscience/metrics/base.py

pyretailscience/utils/date.py

Codecov partial branch

Tests

Summary

Uh oh!

github-actions Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Cloudflare Pages Deployment

Uh oh!

codecov Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

murray-ds commented Apr 5, 2026 •

edited

Loading

claude Bot commented Apr 5, 2026 •

edited

Loading

`pyretailscience/metrics/distribution/pct_of_stores.py`

`pyretailscience/metrics/base.py`

`pyretailscience/utils/date.py`

github-actions Bot commented Apr 5, 2026 •

edited

Loading

codecov Bot commented Apr 5, 2026 •

edited

Loading