All else being equal, simpler is better. Weigh every change as improvement magnitude vs. complexity cost: a marginal gain that drags in a new abstraction, an extra parameter, or another layer of indirection is rarely worth it. Removing code while preserving or improving behavior is one of the best outcomes available — a net-negative diff is a feature, not a sign of cut corners.
Apply this lens to every change:
- Prefer deletion over addition when both achieve the goal. If a requirement can be met by removing a code path rather than guarding it, do that.
- Reject sunk-cost momentum. If a refactor is getting uglier as you go — more special cases, more wrappers, more comments needed to justify each step — stop and reconsider. The right answer is often a smaller change, or a different shape of change entirely.
- No premature abstraction. A helper for one caller has to be read every time too. Three similar lines beat a wrapper introduced "just in case" a second use site appears.
- Default to the boring shape. Flat functions beat dispatch tables for two cases; direct checks beat extension points; explicit code beats clever code.
When in doubt about whether a piece of complexity is earning its keep, leave it out. Adding it back when a real second use case demands it is cheap; removing an abstraction once code depends on its shape is not.
- Install dependencies:
uv sync - Install pre-commit hooks:
uv run pre-commit install— run this before your first commit. The hooks (ruff, conventional-commit, pytest, etc.) gate every commit and mirror CI; setting them up first catches failures locally instead of in the pipeline. - Run all tests:
uv run pytest - Run specific test:
uv run pytest tests/test_file.py::test_function_name - Run tests with pattern:
uv run pytest -k "pattern" - Run tests with coverage:
uv run pytest --cov=openretailscience - Lint code:
uv run ruff check . - Format code:
uv run ruff format . - Run Python scripts:
uv run python script.py - Run notebook:
uv run jupyter nbconvert --to notebook --execute my_notebook.ipynb - When running a specific test, use
pytest -k "test_name"if you don't already know the full node path (class name, parameterization, etc.). Don't guess full paths.
Note: Always use uv run to execute Python commands to ensure the correct virtual environment and
dependencies are used.
- Follow Google Python Style Guide
- Use Google-style docstrings with Args, Returns, and Raises sections
- Import order: standard library → third-party → internal packages
- Use explicit type annotations for all functions and parameters
- Class names use CamelCase; functions, methods, variables use snake_case
- Private methods/functions should be prefixed with underscore (_)
- Max line length: 120 characters
- Error handling: validate inputs at public API boundaries with specific
exceptions and descriptive messages. Private functions should not
re-validate inputs, but must include an explicit
raisefor unhandled cases in branching logic (e.g., if/elif chains dispatching on a type parameter) to prevent implicitNonereturns. Every code path must either return a value or raise an exception. - File naming: Module names should be lowercase with underscores
- Using double quotes when quoting strings
- When checking for None values use
if my_var is not Nonerather thanif my_var: - When checking for empty lists values use
if len(my_list) > 0rather thanif my_list: - Always create "conventional commit" commit messages
- Do not stage files or commit unless asked to
- Docstrings should specify argument and return types
- Type annotations should use Python 3.10-compatible formats (e.g., use
from __future__ import annotationsand import newer typing constructs likeSelfinsideif TYPE_CHECKING:blocks) - Remove unnecessary trailing whitespace
- Assume datasets processed via Ibis could be in the 1B–10B row range. Avoid redundant operations (e.g., a
.nunique()on already-deduplicated data) and prefer the cheapest correct operation for a given context. - Use vectorized operations for numpy arrays and pandas DataFrames/Series wherever possible instead of loops or .apply() to keep code clean and performant. Avoid converting Series to lists for iteration or using .iterrows() - use vectorized operations directly on the DataFrame/Series
- Extract magic numbers as named constants with descriptive names (e.g.,
GREEN_THRESHOLD = 1.0instead of hardcoded1.0in conditionals) - Add input validation for public functions and class methods: validate types, ranges, and constraints early with clear error messages
- Use ternary operators for simple conditional assignments (e.g.,
status = "active" if count > 0 else "inactive"instead of multi-line if/else blocks) - Avoid using
Anyorobjecttypes in annotations; determine and use the correct specific type whenever possible - Use
TYPE_CHECKINGblocks for type-only imports to avoid runtime import overhead. Do not usenoqa: TC003to suppress these warnings; instead, properly move type-only imports intoif TYPE_CHECKING:blocks - In LaTeX math blocks (
$$ ... $$), do not escape underscores inside\text{}commands. Use\text{unit_spend}, not\text{unit\_spend}— the backslash-escaped form renders a visible backslash in the output. - Shared validation helpers belong in
openretailscience/core/validation.py. Do not place validation helpers in metric-specific modules. - No nested function definitions. A
def(orasync def) must not appear inside another function's body. Lift inline helpers to module scope. If the helper genuinely needs to bind outer-scope state, take that state as an explicit parameter rather than relying on a closure. Lambdas are exempt; the rule is specifically aboutdef/async definside anotherdef. (Ruff has no built-in rule for this; enforce by review.)
Strongly prefer red/green TDD wherever possible. When implementing new functionality or fixing bugs:
- Red — Write a failing test first that captures the expected behavior
- Green — Write the minimum code to make the test pass
- Refactor — Clean up while keeping tests green
This applies to new features, bug fixes, and significant refactoring. Skip TDD only for trivial changes (typo fixes, comment updates, etc.) where writing a test first would be purely mechanical.
Confirming red/green status requires running the tests. Pyright diagnostics (type errors, missing parameters, etc.)
are not a substitute for actually executing uv run pytest on the relevant test file. A test is only "red" when it
fails at runtime, and only "green" when it passes at runtime.
- Test package functionality, not libraries: Every test must import and exercise OpenRetailScience code. Do not test pandas, Python built-ins, or third-party libraries in isolation.
- Verify behavior, not execution: Tests must have meaningful assertions that validate the actual behavior claimed by the test name. Checking return types or "not None" is insufficient.
- Test outputs, not implementation: Focus on public API behavior and results. Avoid testing internal state, private
attributes, or "how" code works internally. Testing private method outputs/behavior is acceptable (e.g.,
obj._calculate(5) == 10), but testing internal state is not (e.g.,obj._cache == {...}). - Use realistic retail data: Test data should reflect real retail scenarios (customer_id, store_id, transaction amounts) rather than generic placeholders ("A", "B", "test", "data").
- Prefer test classes for organizing related tests (group tests by module or class being tested)
- Always put imports at the top of the module (never import inside test methods)
- Where possible prefer pandas
assert_frame_equalto asserting individual values of the dataframe- This may require creating an expected dataframe to compare against
- Avoid
check_dtype=False; match the expected dtype instead (it lazily hides real dtype regressions)
- Use
pytest.mark.parametrizewhen testing multiple input variations of the same behavior - Test files should follow pattern:
test_*.pywith test functionstest_* - Test names must accurately describe what is being tested
- Each test should have at least one meaningful assertion that validates the claimed behavior
- Include boundary/edge case tests for threshold values, limits, and special cases
- When testing against expected values (colors, formats, etc.), reference package constants rather than hardcoding values in tests
- Use
ColumnHelperfor column names in test DataFrames (e.g.,cols.store_id,cols.customer_id) instead of hardcoding string literals like"store_id". This keeps tests decoupled from the current option defaults. - Use pytest fixtures for shared test data setup to improve readability and reduce duplication
- Don't test without package imports: If no OpenRetailScience code is imported, the test is invalid
- Don't test Python/library basics: Avoid tests that verify fundamental Python or standard library behavior (set operations, dict.get(), list methods, string operations) without exercising OpenRetailScience functionality. These tests belong in Python's own test suite.
- Don't write assertion-free tests: Every test must have assertions that can actually fail. Avoid:
- Tests with no assertions at all
- Only
assert obj is not Noneafter object creation (always true) - Only
assert hasattr()orcallable()checks without behavior verification - Only
assert isinstance()type checks without validating actual behavior - Assertions that can never fail (e.g.,
assert x >= 0 or x < 0) - Vague comparison assertions that only check "something changed" (e.g.,
assert filtered_df.shape != unfiltered_df.shape). Better: verify the actual filtering behavior (e.g.,assert len(filtered_df) == 5andassert (filtered_df['price'] > 100).all()). This includes count-existence checks likeassert len(x) > 0or>= 1when the expected count is knowable (e.g., one label per bar, one legend entry per category). Pin the count to its source:assert len(bar_labels) == len(bars). A regression that drops all but one item should fail the test.
- Don't duplicate tests: If you are about to write a new test method whose body follows the same pattern as an
existing method in the class (same assertion, same exception, same setup) but with different input values, stop and
add a
pytest.mark.parametrizeentry to the existing method instead. This applies even when the methods test "different parameters" — if the structure is identical, they belong in one parametrized test. - Don't over-mock: Mock external dependencies only (databases, APIs, file I/O, network calls). Never mock internal OpenRetailScience logic or calculations. Test real package behavior.
- Don't write trivial tests: Avoid testing constants equal their definitions, class names, or tautological assertions (True == True)
- Don't test implementation details: Avoid asserting internal state values or data structures. Testing private method behavior/outputs is OK; testing how data is stored internally is not.
- Don't test configuration defaults: Avoid tests that only verify default values match their definitions (e.g.,
assert get_option("foo") == "bar"when you just set"foo": "bar"in the defaults dictionary). These are tautological and can only fail from typos noticed immediately. Instead, test that configured values are actually used by package functionality through integration tests. - Don't use generic test data: Use domain-specific examples (retail customer data, not "A", "B", "C")
- Don't mismatch test names and assertions: If test claims to verify sorting, actually assert sort order. If it claims validation, verify validation logic works.
Before committing test code, verify each test meets these criteria:
- Imports OpenRetailScience code: At least one import from
openretailscience.* - Calls package functionality: Test actually invokes package functions/classes/methods
- Has meaningful assertions: Assertions verify actual behavior, not just types or existence
- Test name matches behavior: Assertions validate what the test name claims to test
- No substantial duplication: If two or more test methods share the same structure (same assertion pattern, same
exception, same setup) and differ only in one or two input values, they must be combined with
pytest.mark.parametrize. Look for this across the entire test class, not just adjacent methods. - Uses realistic data: Test data reflects retail domain (customer IDs, prices, stores, etc.)
- Minimal mocking: Only external dependencies are mocked, not internal package logic