Skip to content

Commit 82cff11

Browse files
committed
docs: refine CrossShop matrix entry and column-table notes
- Drop group_2_col from CrossShop's required columns (it defaults to group_1_col); add a note explaining the group_n_col defaulting. - Note that the standard column-names table lists recognised names, not a per-dataset required set (e.g. unit_price is absent from the sample). - Use singular "product" for the CustomerDecisionHierarchy level for consistency with the other rows. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01Ei3maqUvpytVcemk9cJYcx
1 parent 5224c98 commit 82cff11

1 file changed

Lines changed: 7 additions & 2 deletions

File tree

docs/getting_started/data_structures.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,9 @@ OpenRetailScience reads columns by a fixed set of standard names. The most commo
138138
| `unit_cost` | Cost per unit |
139139
| `store_id` | Store identifier |
140140

141+
Not every dataset contains every column — this is the set of names OpenRetailScience *recognises*, and each
142+
analysis needs only the ones it uses (the sample data, for example, has no `unit_price`).
143+
141144
If your warehouse uses different names, **do not rename every column**. Map your names onto the standard ones once,
142145
through the options system (a `openretailscience.toml` file, `option_context()`, or `set_option()`). See the
143146
[Options & configuration guide](options_guide.md) for the three approaches and when to use each.
@@ -260,10 +263,10 @@ function aggregates to.
260263
| RFMSegmentation | customer_id, transaction_id, transaction_date, unit_spend | pandas, Ibis | customer |
261264
| NLRSegmentation | customer_id, period_col, value_col | pandas, Ibis | customer |
262265
| GainLoss | customer_id, value_col | pandas | customer |
263-
| CrossShop | group_col, value_col, group_1_col, group_2_col | pandas, Ibis | customer |
266+
| CrossShop | group_col, value_col, group_1_col | pandas, Ibis | customer |
264267
| RevenueTree | customer_id, transaction_id, unit_spend, period_col | pandas, Ibis | period |
265268
| ProductAssociation | value_col, group_col | pandas, Ibis | customer |
266-
| CustomerDecisionHierarchy | customer_id, transaction_id, product_col | pandas | products |
269+
| CustomerDecisionHierarchy | customer_id, transaction_id, product_col | pandas | product |
267270
| CohortAnalysis | customer_id, transaction_date, aggregation_column | pandas, Ibis | cohort |
268271
| PurchasesPerCustomer | customer_id, transaction_id | pandas, Ibis | customer |
269272
| DaysBetweenPurchases | customer_id, transaction_date | pandas, Ibis | customer |
@@ -278,6 +281,8 @@ Notes on optional, behaviour-enhancing columns:
278281
- For `ProductAssociation`, `value_col` is the product identifier whose co-occurrence is measured.
279282
- `CrossShop` and `ProductAssociation` group by `group_col`, which defaults to `customer_id`; pass
280283
`group_col` (for example `transaction_id`) to analyse per basket or transaction instead.
284+
- `CrossShop` compares groups defined by `(group_n_col, group_n_val)` pairs; `group_2_col` and `group_3_col`
285+
default to `group_1_col`, so comparing values within one column needs only `group_1_col`.
281286

282287
!!! note "pandas-only analyses"
283288
Most functions accept either backend, but `GainLoss` and `CustomerDecisionHierarchy` operate on pandas

0 commit comments

Comments
 (0)