Skip to content

Commit 8f733d4

Browse files
committed
docs: added hml seg example to analysis modules page
1 parent 9b5a92f commit 8f733d4

2 files changed

Lines changed: 1671 additions & 3 deletions

File tree

docs/analysis_modules.md

Lines changed: 42 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -461,16 +461,55 @@ rev_tree = revenue_tree.RevenueTree(
461461

462462
<div class="clear" markdown>
463463

464-
![Image title](https://placehold.co/600x400/EEE/31343C){ align=right loading=lazy width="50%"}
464+
![HML Segementation Distribution](assets/images/analysis_modules/hml_segmentation.svg){ align=right loading=lazy width="50%"}
465465

466-
PASTE TEXT HERE
466+
Heavy, Medium, Light (HML) is a segmentation that places customers into groups based on their percentile of spend or the
467+
number of products they bought. Heavy customers are the top 20% of customers, medium are the next 30%, and light are the
468+
bottom 50% of customers. These values are chosen based on the proportions of the Pareto distribution. Often, purchase
469+
behavior follows this distribution, typified by the expression "20% of your customers generate 80% of your sales."
470+
HML segmentation helps answer questions such as:
471+
472+
- How much more are your best customers worth?
473+
- How much more could you spend acquiring your best customers?
474+
- What is the concentration of sales with your top (heavy) customers?
475+
476+
The module also handles customers with zero spend, with options to include them with light customers, exclude them
477+
entirely, or place them in a separate "Zero" segment.
467478

468479
</div>
469480

470481
Example:
471482

472483
```python
473-
PASTE CODE HERE
484+
import numpy as np
485+
import pandas as pd
486+
487+
from pyretailscience.plots import bar
488+
from pyretailscience.segmentation import HMLSegmentation
489+
490+
# Create sample transaction data
491+
rng = np.random.default_rng(42)
492+
df = pd.DataFrame(
493+
{
494+
"customer_id": np.repeat(range(1, 51), 3), # 50 customers with 3 transactions each
495+
"unit_spend": rng.pareto(a=1.5, size=150) * 20, # Pareto distribution to mimic real spending
496+
},
497+
)
498+
499+
# Create HML segmentation
500+
seg = HMLSegmentation(df, zero_value_customers="include_with_light")
501+
502+
# Visualize spend by segment
503+
bar.plot(
504+
seg.df.groupby("segment_name")["unit_spend"].sum(),
505+
value_col="unit_spend",
506+
source_text="Source: PyRetailScience",
507+
sort_order="descending",
508+
x_label="",
509+
y_label="Segment Spend",
510+
title="What's the value of a Heavy customer?",
511+
rot=0,
512+
)
474513
```
475514

476515
### Threshold Segmentation

0 commit comments

Comments
 (0)