@@ -461,16 +461,55 @@ rev_tree = revenue_tree.RevenueTree(
461461
462462<div class =" clear " markdown >
463463
464- ![ Image title ] ( https://placehold.co/600x400/EEE/31343C ) { align=right loading=lazy width="50%"}
464+ ![ HML Segementation Distribution ] ( assets/images/analysis_modules/hml_segmentation.svg ) { align=right loading=lazy width="50%"}
465465
466- PASTE TEXT HERE
466+ Heavy, Medium, Light (HML) is a segmentation that places customers into groups based on their percentile of spend or the
467+ number of products they bought. Heavy customers are the top 20% of customers, medium are the next 30%, and light are the
468+ bottom 50% of customers. These values are chosen based on the proportions of the Pareto distribution. Often, purchase
469+ behavior follows this distribution, typified by the expression "20% of your customers generate 80% of your sales."
470+ HML segmentation helps answer questions such as:
471+
472+ - How much more are your best customers worth?
473+ - How much more could you spend acquiring your best customers?
474+ - What is the concentration of sales with your top (heavy) customers?
475+
476+ The module also handles customers with zero spend, with options to include them with light customers, exclude them
477+ entirely, or place them in a separate "Zero" segment.
467478
468479</div >
469480
470481Example:
471482
472483``` python
473- PASTE CODE HERE
484+ import numpy as np
485+ import pandas as pd
486+
487+ from pyretailscience.plots import bar
488+ from pyretailscience.segmentation import HMLSegmentation
489+
490+ # Create sample transaction data
491+ rng = np.random.default_rng(42 )
492+ df = pd.DataFrame(
493+ {
494+ " customer_id" : np.repeat(range (1 , 51 ), 3 ), # 50 customers with 3 transactions each
495+ " unit_spend" : rng.pareto(a = 1.5 , size = 150 ) * 20 , # Pareto distribution to mimic real spending
496+ },
497+ )
498+
499+ # Create HML segmentation
500+ seg = HMLSegmentation(df, zero_value_customers = " include_with_light" )
501+
502+ # Visualize spend by segment
503+ bar.plot(
504+ seg.df.groupby(" segment_name" )[" unit_spend" ].sum(),
505+ value_col = " unit_spend" ,
506+ source_text = " Source: PyRetailScience" ,
507+ sort_order = " descending" ,
508+ x_label = " " ,
509+ y_label = " Segment Spend" ,
510+ title = " What's the value of a Heavy customer?" ,
511+ rot = 0 ,
512+ )
474513```
475514
476515### Threshold Segmentation
0 commit comments