Skip to content

Add an RFM segmentation #135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
7 tasks
mvanwyk opened this issue Mar 13, 2025 · 0 comments
Closed
7 tasks

Add an RFM segmentation #135

mvanwyk opened this issue Mar 13, 2025 · 0 comments
Assignees

Comments

@mvanwyk
Copy link
Contributor

mvanwyk commented Mar 13, 2025

Implement Ibis query to calculate customer RFM (Recency, Frequency, Monetary) segmentation scores using a simple method.

SQL Code to implement in Ibis

WITH customer_metrics AS (
  SELECT
    customer_id,
    DATEDIFF(DAY, MAX(transaction_date), CURRENT_DATE) AS recency_days,
    COUNT(DISTINCT transaction_id) AS frequency,
    SUM(unit_spend) AS monetary
  FROM my_table
  GROUP BY customer_id
),

rfm_scores AS (
  SELECT
    customer_id,
    NTILE(10) OVER (ORDER BY recency_days DESC) AS r_score,
    NTILE(10) OVER (ORDER BY frequency DESC) AS f_score,
    NTILE(10) OVER (ORDER BY monetary DESC) AS m_score
  FROM customer_metrics
)

SELECT
  customer_id,
  r_score,
  f_score,
  m_score,
  (r_score * 100) + (f_score * 10) + m_score AS rfm_segment
FROM rfm_scores
ORDER BY rfm_segment DESC;
  • The function should allow the user to specify the "current date". ie pass a python date to the Ibis, rather than using the CURRENT_DATE provide by the database.
  • Use the column helper (or get_option) for column naming
  • Takes in a pandas dataframe or Ibis table
  • Add unit tests
  • Place the code in segmentation.py for now
  • Add an example in the analysis_modules.md like the HML Segmentation
  • Use the following description in the function docstring
    Calculates customer RFM segments based on transaction history.
    
    Segments customers using the RFM (Recency, Frequency, Monetary) methodology.
    Each dimension is scored 1-10 (10=best) and combined into a 3-digit segment.
    Higher segment values indicate more valuable customers.
    
    Args:
        { Replace with whatever args you add }
        
    Returns:
        DataFrame with columns:
            - customer_id: Unique customer identifier
            - r_score: Recency score (1-10)
            - f_score: Frequency score (1-10)
            - m_score: Monetary score (1-10)
            - rfm_segment: Combined segment score (range: 111-999)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants