-
Notifications
You must be signed in to change notification settings - Fork 1
Description
See here for a description of Index Plots
The get_indexes function calculates index values and is typically used with the index plot, rather than calling the function directly. The function calculates indexes via Pandas, which is slow and needs the data to be loaded into the memory. By using Ibis, we can quickly push these calculations to the database.
The index plot works by comparing a subgroup versus the total group. For instance, if we broke customers into Heavy, Medium and Light segments, we might ask the question, what does the Light group buy more (or less) of than the other groups. The typical way you would do this is to look at the Light group's % of spend on a category (eg Music) versus the % of spend for all customers. For instance, the Light group might spend 10% of their spend on the Music category versus an average across all customers of 5%. To get the index, we then take (10% / 5%) * 100 and get an index of 200. Typically, an index >= 120 is considered significantly overindexed. An index <= 80 is considered significantly underindexed.
Presently, to identify the "Light" segment, you would pass in the Pandas index locations of the rows where a customer has been segmented as "Light" (see the code example below df_index_filter=df["segment_name"] == "Light"). This won't work with Ibis as Ibis works with database table-like objects, and they don't have the concept of an index, so we will have to change it.
from pyretailscience.standard_graphs import index_plot
index_plot(
df,
df_index_filter=df["segment_name"] == "Light",
value_col="unit_price",
group_col="category_0_name",
)My thinking is to split it into parameters. An index_col and value_to_index., parameters. Eg the below. Let me know if you think the naming is confusing.
from pyretailscience.standard_graphs import index_plot
index_plot(
df,
index_col="segment_name",
value_to_index="Light",
value_col="unit_price",
group_col="category_0_name",
)I think the rest should be relatively straightforward.
Notes
- The user should be able to pass in a data frame or an Ibis table. If they pass in a data frame, then convert it to an Ibis table via
ibis.memtable(df) - If necessary, extend the unit tests to handle any edge cases that are not currently covered
- Please update the index plots section of analysis_modules.md with the new version of the code
- Update the index_plot function to make it compatible with the updated get_indexes function