Convert the get_indexes feature to use Ibis #92

mvanwyk · 2025-02-10T19:33:56Z

See here for a description of Index Plots

The get_indexes function calculates index values and is typically used with the index plot, rather than calling the function directly. The function calculates indexes via Pandas, which is slow and needs the data to be loaded into the memory. By using Ibis, we can quickly push these calculations to the database.

The index plot works by comparing a subgroup versus the total group. For instance, if we broke customers into Heavy, Medium and Light segments, we might ask the question, what does the Light group buy more (or less) of than the other groups. The typical way you would do this is to look at the Light group's % of spend on a category (eg Music) versus the % of spend for all customers. For instance, the Light group might spend 10% of their spend on the Music category versus an average across all customers of 5%. To get the index, we then take (10% / 5%) * 100 and get an index of 200. Typically, an index >= 120 is considered significantly overindexed. An index <= 80 is considered significantly underindexed.

Presently, to identify the "Light" segment, you would pass in the Pandas index locations of the rows where a customer has been segmented as "Light" (see the code example below df_index_filter=df["segment_name"] == "Light"). This won't work with Ibis as Ibis works with database table-like objects, and they don't have the concept of an index, so we will have to change it.

from pyretailscience.standard_graphs import index_plot

index_plot(
    df,
    df_index_filter=df["segment_name"] == "Light",
    value_col="unit_price",
    group_col="category_0_name",
)

My thinking is to split it into parameters. An index_col and value_to_index., parameters. Eg the below. Let me know if you think the naming is confusing.

from pyretailscience.standard_graphs import index_plot

index_plot(
    df,
    index_col="segment_name",
    value_to_index="Light",
    value_col="unit_price",
    group_col="category_0_name",
)

I think the rest should be relatively straightforward.

Notes

The user should be able to pass in a data frame or an Ibis table. If they pass in a data frame, then convert it to an Ibis table via ibis.memtable(df)
If necessary, extend the unit tests to handle any edge cases that are not currently covered
Please update the index plots section of analysis_modules.md with the new version of the code
Update the index_plot function to make it compatible with the updated get_indexes function

The text was updated successfully, but these errors were encountered:

mvanwyk assigned mayurkmmt Feb 10, 2025

coderabbitai bot mentioned this issue Feb 18, 2025

refactor with ibis #95

Merged

mvanwyk closed this as completed Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert the get_indexes feature to use Ibis #92

Convert the get_indexes feature to use Ibis #92

mvanwyk commented Feb 10, 2025 •

edited

Loading

Convert the get_indexes feature to use Ibis #92

Convert the get_indexes feature to use Ibis #92

Comments

mvanwyk commented Feb 10, 2025 • edited Loading

mvanwyk commented Feb 10, 2025 •

edited

Loading