You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The get_indexes function calculates index values and is typically used with the index plot, rather than calling the function directly. The function calculates indexes via Pandas, which is slow and needs the data to be loaded into the memory. By using Ibis, we can quickly push these calculations to the database.
The index plot works by comparing a subgroup versus the total group. For instance, if we broke customers into Heavy, Medium and Light segments, we might ask the question, what does the Light group buy more (or less) of than the other groups. The typical way you would do this is to look at the Light group's % of spend on a category (eg Music) versus the % of spend for all customers. For instance, the Light group might spend 10% of their spend on the Music category versus an average across all customers of 5%. To get the index, we then take (10% / 5%) * 100 and get an index of 200. Typically, an index >= 120 is considered significantly overindexed. An index <= 80 is considered significantly underindexed.
Presently, to identify the "Light" segment, you would pass in the Pandas index locations of the rows where a customer has been segmented as "Light" (see the code example below df_index_filter=df["segment_name"] == "Light"). This won't work with Ibis as Ibis works with database table-like objects, and they don't have the concept of an index, so we will have to change it.
My thinking is to split it into parameters. An index_col and value_to_index., parameters. Eg the below. Let me know if you think the naming is confusing.
See here for a description of Index Plots
The get_indexes function calculates index values and is typically used with the index plot, rather than calling the function directly. The function calculates indexes via Pandas, which is slow and needs the data to be loaded into the memory. By using Ibis, we can quickly push these calculations to the database.
The index plot works by comparing a subgroup versus the total group. For instance, if we broke customers into Heavy, Medium and Light segments, we might ask the question, what does the Light group buy more (or less) of than the other groups. The typical way you would do this is to look at the Light group's % of spend on a category (eg Music) versus the % of spend for all customers. For instance, the Light group might spend 10% of their spend on the Music category versus an average across all customers of 5%. To get the index, we then take (10% / 5%) * 100 and get an index of 200. Typically, an index >= 120 is considered significantly overindexed. An index <= 80 is considered significantly underindexed.
Presently, to identify the "Light" segment, you would pass in the Pandas index locations of the rows where a customer has been segmented as "Light" (see the code example below
df_index_filter=df["segment_name"] == "Light"
). This won't work with Ibis as Ibis works with database table-like objects, and they don't have the concept of an index, so we will have to change it.My thinking is to split it into parameters. An
index_col
andvalue_to_index.
, parameters. Eg the below. Let me know if you think the naming is confusing.I think the rest should be relatively straightforward.
Notes
ibis.memtable(df)
The text was updated successfully, but these errors were encountered: