-
Notifications
You must be signed in to change notification settings - Fork 1
cross-shop #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cross-shop #112
Conversation
WalkthroughThe changes update the Changes
Sequence Diagram(s)sequenceDiagram
actor Client as "Caller"
participant CS as "CrossShop"
participant Converter as "Table Converter"
participant Aggregator as "Aggregator"
Client->>CS: Instantiate with DataFrame/ibis.Table and group parameters
CS->>Converter: Check input type (pd.DataFrame?)
alt Input is Pandas DataFrame
Converter-->>CS: Convert to ibis.Table
else Already an ibis.Table
CS-->>CS: Proceed without conversion
end
CS->>Aggregator: Apply group filters using column names and values
Aggregator-->>CS: Return aggregated result
CS-->>Client: Return final DataFrame
Poem
Tip CodeRabbit's docstrings feature is now available as part of our Pro Plan! Simply use the command ✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAttention: Patch coverage is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/test_cross_shop.py (1)
18-31
: Consider adding a test for 'Hats' or removing it if not needed.
The newly introduced "Hats" category is unused in the group definitions for test cases. Including it without any corresponding test scenario can create confusion or leave potential coverage gaps.pyretailscience/cross_shop.py (1)
20-26
: Constructor signature successfully abstracts groups via column-value pairs.
This approach is clean and flexible. In the future, if you need additional groups beyond three, consider generalizing to a list of (column, value) pairs.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pyretailscience/cross_shop.py
(6 hunks)tests/test_cross_shop.py
(6 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
pyretailscience/cross_shop.py
[warning] 58-58: pyretailscience/cross_shop.py#L58
Added line #L58 was not covered by tests
🔇 Additional comments (9)
tests/test_cross_shop.py (5)
41-44
: Usage of the new group-based parameters looks good.
This relies on the new approach for defining groups by column-value pairs and is consistent with the refactored CrossShop logic.
63-68
: Three-group setup is correctly aligned with the refactored CrossShop class.
The updated parameters for group_3 match the new constructor signature and help ensure thorough multi-group coverage.
99-104
: Parameter usage for group-based filtering is correct.
No issues found with the logic for specifying group columns and values, and it properly tests thecustomer_id
aggregation scenario.
138-143
: Parameters for the three-group scenario remain consistent.
The new column-value definitions match the refactoring in the CrossShop class, ensuring relevant coverage of multi-group logic.
176-181
: Refactored approach for three groups plus aggregation continues to look good.
These lines closely follow the new pattern of passing group column-value pairs and appear correct.pyretailscience/cross_shop.py (4)
3-3
: Ensure ibis is installed and properly managed.
Importingibis
directly here can lead to import errors if the environment is missing the library. Verify thatibis
is included in the deployment or environment specifications.
34-40
: Updated docstrings accurately reflect the changes.
They provide clear guidance on how to supply groups via column-value pairs.
58-58
: Line is not covered by tests.
Coverage tools indicate this line remains untested. Consider adding a scenario where nogroup_3_col
is provided to ensure complete coverage.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 58-58: pyretailscience/cross_shop.py#L58
Added line #L58 was not covered by tests
83-115
: Conversion logic and group membership checks appear correct.
Casting the expression to int64 for group indicators is straightforward and aligns with the ibis-based flow. Ensure that matching exact values (rather than partial matches) is the intended behavior.
eb6a16b
to
3d6fed1
Compare
3d6fed1
to
3b26d38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
pyretailscience/cross_shop.py (2)
134-142
: Complex Ibis query could benefit from additional comments.While the Ibis query is functionally correct, the multiple operations (select, group_by, aggregate, order_by) could benefit from a brief comment explaining the overall data transformation being performed.
Consider adding a comment before this block explaining the purpose of these operations in layman's terms.
144-144
: Consider using safer tuple conversion.Relying on
apply(lambda x: tuple(x))
works but does not validate the input. Consider using pandas's built-in tuple conversion for better performance and clarity.-cs_df["groups"] = cs_df[group_cols].apply(lambda x: tuple(x), axis=1) +cs_df["groups"] = cs_df[group_cols].apply(tuple, axis=1)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pyretailscience/cross_shop.py
(5 hunks)tests/test_cross_shop.py
(7 hunks)
🔇 Additional comments (9)
tests/test_cross_shop.py (4)
18-31
: Good data structure change to categorical values.The refactoring from boolean indexes to a categorical column
category_1_name
with string values makes the test data more realistic and aligns with the new implementation approach.
92-92
: Consider the implications of disabling dtype checking.Using
pd.testing.assert_frame_equal
withcheck_dtype=False
prevents catching type-related issues. While this allows the test to pass despite minor type differences, it might hide potential bugs related to data types in production.Consider if explicit type casting (like you did in line 130) would be more appropriate here to ensure consistent data types.
130-130
: Good explicit type casting for expected results.Explicitly setting the data types for group columns to
int32
aligns with the implementation and helps reduce memory usage with large datasets.
200-220
: Good validation test for incomplete group 3 parameters.This test properly validates that an error is raised when only one of
group_3_col
orgroup_3_val
is provided, ensuring API consistency.pyretailscience/cross_shop.py (5)
3-4
: Import of Ibis aligns with refactoring goal.Adding the Ibis import supports the PR objective of refactoring cross-shop code to utilize Ibis.
19-28
: Good API design with explicit column-value pairs.Refactoring from boolean indices to column-value pairs improves the API's usability and clarity. Using explicit column names and values makes the code more self-documenting and allows for more flexible filtering.
117-118
: Good validation for incomplete group 3 parameters.This validation ensures API consistency by requiring that if one of
group_3_col
orgroup_3_val
is provided, the other must be provided as well.
120-122
: Good use of temporary column with explanatory comment.The comment explaining why a temporary value column is needed provides valuable context for future developers and addresses the previous review comment.
124-126
: Consistent use of int32 casting for group indicators.Casting boolean results to
int32
follows best practices for reducing memory usage with large datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
feat: refactor cross-shop code to ibis
Summary by CodeRabbit
New Features
Refactor
Tests