Skip to content

cross-shop #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 27, 2025
Merged

cross-shop #112

merged 1 commit into from
Feb 27, 2025

Conversation

mayurkmmt
Copy link
Collaborator

@mayurkmmt mayurkmmt commented Feb 26, 2025

feat: refactor cross-shop code to ibis

Summary by CodeRabbit

  • New Features

    • Enhanced data grouping now allows filters based on descriptive category values, providing a more intuitive grouping experience.
  • Refactor

    • Streamlined data aggregation logic for clearer and more flexible processing.
  • Tests

    • Updated test cases to align with the new grouping approach, ensuring robust validation of the improved functionality.
    • Removed tests for overlapping group indices, reflecting the shift to a categorical grouping system.

Copy link

coderabbitai bot commented Feb 26, 2025

Walkthrough

The changes update the CrossShop class to accept either a Pandas DataFrame or an ibis.Table and replace boolean index parameters with explicit column names and values for group definitions. The _calc_cross_shop method now converts a Pandas DataFrame to an ibis.Table when needed and performs group filtering and aggregation using the provided column-value pairs. Test cases have been refactored accordingly: sample data now includes a categorical column, and overlapping group error tests have been removed.

Changes

File(s) Change Summary
pyretailscience/cross_shop.py Updated the CrossShop class: the constructor now accepts a DataFrame or an ibis.Table along with group parameters defined by column names and values. The _calc_cross_shop method has been modified to convert Pandas DataFrames to an ibis.Table, apply group filters by comparing column values, and adjust aggregation operations using ibis functions.
tests/test_cross_shop.py Refactored tests for _calc_cross_shop by replacing boolean index columns with a categorical column (e.g., category_1_name). Removed tests for overlapping group indices and updated test cases to align with the new parameter signatures.

Sequence Diagram(s)

sequenceDiagram
    actor Client as "Caller"
    participant CS as "CrossShop"
    participant Converter as "Table Converter"
    participant Aggregator as "Aggregator"
    
    Client->>CS: Instantiate with DataFrame/ibis.Table and group parameters
    CS->>Converter: Check input type (pd.DataFrame?)
    alt Input is Pandas DataFrame
        Converter-->>CS: Convert to ibis.Table
    else Already an ibis.Table
        CS-->>CS: Proceed without conversion
    end
    CS->>Aggregator: Apply group filters using column names and values
    Aggregator-->>CS: Return aggregated result
    CS-->>Client: Return final DataFrame
Loading

Poem

I’m a rabbit in a code-filled glen,
Hopping over changes again and again.
Boolean trails replaced with clear, smart lines,
Column names and values now perfectly align.
In fields of data, I skip with glee,
Celebrating fresh logic as free as can be! 🐇🌼

Tip

CodeRabbit's docstrings feature is now available as part of our Pro Plan! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Feb 26, 2025

Codecov Report

Attention: Patch coverage is 90.90909% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pyretailscience/cross_shop.py 90.90% 1 Missing and 1 partial ⚠️
Files with missing lines Coverage Δ
pyretailscience/cross_shop.py 46.66% <90.90%> (-0.53%) ⬇️

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
tests/test_cross_shop.py (1)

18-31: Consider adding a test for 'Hats' or removing it if not needed.
The newly introduced "Hats" category is unused in the group definitions for test cases. Including it without any corresponding test scenario can create confusion or leave potential coverage gaps.

pyretailscience/cross_shop.py (1)

20-26: Constructor signature successfully abstracts groups via column-value pairs.
This approach is clean and flexible. In the future, if you need additional groups beyond three, consider generalizing to a list of (column, value) pairs.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ce2eec and eb6a16b.

📒 Files selected for processing (2)
  • pyretailscience/cross_shop.py (6 hunks)
  • tests/test_cross_shop.py (6 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
pyretailscience/cross_shop.py

[warning] 58-58: pyretailscience/cross_shop.py#L58
Added line #L58 was not covered by tests

🔇 Additional comments (9)
tests/test_cross_shop.py (5)

41-44: Usage of the new group-based parameters looks good.
This relies on the new approach for defining groups by column-value pairs and is consistent with the refactored CrossShop logic.


63-68: Three-group setup is correctly aligned with the refactored CrossShop class.
The updated parameters for group_3 match the new constructor signature and help ensure thorough multi-group coverage.


99-104: Parameter usage for group-based filtering is correct.
No issues found with the logic for specifying group columns and values, and it properly tests the customer_id aggregation scenario.


138-143: Parameters for the three-group scenario remain consistent.
The new column-value definitions match the refactoring in the CrossShop class, ensuring relevant coverage of multi-group logic.


176-181: Refactored approach for three groups plus aggregation continues to look good.
These lines closely follow the new pattern of passing group column-value pairs and appear correct.

pyretailscience/cross_shop.py (4)

3-3: Ensure ibis is installed and properly managed.
Importing ibis directly here can lead to import errors if the environment is missing the library. Verify that ibis is included in the deployment or environment specifications.


34-40: Updated docstrings accurately reflect the changes.
They provide clear guidance on how to supply groups via column-value pairs.


58-58: Line is not covered by tests.
Coverage tools indicate this line remains untested. Consider adding a scenario where no group_3_col is provided to ensure complete coverage.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 58-58: pyretailscience/cross_shop.py#L58
Added line #L58 was not covered by tests


83-115: Conversion logic and group membership checks appear correct.
Casting the expression to int64 for group indicators is straightforward and aligns with the ibis-based flow. Ensure that matching exact values (rather than partial matches) is the intended behavior.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
pyretailscience/cross_shop.py (2)

134-142: Complex Ibis query could benefit from additional comments.

While the Ibis query is functionally correct, the multiple operations (select, group_by, aggregate, order_by) could benefit from a brief comment explaining the overall data transformation being performed.

Consider adding a comment before this block explaining the purpose of these operations in layman's terms.


144-144: Consider using safer tuple conversion.

Relying on apply(lambda x: tuple(x)) works but does not validate the input. Consider using pandas's built-in tuple conversion for better performance and clarity.

-cs_df["groups"] = cs_df[group_cols].apply(lambda x: tuple(x), axis=1)
+cs_df["groups"] = cs_df[group_cols].apply(tuple, axis=1)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d6fed1 and 3b26d38.

📒 Files selected for processing (2)
  • pyretailscience/cross_shop.py (5 hunks)
  • tests/test_cross_shop.py (7 hunks)
🔇 Additional comments (9)
tests/test_cross_shop.py (4)

18-31: Good data structure change to categorical values.

The refactoring from boolean indexes to a categorical column category_1_name with string values makes the test data more realistic and aligns with the new implementation approach.


92-92: Consider the implications of disabling dtype checking.

Using pd.testing.assert_frame_equal with check_dtype=False prevents catching type-related issues. While this allows the test to pass despite minor type differences, it might hide potential bugs related to data types in production.

Consider if explicit type casting (like you did in line 130) would be more appropriate here to ensure consistent data types.


130-130: Good explicit type casting for expected results.

Explicitly setting the data types for group columns to int32 aligns with the implementation and helps reduce memory usage with large datasets.


200-220: Good validation test for incomplete group 3 parameters.

This test properly validates that an error is raised when only one of group_3_col or group_3_val is provided, ensuring API consistency.

pyretailscience/cross_shop.py (5)

3-4: Import of Ibis aligns with refactoring goal.

Adding the Ibis import supports the PR objective of refactoring cross-shop code to utilize Ibis.


19-28: Good API design with explicit column-value pairs.

Refactoring from boolean indices to column-value pairs improves the API's usability and clarity. Using explicit column names and values makes the code more self-documenting and allows for more flexible filtering.


117-118: Good validation for incomplete group 3 parameters.

This validation ensures API consistency by requiring that if one of group_3_col or group_3_val is provided, the other must be provided as well.


120-122: Good use of temporary column with explanatory comment.

The comment explaining why a temporary value column is needed provides valuable context for future developers and addresses the previous review comment.


124-126: Consistent use of int32 casting for group indicators.

Casting boolean results to int32 follows best practices for reducing memory usage with large datasets.

Copy link
Contributor

@mvanwyk mvanwyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mayurkmmt mayurkmmt merged commit fd3847f into main Feb 27, 2025
3 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants