refactor with ibis #95

mayurkmmt · 2025-02-13T10:20:40Z

User description

feat: refactor get_index function with ibis

PR Type

Enhancement, Tests

Description

Refactored get_indexes function to use Ibis for scalable computation.
Enhanced get_indexes to support both pandas DataFrame and Ibis Table.
Added robust error handling for invalid filters and aggregation functions.
Updated and expanded test cases for get_indexes with various scenarios.

Changes walkthrough 📝

Relevant files

Enhancement

index.py Refactor `get_indexes` to use Ibis and enhance functionality pyretailscience/plots/index.py Refactored `get_indexes` to use Ibis for efficient computation. Added support for both pandas DataFrame and Ibis Table. Improved error handling for invalid filters and unsupported aggregation functions. Updated logic for calculating proportions and indices.	+45/-14

Tests

test_index.py Update and expand test cases for `get_indexes` tests/plots/test_index.py Replaced outdated test cases with new ones for `get_indexes`. Added tests for invalid filters and unsupported aggregation functions. Included tests for various aggregation functions and offset handling. Simplified and clarified test data and assertions.	+67/-85

Need help?
Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
Check out the documentation for more information.

Summary by CodeRabbit

New Features
- Updated plotting functionality with new parameters for enhanced index calculation.
- Improved versatility of index retrieval by supporting both DataFrame and Ibis Table inputs.
Refactor
- Enhanced data processing for faster and more reliable visualizations, improving performance on larger datasets.
- Updated aggregation logic now delivers a more comprehensive view of grouped data.
Tests
- Expanded test coverage to ensure robust handling of various input conditions and edge cases, including validation for invalid aggregation functions and improved clarity in test cases.

coderabbitai · 2025-02-13T10:20:48Z

Walkthrough

This pull request updates the data processing logic by modifying the get_indexes function in the plotting module. The function now accepts either a pd.DataFrame or an ibis.Table and leverages Ibis for grouping and aggregation, returning a pd.DataFrame instead of a pd.Series. Corresponding adjustments have been made in the plot function. The test suite has been extended and refactored to cover new input formats, error handling for invalid filter configurations, and unsupported aggregation functions.

Changes

File(s)	Summary of Changes
`pyretailscience/…/index.py`	Updated `get_indexes` to accept `pd.DataFrame
`tests/…/test_index.py`	Renamed and restructured test functions to reflect new column names and structures; added tests for invalid filters, unsupported aggregation functions, and offset functionality.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant P as Plot Function
    participant GI as get_indexes
    participant I as Ibis Engine

    C->>P: Call plot(data)
    P->>GI: Send data (DataFrame/ibis.Table) and parameters
    GI->>I: Execute group-by and aggregation via Ibis
    I-->>GI: Return aggregated results
    GI-->>P: Return pd.DataFrame with indexes
    P->>C: Render updated plot

Possibly related PRs

feat: convert seg stats to use Ibis #90: The changes in the main PR modify the get_indexes function to utilize Ibis for data manipulation, which is directly related to transitioning the SegTransactionStats class to use Ibis.
feat: changed threshold seg to use ibis #89: The modifications to the get_indexes function to utilize Ibis are closely related to refactoring logic to use Ibis in the segmentation.py module.
Convert the get_indexes feature to use Ibis #92: The main issue directly modifies the get_indexes function to utilize Ibis, aligning with the goal of converting get_indexes to use Ibis.

Suggested labels

documentation, Review effort [1-5]: 3

Poem

I hop through data fields with delight,
Dancing with Ibis in the coding light.
Old Pandas steps now gently fade,
New indexes and plots beautifully made.
A rabbit cheers for code well-played! 🐰

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

qodo-merge-pro · 2025-02-13T10:21:05Z

Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Performance Impact The conversion from pandas DataFrame to Ibis table using memtable could have performance implications for large datasets. Need to validate performance with production-scale data. if isinstance(df, pd.DataFrame): df = df.copy() df["_filter"] = df_index_filter table = ibis.memtable(df) else: table = df.mutate(_filter=ibis.literal(df_index_filter)) Error Handling The lambda functions used for aggregation could potentially fail silently if the column types are incompatible with the requested operation. Additional type checking may be needed. agg_fn = lambda x: getattr(x, agg_func)() Memory Usage Multiple intermediate tables are created during the index calculation process which could lead to high memory usage with large datasets. Consider optimizing the number of intermediate operations. overall_agg = table.group_by(group_cols).aggregate(value=agg_fn(table[value_col])) if index_subgroup_col is None: overall_total = overall_agg.value.sum().execute() overall_props = overall_agg.mutate(proportion=overall_agg.value / overall_total) else: overall_total = overall_agg.group_by(index_subgroup_col).aggregate(total=lambda t: t.value.sum()) overall_props = ( overall_agg.join(overall_total, index_subgroup_col) .mutate(proportion=lambda t: t.value / t.total) .drop("total") ) overall_props = overall_props.mutate(proportion_overall=overall_props.proportion).drop("proportion") subset_agg = table.filter(table._filter).group_by(group_cols).aggregate(value=agg_fn(table[value_col])) if index_subgroup_col is None: subset_total = subset_agg.value.sum().name("total") subset_props = subset_agg.mutate(proportion=subset_agg.value / subset_total) else: subset_total = subset_agg.group_by(index_subgroup_col).aggregate(total=lambda t: t.value.sum()) subset_props = ( subset_agg.join(subset_total, index_subgroup_col) .mutate(proportion=lambda t: t.value / t.total) .drop("total") ) result = subset_props.join(overall_props, group_cols).mutate( index=lambda t: (t.proportion / t.proportion_overall * 100) - offset, ) return result.execute()

qodo-merge-pro · 2025-02-13T10:21:39Z

Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Validate filter length matches data Add input validation to ensure the length of `df_index_filter` matches the number of rows in the input DataFrame/Table to prevent index misalignment issues. pyretailscience/plots/index.py [268-273] if isinstance(df, pd.DataFrame): + if len(df_index_filter) != len(df): + raise ValueError("Length of df_index_filter must match number of rows in DataFrame") df = df.copy() df["_filter"] = df_index_filter table = ibis.memtable(df) else: + if len(df_index_filter) != df.count().execute(): + raise ValueError("Length of df_index_filter must match number of rows in Table") table = df.mutate(_filter=ibis.literal(df_index_filter)) Apply this suggestion Suggestion importance[1-10]: 9 __ Why: This is a critical validation that prevents silent errors and data corruption by ensuring the filter array matches the data size. Missing this check could lead to incorrect index calculations or runtime errors.	High
Possible issue	Validate column existence before processing Add validation to ensure value_col and index_col exist in the DataFrame/Table before processing to prevent runtime errors. pyretailscience/plots/index.py [242-250] def get_indexes( df: pd.DataFrame \| ibis.Table, df_index_filter: list[bool], index_col: str, value_col: str, index_subgroup_col: str \| None = None, agg_func: str = "sum", offset: int = 0, ) -> pd.DataFrame: + columns = df.columns if isinstance(df, pd.DataFrame) else df.columns + if index_col not in columns or value_col not in columns: + raise ValueError(f"Columns {index_col} and {value_col} must exist in the data") + if index_subgroup_col and index_subgroup_col not in columns: + raise ValueError(f"Subgroup column {index_subgroup_col} not found in data") Apply this suggestion Suggestion importance[1-10]: 8 __ Why: This validation is essential for preventing runtime errors and providing clear error messages when required columns are missing. It helps catch configuration errors early in the process.	Medium

codecov · 2025-02-13T10:21:54Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines	Coverage Δ
pyretailscience/plots/index.py	`81.08% <100.00%> (-4.64%)`	⬇️

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

pyretailscience/plots/index.py (1)
275-277: Good aggregator validation.

The aggregator check avoids invalid strings. As a minor extension, consider implementing a dictionary-based mapping instead of relying on getattr for each aggregator to improve explicitness and reduce potential runtime edge cases.
- agg_fn = lambda x: getattr(x, agg_func)()
+ aggregations = {
+     "sum": lambda x: x.sum(),
+     "mean": lambda x: x.mean(),
+     "max": lambda x: x.max(),
+     "min": lambda x: x.min(),
+     "nunique": lambda x: x.nunique(),
+ }
+ agg_fn = aggregations[agg_func]
tests/plots/test_index.py (4)

11-11: Clarify purpose of OFFSET_THRESHOLD.

In test code, OFFSET_THRESHOLD = -5 is used to check that values exceed -5. Consider adding a brief comment or docstring to explain how or why -5 is chosen for your offset validation.

14-28: Basic test scenario covers essential columns and index computation.

The test confirms output is a non-empty DataFrame with expected columns. Consider adding an assertion verifying correctness of the index values to ensure the logic (including offset) is accurate.

77-93: Comprehensive aggregator coverage.

Testing multiple aggregation functions strengthens robustness. Consider verifying the numeric correctness for each aggregator’s output as well (e.g., verifying sums or means are correct).

95-110: Offset test ensures final index is above the threshold.

This ensures offset logic is partially tested. You might expand coverage by explicitly comparing a known input and checking the final exact index result.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 111401b and 17f178b.

📒 Files selected for processing (2)

pyretailscience/plots/index.py (3 hunks)
tests/plots/test_index.py (1 hunks)

🔇 Additional comments (8)

pyretailscience/plots/index.py (5)

41-41: Consider checking Ibis availability or version constraints.

While importing Ibis is crucial for this solution, ensure that your setup and environment have a compatible Ibis version. If your library users cannot install Ibis (e.g., due to environment constraints), you might provide a fallback or clear error messages explaining how to enable Ibis features.

Would you like me to generate a shell script that checks for Ibis installation and prints the installed version?

243-250: Nice extension to support both DataFrame and ibis.Table.

Accepting either a Pandas DataFrame or an Ibis table improves flexibility. Confirm all downstream code can handle both input types, especially if any type-specific assumptions remain (e.g., .copy() usage, indexing, etc.).

268-274: Validate _filter alignment between DataFrame and filter list lengths.

When creating _filter with df_index_filter, ensure the list matches the length of the DataFrame. Otherwise, you may encounter misalignment or exceptions. Consider adding a proactive check to raise an error if lengths differ.

281-282: Group columns ordering may be confusing.

When a subgroup is provided, the group order is [index_subgroup_col, index_col]. Ensure this reversed ordering is intentional and that the subsequent logic (sorting, final indexing) aligns well with the reversed grouping.

311-313: Potential divide-by-zero scenario.

If totals in either the overall props or subset props are zero, (t.proportion / t.proportion_overall) might raise or produce NaNs or infinities. Consider adding a quick check to avoid unexpected results.

tests/plots/test_index.py (3)

31-47: Subgroup test properly verifies multi-level grouping.

This test confirms that subgroup columns are included in the result. If feasible, add a check for correctness of the numeric results beyond just non-emptiness.

49-60: Valid test for all-True filter scenario.

Ensuring a ValueError is raised if df_index_filter is all True or all False is a critical correctness check.

63-74: Validates invalid aggregator gracefully.

This test effectively ensures the function raises a ValueError for unsupported aggregators. Coverage is good; no further concerns.

pyretailscience/plots/index.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

pyretailscience/plots/index.py (2)
54-74: Consider refactoring to reduce complexity.

The function is marked with noqa to ignore complexity warnings. Consider breaking it down into smaller, more focused functions to improve maintainability.

Here's a suggested approach:

Extract the data preparation logic (lines 141-156) into a separate function.

Extract the series handling logic (lines 157-187) into a separate function.

Extract the plotting logic (lines 189-240) into a separate function.

This would make the code more modular and easier to test.

280-280: Consider using a dictionary for aggregation functions.

The lambda function could be replaced with a dictionary mapping for better readability and maintainability.
-    agg_fn = lambda x: getattr(x, agg_func)()
+    AGG_FUNCTIONS = {
+        "sum": lambda x: x.sum(),
+        "mean": lambda x: x.mean(),
+        "max": lambda x: x.max(),
+        "min": lambda x: x.min(),
+        "nunique": lambda x: x.nunique(),
+    }
+    agg_fn = AGG_FUNCTIONS[agg_func]
tests/plots/test_index.py (1)
14-28: Consider adding more specific assertions.

While the test verifies the basic structure of the result, it could benefit from more specific assertions about the calculated index values.
     assert isinstance(result, pd.DataFrame)
     assert "category" in result.columns
     assert "index" in result.columns
     assert not result.empty
+    # Add specific assertions
+    assert len(result) == 3  # Expected number of categories
+    assert all(result["index"] >= 0)  # Index values should be non-negative

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17f178b and 2b173c9.

📒 Files selected for processing (2)

pyretailscience/plots/index.py (5 hunks)
tests/plots/test_index.py (2 hunks)

🔇 Additional comments (6)

pyretailscience/plots/index.py (4)

141-150: LGTM! Parameter changes align with requirements.

The changes to use index_col and value_to_index instead of df_index_filter align with the request to split the index concept into parameters for better scalability.

244-253: LGTM! Type hints and parameters are well-defined.

The function now accepts both pd.DataFrame and ibis.Table, making it more versatile. The parameter names are clear and descriptive.

269-275: LGTM! Ibis table conversion is handled correctly.

The code properly handles both pandas DataFrame and Ibis Table inputs, with appropriate conversion to Ibis for consistent processing.

284-316: LGTM! Ibis operations are well-structured.

The Ibis operations for grouping, aggregation, and joining are correctly implemented. The code handles both single-group and subgroup cases appropriately.

tests/plots/test_index.py (2)

74-95: LGTM! Comprehensive test coverage for aggregation functions.

The test case thoroughly verifies all supported aggregation functions.

54-71: LGTM! Error handling is well-tested.

The test case properly verifies that an invalid aggregation function raises a ValueError with the correct error message.

mvanwyk · 2025-02-14T15:21:56Z

pyretailscience/plots/index.py

+        df["_filter"] = value_to_index
+        table = ibis.memtable(df)
+    else:
+        table = df.mutate(_filter=ibis.literal(value_to_index))


Where is the _filter column used? I don't see it referenced below and if df is a 3+ billion row table that could add a lot of data to the query.

mvanwyk · 2025-02-14T15:30:33Z

pyretailscience/plots/index.py

+            .drop("total")
+        )
+
+    overall_props = overall_props.mutate(proportion_overall=overall_props.proportion).drop("proportion")


Is this just renaming proportion to proportion_overall?

Can we just name it proportion_overall in the first place then and remove this line.

pyretailscience/plots/index.py

tests/plots/test_index.py

mvanwyk · 2025-02-17T12:06:07Z

pyretailscience/plots/index.py

+            .drop("total")
+        )
+
+    overall_props = overall_props.mutate(proportion_overall=overall_props.proportion).drop("proportion")


Can we just name it proportion_overall in the first place then and remove this line.

mvanwyk · 2025-02-17T12:10:35Z

tests/plots/test_index.py

+    assert "category" in result.columns
+    assert "index" in result.columns
+    assert not result.empty
+    assert all(result["index"] >= OFFSET_THRESHOLD)


Since OFFSET_THRESHOLD is only used in this function, can you move its instantiation here please.

mvanwyk · 2025-02-17T12:11:04Z

tests/plots/test_index.py

+        index_col="category",
+        value_col="value",
+        group_col="category",
+        offset=5,


SInce this is tied to the value of OFFSET_THRESHOLD I think we should change this line.

Suggested change

offset=5,

offset=-OFFSET_THRESHOLD,

mvanwyk · 2025-02-17T12:16:44Z

tests/plots/test_index.py

+    expected_output = pd.DataFrame(
+        {
+            "group_col": ["A"],
+            "value": [3],
+            "proportion": [1.0],
+            "value_right": [3],
+            "proportion_overall": [0.142857],
+            "index": [700.0],
+        },
+    )
+


This isn't right. See the previous tests. There should be a value create for each of the values of group_col. ie A,B
and C. Also, the other columns should be removed. Only group_col and index should be there.

https://github.com/Data-Simply/pyretailscience/blob/main/tests/plots/test_index.py#L22

mvanwyk · 2025-02-17T12:18:19Z

tests/plots/test_index.py

    expected_output = pd.DataFrame(
        {
-            "group_col1": ["A", "B", "C"],
-            "index": [140, 140, 46.6666667],
+            "group_col1": ["A", "A"],
+            "group_col2": ["D", "E"],
+            "value": [3, 15],
+            "proportion": [0.166667, 0.833333],
+            "value_right": [3, 15],
+            "proportion_overall": [0.166667, 0.833333],
+            "index": [100.0, 100.0],
        },
    )


This isn't also right. See the previous tests. There should be a value created for each of the values of group_col2 and group_col2. Also, the other columns should be removed. Only group_col1, group_col2 and index should be there.

https://github.com/Data-Simply/pyretailscience/blob/main/tests/plots/test_index.py#L42

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

tests/plots/test_index.py (1)

14-28: ⚠️ Potential issue

Add tests to verify index value calculations.

The current test only verifies the structure of the output DataFrame but doesn't validate that the index values are calculated correctly. This was present in the previous tests and should be added back.

Add assertions to verify the actual index values:

     result = get_indexes(df, value_to_index="A", index_col="category", value_col="value", group_col="category")
     assert isinstance(result, pd.DataFrame)
     assert "category" in result.columns
     assert "index" in result.columns
     assert not result.empty
+    # Verify index values
+    expected_values = {
+        "A": 100.0,  # Base category
+        "B": 200.0,  # (30+40)/(10+20) * 100
+        "C": 300.0,  # (50+60)/(10+20) * 100
+    }
+    for category, expected_index in expected_values.items():
+        actual_index = result[result["category"] == category]["index"].iloc[0]
+        assert np.isclose(actual_index, expected_index, rtol=1e-5)

🧹 Nitpick comments (2)

pyretailscience/plots/index.py (1)
279-279: Consider these improvements for better code clarity and safety.

Simplify the lambda function for aggregation:
-    agg_fn = lambda x: getattr(x, agg_func)()
+    agg_fn = getattr(ibis.expr.types.generic.Value, agg_func)
Move the division by zero check earlier to prevent unnecessary computation:
     subset_total = subset_agg.group_by(index_subgroup_col).aggregate(total=lambda t: t.value.sum())
+    subset_total = subset_total.filter(lambda t: t.total != 0)
     subset_props = (
         subset_agg.join(subset_total, index_subgroup_col)
-        .filter(lambda t: t.total != 0)
         .mutate(proportion=lambda t: t.value / t.total)
         .drop("total")
     )
Also applies to: 307-307
tests/plots/test_index.py (1)
76-81: Use more realistic test data.

The current test data uses a simple arithmetic progression. Consider using more realistic data that better represents actual retail scenarios:
     df = pd.DataFrame(
         {
-            "category": ["A", "A", "B", "B", "C", "C"],
-            "value": [10, 20, 30, 40, 50, 60],
+            "category": ["Electronics", "Electronics", "Clothing", "Clothing", "Food", "Food"],
+            "value": [1200, 1500, 800, 950, 250, 300],
         },
     )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8d6e353 and 26459e9.

📒 Files selected for processing (2)

pyretailscience/plots/index.py (5 hunks)
tests/plots/test_index.py (2 hunks)

🔇 Additional comments (2)

pyretailscience/plots/index.py (2)

54-241: LGTM! The plot function has been successfully refactored.

The changes align with the PR objectives, removing the df_index_filter parameter and introducing index_col and value_to_index parameters for better scalability. The implementation maintains all the existing functionality while improving the interface.

244-321: LGTM! Successfully refactored to use Ibis for scalable computation.

The implementation effectively uses Ibis for efficient computation at scale, with proper handling of both pandas DataFrame and Ibis Table inputs. The error handling for unsupported aggregation functions is comprehensive.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

pyretailscience/plots/index.py (1)

244-320: Optimize zero total filtering.

Consider moving the zero total filter to the overall aggregation step to prevent unnecessary computations on rows that will be filtered out later.

Apply this diff to optimize the filtering:

     if index_subgroup_col is None:
         overall_total = overall_agg.value.sum().execute()
         overall_props = overall_agg.mutate(proportion_overall=overall_agg.value / overall_total)
     else:
         overall_total = overall_agg.group_by(index_subgroup_col).aggregate(total=lambda t: t.value.sum())
+        overall_total = overall_total.filter(lambda t: t.total != 0)
         overall_props = (
             overall_agg.join(overall_total, index_subgroup_col)
             .mutate(proportion_overall=lambda t: t.value / t.total)
             .drop("total")
         )

     table = table.filter(table[index_col] == value_to_index)
     subset_agg = table.group_by(group_cols).aggregate(value=agg_fn(table[value_col]))

     if index_subgroup_col is None:
         subset_total = subset_agg.value.sum().name("total")
         subset_props = subset_agg.mutate(proportion=subset_agg.value / subset_total)
     else:
         subset_total = subset_agg.group_by(index_subgroup_col).aggregate(total=lambda t: t.value.sum())
         subset_props = (
             subset_agg.join(subset_total, index_subgroup_col)
-            .filter(lambda t: t.total != 0)
             .mutate(proportion=lambda t: t.value / t.total)
             .drop("total")
         )

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 26459e9 and 4505b7a.

📒 Files selected for processing (2)

pyretailscience/plots/index.py (5 hunks)
tests/plots/test_index.py (3 hunks)

🔇 Additional comments (10)

tests/plots/test_index.py (9)

15-29: Add assertions for index value calculations.

The test verifies the DataFrame structure but doesn't validate that the index values are calculated correctly. Based on past review feedback, please add assertions to verify the actual index values.

31-53: Add assertions for index value calculations with subgroups.

The test verifies the DataFrame structure but doesn't validate that the index values are calculated correctly for each subgroup. Please add assertions to verify the actual index values.

55-73: LGTM!

The test effectively validates error handling for invalid aggregation functions.

75-97: Add assertions for index values with different aggregations.

The test verifies the DataFrame structure but doesn't validate that the index values are calculated correctly for each aggregation function. Please add assertions to verify the actual index values.

99-121: LGTM!

The test effectively validates that index values respect the offset threshold.

123-141: LGTM!

The test effectively validates index calculations by comparing with expected output values.

143-170: LGTM!

The test effectively validates index calculations for two columns by comparing with expected output values.

172-187: Add assertions for index values with Ibis table input.

The test verifies the DataFrame structure but doesn't validate that the index values are calculated correctly when using an Ibis table input. Please add assertions to verify the actual index values.

189-328: LGTM!

The plot tests effectively validate:

Plot generation with default parameters

Custom title handling

Highlight range functionality

Group filtering

Error handling for invalid parameters

Source text and custom label handling

🧰 Tools

🪛 Ruff (0.8.2)

192-192: Use @pytest.fixture over @pytest.fixture()

Remove parentheses

(PT001)

pyretailscience/plots/index.py (1)

54-241: LGTM!

The plot function has been effectively updated to:

Use new parameters index_col and value_to_index

Maintain comprehensive documentation

Handle all plotting scenarios correctly

mvanwyk

LGTM!

feat: refactor get_index function with ibis

17f178b

qodo-merge-pro bot added the Review effort 4/5 label Feb 13, 2025

coderabbitai bot reviewed Feb 13, 2025

View reviewed changes

mvanwyk reviewed Feb 13, 2025

View reviewed changes

pyretailscience/plots/index.py Outdated Show resolved Hide resolved

fix: remove df_filter and refactor the code and test cases

2b173c9

coderabbitai bot reviewed Feb 14, 2025

View reviewed changes

mvanwyk reviewed Feb 17, 2025

View reviewed changes

fix: revert the test cases of expected output

8d6e353

mvanwyk reviewed Feb 17, 2025

View reviewed changes

fix: change the test cases of get index

26459e9

coderabbitai bot reviewed Feb 18, 2025

View reviewed changes

fix: refactor the code

4505b7a

coderabbitai bot reviewed Feb 18, 2025

View reviewed changes

mvanwyk approved these changes Feb 18, 2025

View reviewed changes

mayurkmmt merged commit 61591f9 into main Feb 19, 2025
3 checks passed

refactor with ibis #95

refactor with ibis #95

Uh oh!

Conversation

mayurkmmt commented Feb 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Changes walkthrough 📝

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Suggested labels

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

qodo-merge-pro bot commented Feb 13, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Uh oh!

codecov bot commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

mvanwyk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

mayurkmmt commented Feb 13, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 13, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

qodo-merge-pro bot commented Feb 13, 2025 •

edited

Loading

codecov bot commented Feb 13, 2025 •

edited

Loading