-
Notifications
You must be signed in to change notification settings - Fork 1
feat: convert seg stats to use Ibis #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request makes several modifications across multiple files. In Changes
Possibly related PRs
Suggested labels
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here. PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here. PR Code Suggestions ✨Explore these optional code suggestions:
|
Codecov ReportAttention: Patch coverage is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
pyretailscience/segmentation.py (3)
222-247
: Column ordering approach works, but consider consistent naming.Inserting a literal "units" can occasionally conflict with aggregated column naming. If you prefer uniform naming conventions, consider using a more descriptive or dynamic reference akin to self.agg_unit_qty.
250-302
: ibis-based aggregation logic is sound, but check performance for large data.Switching to ibis for grouping and union is correct. For very large datasets, confirm that combining segment_metrics and total_metrics with ibis.union does not cause excessive overhead. You might want to test with scaled data to ensure acceptable performance.
366-368
: Filtering out “Total” is fine, but maintain alignment with aggregator.If the label for the aggregated row changes later, this filtering might break. For future-proofing, consider referencing a configurable label or a known constant rather than a literal string.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
pyproject.toml
(1 hunks)pyretailscience/options.py
(2 hunks)pyretailscience/segmentation.py
(4 hunks)tests/test_segmentation.py
(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Pre-Commit
🔇 Additional comments (18)
pyretailscience/segmentation.py (9)
156-156
: Allowing both DataFrame and ibis.Table is good.Updating the constructor to accept either a pandas DataFrame or an ibis Table is appropriate for broader flexibility. Just ensure the docstring and usage references this extended input type clearly throughout.
190-191
: Typed attribute annotation looks solid.The explicit type hint (_df: pd.DataFrame | None) helps with clarity and IDE assistance.
192-192
: Constructor aligns with ibis-based approach.Replacing DuckDB references with (pd.DataFrame | ibis.Table) in the constructor is consistent with the PR objective to use ibis.
196-196
: Docstring updated accordingly.The docstring now reflects the new accepted data types, which improves clarity.
202-202
: No issues noted here.Calling ColumnHelper at initialization remains a clean approach to manage column naming.
204-206
: Good check for required columns.Including the segment_col in this list ensures consistent coverage of necessary columns in the data source.
209-210
: Optional column handling is clear.Appending unit_qty only if present in the data’s columns promotes flexible usage.
219-219
: Storing the table at initialization is appropriate.Assigning self.table in the constructor centralizes the aggregated structure for later property access.
305-315
: Column reordering is straightforward and clear.Executing the ibis table and selecting columns in a defined order is well organized.
tests/test_segmentation.py (6)
21-21
: Floating spend values improve accuracy.Replacing integer spend values with floats is a valid step for monetary amounts.
42-42
: Verifying percentage column for segments.The new "customers_pct" usage aligns with the updated code. Values (0.6, 0.4, 1.0) seem correct.
44-47
: Minor structural changes.Refactoring these lines to set up and instantiate SegTransactionStats is neat. No functional concerns.
55-55
: Float-based spend values remain consistent.Ensures test data aligns with the new float usage throughout the module.
72-74
: Creating SegTransactionStats instance and verifying output.Switching from direct static calls to instance-based usage is correct. Sorting by segment_name before comparing is good practice.
94-98
: Validating total coverage for single-segment scenarios.Testing the scenario with 1.0 for the entire population across two rows is consistent with the logic for “segment” plus “Total.”
pyretailscience/options.py (2)
240-240
: Comment removal is inconsequential.No functional impact from dropping the “# noqa: SLF001” comment; code remains clear.
395-395
: New attribute customers_pct is coherent.This addition helps standardize referencing the customer percentage column. Nicely integrates with existing naming patterns.
pyproject.toml (1)
71-71
: SLF001 Ignore Rule Addition Approved
The addition of "SLF001" to the ignore list is appropriate given Ibis’s frequent use of internal/private attributes (e.g., ibis._[column]) that would otherwise trigger this lint warning. The inline comment clearly explains the rationale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (3)
pyretailscience/segmentation.py:314
- The column ordering in the
df
property method should ensure that all required columns are present and in the correct order.
self._df = self.table.execute()[col_order]
pyretailscience/segmentation.py:368
- The
hide_total
check is case-sensitive. Ensure consistency in how 'Total' is handled throughout the code.
val_s = val_s[val_s.index != "Total"]
pyproject.toml:71
- The comment has a minor grammatical issue. It should be: 'Ibis makes extensive use of ibis._[column], which triggers this.'
"SLF001", # Ibis makes a lot of use of the ibis._[column] which triggers this
cols.calc_spend_per_cust: ibis._[cols.agg_unit_spend] / ibis._[cols.agg_customer_id], | ||
cols.calc_spend_per_trans: ibis._[cols.agg_unit_spend] / ibis._[cols.agg_transaction_id], | ||
cols.calc_trans_per_cust: ibis._[cols.agg_transaction_id] / ibis._[cols.agg_customer_id], | ||
cols.customers_pct: ibis._[cols.agg_customer_id] / total_customers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usage of ibis._
seems incorrect. It should be replaced with the proper syntax for accessing columns in Ibis expressions.
cols.calc_spend_per_cust: ibis._[cols.agg_unit_spend] / ibis._[cols.agg_customer_id], | |
cols.calc_spend_per_trans: ibis._[cols.agg_unit_spend] / ibis._[cols.agg_transaction_id], | |
cols.calc_trans_per_cust: ibis._[cols.agg_transaction_id] / ibis._[cols.agg_customer_id], | |
cols.customers_pct: ibis._[cols.agg_customer_id] / total_customers, | |
cols.calc_spend_per_cust: final_metrics[cols.agg_unit_spend] / final_metrics[cols.agg_customer_id], | |
cols.calc_spend_per_trans: final_metrics[cols.agg_unit_spend] / final_metrics[cols.agg_transaction_id], | |
cols.calc_trans_per_cust: final_metrics[cols.agg_transaction_id] / final_metrics[cols.agg_customer_id], | |
cols.customers_pct: final_metrics[cols.agg_customer_id] / total_customers, |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
pyretailscience/segmentation.py (1)
287-290
:⚠️ Potential issueFix incorrect Ibis column access syntax.
The usage of
ibis._
is incorrect. Use direct column access instead.Apply this diff:
- cols.calc_spend_per_cust: ibis._[cols.agg_unit_spend] / ibis._[cols.agg_customer_id], - cols.calc_spend_per_trans: ibis._[cols.agg_unit_spend] / ibis._[cols.agg_transaction_id], - cols.calc_trans_per_cust: ibis._[cols.agg_transaction_id] / ibis._[cols.agg_customer_id], - cols.customers_pct: ibis._[cols.agg_customer_id] / total_customers, + cols.calc_spend_per_cust: final_metrics[cols.agg_unit_spend] / final_metrics[cols.agg_customer_id], + cols.calc_spend_per_trans: final_metrics[cols.agg_unit_spend] / final_metrics[cols.agg_transaction_id], + cols.calc_trans_per_cust: final_metrics[cols.agg_transaction_id] / final_metrics[cols.agg_customer_id], + cols.customers_pct: final_metrics[cols.agg_customer_id] / total_customers,
🧹 Nitpick comments (2)
pyretailscience/segmentation.py (2)
156-156
: Update docstring to reflect new type.The docstring should be updated to match the new type annotation:
- df (pd.DataFrame): A dataframe with the transaction data. The dataframe must contain a customer_id column. + df (pd.DataFrame | ibis.Table): A dataframe or Ibis table with the transaction data. The data must contain a customer_id column.Also applies to: 169-169
190-219
: Update docstring and improve type handling.
- Update the docstring to reflect the new type:
- data (pd.DataFrame | ibis.Table): The transaction data. The dataframe must contain the columns + data (pd.DataFrame | ibis.Table): The transaction data. The data must contain the columns
- Consider adding type hints for the class variable:
- _df: pd.DataFrame | None = None + _df: pd.DataFrame | None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pyretailscience/segmentation.py
(4 hunks)tests/test_segmentation.py
(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Pre-Commit
🔇 Additional comments (7)
tests/test_segmentation.py (4)
22-22
: LGTM! The changes improve data type precision and code structure.The changes enhance the test by:
- Using float values for monetary amounts, which is more appropriate for financial calculations.
- Adopting an object-oriented approach that aligns with the new implementation.
Also applies to: 46-48
56-56
: LGTM! The changes improve data type precision and code structure.The changes enhance the test by:
- Using float values for monetary amounts, which is more appropriate for financial calculations.
- Adopting an object-oriented approach that aligns with the new implementation.
Also applies to: 75-75
99-99
: LGTM! The code structure is improved.The change to object-oriented instantiation aligns with the new implementation.
102-124
: LGTM! Good addition of edge case testing.The new test function effectively verifies the handling of zero net units, ensuring that:
- Price per unit calculations correctly handle division by zero.
- Other metrics remain accurate even when some segments have zero units.
pyretailscience/segmentation.py (3)
221-249
: LGTM! Good addition of column order management.The new static method effectively:
- Centralizes column order management
- Handles both cases with and without quantity columns
- Improves maintainability by providing a single source of truth
250-302
: LGTM! Good migration to Ibis.The changes effectively:
- Handle both DataFrame and Ibis Table inputs
- Use Ibis for aggregations and calculations
- Maintain the same functionality while leveraging Ibis features
303-313
: LGTM! Good use of column order management.The property effectively:
- Uses the new _get_col_order method for consistent column ordering
- Handles the conversion from Ibis Table to DataFrame
- Caches the result for better performance
PR Type
Enhancement, Tests
Description
Migrated segmentation statistics computation from DuckDB to Ibis framework.
Enhanced
SegTransactionStats
class with Ibis-based aggregation and metrics.Updated tests to validate Ibis-based implementation and ensure correctness.
Added new column options and adjusted TOML file handling logic.
Changes walkthrough 📝
segmentation.py
Transition segmentation statistics to Ibis framework
pyretailscience/segmentation.py
SegTransactionStats
to use Ibis for metrics calculation._get_col_order
method for consistent column ordering._calc_seg_stats
to support Ibis-based aggregation.options.py
Add new column option and refine TOML handling
pyretailscience/options.py
customers_pct
column option for percentage calculations.test_segmentation.py
Update tests for Ibis-based segmentation stats
tests/test_segmentation.py
SegTransactionStats
implementation.
pyproject.toml
Update linting rules for Ibis compatibility
pyproject.toml
SLF001
to ignored linting rules for Ibis-specific syntax.Summary by CodeRabbit
New Features:
Refactor:
Chores: