feat: convert column names to use the options class #91

mvanwyk · 2025-02-09T11:59:00Z

PR Type

Enhancement, Tests, Documentation

Description

Introduced ColumnHelper and get_option for column name standardization.
Updated tests and core logic to use ColumnHelper for column references.
Replaced hardcoded column names with dynamic options across modules.
Updated documentation and examples to reflect column name changes.

Changes walkthrough 📝

Relevant files

Tests

4 files

test_cross_shop.py Updated tests to use `ColumnHelper` for column references.	+20/-17
test_product_association.py Refactored tests to use `ColumnHelper` for transaction IDs.	+22/-19
test_range_planning.py `Updated range planning tests for dynamic column names.`	+22/-9
test_segmentation.py `Adjusted segmentation tests for column name standardization.`	+5/-5

Enhancement

8 files

customer.py Standardized column names using `ColumnHelper` in customer logic.	+28/-23
cross_shop.py `Integrated dynamic column options in cross-shop logic.`	+12/-10
gain_loss.py `Refactored gain/loss logic to use column options.`	+10/-7
range_planning.py `Added dynamic column handling in range planning module.`	+18/-12
product_association.py `Standardized product association logic with column options.`	+7/-5
standard_graphs.py `Updated time plot logic to use dynamic column names.`	+2/-1
segmentation.py `Refactored segmentation logic for column name standardization.`	+2/-2
options.py `Added new column options for transaction date and time.`	+3/-0

Documentation

4 files

segmentation.ipynb `Updated segmentation example to reflect column changes.`	+42/-42
product_association.ipynb `Adjusted product association example for column standardization.`	+21/-27
revenue_tree.ipynb `Updated revenue tree example with dynamic column names.`	+10/-10
analysis_modules.md `Updated analysis module documentation for column standardization.`	+3/-3

Additional files

4 files

transactions.parquet	[link]
cross_shop.ipynb	+34/-40
gain_loss.ipynb	+139/-142
retention.ipynb	+25/-32

Need help?
Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
Check out the documentation for more information.

Summary by CodeRabbit

New Features
- Visualizations now consistently display updated labels for transaction dates and spending metrics.
Documentation
- Examples have been refreshed to reflect the new naming conventions, with placeholders added for upcoming content.
Refactor
- Unified naming across modules improves consistency in data presentations and enhances overall maintainability.
- Enhanced flexibility in column name management through the introduction of a ColumnHelper class.

coderabbitai · 2025-02-09T11:59:08Z

Walkthrough

This change set primarily renames the variable and column identifier from "transaction_datetime" to "transaction_date" across multiple documentation files, notebooks, core modules, and tests. It also updates metric columns in one notebook and replaces hardcoded column strings with dynamic references via get_option and the ColumnHelper class. In addition, documentation placeholders and a new parameter in one class have been added to allow future enhancements.

Changes

File(s)	Change Summary
docs/analysis_modules.md, docs/examples/*	Renamed "transaction_datetime" to "transaction_date" in timeline plots, product association, revenue tree, and segmentation examples; updated metric columns (e.g. "unit_price" → "unit_spend", "quantity" → "unit_quantity") and added placeholder sections for future content.
pyretailscience/*	Refactored modules (CrossShop, Customer, GainLoss, ProductAssociation, CustomerDecisionHierarchy, ThresholdSegmentation, standard_graphs) to replace hardcoded strings with dynamic values via `get_option` and `ColumnHelper`; updated method signatures, required column lists, and error messages accordingly.
tests/*	Updated tests to use `ColumnHelper` for column references; standardized DataFrame setups and assertions to align with the renamed columns and updated method parameters.

Possibly related PRs

feat: standard bar plot #86 – The changes in the main PR, which involve renaming the variable transaction_datetime to transaction_date, are directly related to the changes in the retrieved PR, where the same variable is also renamed in the context of various documentation and code files.
docs: updated analysis module docs #80 – The changes in the main PR and the retrieved PR are related through the modification of the transaction_datetime variable to transaction_date, which is a consistent update across both documentation and code examples.
feat: changed threshold seg to use ibis #89 – The changes in the main PR, which involve renaming the transaction_datetime variable to transaction_date, are directly related to the changes in the retrieved PR, where the same variable is updated in the ThresholdSegmentation class to align with new naming conventions.

Suggested labels

enhancement, documentation, Tests

Poem

I'm a hopping rabbit with lines of code so neat,
Skipping through changes in data and columns complete,
Renaming dates and adjusting tests with flair,
Options now dynamic, floating in the air,
With placeholders for future dreams and a gentle debug tune,
I celebrate these updates under the coding moon!

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

qodo-merge-pro · 2025-02-09T11:59:34Z

Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Missing Parameter The _get_pairs() method is called with product_col parameter but it's not passed in init(). This could cause runtime errors. self.pairs_df = self._get_pairs(df, exclude_same_transaction_products) Incomplete Tests The test_init_invalid_dataframe() test case doesn't validate the new product_col parameter behavior. Additional test coverage needed. def test_init_invalid_dataframe(self): """Test that the function raises a ValueError when the dataframe is invalid.""" df = pd.DataFrame( {cols.customer_id: [1, 2, 3], cols.transaction_id: [1, 2, 3], "product_name": ["A", "B", "C"]}, ) exclude_same_transaction_products = True random_state = 42

qodo-merge-pro · 2025-02-09T11:59:55Z

Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Initialize variables before merge operation The _calc_cross_shop method is missing initialization of cs_df and kpi_df variables before they are used in the merge operation. Initialize these variables before using them. pyretailscience/cross_shop.py [124-125] +cs_df = cs_df.groupby(cols.customer_id)[group_cols].max() cs_df["groups"] = cs_df[group_cols].apply(lambda x: tuple(x), axis=1) - +kpi_df = df.groupby(cols.customer_id)[value_col].agg(agg_func) return cs_df.merge(kpi_df, left_index=True, right_index=True) `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 10 __ Why: The suggestion fixes a critical bug where cs_df and kpi_df variables are used in a merge operation without being properly initialized, which would cause a runtime error.	High
Possible issue	✅ ~~Fix missing method argument~~ Suggestion Impact: The commit implemented exactly the suggested change by adding the product_col argument to the _get_pairs method call code diff: - self.pairs_df = self._get_pairs(df, exclude_same_transaction_products) + self.pairs_df = self._get_pairs(df, exclude_same_transaction_products, product_col) The _get_pairs method is called with incorrect number of arguments - product_col parameter is defined but not passed when calling the method. pyretailscience/range_planning.py [57] -self.pairs_df = self._get_pairs(df, exclude_same_transaction_products) +self.pairs_df = self._get_pairs(df, exclude_same_transaction_products, self.product_col) Apply this suggestion Suggestion importance[1-10]: 10 __ Why: The suggestion fixes a critical bug where the _get_pairs method is called with missing required argument product_col, which would cause a runtime error.	High
General	✅ ~~Fix inconsistent parameter documentation~~ Suggestion Impact: The docstring was updated to fix the inconsistency, though with a different default value (column.column_id instead of column.customer_id) code diff: group_col (str, optional): The name of the column that identifies unique - transactions or customers. Defaults to option column.unit_spend. + transactions or customers. Defaults to option column.column_id. The docstring description for `group_col` parameter is inconsistent with the default value. The docstring states it defaults to `option column.unit_spend` but the code uses `get_option("column.customer_id")`. Update either the docstring or the default value to match. pyretailscience/product_association.py [59-60] group_col (str, optional): The name of the column that identifies unique - transactions or customers. Defaults to option column.unit_spend. + transactions or customers. Defaults to option column.customer_id. Apply this suggestion Suggestion importance[1-10]: 7 __ Why: The suggestion identifies an important inconsistency between the docstring and actual code implementation that could mislead users. Accurate documentation is crucial for proper API usage.	Medium
General	Preserve unit price information Consider keeping both 'unit_quantity' and 'unit_spend' columns, but also include 'unit_price' as it provides valuable per-unit price information that can be useful for analysis. The unit price can be derived as unit_spend/unit_quantity. docs/examples/product_association.ipynb [80-81] " <th>unit_quantity</th>\n", " <th>unit_spend</th>\n", +" <th>unit_price</th>\n", Apply this suggestion Suggestion importance[1-10]: 7 __ Why: Having unit price information is valuable for price analysis and comparisons, and it's better to keep it as an explicit column rather than requiring recalculation. This enhances data usability and analysis capabilities.	Medium

Copilot

Copilot reviewed 9 out of 20 changed files in this pull request and generated 1 comment.

Files not reviewed (11)

docs/examples/product_association.ipynb: Evaluated as low risk
docs/examples/revenue_tree.ipynb: Evaluated as low risk
docs/examples/segmentation.ipynb: Evaluated as low risk
docs/analysis_modules.md: Evaluated as low risk
tests/test_product_association.py: Evaluated as low risk
pyretailscience/range_planning.py: Evaluated as low risk
pyretailscience/segmentation.py: Evaluated as low risk
pyretailscience/product_association.py: Evaluated as low risk
pyretailscience/cross_shop.py: Evaluated as low risk
pyretailscience/standard_graphs.py: Evaluated as low risk
pyretailscience/gain_loss.py: Evaluated as low risk

Copilot · 2025-02-09T12:00:27Z

pyretailscience/customer.py

+        self.cust_purchases_s = df.groupby(cols.customer_id)[cols.customer_id].nunique()



The unique count should be calculated on 'transaction_id' instead of 'customer_id'. Replace with: self.cust_purchases_s = df.groupby(cols.customer_id)[cols.transaction_id].nunique()

Suggested change

self.cust_purchases_s = df.groupby(cols.customer_id)[cols.customer_id].nunique()

self.cust_purchases_s = df.groupby(cols.customer_id)[cols.transaction_id].nunique()

codecov · 2025-02-09T12:00:52Z

Codecov Report

Attention: Patch coverage is 37.50000% with 30 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
pyretailscience/customer.py	0.00%	22 Missing ⚠️
pyretailscience/gain_loss.py	20.00%	4 Missing ⚠️
pyretailscience/range_planning.py	80.00%	2 Missing ⚠️
pyretailscience/cross_shop.py	83.33%	1 Missing ⚠️
pyretailscience/standard_graphs.py	50.00%	1 Missing ⚠️

Files with missing lines	Coverage Δ
pyretailscience/options.py	`97.53% <100.00%> (+1.57%)`	⬆️
pyretailscience/product_association.py	`85.93% <100.00%> (+0.22%)`	⬆️
pyretailscience/segmentation.py	`69.01% <ø> (+8.85%)`	⬆️
pyretailscience/cross_shop.py	`47.19% <83.33%> (+1.21%)`	⬆️
pyretailscience/standard_graphs.py	`40.45% <50.00%> (+0.45%)`	⬆️
pyretailscience/range_planning.py	`42.00% <80.00%> (-0.71%)`	⬇️
pyretailscience/gain_loss.py	`32.58% <20.00%> (+0.40%)`	⬆️
pyretailscience/customer.py	`0.00% <0.00%> (ø)`

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

docs/examples/segmentation.ipynb (1)

715-804: Consider adding data validation before export

The segment activation code should validate data before export:

def export_segment(df, segment_name, filename):
    # Validate inputs
    if segment_name not in df["segment_name"].unique():
        raise ValueError(f"Invalid segment: {segment_name}")
        
    # Validate we have customer IDs
    segment_customers = df[df["segment_name"] == segment_name].index
    if len(segment_customers) == 0:
        raise ValueError(f"No customers found in segment: {segment_name}")
        
    # Export
    segment_customers.to_series().to_csv(filename, index=False)
    print(f"Exported {len(segment_customers)} customers to {filename}")

🧹 Nitpick comments (6)

pyretailscience/gain_loss.py (2)
54-54: Improve parameter documentation.

The docstring for value_col parameter should specify that it defaults to the value from column.unit_spend option.
-            value_col (str, optional): The column to calculate the gain loss from. Defaults to option column.unit_spend.
+            value_col (str, optional): The column to calculate the gain loss from. Defaults to the value from `column.unit_spend` option.
290-290: Address the TODO comment.

The TODO comment suggests that there might be a performance optimization opportunity by avoiding DataFrame construction.

Would you like me to help implement a solution that avoids constructing a pandas DataFrame or open an issue to track this task?
docs/examples/segmentation.ipynb (2)
324-508: Consider adding input validation for segmentation parameters

The segmentation code works but could benefit from parameter validation:
def validate_segment_params(zero_value_customers):
    valid_options = ["include_with_light", "exclude"] 
    if zero_value_customers not in valid_options:
        raise ValueError(f"zero_value_customers must be one of {valid_options}")

# Use before segmentation
validate_segment_params(zero_value_customers="include_with_light")
509-714: Add error handling for visualization

The visualization code should handle potential errors:
try:
    ax = seg_stats.plot(
        figsize=(10, 5),
        value_col="spend",
        source_text="Source: Transaction data financial year 2023", 
        sort_order="descending",
        title="What's the value of a Heavy customer?",
        rot=0,
    )
except Exception as e:
    print(f"Error creating plot: {e}")
    raise
docs/analysis_modules.md (2)

261-269: Consistent Column Renaming in Timeline Plot Example
The update replacing the hardcoded "transaction_datetime" with "transaction_date" in the DataFrame construction is correctly applied, aligning with the PR objective. Consider, as a future improvement, using a dynamic reference from the options class (if available) for greater maintainability.

448-454: Consistent Column Renaming in Revenue Tree Example
The change in the revenue tree example—where the filtering indices now use "transaction_date" instead of "transaction_datetime"—is correct. In line with the PR objective, if an options class is intended to centralize column name definitions, consider updating this snippet to reference that class instead of using a literal string.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68de169 and 5f24832.

⛔ Files ignored due to path filters (1)

data/transactions.parquet is excluded by !**/*.parquet

📒 Files selected for processing (16)

docs/analysis_modules.md (2 hunks)
docs/examples/product_association.ipynb (4 hunks)
docs/examples/revenue_tree.ipynb (4 hunks)
docs/examples/segmentation.ipynb (8 hunks)
pyretailscience/cross_shop.py (8 hunks)
pyretailscience/customer.py (6 hunks)
pyretailscience/gain_loss.py (7 hunks)
pyretailscience/options.py (1 hunks)
pyretailscience/product_association.py (6 hunks)
pyretailscience/range_planning.py (6 hunks)
pyretailscience/segmentation.py (1 hunks)
pyretailscience/standard_graphs.py (2 hunks)
tests/test_cross_shop.py (9 hunks)
tests/test_product_association.py (17 hunks)
tests/test_range_planning.py (3 hunks)
tests/test_segmentation.py (4 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

tests/test_cross_shop.py

12-12: Use @pytest.fixture over @pytest.fixture()

Remove parentheses

(PT001)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Pre-Commit

🔇 Additional comments (34)

tests/test_range_planning.py (4)

8-10: LGTM!

The imports and initialization of ColumnHelper are correctly placed at the top of the file.

58-60: LGTM!

The DataFrame creation correctly uses ColumnHelper for column references, improving maintainability.

71-72: LGTM!

The changes correctly:

Use ColumnHelper for column references

Add the required product_col parameter to _get_pairs

Update the expected DataFrame to use ColumnHelper

Also applies to: 78-82, 84-84

92-93: LGTM!

The changes correctly:

Use ColumnHelper for column references

Add the required product_col parameter to _get_pairs

Update the expected DataFrame to use ColumnHelper

Also applies to: 99-103, 106-106

tests/test_cross_shop.py (7)

7-9: LGTM!

The imports and initialization of ColumnHelper are correctly placed at the top of the file.

17-17: LGTM!

The DataFrame creation correctly uses ColumnHelper for column references.

Also applies to: 21-21

35-35: LGTM!

The DataFrame creation and index setting correctly use ColumnHelper for column references.

Also applies to: 39-39, 41-41

56-56: LGTM!

The DataFrame creation and index setting correctly use ColumnHelper for column references.

Also applies to: 72-72, 74-74

109-109: LGTM!

The value_col parameter and DataFrame creation correctly use ColumnHelper for column references.

Also applies to: 129-129, 133-133

145-145: LGTM!

The value_col parameter and DataFrame creation correctly use ColumnHelper for column references.

Also applies to: 149-149, 161-161

180-180: LGTM!

The value_col parameter and DataFrame creation correctly use ColumnHelper for column references.

Also applies to: 185-185, 190-190

pyretailscience/cross_shop.py (4)

8-8: LGTM!

The imports for ColumnHelper and get_option are correctly placed at the top of the file.

24-24: LGTM!

The default value_col correctly uses get_option for dynamic column reference.

80-80: LGTM!

The method correctly uses get_option and ColumnHelper for dynamic column references.

Also applies to: 100-100, 108-108, 124-124, 127-127

134-134: LGTM!

The default value_col correctly uses get_option for dynamic column reference.

pyretailscience/range_planning.py (5)

12-12: LGTM!

The imports for ColumnHelper and get_option are correctly placed at the top of the file.

22-22: LGTM!

The changes correctly:

Add the product_col parameter with proper documentation

Use ColumnHelper for dynamic column references

Also applies to: 33-33, 48-49, 56-56

61-62: LGTM!

The method correctly uses ColumnHelper for dynamic column references.

Also applies to: 64-67, 73-73, 77-77, 79-79

99-100: LGTM!

The method correctly uses self.product_col and get_option for dynamic column references.

168-169: LGTM!

The method correctly uses self.product_col and get_option for dynamic column references.

tests/test_product_association.py (1)

6-10: LGTM!

The changes consistently replace hardcoded column names with ColumnHelper, improving maintainability.

Also applies to: 20-20

tests/test_segmentation.py (1)

7-10: LGTM!

The changes consistently replace hardcoded column names with ColumnHelper, improving maintainability.

Also applies to: 387-388

pyretailscience/segmentation.py (1)

93-94: LGTM! Error message updated to use dynamic column reference.

The error message has been updated to use option column.customer_id instead of hardcoded string, which aligns with the PR objective of using the options class for column names.

pyretailscience/customer.py (3)

34-35: LGTM! PurchasesPerCustomer updated to use ColumnHelper.

The class now uses ColumnHelper for column references, which improves maintainability by centralizing column name management.

Also applies to: 41-41

185-186: LGTM! DaysBetweenPurchases updated to use ColumnHelper.

The class now uses ColumnHelper for column references in both the constructor and the _calculate_days_between_purchases method. The changes are consistent and improve code maintainability.

Also applies to: 212-219

335-337: LGTM! TransactionChurn updated to use ColumnHelper.

The class now uses ColumnHelper for column references in both the constructor and the transaction processing logic. The changes are consistent with other classes.

Also applies to: 342-347

pyretailscience/standard_graphs.py (1)

68-68: LGTM! Updated to use dynamic column reference.

The code now uses get_option("column.transaction_date") instead of a hardcoded column name, which aligns with the PR objective.

pyretailscience/options.py (1)

387-390: LGTM! Added date/time attributes to ColumnHelper.

The addition of transaction_date and transaction_time attributes enhances the ColumnHelper class by providing direct access to these commonly used column names. This is consistent with the PR objective of using the options class for column names.

docs/examples/product_association.ipynb (1)

70-81: LGTM! Column names have been updated for better clarity.

The renaming of columns improves consistency and better reflects their purpose:

transaction_datetime → transaction_date

unit_price → unit_spend

quantity → unit_quantity

docs/examples/revenue_tree.ipynb (2)

45-45: LGTM! Column name updated for consistency.

The renaming of transaction_datetime to transaction_date maintains consistency with other files.

337-338: LGTM! Date filtering updated to use new column name.

The date filtering logic has been correctly updated to use the renamed column.

docs/examples/segmentation.ipynb (3)

1-18: LGTM! Clear introduction and setup

The markdown introduction provides a good overview of customer segmentation and its business value.

221-323: LGTM! Clear explanation of HML segmentation

Good explanation of Heavy/Medium/Light segmentation methodology and business use cases.

20-193: undefined

coderabbitai · 2025-02-09T12:02:38Z

pyretailscience/product_association.py

@@ -55,7 +57,7 @@ class ProductAssociation:
        value_col (str): The name of the column in the input DataFrame that contains
            the product identifiers.
        group_col (str, optional): The name of the column that identifies unique
-            transactions or customers. Defaults to "customer_id".
+            transactions or customers. Defaults to option column.unit_spend.


⚠️ Potential issue

Fix incorrect docstring defaults.

The docstring incorrectly states that group_col defaults to option column.unit_spend when it actually defaults to column.customer_id.

Apply this diff to fix the docstrings:

- transactions or customers. Defaults to option column.unit_spend. + transactions or customers. Defaults to the value from `column.customer_id` option.

Also applies to: 178-178

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

pyretailscience/range_planning.py (1)
249-249: Fix hardcoded column name in plot method.

The plot method still uses hardcoded "product_name" instead of the dynamic product_col.
-        labels = self.pairs_df["product_name"].cat.categories
+        labels = self.pairs_df[self.product_col].cat.categories

🧹 Nitpick comments (3)

pyretailscience/range_planning.py (3)
31-32: Update docstring to reflect dynamic column requirements.

The docstring still mentions hardcoded column names. Update it to reflect that product_name is now specified via product_col.
-                customer_id, transaction_id, product_name.
+                customer_id, transaction_id, and the column specified by product_col.
232-232: Improve type hint for kwargs parameter.

The current type hint dict[str, any] could be more specific to dendrogram parameters.
-    **kwargs: dict[str, any],
+    **kwargs: dict[str, float | str | bool | int],
162-173: Consider optimizing sparse matrix creation for large datasets.

For large datasets, creating the sparse matrix could be memory-intensive. Consider adding a warning or implementing batch processing for very large datasets.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f24832 and ce9690e.

📒 Files selected for processing (3)

pyretailscience/customer.py (6 hunks)
pyretailscience/product_association.py (6 hunks)
pyretailscience/range_planning.py (6 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

pyretailscience/product_association.py
pyretailscience/customer.py

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Pre-Commit

🔇 Additional comments (3)

pyretailscience/range_planning.py (3)

12-12: LGTM! Import changes align with PR objectives.

The addition of ColumnHelper and get_option imports supports the standardization of column names.

19-57: LGTM! Implementation changes improve flexibility.

The addition of product_col parameter and use of ColumnHelper for column validation enhances code maintainability.

61-81: LGTM! Method changes consistently use dynamic column names.

The _get_pairs method implementation correctly uses the dynamic product_col parameter and ColumnHelper for column references.

feat: convert column names to use the options class

5f24832

mvanwyk requested a review from Copilot February 9, 2025 11:59

mvanwyk self-assigned this Feb 9, 2025

qodo-merge-pro bot added the Review effort [1-5]: 3 label Feb 9, 2025

Copilot AI reviewed Feb 9, 2025

View reviewed changes

coderabbitai bot reviewed Feb 9, 2025

View reviewed changes

chore: fix several bugs from refactoring

ce9690e

coderabbitai bot reviewed Feb 9, 2025

View reviewed changes

mvanwyk merged commit c03dbcf into main Feb 9, 2025
3 of 4 checks passed

mvanwyk deleted the change_transaction_datetime branch February 9, 2025 12:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: convert column names to use the options class #91

feat: convert column names to use the options class #91

mvanwyk commented Feb 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 9, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

qodo-merge-pro bot commented Feb 9, 2025

qodo-merge-pro bot commented Feb 9, 2025 •

edited

Loading

Copilot AI left a comment

Copilot AI Feb 9, 2025

codecov bot commented Feb 9, 2025 •

edited

Loading

coderabbitai bot left a comment

coderabbitai bot Feb 9, 2025

coderabbitai bot left a comment

		self.cust_purchases_s = df.groupby(cols.customer_id)[cols.customer_id].nunique()

feat: convert column names to use the options class #91

feat: convert column names to use the options class #91

Conversation

mvanwyk commented Feb 9, 2025 • edited by coderabbitai bot Loading

PR Type

Description

Changes walkthrough 📝

Summary by CodeRabbit

coderabbitai bot commented Feb 9, 2025 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested labels

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

qodo-merge-pro bot commented Feb 9, 2025

PR Reviewer Guide 🔍

qodo-merge-pro bot commented Feb 9, 2025 • edited Loading

PR Code Suggestions ✨

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot AI Feb 9, 2025

Choose a reason for hiding this comment

codecov bot commented Feb 9, 2025 • edited Loading

Codecov Report

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Feb 9, 2025

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

mvanwyk commented Feb 9, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 9, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

qodo-merge-pro bot commented Feb 9, 2025 •

edited

Loading

codecov bot commented Feb 9, 2025 •

edited

Loading