-
Notifications
You must be signed in to change notification settings - Fork 1
feat: convert column names to use the options class #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change set primarily renames the variable and column identifier from "transaction_datetime" to "transaction_date" across multiple documentation files, notebooks, core modules, and tests. It also updates metric columns in one notebook and replaces hardcoded column strings with dynamic references via Changes
Possibly related PRs
Suggested labels
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here. PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here. PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 9 out of 20 changed files in this pull request and generated 1 comment.
Files not reviewed (11)
- docs/examples/product_association.ipynb: Evaluated as low risk
- docs/examples/revenue_tree.ipynb: Evaluated as low risk
- docs/examples/segmentation.ipynb: Evaluated as low risk
- docs/analysis_modules.md: Evaluated as low risk
- tests/test_product_association.py: Evaluated as low risk
- pyretailscience/range_planning.py: Evaluated as low risk
- pyretailscience/segmentation.py: Evaluated as low risk
- pyretailscience/product_association.py: Evaluated as low risk
- pyretailscience/cross_shop.py: Evaluated as low risk
- pyretailscience/standard_graphs.py: Evaluated as low risk
- pyretailscience/gain_loss.py: Evaluated as low risk
pyretailscience/customer.py
Outdated
self.cust_purchases_s = df.groupby(cols.customer_id)[cols.customer_id].nunique() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unique count should be calculated on 'transaction_id' instead of 'customer_id'. Replace with: self.cust_purchases_s = df.groupby(cols.customer_id)[cols.transaction_id].nunique()
self.cust_purchases_s = df.groupby(cols.customer_id)[cols.customer_id].nunique() | |
self.cust_purchases_s = df.groupby(cols.customer_id)[cols.transaction_id].nunique() |
Copilot uses AI. Check for mistakes.
Codecov ReportAttention: Patch coverage is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
docs/examples/segmentation.ipynb (1)
715-804
: Consider adding data validation before exportThe segment activation code should validate data before export:
def export_segment(df, segment_name, filename): # Validate inputs if segment_name not in df["segment_name"].unique(): raise ValueError(f"Invalid segment: {segment_name}") # Validate we have customer IDs segment_customers = df[df["segment_name"] == segment_name].index if len(segment_customers) == 0: raise ValueError(f"No customers found in segment: {segment_name}") # Export segment_customers.to_series().to_csv(filename, index=False) print(f"Exported {len(segment_customers)} customers to {filename}")
🧹 Nitpick comments (6)
pyretailscience/gain_loss.py (2)
54-54
: Improve parameter documentation.The docstring for
value_col
parameter should specify that it defaults to the value fromcolumn.unit_spend
option.- value_col (str, optional): The column to calculate the gain loss from. Defaults to option column.unit_spend. + value_col (str, optional): The column to calculate the gain loss from. Defaults to the value from `column.unit_spend` option.
290-290
: Address the TODO comment.The TODO comment suggests that there might be a performance optimization opportunity by avoiding DataFrame construction.
Would you like me to help implement a solution that avoids constructing a pandas DataFrame or open an issue to track this task?
docs/examples/segmentation.ipynb (2)
324-508
: Consider adding input validation for segmentation parametersThe segmentation code works but could benefit from parameter validation:
def validate_segment_params(zero_value_customers): valid_options = ["include_with_light", "exclude"] if zero_value_customers not in valid_options: raise ValueError(f"zero_value_customers must be one of {valid_options}") # Use before segmentation validate_segment_params(zero_value_customers="include_with_light")
509-714
: Add error handling for visualizationThe visualization code should handle potential errors:
try: ax = seg_stats.plot( figsize=(10, 5), value_col="spend", source_text="Source: Transaction data financial year 2023", sort_order="descending", title="What's the value of a Heavy customer?", rot=0, ) except Exception as e: print(f"Error creating plot: {e}") raisedocs/analysis_modules.md (2)
261-269
: Consistent Column Renaming in Timeline Plot Example
The update replacing the hardcoded "transaction_datetime" with "transaction_date" in the DataFrame construction is correctly applied, aligning with the PR objective. Consider, as a future improvement, using a dynamic reference from the options class (if available) for greater maintainability.
448-454
: Consistent Column Renaming in Revenue Tree Example
The change in the revenue tree example—where the filtering indices now use "transaction_date" instead of "transaction_datetime"—is correct. In line with the PR objective, if an options class is intended to centralize column name definitions, consider updating this snippet to reference that class instead of using a literal string.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
data/transactions.parquet
is excluded by!**/*.parquet
📒 Files selected for processing (16)
docs/analysis_modules.md
(2 hunks)docs/examples/product_association.ipynb
(4 hunks)docs/examples/revenue_tree.ipynb
(4 hunks)docs/examples/segmentation.ipynb
(8 hunks)pyretailscience/cross_shop.py
(8 hunks)pyretailscience/customer.py
(6 hunks)pyretailscience/gain_loss.py
(7 hunks)pyretailscience/options.py
(1 hunks)pyretailscience/product_association.py
(6 hunks)pyretailscience/range_planning.py
(6 hunks)pyretailscience/segmentation.py
(1 hunks)pyretailscience/standard_graphs.py
(2 hunks)tests/test_cross_shop.py
(9 hunks)tests/test_product_association.py
(17 hunks)tests/test_range_planning.py
(3 hunks)tests/test_segmentation.py
(4 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/test_cross_shop.py
12-12: Use @pytest.fixture
over @pytest.fixture()
Remove parentheses
(PT001)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Pre-Commit
🔇 Additional comments (34)
tests/test_range_planning.py (4)
8-10
: LGTM!The imports and initialization of ColumnHelper are correctly placed at the top of the file.
58-60
: LGTM!The DataFrame creation correctly uses ColumnHelper for column references, improving maintainability.
71-72
: LGTM!The changes correctly:
- Use ColumnHelper for column references
- Add the required product_col parameter to _get_pairs
- Update the expected DataFrame to use ColumnHelper
Also applies to: 78-82, 84-84
92-93
: LGTM!The changes correctly:
- Use ColumnHelper for column references
- Add the required product_col parameter to _get_pairs
- Update the expected DataFrame to use ColumnHelper
Also applies to: 99-103, 106-106
tests/test_cross_shop.py (7)
7-9
: LGTM!The imports and initialization of ColumnHelper are correctly placed at the top of the file.
17-17
: LGTM!The DataFrame creation correctly uses ColumnHelper for column references.
Also applies to: 21-21
35-35
: LGTM!The DataFrame creation and index setting correctly use ColumnHelper for column references.
Also applies to: 39-39, 41-41
56-56
: LGTM!The DataFrame creation and index setting correctly use ColumnHelper for column references.
Also applies to: 72-72, 74-74
109-109
: LGTM!The value_col parameter and DataFrame creation correctly use ColumnHelper for column references.
Also applies to: 129-129, 133-133
145-145
: LGTM!The value_col parameter and DataFrame creation correctly use ColumnHelper for column references.
Also applies to: 149-149, 161-161
180-180
: LGTM!The value_col parameter and DataFrame creation correctly use ColumnHelper for column references.
Also applies to: 185-185, 190-190
pyretailscience/cross_shop.py (4)
8-8
: LGTM!The imports for ColumnHelper and get_option are correctly placed at the top of the file.
24-24
: LGTM!The default value_col correctly uses get_option for dynamic column reference.
80-80
: LGTM!The method correctly uses get_option and ColumnHelper for dynamic column references.
Also applies to: 100-100, 108-108, 124-124, 127-127
134-134
: LGTM!The default value_col correctly uses get_option for dynamic column reference.
pyretailscience/range_planning.py (5)
12-12
: LGTM!The imports for ColumnHelper and get_option are correctly placed at the top of the file.
22-22
: LGTM!The changes correctly:
- Add the product_col parameter with proper documentation
- Use ColumnHelper for dynamic column references
Also applies to: 33-33, 48-49, 56-56
61-62
: LGTM!The method correctly uses ColumnHelper for dynamic column references.
Also applies to: 64-67, 73-73, 77-77, 79-79
99-100
: LGTM!The method correctly uses self.product_col and get_option for dynamic column references.
168-169
: LGTM!The method correctly uses self.product_col and get_option for dynamic column references.
tests/test_product_association.py (1)
6-10
: LGTM!The changes consistently replace hardcoded column names with ColumnHelper, improving maintainability.
Also applies to: 20-20
tests/test_segmentation.py (1)
7-10
: LGTM!The changes consistently replace hardcoded column names with ColumnHelper, improving maintainability.
Also applies to: 387-388
pyretailscience/segmentation.py (1)
93-94
: LGTM! Error message updated to use dynamic column reference.The error message has been updated to use
option column.customer_id
instead of hardcoded string, which aligns with the PR objective of using the options class for column names.pyretailscience/customer.py (3)
34-35
: LGTM! PurchasesPerCustomer updated to use ColumnHelper.The class now uses
ColumnHelper
for column references, which improves maintainability by centralizing column name management.Also applies to: 41-41
185-186
: LGTM! DaysBetweenPurchases updated to use ColumnHelper.The class now uses
ColumnHelper
for column references in both the constructor and the_calculate_days_between_purchases
method. The changes are consistent and improve code maintainability.Also applies to: 212-219
335-337
: LGTM! TransactionChurn updated to use ColumnHelper.The class now uses
ColumnHelper
for column references in both the constructor and the transaction processing logic. The changes are consistent with other classes.Also applies to: 342-347
pyretailscience/standard_graphs.py (1)
68-68
: LGTM! Updated to use dynamic column reference.The code now uses
get_option("column.transaction_date")
instead of a hardcoded column name, which aligns with the PR objective.pyretailscience/options.py (1)
387-390
: LGTM! Added date/time attributes to ColumnHelper.The addition of
transaction_date
andtransaction_time
attributes enhances theColumnHelper
class by providing direct access to these commonly used column names. This is consistent with the PR objective of using the options class for column names.docs/examples/product_association.ipynb (1)
70-81
: LGTM! Column names have been updated for better clarity.The renaming of columns improves consistency and better reflects their purpose:
transaction_datetime
→transaction_date
unit_price
→unit_spend
quantity
→unit_quantity
docs/examples/revenue_tree.ipynb (2)
45-45
: LGTM! Column name updated for consistency.The renaming of
transaction_datetime
totransaction_date
maintains consistency with other files.
337-338
: LGTM! Date filtering updated to use new column name.The date filtering logic has been correctly updated to use the renamed column.
docs/examples/segmentation.ipynb (3)
1-18
: LGTM! Clear introduction and setupThe markdown introduction provides a good overview of customer segmentation and its business value.
221-323
: LGTM! Clear explanation of HML segmentationGood explanation of Heavy/Medium/Light segmentation methodology and business use cases.
20-193
: undefined
@@ -55,7 +57,7 @@ class ProductAssociation: | |||
value_col (str): The name of the column in the input DataFrame that contains | |||
the product identifiers. | |||
group_col (str, optional): The name of the column that identifies unique | |||
transactions or customers. Defaults to "customer_id". | |||
transactions or customers. Defaults to option column.unit_spend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incorrect docstring defaults.
The docstring incorrectly states that group_col
defaults to option column.unit_spend
when it actually defaults to column.customer_id
.
Apply this diff to fix the docstrings:
- transactions or customers. Defaults to option column.unit_spend.
+ transactions or customers. Defaults to the value from `column.customer_id` option.
Also applies to: 178-178
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
pyretailscience/range_planning.py (1)
249-249
: Fix hardcoded column name in plot method.The plot method still uses hardcoded "product_name" instead of the dynamic
product_col
.- labels = self.pairs_df["product_name"].cat.categories + labels = self.pairs_df[self.product_col].cat.categories
🧹 Nitpick comments (3)
pyretailscience/range_planning.py (3)
31-32
: Update docstring to reflect dynamic column requirements.The docstring still mentions hardcoded column names. Update it to reflect that
product_name
is now specified viaproduct_col
.- customer_id, transaction_id, product_name. + customer_id, transaction_id, and the column specified by product_col.
232-232
: Improve type hint for kwargs parameter.The current type hint
dict[str, any]
could be more specific to dendrogram parameters.- **kwargs: dict[str, any], + **kwargs: dict[str, float | str | bool | int],
162-173
: Consider optimizing sparse matrix creation for large datasets.For large datasets, creating the sparse matrix could be memory-intensive. Consider adding a warning or implementing batch processing for very large datasets.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
pyretailscience/customer.py
(6 hunks)pyretailscience/product_association.py
(6 hunks)pyretailscience/range_planning.py
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pyretailscience/product_association.py
- pyretailscience/customer.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Pre-Commit
🔇 Additional comments (3)
pyretailscience/range_planning.py (3)
12-12
: LGTM! Import changes align with PR objectives.The addition of
ColumnHelper
andget_option
imports supports the standardization of column names.
19-57
: LGTM! Implementation changes improve flexibility.The addition of
product_col
parameter and use ofColumnHelper
for column validation enhances code maintainability.
61-81
: LGTM! Method changes consistently use dynamic column names.The
_get_pairs
method implementation correctly uses the dynamicproduct_col
parameter andColumnHelper
for column references.
PR Type
Enhancement, Tests, Documentation
Description
Introduced
ColumnHelper
andget_option
for column name standardization.Updated tests and core logic to use
ColumnHelper
for column references.Replaced hardcoded column names with dynamic options across modules.
Updated documentation and examples to reflect column name changes.
Changes walkthrough 📝
4 files
Updated tests to use `ColumnHelper` for column references.
Refactored tests to use `ColumnHelper` for transaction IDs.
Updated range planning tests for dynamic column names.
Adjusted segmentation tests for column name standardization.
8 files
Standardized column names using `ColumnHelper` in customer logic.
Integrated dynamic column options in cross-shop logic.
Refactored gain/loss logic to use column options.
Added dynamic column handling in range planning module.
Standardized product association logic with column options.
Updated time plot logic to use dynamic column names.
Refactored segmentation logic for column name standardization.
Added new column options for transaction date and time.
4 files
Updated segmentation example to reflect column changes.
Adjusted product association example for column standardization.
Updated revenue tree example with dynamic column names.
Updated analysis module documentation for column standardization.
4 files
Summary by CodeRabbit
ColumnHelper
class.