Skip to content

feat: convert column names to use the options class #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 9, 2025

Conversation

mvanwyk
Copy link
Contributor

@mvanwyk mvanwyk commented Feb 9, 2025

PR Type

Enhancement, Tests, Documentation


Description

  • Introduced ColumnHelper and get_option for column name standardization.

  • Updated tests and core logic to use ColumnHelper for column references.

  • Replaced hardcoded column names with dynamic options across modules.

  • Updated documentation and examples to reflect column name changes.


Changes walkthrough 📝

Relevant files
Tests
4 files
test_cross_shop.py
Updated tests to use `ColumnHelper` for column references.
+20/-17 
test_product_association.py
Refactored tests to use `ColumnHelper` for transaction IDs.
+22/-19 
test_range_planning.py
Updated range planning tests for dynamic column names.     
+22/-9   
test_segmentation.py
Adjusted segmentation tests for column name standardization.
+5/-5     
Enhancement
8 files
customer.py
Standardized column names using `ColumnHelper` in customer logic.
+28/-23 
cross_shop.py
Integrated dynamic column options in cross-shop logic.     
+12/-10 
gain_loss.py
Refactored gain/loss logic to use column options.               
+10/-7   
range_planning.py
Added dynamic column handling in range planning module.   
+18/-12 
product_association.py
Standardized product association logic with column options.
+7/-5     
standard_graphs.py
Updated time plot logic to use dynamic column names.         
+2/-1     
segmentation.py
Refactored segmentation logic for column name standardization.
+2/-2     
options.py
Added new column options for transaction date and time.   
+3/-0     
Documentation
4 files
segmentation.ipynb
Updated segmentation example to reflect column changes.   
+42/-42 
product_association.ipynb
Adjusted product association example for column standardization.
+21/-27 
revenue_tree.ipynb
Updated revenue tree example with dynamic column names.   
+10/-10 
analysis_modules.md
Updated analysis module documentation for column standardization.
+3/-3     
Additional files
4 files
transactions.parquet [link]   
cross_shop.ipynb +34/-40 
gain_loss.ipynb +139/-142
retention.ipynb +25/-32 

Need help?
  • Type /help how to ... in the comments thread for any questions about Qodo Merge usage.
  • Check out the documentation for more information.
  • Summary by CodeRabbit

    • New Features
      • Visualizations now consistently display updated labels for transaction dates and spending metrics.
    • Documentation
      • Examples have been refreshed to reflect the new naming conventions, with placeholders added for upcoming content.
    • Refactor
      • Unified naming across modules improves consistency in data presentations and enhances overall maintainability.
      • Enhanced flexibility in column name management through the introduction of a ColumnHelper class.

    @mvanwyk mvanwyk requested a review from Copilot February 9, 2025 11:59
    @mvanwyk mvanwyk self-assigned this Feb 9, 2025
    Copy link

    coderabbitai bot commented Feb 9, 2025

    Walkthrough

    This change set primarily renames the variable and column identifier from "transaction_datetime" to "transaction_date" across multiple documentation files, notebooks, core modules, and tests. It also updates metric columns in one notebook and replaces hardcoded column strings with dynamic references via get_option and the ColumnHelper class. In addition, documentation placeholders and a new parameter in one class have been added to allow future enhancements.

    Changes

    File(s) Change Summary
    docs/analysis_modules.md,
    docs/examples/*
    Renamed "transaction_datetime" to "transaction_date" in timeline plots, product association, revenue tree, and segmentation examples; updated metric columns (e.g. "unit_price" → "unit_spend", "quantity" → "unit_quantity") and added placeholder sections for future content.
    pyretailscience/* Refactored modules (CrossShop, Customer, GainLoss, ProductAssociation, CustomerDecisionHierarchy, ThresholdSegmentation, standard_graphs) to replace hardcoded strings with dynamic values via get_option and ColumnHelper; updated method signatures, required column lists, and error messages accordingly.
    tests/* Updated tests to use ColumnHelper for column references; standardized DataFrame setups and assertions to align with the renamed columns and updated method parameters.

    Possibly related PRs

    • feat: standard bar plot #86 – The changes in the main PR, which involve renaming the variable transaction_datetime to transaction_date, are directly related to the changes in the retrieved PR, where the same variable is also renamed in the context of various documentation and code files.
    • docs: updated analysis module docs #80 – The changes in the main PR and the retrieved PR are related through the modification of the transaction_datetime variable to transaction_date, which is a consistent update across both documentation and code examples.
    • feat: changed threshold seg to use ibis #89 – The changes in the main PR, which involve renaming the transaction_datetime variable to transaction_date, are directly related to the changes in the retrieved PR, where the same variable is updated in the ThresholdSegmentation class to align with new naming conventions.

    Suggested labels

    enhancement, documentation, Tests

    Poem

    I'm a hopping rabbit with lines of code so neat,
    Skipping through changes in data and columns complete,
    Renaming dates and adjusting tests with flair,
    Options now dynamic, floating in the air,
    With placeholders for future dreams and a gentle debug tune,
    I celebrate these updates under the coding moon!

    ✨ Finishing Touches
    • 📝 Generate Docstrings (Beta)

    Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

    ❤️ Share
    🪧 Tips

    Chat

    There are 3 ways to chat with CodeRabbit:

    • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
      • I pushed a fix in commit <commit_id>, please review it.
      • Generate unit testing code for this file.
      • Open a follow-up GitHub issue for this discussion.
    • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
      • @coderabbitai generate unit testing code for this file.
      • @coderabbitai modularize this function.
    • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
      • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
      • @coderabbitai read src/utils.ts and generate unit testing code.
      • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
      • @coderabbitai help me debug CodeRabbit configuration file.

    Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

    CodeRabbit Commands (Invoked using PR comments)

    • @coderabbitai pause to pause the reviews on a PR.
    • @coderabbitai resume to resume the paused reviews.
    • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
    • @coderabbitai full review to do a full review from scratch and review all the files again.
    • @coderabbitai summary to regenerate the summary of the PR.
    • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
    • @coderabbitai resolve resolve all the CodeRabbit review comments.
    • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
    • @coderabbitai help to get help.

    Other keywords and placeholders

    • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
    • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
    • Add @coderabbitai anywhere in the PR title to generate the title automatically.

    CodeRabbit Configuration File (.coderabbit.yaml)

    • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
    • Please see the configuration documentation for more information.
    • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

    Documentation and Community

    • Visit our Documentation for detailed information on how to use CodeRabbit.
    • Join our Discord Community to get help, request features, and share feedback.
    • Follow us on X/Twitter for updates and announcements.

    Copy link
    Contributor

    qodo-merge-pro bot commented Feb 9, 2025

    Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Missing Parameter

    The _get_pairs() method is called with product_col parameter but it's not passed in init(). This could cause runtime errors.

    self.pairs_df = self._get_pairs(df, exclude_same_transaction_products)
    Incomplete Tests

    The test_init_invalid_dataframe() test case doesn't validate the new product_col parameter behavior. Additional test coverage needed.

    def test_init_invalid_dataframe(self):
        """Test that the function raises a ValueError when the dataframe is invalid."""
        df = pd.DataFrame(
            {cols.customer_id: [1, 2, 3], cols.transaction_id: [1, 2, 3], "product_name": ["A", "B", "C"]},
        )
        exclude_same_transaction_products = True
        random_state = 42

    Copy link
    Contributor

    qodo-merge-pro bot commented Feb 9, 2025

    Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Initialize variables before merge operation

    The _calc_cross_shop method is missing initialization of cs_df and kpi_df
    variables before they are used in the merge operation. Initialize these
    variables before using them.

    pyretailscience/cross_shop.py [124-125]

    +cs_df = cs_df.groupby(cols.customer_id)[group_cols].max()
     cs_df["groups"] = cs_df[group_cols].apply(lambda x: tuple(x), axis=1)
    -
    +kpi_df = df.groupby(cols.customer_id)[value_col].agg(agg_func)
     return cs_df.merge(kpi_df, left_index=True, right_index=True)

    [To ensure code accuracy, apply this suggestion manually]

    Suggestion importance[1-10]: 10

    __

    Why: The suggestion fixes a critical bug where cs_df and kpi_df variables are used in a merge operation without being properly initialized, which would cause a runtime error.

    High
    Fix missing method argument
    Suggestion Impact:The commit implemented exactly the suggested change by adding the product_col argument to the _get_pairs method call

    code diff:

    -        self.pairs_df = self._get_pairs(df, exclude_same_transaction_products)
    +        self.pairs_df = self._get_pairs(df, exclude_same_transaction_products, product_col)

    The _get_pairs method is called with incorrect number of arguments - product_col
    parameter is defined but not passed when calling the method.

    pyretailscience/range_planning.py [57]

    -self.pairs_df = self._get_pairs(df, exclude_same_transaction_products)
    +self.pairs_df = self._get_pairs(df, exclude_same_transaction_products, self.product_col)
    • Apply this suggestion
    Suggestion importance[1-10]: 10

    __

    Why: The suggestion fixes a critical bug where the _get_pairs method is called with missing required argument product_col, which would cause a runtime error.

    High
    General
    Fix inconsistent parameter documentation
    Suggestion Impact:The docstring was updated to fix the inconsistency, though with a different default value (column.column_id instead of column.customer_id)

    code diff:

             group_col (str, optional): The name of the column that identifies unique
    -            transactions or customers. Defaults to option column.unit_spend.
    +            transactions or customers. Defaults to option column.column_id.

    The docstring description for group_col parameter is inconsistent with the
    default value. The docstring states it defaults to option column.unit_spend but
    the code uses get_option("column.customer_id"). Update either the docstring or
    the default value to match.

    pyretailscience/product_association.py [59-60]

     group_col (str, optional): The name of the column that identifies unique
    -    transactions or customers. Defaults to option column.unit_spend.
    +    transactions or customers. Defaults to option column.customer_id.
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    __

    Why: The suggestion identifies an important inconsistency between the docstring and actual code implementation that could mislead users. Accurate documentation is crucial for proper API usage.

    Medium
    Preserve unit price information

    Consider keeping both 'unit_quantity' and 'unit_spend' columns, but also include
    'unit_price' as it provides valuable per-unit price information that can be
    useful for analysis. The unit price can be derived as unit_spend/unit_quantity.

    docs/examples/product_association.ipynb [80-81]

     "      <th>unit_quantity</th>\n",
     "      <th>unit_spend</th>\n",
    +"      <th>unit_price</th>\n",
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    __

    Why: Having unit price information is valuable for price analysis and comparisons, and it's better to keep it as an explicit column rather than requiring recalculation. This enhances data usability and analysis capabilities.

    Medium

    Copy link
    Contributor

    @Copilot Copilot AI left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Copilot reviewed 9 out of 20 changed files in this pull request and generated 1 comment.

    Files not reviewed (11)
    • docs/examples/product_association.ipynb: Evaluated as low risk
    • docs/examples/revenue_tree.ipynb: Evaluated as low risk
    • docs/examples/segmentation.ipynb: Evaluated as low risk
    • docs/analysis_modules.md: Evaluated as low risk
    • tests/test_product_association.py: Evaluated as low risk
    • pyretailscience/range_planning.py: Evaluated as low risk
    • pyretailscience/segmentation.py: Evaluated as low risk
    • pyretailscience/product_association.py: Evaluated as low risk
    • pyretailscience/cross_shop.py: Evaluated as low risk
    • pyretailscience/standard_graphs.py: Evaluated as low risk
    • pyretailscience/gain_loss.py: Evaluated as low risk

    Comment on lines 41 to 42
    self.cust_purchases_s = df.groupby(cols.customer_id)[cols.customer_id].nunique()

    Copy link
    Preview

    Copilot AI Feb 9, 2025

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    The unique count should be calculated on 'transaction_id' instead of 'customer_id'. Replace with: self.cust_purchases_s = df.groupby(cols.customer_id)[cols.transaction_id].nunique()

    Suggested change
    self.cust_purchases_s = df.groupby(cols.customer_id)[cols.customer_id].nunique()
    self.cust_purchases_s = df.groupby(cols.customer_id)[cols.transaction_id].nunique()

    Copilot uses AI. Check for mistakes.

    Copy link

    codecov bot commented Feb 9, 2025

    Codecov Report

    Attention: Patch coverage is 37.50000% with 30 lines in your changes missing coverage. Please review.

    Files with missing lines Patch % Lines
    pyretailscience/customer.py 0.00% 22 Missing ⚠️
    pyretailscience/gain_loss.py 20.00% 4 Missing ⚠️
    pyretailscience/range_planning.py 80.00% 2 Missing ⚠️
    pyretailscience/cross_shop.py 83.33% 1 Missing ⚠️
    pyretailscience/standard_graphs.py 50.00% 1 Missing ⚠️
    Files with missing lines Coverage Δ
    pyretailscience/options.py 97.53% <100.00%> (+1.57%) ⬆️
    pyretailscience/product_association.py 85.93% <100.00%> (+0.22%) ⬆️
    pyretailscience/segmentation.py 69.01% <ø> (+8.85%) ⬆️
    pyretailscience/cross_shop.py 47.19% <83.33%> (+1.21%) ⬆️
    pyretailscience/standard_graphs.py 40.45% <50.00%> (+0.45%) ⬆️
    pyretailscience/range_planning.py 42.00% <80.00%> (-0.71%) ⬇️
    pyretailscience/gain_loss.py 32.58% <20.00%> (+0.40%) ⬆️
    pyretailscience/customer.py 0.00% <0.00%> (ø)

    Copy link

    @coderabbitai coderabbitai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 1

    🔭 Outside diff range comments (1)
    docs/examples/segmentation.ipynb (1)

    715-804: Consider adding data validation before export

    The segment activation code should validate data before export:

    def export_segment(df, segment_name, filename):
        # Validate inputs
        if segment_name not in df["segment_name"].unique():
            raise ValueError(f"Invalid segment: {segment_name}")
            
        # Validate we have customer IDs
        segment_customers = df[df["segment_name"] == segment_name].index
        if len(segment_customers) == 0:
            raise ValueError(f"No customers found in segment: {segment_name}")
            
        # Export
        segment_customers.to_series().to_csv(filename, index=False)
        print(f"Exported {len(segment_customers)} customers to {filename}")
    🧹 Nitpick comments (6)
    pyretailscience/gain_loss.py (2)

    54-54: Improve parameter documentation.

    The docstring for value_col parameter should specify that it defaults to the value from column.unit_spend option.

    -            value_col (str, optional): The column to calculate the gain loss from. Defaults to option column.unit_spend.
    +            value_col (str, optional): The column to calculate the gain loss from. Defaults to the value from `column.unit_spend` option.

    290-290: Address the TODO comment.

    The TODO comment suggests that there might be a performance optimization opportunity by avoiding DataFrame construction.

    Would you like me to help implement a solution that avoids constructing a pandas DataFrame or open an issue to track this task?

    docs/examples/segmentation.ipynb (2)

    324-508: Consider adding input validation for segmentation parameters

    The segmentation code works but could benefit from parameter validation:

    def validate_segment_params(zero_value_customers):
        valid_options = ["include_with_light", "exclude"] 
        if zero_value_customers not in valid_options:
            raise ValueError(f"zero_value_customers must be one of {valid_options}")
    
    # Use before segmentation
    validate_segment_params(zero_value_customers="include_with_light")

    509-714: Add error handling for visualization

    The visualization code should handle potential errors:

    try:
        ax = seg_stats.plot(
            figsize=(10, 5),
            value_col="spend",
            source_text="Source: Transaction data financial year 2023", 
            sort_order="descending",
            title="What's the value of a Heavy customer?",
            rot=0,
        )
    except Exception as e:
        print(f"Error creating plot: {e}")
        raise
    docs/analysis_modules.md (2)

    261-269: Consistent Column Renaming in Timeline Plot Example
    The update replacing the hardcoded "transaction_datetime" with "transaction_date" in the DataFrame construction is correctly applied, aligning with the PR objective. Consider, as a future improvement, using a dynamic reference from the options class (if available) for greater maintainability.


    448-454: Consistent Column Renaming in Revenue Tree Example
    The change in the revenue tree example—where the filtering indices now use "transaction_date" instead of "transaction_datetime"—is correct. In line with the PR objective, if an options class is intended to centralize column name definitions, consider updating this snippet to reference that class instead of using a literal string.

    📜 Review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL
    Plan: Pro

    📥 Commits

    Reviewing files that changed from the base of the PR and between 68de169 and 5f24832.

    ⛔ Files ignored due to path filters (1)
    • data/transactions.parquet is excluded by !**/*.parquet
    📒 Files selected for processing (16)
    • docs/analysis_modules.md (2 hunks)
    • docs/examples/product_association.ipynb (4 hunks)
    • docs/examples/revenue_tree.ipynb (4 hunks)
    • docs/examples/segmentation.ipynb (8 hunks)
    • pyretailscience/cross_shop.py (8 hunks)
    • pyretailscience/customer.py (6 hunks)
    • pyretailscience/gain_loss.py (7 hunks)
    • pyretailscience/options.py (1 hunks)
    • pyretailscience/product_association.py (6 hunks)
    • pyretailscience/range_planning.py (6 hunks)
    • pyretailscience/segmentation.py (1 hunks)
    • pyretailscience/standard_graphs.py (2 hunks)
    • tests/test_cross_shop.py (9 hunks)
    • tests/test_product_association.py (17 hunks)
    • tests/test_range_planning.py (3 hunks)
    • tests/test_segmentation.py (4 hunks)
    🧰 Additional context used
    🪛 Ruff (0.8.2)
    tests/test_cross_shop.py

    12-12: Use @pytest.fixture over @pytest.fixture()

    Remove parentheses

    (PT001)

    ⏰ Context from checks skipped due to timeout of 90000ms (1)
    • GitHub Check: Pre-Commit
    🔇 Additional comments (34)
    tests/test_range_planning.py (4)

    8-10: LGTM!

    The imports and initialization of ColumnHelper are correctly placed at the top of the file.


    58-60: LGTM!

    The DataFrame creation correctly uses ColumnHelper for column references, improving maintainability.


    71-72: LGTM!

    The changes correctly:

    • Use ColumnHelper for column references
    • Add the required product_col parameter to _get_pairs
    • Update the expected DataFrame to use ColumnHelper

    Also applies to: 78-82, 84-84


    92-93: LGTM!

    The changes correctly:

    • Use ColumnHelper for column references
    • Add the required product_col parameter to _get_pairs
    • Update the expected DataFrame to use ColumnHelper

    Also applies to: 99-103, 106-106

    tests/test_cross_shop.py (7)

    7-9: LGTM!

    The imports and initialization of ColumnHelper are correctly placed at the top of the file.


    17-17: LGTM!

    The DataFrame creation correctly uses ColumnHelper for column references.

    Also applies to: 21-21


    35-35: LGTM!

    The DataFrame creation and index setting correctly use ColumnHelper for column references.

    Also applies to: 39-39, 41-41


    56-56: LGTM!

    The DataFrame creation and index setting correctly use ColumnHelper for column references.

    Also applies to: 72-72, 74-74


    109-109: LGTM!

    The value_col parameter and DataFrame creation correctly use ColumnHelper for column references.

    Also applies to: 129-129, 133-133


    145-145: LGTM!

    The value_col parameter and DataFrame creation correctly use ColumnHelper for column references.

    Also applies to: 149-149, 161-161


    180-180: LGTM!

    The value_col parameter and DataFrame creation correctly use ColumnHelper for column references.

    Also applies to: 185-185, 190-190

    pyretailscience/cross_shop.py (4)

    8-8: LGTM!

    The imports for ColumnHelper and get_option are correctly placed at the top of the file.


    24-24: LGTM!

    The default value_col correctly uses get_option for dynamic column reference.


    80-80: LGTM!

    The method correctly uses get_option and ColumnHelper for dynamic column references.

    Also applies to: 100-100, 108-108, 124-124, 127-127


    134-134: LGTM!

    The default value_col correctly uses get_option for dynamic column reference.

    pyretailscience/range_planning.py (5)

    12-12: LGTM!

    The imports for ColumnHelper and get_option are correctly placed at the top of the file.


    22-22: LGTM!

    The changes correctly:

    • Add the product_col parameter with proper documentation
    • Use ColumnHelper for dynamic column references

    Also applies to: 33-33, 48-49, 56-56


    61-62: LGTM!

    The method correctly uses ColumnHelper for dynamic column references.

    Also applies to: 64-67, 73-73, 77-77, 79-79


    99-100: LGTM!

    The method correctly uses self.product_col and get_option for dynamic column references.


    168-169: LGTM!

    The method correctly uses self.product_col and get_option for dynamic column references.

    tests/test_product_association.py (1)

    6-10: LGTM!

    The changes consistently replace hardcoded column names with ColumnHelper, improving maintainability.

    Also applies to: 20-20

    tests/test_segmentation.py (1)

    7-10: LGTM!

    The changes consistently replace hardcoded column names with ColumnHelper, improving maintainability.

    Also applies to: 387-388

    pyretailscience/segmentation.py (1)

    93-94: LGTM! Error message updated to use dynamic column reference.

    The error message has been updated to use option column.customer_id instead of hardcoded string, which aligns with the PR objective of using the options class for column names.

    pyretailscience/customer.py (3)

    34-35: LGTM! PurchasesPerCustomer updated to use ColumnHelper.

    The class now uses ColumnHelper for column references, which improves maintainability by centralizing column name management.

    Also applies to: 41-41


    185-186: LGTM! DaysBetweenPurchases updated to use ColumnHelper.

    The class now uses ColumnHelper for column references in both the constructor and the _calculate_days_between_purchases method. The changes are consistent and improve code maintainability.

    Also applies to: 212-219


    335-337: LGTM! TransactionChurn updated to use ColumnHelper.

    The class now uses ColumnHelper for column references in both the constructor and the transaction processing logic. The changes are consistent with other classes.

    Also applies to: 342-347

    pyretailscience/standard_graphs.py (1)

    68-68: LGTM! Updated to use dynamic column reference.

    The code now uses get_option("column.transaction_date") instead of a hardcoded column name, which aligns with the PR objective.

    pyretailscience/options.py (1)

    387-390: LGTM! Added date/time attributes to ColumnHelper.

    The addition of transaction_date and transaction_time attributes enhances the ColumnHelper class by providing direct access to these commonly used column names. This is consistent with the PR objective of using the options class for column names.

    docs/examples/product_association.ipynb (1)

    70-81: LGTM! Column names have been updated for better clarity.

    The renaming of columns improves consistency and better reflects their purpose:

    • transaction_datetimetransaction_date
    • unit_priceunit_spend
    • quantityunit_quantity
    docs/examples/revenue_tree.ipynb (2)

    45-45: LGTM! Column name updated for consistency.

    The renaming of transaction_datetime to transaction_date maintains consistency with other files.


    337-338: LGTM! Date filtering updated to use new column name.

    The date filtering logic has been correctly updated to use the renamed column.

    docs/examples/segmentation.ipynb (3)

    1-18: LGTM! Clear introduction and setup

    The markdown introduction provides a good overview of customer segmentation and its business value.


    221-323: LGTM! Clear explanation of HML segmentation

    Good explanation of Heavy/Medium/Light segmentation methodology and business use cases.


    20-193: undefined

    @@ -55,7 +57,7 @@ class ProductAssociation:
    value_col (str): The name of the column in the input DataFrame that contains
    the product identifiers.
    group_col (str, optional): The name of the column that identifies unique
    transactions or customers. Defaults to "customer_id".
    transactions or customers. Defaults to option column.unit_spend.
    Copy link

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ⚠️ Potential issue

    Fix incorrect docstring defaults.

    The docstring incorrectly states that group_col defaults to option column.unit_spend when it actually defaults to column.customer_id.

    Apply this diff to fix the docstrings:

    -            transactions or customers. Defaults to option column.unit_spend.
    +            transactions or customers. Defaults to the value from `column.customer_id` option.

    Also applies to: 178-178

    Copy link

    @coderabbitai coderabbitai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 0

    🔭 Outside diff range comments (1)
    pyretailscience/range_planning.py (1)

    249-249: Fix hardcoded column name in plot method.

    The plot method still uses hardcoded "product_name" instead of the dynamic product_col.

    -        labels = self.pairs_df["product_name"].cat.categories
    +        labels = self.pairs_df[self.product_col].cat.categories
    🧹 Nitpick comments (3)
    pyretailscience/range_planning.py (3)

    31-32: Update docstring to reflect dynamic column requirements.

    The docstring still mentions hardcoded column names. Update it to reflect that product_name is now specified via product_col.

    -                customer_id, transaction_id, product_name.
    +                customer_id, transaction_id, and the column specified by product_col.

    232-232: Improve type hint for kwargs parameter.

    The current type hint dict[str, any] could be more specific to dendrogram parameters.

    -    **kwargs: dict[str, any],
    +    **kwargs: dict[str, float | str | bool | int],

    162-173: Consider optimizing sparse matrix creation for large datasets.

    For large datasets, creating the sparse matrix could be memory-intensive. Consider adding a warning or implementing batch processing for very large datasets.

    📜 Review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL
    Plan: Pro

    📥 Commits

    Reviewing files that changed from the base of the PR and between 5f24832 and ce9690e.

    📒 Files selected for processing (3)
    • pyretailscience/customer.py (6 hunks)
    • pyretailscience/product_association.py (6 hunks)
    • pyretailscience/range_planning.py (6 hunks)
    🚧 Files skipped from review as they are similar to previous changes (2)
    • pyretailscience/product_association.py
    • pyretailscience/customer.py
    ⏰ Context from checks skipped due to timeout of 90000ms (1)
    • GitHub Check: Pre-Commit
    🔇 Additional comments (3)
    pyretailscience/range_planning.py (3)

    12-12: LGTM! Import changes align with PR objectives.

    The addition of ColumnHelper and get_option imports supports the standardization of column names.


    19-57: LGTM! Implementation changes improve flexibility.

    The addition of product_col parameter and use of ColumnHelper for column validation enhances code maintainability.


    61-81: LGTM! Method changes consistently use dynamic column names.

    The _get_pairs method implementation correctly uses the dynamic product_col parameter and ColumnHelper for column references.

    @mvanwyk mvanwyk merged commit c03dbcf into main Feb 9, 2025
    3 of 4 checks passed
    @mvanwyk mvanwyk deleted the change_transaction_datetime branch February 9, 2025 12:13
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant