-
Notifications
You must be signed in to change notification settings - Fork 1
Split Segmentation #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split Segmentation #154
Conversation
WalkthroughThe changes reorganize the segmentation modules in the package. Import paths for segmentation classes (HML, Threshold, SegTransactionStats, RFMSegmentation) are updated, and several new modules are introduced. Documentation is enhanced with new API files and navigation restructuring in the mkdocs configuration. Additionally, new test suites replace the deprecated segmentation tests, and unused segmentation classes are removed from the segmentation statistics module. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ThresholdSegmentation
participant DataFrame
Client->>ThresholdSegmentation: __init__(df, thresholds, segments, ...)
ThresholdSegmentation->>DataFrame: Validate required columns
ThresholdSegmentation->>DataFrame: Aggregate data and calculate percentiles
ThresholdSegmentation->>Client: Return segmented DataFrame
sequenceDiagram
participant Client
participant RFMSegmentation
participant DataFrame
Client->>RFMSegmentation: __init__(df, current_date)
RFMSegmentation->>DataFrame: Validate input and compute RFM metrics
RFMSegmentation->>RFMSegmentation: Calculate RFM scores and segments
RFMSegmentation->>Client: Provide segmented results via df and ibis_table properties
Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here. PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here. PR Code Suggestions ✨Explore these optional code suggestions:
|
Codecov ReportAttention: Patch coverage is
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (12)
pyretailscience/segmentation/threshold.py (3)
31-31
: Change segment keys to numeric types for clarity
The parametersegments
is typed asdict[any, str]
. If your threshold keys are numeric quantiles, usingdict[float, str]
can improve clarity and readability.
50-52
: Mismatch in docstring regarding null value checks
The docstring states aValueError
is raised if columns contain null values, but the code only checks for missing columns. Consider adding explicit checks for null values to match the documented behavior.
100-106
: Potential repeated execution of Ibis query
Each time thedf
property is accessed,table.execute()
is called, re-running the query. For large datasets, consider caching results or separating data retrieval from downstream operations to optimize performance.pyretailscience/segmentation/base.py (2)
8-10
: Consider expanding the class docstring.The class docstring could provide more information about how the class should be used, particularly explaining the expected structure of
self.df
that will be referenced in theadd_segment
method.class BaseSegmentation: - """A base class for customer segmentation.""" + """A base class for customer segmentation. + + This class expects subclasses to initialize self.df with customer data, + including a 'segment_name' column that maps customer IDs to their segments. + """
11-33
:❓ Verification inconclusive
Add initialization validation and test coverage for error path.
The
add_segment
method is well-documented, but there are a few improvements that could be made:
- There's no validation that
self.df
exists and has the required "segment_name" column.- The error case (row count mismatch) is not covered by tests.
def add_segment(self, df: pd.DataFrame) -> pd.DataFrame: """Adds the segment to the dataframe based on the customer_id column. Args: df (pd.DataFrame): The dataframe to add the segment to. The dataframe must have a customer_id column. Returns: pd.DataFrame: The dataframe with the segment added. Raises: ValueError: If the number of rows before and after the merge do not match. + AttributeError: If self.df is not initialized or missing the segment_name column. """ + if not hasattr(self, 'df') or 'segment_name' not in self.df.columns: + raise AttributeError("self.df must be initialized with a 'segment_name' column.") + + customer_id_col = get_option("column.customer_id") + if customer_id_col not in df.columns: + raise ValueError(f"DataFrame must contain the customer ID column: {customer_id_col}") + rows_before = len(df) df = df.merge( self.df["segment_name"], how="left", left_on=get_option("column.customer_id"), right_index=True, ) rows_after = len(df) if rows_before != rows_after: raise ValueError("The number of rows before and after the merge do not match. This should not happen.") return dfTo verify test coverage for the error path, let's check if there are tests for this class:
🏁 Script executed:
#!/bin/bash # Look for tests that cover the BaseSegmentation class rg "BaseSegmentation" -g "test_*.py" --files-with-matchesLength of output: 57
Initialization Validation Added – Please Add Tests for Error Paths
The updated changes correctly include validation to ensure that both
self.df
and its"segment_name"
column, as well as the customer ID column indf
, are present before proceeding with the merge. However, our search for tests covering the error cases (such as when these validations fail or the row counts don’t match) didn’t yield any results.
- Ensure that tests are added for the following scenarios:
- When
self.df
is not initialized or is missing the"segment_name"
column.- When the input DataFrame does not include the expected customer ID column.
- When the row count mismatch triggers the ValueError during the merge.
Please add the necessary tests to cover these error paths to maintain robust error handling in the code.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 32-32: pyretailscience/segmentation/base.py#L32
Added line #L32 was not covered by teststests/segmentation/test_threshold.py (1)
170-188
: Additional threshold validation tests.These tests complement the previous validation tests by checking the case where there are too few thresholds for the number of segments.
I notice a minor typo in the test name
test_thresholds_too_too_few_thresholds
(duplicated "too"), but this doesn't affect functionality.- def test_thresholds_too_too_few_thresholds(self): + def test_thresholds_too_few_thresholds(self):tests/segmentation/test_segstats.py (1)
51-77
: Consider adding edge case coverage with no transactions.While these tests correctly confirm transaction-level behavior, it could be valuable to add a scenario where one or more segments have zero rows (e.g., empty segment or no matching transactions) to ensure the code gracefully handles entirely missing segment data.
pyretailscience/segmentation/hml.py (1)
3-6
: Fix minor grammatical issue in docstring.“commonly know as the 80/20 rule” should be “commonly known as the 80/20 rule.”
-commonly know as the 80/20 rule +commonly known as the 80/20 rulepyretailscience/segmentation/segstats.py (2)
137-140
: Be mindful of memory usage for large DataFrames.Converting a large
pd.DataFrame
to an Ibis memtable can increase memory consumption. If the data is substantial, you might consider an alternative Ibis backend or a partitioned approach to avoid potential out-of-memory issues.
250-251
: Support for multi-column plots.Currently, the plot method raises an error if multiple segment columns exist. If multi-level plotting becomes a requirement, refactoring to handle multiple segment levels (e.g., stacked bar charts) may be beneficial.
pyretailscience/segmentation/rfm.py (2)
91-132
: Consider making the bin size configurable.The current implementation of
ibis.ntile(10)
is hardcoded to 10 bins. Some users may want more or fewer bins for their RFM segmentation use cases. Making this a parameter could increase flexibility and reusability.Below is a possible change illustrating how to introduce a
bins
parameter:-def _compute_rfm(self, df: ibis.Table, current_date: datetime.date) -> ibis.Table: +def _compute_rfm(self, df: ibis.Table, current_date: datetime.date, bins: int = 10) -> ibis.Table: ... rfm_scores = customer_metrics.mutate( - r_score=(ibis.ntile(10).over(window_recency)), - f_score=(ibis.ntile(10).over(window_frequency)), - m_score=(ibis.ntile(10).over(window_monetary)), + r_score=(ibis.ntile(bins).over(window_recency)), + f_score=(ibis.ntile(bins).over(window_frequency)), + m_score=(ibis.ntile(bins).over(window_monetary)), ) ...
133-144
: Minor doc alignment suggestion.The docstring for the
df
property mentions “segment names,” but the method actually returns the entire table with RFM metrics and indexes oncustomer_id
. Consider clarifying that the DataFrame includes R, F, M, and segment columns. Similarly, you may wish to document the auxiliaryfm_segment
column or remove it if it’s not needed.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
docs/analysis_modules.md
(4 hunks)docs/api/segmentation/hml.md
(1 hunks)docs/api/segmentation/rfm.md
(1 hunks)docs/api/segmentation/segstats.md
(1 hunks)docs/api/segmentation/threshold.md
(1 hunks)mkdocs.yml
(1 hunks)pyretailscience/segmentation/base.py
(1 hunks)pyretailscience/segmentation/hml.py
(1 hunks)pyretailscience/segmentation/rfm.py
(1 hunks)pyretailscience/segmentation/segstats.py
(1 hunks)pyretailscience/segmentation/threshold.py
(1 hunks)tests/analysis/test_segmentation.py
(0 hunks)tests/segmentation/test_hml.py
(1 hunks)tests/segmentation/test_rfm.py
(1 hunks)tests/segmentation/test_segstats.py
(1 hunks)tests/segmentation/test_threshold.py
(1 hunks)
💤 Files with no reviewable changes (1)
- tests/analysis/test_segmentation.py
🧰 Additional context used
🧬 Code Definitions (6)
tests/segmentation/test_rfm.py (1)
pyretailscience/segmentation/rfm.py (3)
RFMSegmentation
(34-143)df
(134-138)ibis_table
(141-143)
tests/segmentation/test_hml.py (6)
pyretailscience/segmentation/hml.py (1)
HMLSegmentation
(16-49)tests/segmentation/test_segstats.py (1)
base_df
(17-27)tests/segmentation/test_rfm.py (1)
base_df
(18-33)pyretailscience/segmentation/rfm.py (1)
df
(134-138)pyretailscience/segmentation/threshold.py (1)
df
(101-105)pyretailscience/segmentation/segstats.py (1)
df
(190-204)
tests/segmentation/test_segstats.py (1)
pyretailscience/segmentation/segstats.py (3)
SegTransactionStats
(23-299)df
(190-204)plot
(206-299)
pyretailscience/segmentation/hml.py (3)
pyretailscience/segmentation/threshold.py (2)
ThresholdSegmentation
(22-105)df
(101-105)pyretailscience/segmentation/rfm.py (1)
df
(134-138)pyretailscience/segmentation/segstats.py (1)
df
(190-204)
pyretailscience/segmentation/threshold.py (3)
pyretailscience/segmentation/base.py (1)
BaseSegmentation
(8-34)pyretailscience/segmentation/rfm.py (1)
df
(134-138)pyretailscience/segmentation/segstats.py (1)
df
(190-204)
pyretailscience/segmentation/rfm.py (2)
pyretailscience/segmentation/threshold.py (1)
df
(101-105)pyretailscience/segmentation/segstats.py (1)
df
(190-204)
🪛 GitHub Check: codecov/patch
pyretailscience/segmentation/base.py
[warning] 32-32: pyretailscience/segmentation/base.py#L32
Added line #L32 was not covered by tests
🔇 Additional comments (38)
pyretailscience/segmentation/threshold.py (1)
75-82
: Clarify behavior for "include_with_light" zero-spend customers
Currently there is no specific branch handling zero-spend customers whenzero_value_customers="include_with_light"
. Verify if allowing them to remain in the main dataset and receive a normal percentile rank is the intended behavior for your business logic.docs/api/segmentation/hml.md (1)
1-3
: Documentation references updated successfully
The file properly references thepyretailscience.segmentation.hml
module, which aligns with the new segmentation structure.docs/api/segmentation/rfm.md (1)
1-3
: Documentation looks good.The documentation for the RFM Segmentation module follows the project's documentation pattern with a concise header and proper module reference.
docs/api/segmentation/threshold.md (1)
1-3
: Documentation looks good.The documentation for the Threshold Segmentation module is consistent with other documentation patterns in the project.
docs/api/segmentation/segstats.md (1)
1-3
: Documentation looks good.The documentation for the SegTransactionStats Segmentation module maintains consistency with other segmentation documentation patterns.
pyretailscience/segmentation/base.py (1)
1-7
: Well-structured imports and module docstring.The module docstring clearly explains the purpose of this file, and the imports are organized appropriately.
tests/segmentation/test_hml.py (7)
1-24
: LGTM: Well-structured test fixture and setup.The imports, class declaration, and fixture setup look clean and follow good testing practices.
25-38
: Comprehensive test for "exclude" option.This test effectively verifies that zero-spend customers are excluded from the result DataFrame when the
zero_value_customers
parameter is set to "exclude".
39-50
: Comprehensive test for "include_with_light" option.The test appropriately verifies that zero-spend customers are included in the "Light" segment when the
zero_value_customers
parameter is set to "include_with_light".
51-62
: Comprehensive test for "separate_segment" option.The test correctly verifies that zero-spend customers are assigned to a "Zero" segment when the
zero_value_customers
parameter is set to "separate_segment".
63-68
: Good validation test for missing columns.This test ensures the class properly raises a ValueError when required columns are missing, which is essential for early error detection.
69-78
: Input data integrity check.Good practice to verify that the original DataFrame is not modified during processing, ensuring the method is non-destructive.
79-89
: Effective test for alternate value column.This test demonstrates the flexibility of the class by showing it can work with different column names for the value parameter.
tests/segmentation/test_threshold.py (6)
1-37
: LGTM: Well-structured basic segmentation test.The initial test case effectively verifies that customers are correctly segmented based on provided thresholds. The structure follows good testing practices.
38-49
: Error handling test for single customer scenario.Good test to verify that the class raises an appropriate error when there's only one customer in the DataFrame. This prevents segmentation issues with too few data points.
50-78
: Thorough testing of custom aggregation function.This test properly verifies that the aggregation function works correctly with a non-standard value column. The use of pandas testing utilities for DataFrame comparison is a good practice.
79-115
: Good validation of segment data merging.Comprehensive test to verify that segment information is correctly merged back into the original DataFrame using the
add_segment
method.
116-136
: Important test for duplicate customer IDs.This test verifies that the segmentation handles DataFrames with duplicate customer_id entries correctly, which is a common real-world scenario.
137-169
: Thorough error handling for threshold and segment validation.These tests ensure that the class properly validates the thresholds and segments parameters, raising appropriate errors when they don't meet requirements.
tests/segmentation/test_rfm.py (8)
1-50
: LGTM: Well-structured test fixtures for RFM segmentation.The test fixtures provide comprehensive test data and expected results for RFM segmentation. The use of explicit expected values for RFM scores is good practice.
51-64
: Comprehensive RFM segmentation test with current date.This test correctly verifies that RFM scores and segments are calculated properly based on a specified current date.
65-77
: Good error handling test for missing columns.This test ensures the class properly validates required columns and raises an appropriate error when they're missing.
78-92
: Testing single customer scenario.Important edge case test to verify that RFM segmentation works correctly with a single customer DataFrame.
93-114
: Multiple transactions handling test.This test verifies that the class correctly aggregates multiple transactions per customer when calculating RFM scores.
115-124
: Verification of complete customer processing.Good test to ensure that all customers are processed and receive RFM segments.
125-138
: Clever use of freezegun for date testing.This test effectively uses freezegun to mock the current date, allowing for predictable testing of the automatic date handling feature.
139-161
: Type validation tests.Thorough tests to verify that the class properly validates input types and raises appropriate errors for invalid inputs.
tests/segmentation/test_segstats.py (2)
29-49
: Comprehensive scenario validation.These tests effectively validate various aggregation outputs (revenue, transactions, and customers) for multi-segment data, including the “Total” row. Great work ensuring robust coverage of common scenarios.
153-321
: Impressive multiple-parameter coverage.These tests thoroughly exercise segment columns, plotting constraints, and extra aggregations. They validate correct error handling and proper data shape for advanced usage. No significant issues noted.
pyretailscience/segmentation/hml.py (1)
19-49
: Enhance test coverage for HMLSegmentation.Although this class extends
ThresholdSegmentation
and appears logically sound, consider adding dedicated tests to verify that Light, Medium, Heavy, and Zero spend segments are computed correctly in line with the 50/30/20 splits.pyretailscience/segmentation/segstats.py (1)
1-10
: Well-structured documentation updates.The updated docstring clearly communicates the purpose and capabilities of
SegTransactionStats
. The emphasis on both Pandas DataFrames and Ibis Tables is concise and aligns with the underlying logic.pyretailscience/segmentation/rfm.py (4)
1-24
: Great descriptive module docstring.The in-depth explanation of RFM segmentation logic, scoring tier details, and business implications is well-written and thorough. This provides valuable clarity to users and maintainers.
26-33
: Verify Python version compatibility.The line
datetime.datetime.now(datetime.UTC).date()
(line 87) relies ondatetime.UTC
, which is only available in Python 3.11+. If this library must support older Python versions, consider switching todatetime.timezone.utc
.Do you need to support Python versions earlier than 3.11? If so, please confirm and update accordingly.
34-46
: Well-structured class docstring.The class docstring neatly outlines the RFM approach and clearly communicates the scoring system. The explanation aligns well with the module docstring. No issues found here.
48-90
: Robust initialization checks.The constructor correctly enforces required columns and handles both pandas DataFrame and Ibis Table inputs. The early validation steps (lines 79-82) and type checks (lines 77, 89) help prevent runtime errors. Nicely done!
docs/analysis_modules.md (3)
684-684
: Approve HMLSegmentation Import Update
The import forHMLSegmentation
has been correctly updated to use the new module path (pyretailscience.segmentation.hml
). This aligns with the intended reorganization of the segmentation modules.
727-727
: Approve ThresholdSegmentation Import Update
The import forThresholdSegmentation
is now correctly pointing topyretailscience.segmentation.threshold
, which reflects the updated module structure.
822-822
: Approve RFMSegmentation Import Update
The import forRFMSegmentation
has been updated topyretailscience.segmentation.rfm
, in line with the new module organization.
for quantile, segment in zip(thresholds, segments, strict=True): | ||
case = case.when(df["ptile"] <= quantile, segment) | ||
|
||
case = case.end() | ||
|
||
df = df.mutate(segment_name=case).drop(["ptile"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
No fallback for percentile values above highest threshold
This block only handles cases where the percentile is <=
each threshold in turn. If the highest threshold is below 1.0
, some customers may not be assigned to any segment. Consider adding a fallback or requiring 1.0
as the final threshold.
a0179ca
to
1fc2b6e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/segmentation/test_threshold.py (1)
170-170
: Fix typo in test method nameThere's a duplicate "too" in the method name.
-def test_thresholds_too_too_few_thresholds(self): +def test_thresholds_too_few_thresholds(self):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
docs/analysis_modules.md
(4 hunks)docs/api/segmentation/hml.md
(1 hunks)docs/api/segmentation/rfm.md
(1 hunks)docs/api/segmentation/segstats.md
(1 hunks)docs/api/segmentation/threshold.md
(1 hunks)mkdocs.yml
(1 hunks)pyretailscience/segmentation/base.py
(1 hunks)pyretailscience/segmentation/hml.py
(1 hunks)pyretailscience/segmentation/rfm.py
(1 hunks)pyretailscience/segmentation/segstats.py
(1 hunks)pyretailscience/segmentation/threshold.py
(1 hunks)tests/analysis/test_segmentation.py
(0 hunks)tests/segmentation/test_hml.py
(1 hunks)tests/segmentation/test_rfm.py
(1 hunks)tests/segmentation/test_segstats.py
(1 hunks)tests/segmentation/test_threshold.py
(1 hunks)
💤 Files with no reviewable changes (1)
- tests/analysis/test_segmentation.py
🚧 Files skipped from review as they are similar to previous changes (6)
- docs/api/segmentation/hml.md
- docs/api/segmentation/rfm.md
- docs/api/segmentation/segstats.md
- pyretailscience/segmentation/hml.py
- docs/analysis_modules.md
- docs/api/segmentation/threshold.md
🧰 Additional context used
🧬 Code Definitions (3)
tests/segmentation/test_hml.py (6)
pyretailscience/segmentation/hml.py (1)
HMLSegmentation
(16-49)tests/segmentation/test_rfm.py (1)
base_df
(18-33)tests/segmentation/test_segstats.py (1)
base_df
(17-27)pyretailscience/segmentation/rfm.py (1)
df
(134-138)pyretailscience/segmentation/threshold.py (1)
df
(101-105)pyretailscience/segmentation/segstats.py (1)
df
(190-204)
tests/segmentation/test_segstats.py (1)
pyretailscience/segmentation/segstats.py (3)
SegTransactionStats
(23-299)df
(190-204)plot
(206-299)
pyretailscience/segmentation/threshold.py (3)
pyretailscience/segmentation/base.py (1)
BaseSegmentation
(8-34)pyretailscience/segmentation/rfm.py (1)
df
(134-138)pyretailscience/segmentation/segstats.py (1)
df
(190-204)
🪛 GitHub Check: codecov/patch
pyretailscience/segmentation/base.py
[warning] 32-32: pyretailscience/segmentation/base.py#L32
Added line #L32 was not covered by tests
🔇 Additional comments (32)
mkdocs.yml (1)
29-33
:✅ Verification successful
Consider verifying the new doc file references.
The newly introduced documentation paths for HML, RFM, SegTransactionStats, and Threshold Segmentation look consistent. However, please confirm that these files (
hml.md
,rfm.md
,segstats.md
, andthreshold.md
) exist and are properly recognized by MkDocs.Here is a script to check for these files:
🏁 Script executed:
#!/bin/bash # Verify the existence of new doc files under docs/api/segmentation fd hml.md docs/api/segmentation fd rfm.md docs/api/segmentation fd segstats.md docs/api/segmentation fd threshold.md docs/api/segmentationLength of output: 273
Doc file references have been verified.
The new documentation paths for HML, RFM, SegTransactionStats, and Threshold Segmentation have all been confirmed to exist under thedocs/api/segmentation
directory:
docs/api/segmentation/hml.md
docs/api/segmentation/rfm.md
docs/api/segmentation/segstats.md
docs/api/segmentation/threshold.md
No further changes are required.
pyretailscience/segmentation/threshold.py (1)
88-91
: No fallback for customers above the highest threshold.If the highest threshold is below 1.0, customers with a percentile rank above that threshold have no assigned segment. A fallback or enforced final threshold of 1.0 is recommended to handle these customers correctly.
pyretailscience/segmentation/base.py (1)
8-34
: Add unit test coverage for line 32.Static analysis indicates that this error-handling line (raise ValueError) is untested. Adding a negative test scenario that triggers a row mismatch after merging will help verify this logic and improve overall coverage.
Would you like me to provide a sample test case that forces a mismatch scenario for demonstration?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 32-32: pyretailscience/segmentation/base.py#L32
Added line #L32 was not covered by teststests/segmentation/test_hml.py (5)
1-89
: Well-structured test suite for HMLSegmentationThe test coverage for the HMLSegmentation class is comprehensive, effectively testing all three options for handling zero-value customers: "exclude", "include_with_light", and "separate_segment". The tests properly verify the segmentation logic and edge cases.
25-37
: Test case properly verifies zero customer exclusionGood test for ensuring zero-spend customers are completely excluded from the result DataFrame when the "exclude" option is used.
64-67
: Thorough validation of error handlingThe test properly verifies that a ValueError is raised when required columns are missing. This helps ensure the class fails fast with clear error messages rather than producing unexpected results.
70-77
: Important verification of non-mutationThis test is critical for ensuring that the segmentation process doesn't modify the original DataFrame, which is essential for maintaining data integrity when the same DataFrame is used in multiple operations.
79-88
: Good test for alternate column functionalityTesting with an alternate value column ensures the flexibility of the implementation to work with different column names, which is important for adapting to various data structures.
tests/segmentation/test_threshold.py (3)
1-188
: Comprehensive test suite for ThresholdSegmentationThe test suite covers the core functionality of ThresholdSegmentation, including proper segmentation, aggregation functions, error handling, and edge cases. The tests are well-structured and verify both expected behavior and proper error handling.
38-48
: Correctly tests single customer limitationThis test verifies that attempting to use ThresholdSegmentation with a DataFrame containing only one customer raises a ValueError. This is an important limitation to test for, as segmentation generally requires multiple data points to be meaningful.
50-77
: Thorough test of custom aggregation functionsThis test effectively validates that the ThresholdSegmentation class correctly applies custom aggregation functions (in this case 'nunique') for segmenting on product_id. The test covers a complex case with multiple products per customer and verifies the expected segmentation outcome.
tests/segmentation/test_rfm.py (4)
1-161
: Comprehensive test suite for RFMSegmentationThe test suite thoroughly covers various aspects of RFM segmentation, including correct calculation of scores, handling of edge cases, date handling, and validation of input types. The use of pytest fixtures and freezegun for consistent testing is particularly effective.
35-49
: Well-structured fixture for expected valuesThe expected_df fixture provides a clear reference point for RFM segmentation tests, making the assertions more readable and maintainable. This is a good practice for tests that need to compare complex DataFrame structures.
125-137
: Good use of freezegun for date testingUsing freezegun to control the current date ensures consistent test results, regardless of when the tests are run. This is crucial for date-dependent functionality like RFM segmentation.
154-160
: Important test for ibis.Table compatibilityThis test verifies that the RFMSegmentation class properly handles ibis.Table objects, which is an important feature for users working with large datasets where Ibis can provide performance benefits.
tests/segmentation/test_segstats.py (5)
13-151
: Well-structured tests for segment statistics calculationThe TestCalcSegStats class thoroughly tests the segment statistics calculation logic, including revenue, transactions, and customers per segment. The tests cover various scenarios like single segments, zero net units, and excluding total rows.
153-320
: Comprehensive testing of SegTransactionStats functionalityThe TestSegTransactionStats class provides good coverage of the class's functionality, including error handling, multiple segment columns, plotting limitations, and custom aggregations. The tests for the extra_aggs parameter are particularly thorough.
210-227
: Important validation of plotting limitationsThis test verifies that attempting to plot with multiple segment columns raises an appropriate ValueError, which is essential for providing clear error messages to users who might attempt this unsupported operation.
229-288
: Thorough testing of extra aggregations featureThe tests for the extra_aggs parameter cover both single and multiple custom aggregations, verifying that the results match expected values. This is crucial for ensuring that this powerful feature works correctly.
290-320
: Good error handling tests for invalid aggregationsThese tests verify that appropriate errors are raised for invalid columns and functions in extra_aggs, ensuring that users receive clear error messages rather than unexpected results.
pyretailscience/segmentation/segstats.py (7)
1-10
: Clear and focused module documentation.The updated docstring effectively communicates the module's purpose, focusing solely on transaction statistics by segment. This aligns well with the modular restructuring objectives of the PR.
19-19
: Import statements have been appropriately streamlined.The import statements have been simplified to focus only on dependencies needed for SegTransactionStats, removing references to the segmentation classes that have been moved to separate modules.
23-85
: The SegTransactionStats class implementation is well-structured.The class initialization and validation logic are robust, with proper error handling for missing columns and invalid input formats. The class effectively supports both Pandas DataFrames and Ibis Tables.
86-114
: Column order management is well-implemented.The static method for determining column order based on data availability is clean and maintainable, with good documentation explaining its purpose.
115-187
: Segment statistics calculation is robust and well-implemented.The implementation handles various edge cases appropriately:
- Converting DataFrame to Ibis table when needed
- Validating input types
- Handling optional quantity columns
- Supporting custom aggregations
- Including total metrics when requested
- Computing derived metrics correctly
189-204
: The df property implementation is efficient and flexible.The property correctly handles column ordering, including both standard metrics and any custom aggregations specified during initialization.
206-299
: The plot method is comprehensive with good parameter validation.The plotting functionality provides flexible options for visualization while maintaining appropriate parameter validation. The method effectively handles orientation, sorting, axis labels, and formatting.
pyretailscience/segmentation/rfm.py (5)
1-24
: Excellent comprehensive module documentation.The docstring provides a thorough explanation of RFM segmentation, including its purpose, benefits, and the scoring methodology. This helps users understand the module's functionality and how to interpret the results.
26-32
: Import statements are appropriately minimal.The imports include only the necessary modules and libraries needed for the RFM segmentation functionality.
34-47
: Clear class-level documentation for RFMSegmentation.The class docstring effectively explains how customers are scored on the three dimensions (recency, frequency, monetary) and how the RFM segment is calculated.
93-132
: RFM computation logic is well-implemented.The
_compute_rfm
method effectively:
- Calculates the base metrics (recency, frequency, monetary)
- Creates appropriate window functions for ranking
- Computes percentile scores using ntile
- Constructs the final RFM and FM segments
Using secondary sorting by customer_id in window functions is a good practice to ensure deterministic results when there are ties in the primary metric.
133-144
: Accessor properties are appropriately implemented.The class provides convenient properties to access the segmentation results in both DataFrame and Ibis table formats. The DataFrame is lazily executed only when needed, which is efficient.
segments: dict[any, str], | ||
value_col: str | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Clarify the usage of the segments dictionary.
The docstring states that “keys are segment IDs and values are segment names,” but the loop iterates over the dictionary keys as if they are the displayed segment names. This may cause confusion or unintended labeling.
Consider either:
- Changing the loop to use the dictionary values as segment labels, or
- Updating the docstring to reflect that the dictionary keys will be used as the segment labels.
def __init__(self, df: pd.DataFrame | ibis.Table, current_date: str | datetime.date | None = None) -> None: | ||
"""Initializes the RFM segmentation process. | ||
|
||
Args: | ||
df (pd.DataFrame | ibis.Table): A DataFrame or Ibis table containing transaction data. | ||
Must include the following columns: | ||
- customer_id | ||
- transaction_date | ||
- unit_spend | ||
- transaction_id | ||
current_date (Optional[Union[str, datetime.date]]): The reference date for calculating recency. | ||
Can be a string (format: "YYYY-MM-DD"), a date object, or None (defaults to the current system date). | ||
|
||
Raises: | ||
ValueError: If the dataframe is missing required columns. | ||
TypeError: If the input data is not a pandas DataFrame or an Ibis Table. | ||
""" | ||
cols = ColumnHelper() | ||
required_cols = [ | ||
cols.customer_id, | ||
cols.transaction_date, | ||
cols.unit_spend, | ||
cols.transaction_id, | ||
] | ||
if isinstance(df, pd.DataFrame): | ||
df = ibis.memtable(df) | ||
elif not isinstance(df, ibis.Table): | ||
raise TypeError("df must be either a pandas DataFrame or an Ibis Table") | ||
|
||
missing_cols = set(required_cols) - set(df.columns) | ||
if missing_cols: | ||
error_message = f"Missing required columns: {missing_cols}" | ||
raise ValueError(error_message) | ||
|
||
if isinstance(current_date, str): | ||
current_date = datetime.date.fromisoformat(current_date) | ||
elif current_date is None: | ||
current_date = datetime.datetime.now(datetime.UTC).date() | ||
elif not isinstance(current_date, datetime.date): | ||
raise TypeError("current_date must be a string in 'YYYY-MM-DD' format, a datetime.date object, or None") | ||
|
||
self.table = self._compute_rfm(df, current_date) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Initialization method has thorough validation.
The __init__
method includes comprehensive validation of input parameters and handles different input types appropriately.
However, there's a potential Python version compatibility issue:
🏁 Script executed:
#!/bin/bash
# Description: Check Python version requirement for datetime.UTC usage
# Check if the repository specifies a minimum Python version
grep -r "python_requires\|python_version" setup.py pyproject.toml requirements*.txt | head -5
# Check if the code uses datetime.UTC elsewhere
rg "datetime\.UTC"
Length of output: 317
Attention: Python Version Compatibility for datetime.UTC
- The initialization method’s validations are thorough and correctly handle alternative input types.
- However, the use of
datetime.UTC
inpyretailscience/segmentation/rfm.py
(line 58) may lead to compatibility issues on Python versions earlier than 3.11. - The repository configuration (e.g., in
pyproject.toml
) does not enforce a minimum Python version, so consider either:- Updating the project’s Python version requirement to 3.11 or higher, or
- Replacing
datetime.UTC
with the more broadly supporteddatetime.timezone.utc
for compatibility with older Python versions.
1fc2b6e
to
835a8e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pyretailscience/segmentation/segstats.py (1)
23-24
: Consider expanding the class-level docstring.While the implementation and
__init__
docstring are detailed, the class-level docstring is quite brief. Consider expanding it to match the comprehensiveness of the module-level docstring.- """Calculates transaction statistics by segment.""" + """Calculates and visualizes transaction statistics by segment. + + This class computes transaction-based statistics grouped by segment columns, + including total spend, unique customers, transactions per customer, and more. + It supports both Pandas DataFrames and Ibis Tables as input data formats and + provides visualization capabilities for segment-based statistics. + """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
docs/analysis_modules.md
(4 hunks)docs/api/segmentation/hml.md
(1 hunks)docs/api/segmentation/rfm.md
(1 hunks)docs/api/segmentation/segstats.md
(1 hunks)docs/api/segmentation/threshold.md
(1 hunks)mkdocs.yml
(1 hunks)pyretailscience/segmentation/base.py
(1 hunks)pyretailscience/segmentation/hml.py
(1 hunks)pyretailscience/segmentation/rfm.py
(1 hunks)pyretailscience/segmentation/segstats.py
(1 hunks)pyretailscience/segmentation/threshold.py
(1 hunks)tests/analysis/test_segmentation.py
(0 hunks)tests/segmentation/test_hml.py
(1 hunks)tests/segmentation/test_rfm.py
(1 hunks)tests/segmentation/test_segstats.py
(1 hunks)tests/segmentation/test_threshold.py
(1 hunks)
💤 Files with no reviewable changes (1)
- tests/analysis/test_segmentation.py
🚧 Files skipped from review as they are similar to previous changes (10)
- docs/api/segmentation/rfm.md
- docs/api/segmentation/threshold.md
- docs/api/segmentation/hml.md
- docs/api/segmentation/segstats.md
- tests/segmentation/test_threshold.py
- tests/segmentation/test_hml.py
- pyretailscience/segmentation/hml.py
- tests/segmentation/test_rfm.py
- docs/analysis_modules.md
- mkdocs.yml
🧰 Additional context used
🧬 Code Definitions (2)
tests/segmentation/test_segstats.py (1)
pyretailscience/segmentation/segstats.py (3)
SegTransactionStats
(23-299)df
(190-204)plot
(206-299)
pyretailscience/segmentation/threshold.py (3)
pyretailscience/segmentation/base.py (1)
BaseSegmentation
(8-34)pyretailscience/segmentation/rfm.py (1)
df
(134-138)pyretailscience/segmentation/segstats.py (1)
df
(190-204)
🪛 GitHub Check: codecov/patch
pyretailscience/segmentation/base.py
[warning] 32-32: pyretailscience/segmentation/base.py#L32
Added line #L32 was not covered by tests
🔇 Additional comments (12)
pyretailscience/segmentation/segstats.py (2)
1-10
: Excellent docstring improvement!The updated module docstring clearly explains the purpose of the module and the functionality provided by the
SegTransactionStats
class. It effectively communicates the key capabilities, such as computing transaction-based statistics by segment and supporting both Pandas DataFrames and Ibis Tables.
19-19
: Import path update looks good.The updated import for
ColumnHelper
frompyretailscience.options
is correctly implemented.tests/segmentation/test_segstats.py (5)
1-321
: Well-structured comprehensive test suite!The test file provides excellent coverage of the
SegTransactionStats
class functionality. The tests are well-organized into two classes that separately test the internal_calc_seg_stats
method and the public-facing class functionality.
13-151
: Thorough testing of the calculation logic.The
TestCalcSegStats
class thoroughly tests the calculation functionality with various scenarios:
- Transaction-item level calculations
- Transaction level calculations
- Single segment handling
- Zero net units edge case
- Total row exclusion option
This comprehensive approach ensures the core calculation logic is robust.
153-321
: Effective validation of class behavior and error handling.The
TestSegTransactionStats
class effectively tests:
- Input validation by checking errors on missing required columns
- Multi-segment column functionality
- Plot method restrictions
- Extra aggregations functionality and error handling
These tests provide confidence in the class's robustness when handling various inputs and edge cases.
102-125
: Excellent edge case handling for zero net units.The test for handling zero net units is particularly valuable as it ensures the code correctly handles division-by-zero scenarios, which could otherwise lead to runtime errors.
229-289
: Well-structured test for extra aggregations.The
test_extra_aggs_functionality
method effectively validates both single and multiple extra aggregations. The use of descriptive constant variables for expected values enhances readability and makes the test purpose clear.pyretailscience/segmentation/base.py (1)
11-22
: Consider verifying the existence of "segment_name"
This method assumesself.df["segment_name"]
always exists. Ifself.df
is uninitialized or missing the "segment_name" column, the merge will fail. Ensure thatself.df
is properly populated with a "segment_name" column before callingadd_segment
.Would you like to verify the presence of "segment_name" in
self.df
before merging and add a safeguard for potential KeyError?pyretailscience/segmentation/rfm.py (2)
87-87
: Compatibility warning for Python versions < 3.11
Usingdatetime.UTC
can cause compatibility issues for those on older Python versions. To maintain broader support, consider usingdatetime.timezone.utc
or specifying a minimum Python version of 3.11.
112-126
:❓ Verification inconclusive
Verify ranking logic for recency
The docstring says “9 represents the best customer” for Recency, yet the code orders by ascending recency (lowest recency first). This might place the most recent purchasers in the lowest ntile value. Confirm the intended behavior and adjust if necessary (e.g., by reversing the ordering or reinterpreting the ntile scale).
🏁 Script executed:
#!/bin/bash # We can verify how ibis.ntile(10) assigns ranks with ascending vs descending order. # This script searches for usage of ntile in the codebase to confirm no contradictory logic. rg 'ibis\.ntile\(10\)'Length of output: 312
Clarify Recency Ranking Intent
After verifying the ntile usage in the code, it’s clear that the three metrics (recency, frequency, monetary) are partitioned using ascending sort orders. Given that a lower recency (i.e., fewer days since the last purchase) should intuitively indicate a better customer, this implementation would assign lower ntile values—contrary to the docstring which states “9 represents the best customer.” Please confirm whether the intended behavior is to have customers with the smallest recency values receive a higher ntile score (perhaps by reversing the ordering for recency or by adjusting the ntile scale) or if the docstring should be modified to reflect the current implementation.
- File:
pyretailscience/segmentation/rfm.py
- Lines: 112-126
- Action Needed: Verify and reconcile the mismatch between the documented best-customer score for recency and the current ascending order logic.
pyretailscience/segmentation/threshold.py (2)
35-47
: Implement “include_with_light” zero-value option
The code handles the"exclude"
and"separate_segment"
cases but not"include_with_light"
. This may lead to unexpected behavior if a user passes"include_with_light"
.
88-90
: No fallback segment for percentile values above the highest threshold
If the highest threshold is below1.0
, customers above that threshold remain unassigned. Consider adding a final.else_()
branch or requiring1.0
as the last threshold.
@@ -26,7 +26,12 @@ nav: | |||
- Haversine Distance: api/analysis/haversine.md | |||
- Product Association: api/analysis/product_association.md | |||
- Revenue Tree: api/analysis/revenue_tree.md | |||
- Segmentation: api/analysis/segmentation.md | |||
- Segmentation: | |||
- Base Segmentation: api/segmentation/base.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this file exist? I don't see it in the PR
835a8e1
to
6a4058b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pyretailscience/segmentation/threshold.py (1)
53-54
: No fallback for percentile values above the highest threshold
If the largest threshold is below 1.0, some customers may not be assigned to any segment. Consider ensuring1.0
is the final threshold or adding a final catch-all.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
docs/analysis_modules.md
(4 hunks)docs/api/segmentation/base.md
(1 hunks)docs/api/segmentation/hml.md
(1 hunks)docs/api/segmentation/rfm.md
(1 hunks)docs/api/segmentation/segstats.md
(1 hunks)docs/api/segmentation/threshold.md
(1 hunks)mkdocs.yml
(1 hunks)pyretailscience/segmentation/base.py
(1 hunks)pyretailscience/segmentation/hml.py
(1 hunks)pyretailscience/segmentation/rfm.py
(1 hunks)pyretailscience/segmentation/segstats.py
(1 hunks)pyretailscience/segmentation/threshold.py
(1 hunks)tests/analysis/test_segmentation.py
(0 hunks)tests/segmentation/test_hml.py
(1 hunks)tests/segmentation/test_rfm.py
(1 hunks)tests/segmentation/test_segstats.py
(1 hunks)tests/segmentation/test_threshold.py
(1 hunks)
💤 Files with no reviewable changes (1)
- tests/analysis/test_segmentation.py
✅ Files skipped from review due to trivial changes (1)
- docs/api/segmentation/base.md
🚧 Files skipped from review as they are similar to previous changes (10)
- docs/api/segmentation/threshold.md
- docs/api/segmentation/segstats.md
- docs/api/segmentation/rfm.md
- docs/api/segmentation/hml.md
- tests/segmentation/test_threshold.py
- tests/segmentation/test_hml.py
- pyretailscience/segmentation/hml.py
- mkdocs.yml
- tests/segmentation/test_rfm.py
- docs/analysis_modules.md
🧰 Additional context used
🧬 Code Definitions (3)
tests/segmentation/test_segstats.py (1)
pyretailscience/segmentation/segstats.py (3)
SegTransactionStats
(23-299)df
(190-204)plot
(206-299)
pyretailscience/segmentation/threshold.py (3)
pyretailscience/segmentation/base.py (1)
BaseSegmentation
(8-34)pyretailscience/segmentation/rfm.py (1)
df
(134-138)pyretailscience/segmentation/segstats.py (1)
df
(190-204)
pyretailscience/segmentation/rfm.py (2)
pyretailscience/segmentation/threshold.py (1)
df
(101-105)pyretailscience/segmentation/segstats.py (1)
df
(190-204)
🪛 GitHub Check: codecov/patch
pyretailscience/segmentation/base.py
[warning] 32-32: pyretailscience/segmentation/base.py#L32
Added line #L32 was not covered by tests
🔇 Additional comments (22)
pyretailscience/segmentation/base.py (3)
8-9
: Well-structured base class
Your docstring and class structure provide a concise and user-friendly foundation for specialized segmentation classes.
11-22
: Merge-based logic verification
The logic for merging segment names appears clear. However, ensure that the caller setsself.df
with a validsegment_name
column before invokingadd_segment
.Would you confirm that the codebase has tests ensuring a
ValueError
is raised ifself.df
lackssegment_name
?
32-32
: Unit test coverage recommendation
The line raisingValueError
is not covered by tests.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 32-32: pyretailscience/segmentation/base.py#L32
Added line #L32 was not covered by testspyretailscience/segmentation/rfm.py (3)
1-24
: Comprehensive documentation
The docstring provides a thorough overview of RFM segmentation, covering methodology and benefits clearly.
87-87
: Python version compatibility concern
datetime.UTC
is only available in Python 3.11 and later. Consider usingdatetime.timezone.utc
for broader compatibility.If you intend to support Python versions below 3.11, please confirm that this usage won't break in those environments.
133-139
: Lazy-loading the DataFrame
Your property-based approach to executing the data and caching the result is efficient. Just be mindful of memory usage for large datasets.pyretailscience/segmentation/threshold.py (1)
95-96
: Ensure union schema alignment
When unitingdf
andzero_df
, confirm both tables share the same schema. A mismatch in columns or data types might cause errors in future changes.pyretailscience/segmentation/segstats.py (3)
1-10
: Improved module documentationThe module now has a clear, comprehensive docstring that accurately describes its purpose and capabilities. The documentation clearly explains that this module focuses on transaction statistics by segment, supporting both Pandas DataFrames and Ibis Tables.
19-19
: Import path updated correctlyThe import path for
ColumnHelper
has been updated as part of the segmentation module restructuring.
23-24
: SegTransactionStats class is properly maintainedThis class has been correctly maintained as the sole focus of this module, aligning with the PR objective of splitting segmentation logic into modular files.
tests/segmentation/test_segstats.py (12)
1-11
: Well-structured test moduleThe test module has clear imports and appropriate organization. Using
ColumnHelper
for consistent column access is a good practice for maintainability.
13-28
: Good test fixture setupThe
TestCalcSegStats
class with thebase_df
fixture provides a solid foundation for testing with sample data that includes all required columns. This approach promotes code reuse across multiple tests.
29-50
: Comprehensive basic functionality testThis test thoroughly verifies the core calculation logic, ensuring that revenue, transactions, and customers are correctly calculated per segment. The expected values are well-defined and the verification uses proper assertion methods.
51-77
: Additional test for dataframes without quantity columnThis test appropriately verifies that the class works correctly when the input dataframe doesn't include the unit quantity column, which is an important edge case to cover.
78-101
: Single segment handling testGood test for verifying behavior when all rows belong to the same segment, an important edge case that could otherwise lead to unexpected behavior.
102-125
: Zero net units handling testThis test verifies correct behavior when a segment has zero net units, properly checking that:
- The price per unit is set to NaN (as division by zero would be undefined)
- The units per transaction is calculated as zero
- Other metrics remain unaffected
This ensures robust handling of edge cases in real-world data.
126-151
: Total row exclusion testGood verification of the
calc_total
parameter functionality, ensuring the total row is excluded when specified and that customer percentages are adjusted correctly.
153-164
: Error handling test for missing required columnsThis test correctly verifies that the class raises an appropriate error when required columns are missing, which is crucial for providing clear feedback to users.
165-209
: Multiple segment columns testThis test thoroughly verifies the class's ability to handle multiple segment columns, which is an important feature for complex segmentation analysis. The test correctly:
- Verifies that both segment columns are present in the result
- Checks that only actual combinations (not all possible combinations) are included
- Validates the aggregation calculations across the segment combinations
210-228
: Plot limitations testGood test to validate that the plotting functionality correctly raises an error when multiple segment columns are provided, as this is a documented limitation of the implementation.
229-289
: Extra aggregations functionality testComprehensive test for the
extra_aggs
parameter, verifying both single and multiple additional aggregations. The test uses well-defined expected values and thorough assertions to ensure the feature works correctly.
290-320
: Error handling tests for invalid extra aggregationsThese tests verify appropriate error handling for:
- Invalid column names in extra aggregations
- Invalid aggregation functions
Both tests ensure users receive clear error messages when configuration issues occur, improving the developer experience.
zero_value_customers: Literal["separate_segment", "exclude", "include_with_light"] = "separate_segment", | ||
) -> None: | ||
"""Segments customers based on user-defined thresholds and segments. | ||
|
||
Args: | ||
df (pd.DataFrame | ibis.Table): A dataframe with the transaction data. The dataframe must contain a customer_id column. | ||
thresholds (List[float]): The percentile thresholds for segmentation. | ||
segments (List[str]): A list of segment names for each threshold. | ||
value_col (str, optional): The column to use for the segmentation. Defaults to get_option("column.unit_spend"). | ||
agg_func (str, optional): The aggregation function to use when grouping by customer_id. Defaults to "sum". | ||
zero_segment_name (str, optional): The name of the segment for customers with zero spend. Defaults to "Zero". | ||
zero_value_customers (Literal["separate_segment", "exclude", "include_with_light"], optional): How to handle | ||
customers with zero spend. Defaults to "separate_segment". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Missing handling for "include_with_light"
The code does not handle the include_with_light
option for zero spend customers. This could lead to inconsistent outcomes or errors.
User description
feat: split segmentations.py file
PR Type
Enhancement, Documentation, Tests
Description
Refactored segmentation logic into modular files for better organization.
base.py
for base segmentation functionality.hml.py
for HML segmentation logic.rfm.py
for RFM segmentation logic.threshold.py
for threshold-based segmentation.segstats.py
for transaction statistics by segment.Updated documentation to reflect new segmentation module structure.
HMLSegmentation
,RFMSegmentation
,ThresholdSegmentation
, andSegTransactionStats
.analysis_modules.md
to use new imports.Added comprehensive unit tests for all segmentation modules.
HMLSegmentation
,RFMSegmentation
,ThresholdSegmentation
, andSegTransactionStats
.Updated
mkdocs.yml
to include new segmentation API documentation.Changes walkthrough 📝
5 files
Added base class for segmentation logic
Added HML segmentation logic
Added RFM segmentation logic
Added threshold-based segmentation logic
Added transaction statistics by segment logic
4 files
Added unit tests for HML segmentation
Added unit tests for RFM segmentation
Added unit tests for threshold segmentation
Added unit tests for transaction statistics by segment
6 files
Updated examples to reflect new segmentation module structure
Added API documentation for HML segmentation
Added API documentation for RFM segmentation
Added API documentation for SegTransactionStats
Added API documentation for threshold segmentation
Updated navigation to include new segmentation API documentation
2 files
Summary by CodeRabbit
New Features
Documentation
Refactor
Tests