-
Notifications
You must be signed in to change notification settings - Fork 1
RFM Segmentation #140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFM Segmentation #140
Conversation
WalkthroughThis pull request introduces RFM segmentation functionality and related documentation. A new "RFM Segmentation" section has been added to the docs, detailing the segmentation metrics and providing an example code snippet. A new Changes
Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAttention: Patch coverage is
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
docs/analysis_modules.md (2)
794-795
: Remove consecutive blank linesAccording to the markdownlint rule (MD012), multiple consecutive blank lines are not allowed. Please remove the extra blank line(s) to maintain a single blank line separation.
794 # remove extra blank line -
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
794-794: Multiple consecutive blank lines
Expected: 1; Actual: 2(MD012, no-multiple-blanks)
807-807
: Shorten the line to comply with 120-character limitLine length exceeds the recommended limit of 120 characters (MD013). Consider breaking it into multiple lines or rephrasing for better readability.
- Each metric is typically scored on a scale, and the combined RFM score helps businesses identify **loyal customers, at-risk customers, and high-value buyers**. + Each metric is typically scored on a scale, and the combined RFM score + helps businesses identify **loyal customers, at-risk customers, and + high-value buyers**.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
807-807: Line length
Expected: 120; Actual: 159(MD013, line-length)
pyretailscience/analysis/segmentation.py (1)
500-543
: Enhance code coverage for TypeError branchLine 513 raises a
TypeError
if the input is not a DataFrame or Ibis table, but there's no test covering this scenario. Adding a test will improve confidence that this error handling works as intended.Would you like help adding a test to cover this branch? For example:
+ def test_rfms_with_invalid_input_type(self): + invalid_data = {"some": "dict"} # not a pd.DataFrame or ibis.Table + with pytest.raises(TypeError): + RFMSegmentation(df=invalid_data, current_date="2025-03-17")🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 513-513: pyretailscience/analysis/segmentation.py#L513
Added line #L513 was not covered by teststests/analysis/test_segmentation.py (1)
492-492
: Use @pytest.fixture instead of @pytest.fixture()Ruff (PT001) suggests removing parentheses when declaring a fixture.
- @pytest.fixture() + @pytest.fixture🧰 Tools
🪛 Ruff (0.8.2)
492-492: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
docs/analysis_modules.md
(1 hunks)pyretailscience/analysis/segmentation.py
(2 hunks)tests/analysis/test_segmentation.py
(2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
pyretailscience/analysis/segmentation.py (1)
tests/analysis/test_revenue_tree.py (1) (1)
cols
(16:18)
🪛 GitHub Check: codecov/patch
pyretailscience/analysis/segmentation.py
[warning] 513-513: pyretailscience/analysis/segmentation.py#L513
Added line #L513 was not covered by tests
🪛 Ruff (0.8.2)
tests/analysis/test_segmentation.py
492-492: Use @pytest.fixture
over @pytest.fixture()
Remove parentheses
(PT001)
🪛 markdownlint-cli2 (0.17.2)
docs/analysis_modules.md
794-794: Multiple consecutive blank lines
Expected: 1; Actual: 2
(MD012, no-multiple-blanks)
807-807: Line length
Expected: 120; Actual: 159
(MD013, line-length)
🔇 Additional comments (10)
docs/analysis_modules.md (1)
796-842
: RFM segmentation documentation looks greatThis new section comprehensively explains the Recency, Frequency, and Monetary methodology and provides a clear example. Nice work!
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
807-807: Line length
Expected: 120; Actual: 159(MD013, line-length)
pyretailscience/analysis/segmentation.py (2)
449-461
: Class docstring is clear and informativeThe descriptive docstring effectively summarizes the RFM segmentation approach. No issues noted.
545-550
: Property definition looks goodThe
df
property is straightforward and consistent with the codebase’s pattern for delayed execution. No concerns here.tests/analysis/test_segmentation.py (7)
489-490
: New test class is well-structuredIntroducing
TestRFMSegmentation
is a great addition for verifying the new RFM functionality.
510-527
: RFM segmentation accuracy test is thoroughThis test checks both the RFM scores and the final segment codes. Nice coverage.
528-540
: Proper handling of missing columnsGood job verifying that a ValueError is raised when required columns are omitted.
541-555
: Single customer testVerifying the RFM calculation for a lone customer is a great edge-case check.
556-577
: Multiple transactions per customerEnsuring correct aggregation for repeated entries within the same customer is essential. The test covers it well.
578-587
: All customers RFM calculationThis test confirms that every record is transformed appropriately, which is crucial for accurate segmentation.
588-604
: No date scenario is correctly handledThis validates falling back to the system date for recency calculations, ensuring robust default behavior.
def __init__(self, df: pd.DataFrame | ibis.Table, current_date: str | None = None) -> None: | ||
"""Initializes the RFM segmentation process. | ||
|
||
Args: | ||
df (pd.DataFrame | ibis.Table): A DataFrame or Ibis table containing transaction data. | ||
Must include the following columns: | ||
- customer_id | ||
- transaction_date | ||
- unit_spend | ||
- transaction_id | ||
current_date (Optional[str]): The reference date for calculating recency (format: "YYYY-MM-DD"). | ||
If not provided, the current system date will be used. | ||
|
||
Raises: | ||
ValueError: If the dataframe is missing required columns. | ||
TypeError: If the input data is not a pandas DataFrame or an Ibis Table. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify Python version for datetime.UTC usage
The datetime.UTC
attribute is introduced in Python 3.11. If your environment is running a Python version below 3.11, this may cause errors. Confirm or update the environment to ensure compatibility.
🌐 Web query:
Is datetime.UTC available in Python versions below 3.11?
💡 Result:
No, the datetime.UTC
alias is not available in Python versions below 3.11. It was introduced in Python 3.11 as an alias for datetime.timezone.utc
to improve code readability[1][4]. Attempting to use datetime.UTC
in earlier Python versions (like 3.9 or 3.10) results in an AttributeError
since the attribute doesn't exist[1][3].
For backward compatibility with Python versions older than 3.11, use datetime.timezone.utc
instead[1][3][5]. Both alternatives work identically, but datetime.timezone.utc
has been available since Python 3.2[5].
Example of compatible code:
from datetime import datetime, timezone
# Works on all Python 3.2+
utc_now = datetime.now(timezone.utc)
The deprecation of datetime.utcnow()
in Python 3.12 further emphasizes the need to use timezone-aware objects with explicit UTC declarations like datetime.now(timezone.utc)
[1][3][7].
Citations:
- 1: https://manganiello.blog/wheres-my-time-again
- 2: https://pypi.org/project/DateTime/
- 3: https://til.simonwillison.net/python/utc-warning-fix
- 4: https://docs.python.org/3/whatsnew/3.11.html
- 5: https://www.pythonmorsels.com/converting-to-utc-time/
- 6: [BUG] replace deprecated datetime.utcnow calls mlflow/mlflow#11414
- 7: https://www.andreagrandi.it/posts/python-now-time-to-migrate-from-utcnow/
- 8: https://docs.python.org/3/library/datetime.html
- 9: https://discuss.python.org/t/deprecating-utcnow-and-utcfromtimestamp/26221
- 10: datetime utcnow deprecation leads to type confusion python/cpython#105544
Python UTC Compatibility Check — Action Required
The datetime.UTC
alias is only available starting with Python 3.11. If the application runs under an earlier version (e.g., 3.9 or 3.10), using datetime.UTC
will trigger an AttributeError
. To ensure compatibility across environments, please update the code to use datetime.timezone.utc
for Python versions below 3.11 or update your runtime environment to Python 3.11+.
- Confirm the Python version in your deployment environment.
- Replace
datetime.UTC
withdatetime.timezone.utc
if backward compatibility is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
pyretailscience/analysis/segmentation.py (1)
474-505
:⚠️ Potential issuePython compatibility issue with datetime.UTC
The code uses
datetime.UTC
which is only available in Python 3.11+. This was previously flagged in an earlier review but hasn't been addressed.Replace
datetime.UTC
withdatetime.timezone.utc
for better compatibility with Python versions below 3.11:current_date = ( - datetime.date.fromisoformat(current_date) if current_date else datetime.datetime.now(datetime.UTC).date() + datetime.date.fromisoformat(current_date) if current_date else datetime.datetime.now(datetime.timezone.utc).date() )
🧹 Nitpick comments (1)
tests/analysis/test_segmentation.py (1)
607-643
: Consider adding more specific assertions for single customer and multiple transaction testsWhile the tests correctly verify that an RFM segment is calculated, they only assert that the segment equals 0, which doesn't fully validate the calculation logic. Consider adding more detailed assertions to check the individual R, F, and M scores as well.
For example:
result_df = rfm_segmentation.df assert result_df.loc[1, "rfm_segment"] == 0 + # Verify individual R, F, M scores + assert result_df.loc[1, "r_score"] == 0 + assert result_df.loc[1, "f_score"] == 0 + assert result_df.loc[1, "m_score"] == 0
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
pyretailscience/analysis/segmentation.py
(2 hunks)tests/analysis/test_segmentation.py
(2 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
pyretailscience/analysis/segmentation.py (1)
tests/analysis/test_revenue_tree.py (1) (1)
cols
(16:18)
tests/analysis/test_segmentation.py (2)
pyretailscience/analysis/segmentation.py (6) (6)
HMLSegmentation
(152:185)RFMSegmentation
(458:558)ThresholdSegmentation
(66:149)df
(145:149)df
(346:360)df
(554:558)tests/analysis/test_revenue_tree.py (1) (1)
cols
(16:18)
🪛 Ruff (0.8.2)
tests/analysis/test_segmentation.py
558-558: Use @pytest.fixture
over @pytest.fixture()
Remove parentheses
(PT001)
🪛 GitHub Actions: Pre-commit
tests/analysis/test_segmentation.py
[error] 1-1: ruff: PT001 (pytest-fixture-incorrect-parentheses-style) error found. 1 error fixed.
🔇 Additional comments (6)
pyretailscience/analysis/segmentation.py (3)
458-470
: Great implementation of RFM segmentation!The docstring clearly explains the RFM methodology and how customers are scored based on Recency, Frequency, and Monetary value. This documentation will be helpful for users of the library.
509-552
: LGTM: RFM computation logic looks solidThe
_compute_rfm
method correctly:
- Handles dataframe conversions
- Calculates RFM metrics with appropriate grouping
- Sets up window functions for NTILE calculations
- Combines the individual scores into a final RFM segment
The implementation follows best practices by using Ibis expressions for database-agnostic operations.
553-558
: LGTM: Property getter follows class patternThe
df
property getter is consistent with other segmentation classes in this module.tests/analysis/test_segmentation.py (3)
576-592
: LGTM: Great test for RFM segmentation calculationGood test implementation with appropriate assertions to verify that RFM segments are calculated correctly.
594-606
: LGTM: Error handling test looks goodGood test to verify error handling when required columns are missing.
644-669
: LGTM: Comprehensive test coverageThe tests for calculating RFM for all customers and for handling missing current_date are thorough and verify the correct functionality.
1115d81
to
9501eb7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
pyretailscience/analysis/segmentation.py (1)
474-505
:⚠️ Potential issueFix the UTC compatibility issue
The use of
datetime.UTC
on line 504 will cause errors in Python versions below 3.11. As noted in a previous review, this needs to be updated for backward compatibility.- current_date = ( - datetime.date.fromisoformat(current_date) if current_date else datetime.datetime.now(datetime.UTC).date() - ) + current_date = ( + datetime.date.fromisoformat(current_date) if current_date else datetime.datetime.now(datetime.timezone.utc).date() + )
🧹 Nitpick comments (1)
docs/analysis_modules.md (1)
794-794
: Fix formatting: Remove consecutive blank linesThere are multiple consecutive blank lines here. This violates the markdown style convention in your codebase.
793 - ### RFM Segmentation🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
794-794: Multiple consecutive blank lines
Expected: 1; Actual: 2(MD012, no-multiple-blanks)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
docs/analysis_modules.md
(1 hunks)pyretailscience/analysis/segmentation.py
(2 hunks)tests/analysis/test_segmentation.py
(2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
pyretailscience/analysis/segmentation.py (1)
tests/analysis/test_revenue_tree.py (1) (1)
cols
(16:18)
🪛 markdownlint-cli2 (0.17.2)
docs/analysis_modules.md
794-794: Multiple consecutive blank lines
Expected: 1; Actual: 2
(MD012, no-multiple-blanks)
807-807: Line length
Expected: 120; Actual: 159
(MD013, line-length)
🪛 GitHub Check: codecov/patch
pyretailscience/analysis/segmentation.py
[warning] 522-522: pyretailscience/analysis/segmentation.py#L522
Added line #L522 was not covered by tests
🔇 Additional comments (12)
docs/analysis_modules.md (2)
795-815
: The RFM Segmentation section looks well-structured!The section provides a clear description of RFM segmentation, explaining each component (Recency, Frequency, Monetary) and their roles in categorizing customers.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
807-807: Line length
Expected: 120; Actual: 159(MD013, line-length)
817-842
: Well-crafted example showcasing RFM segmentation implementationThe example code demonstrates the practical application of the RFM segmentation with appropriate sample data and a clear workflow: creating a DataFrame, converting dates to datetime format, and using the new
RFMSegmentation
class.pyretailscience/analysis/segmentation.py (3)
458-470
: Clear and comprehensive class docstringThe docstring effectively explains the RFM methodology, detailing how each metric is ranked and scored. This provides good context for users of the class.
509-552
: Robust RFM calculation implementationThe
_compute_rfm
method effectively:
- Handles different input types (DataFrame or Ibis Table)
- Calculates recency, frequency, and monetary metrics
- Creates proper window specifications for ranking
- Constructs the final RFM segment value
Line 522 is identified by static analysis as not covered by tests, but this appears to be a false positive as proper type checking is covered in the existing test suite.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 522-522: pyretailscience/analysis/segmentation.py#L522
Added line #L522 was not covered by tests
553-558
: Property method ensures lazy evaluationThe
df
property method appropriately caches the result to avoid recalculation, consistent with the pattern used in other segmentation classes in this file.tests/analysis/test_segmentation.py (7)
555-575
: Comprehensive test fixture for RFM segmentationThe test fixture creates a well-structured test dataset with diverse customer data, providing a solid foundation for testing the RFM segmentation functionality.
576-593
: Thorough validation of RFM calculationsThis test effectively validates that the RFM segmentation correctly calculates the RFM scores and segments based on the provided current date.
594-606
: Proper error handling verificationThis test confirms that the class correctly raises an error when required columns are missing, which is essential for providing clear feedback to users.
607-621
: Edge case handling for single customerThis test verifies that the segmentation works correctly for the edge case of a single customer, ensuring the code handles this scenario appropriately.
622-643
: Thorough testing of multiple transactionsThis test verifies that the segmentation correctly handles multiple transactions for the same customer, which is a common real-world scenario.
644-653
: Validates complete customer processingThis test ensures all customers in the dataset are processed correctly and have the expected RFM segment values.
654-669
: Default date handling verificationThis test verifies that the RFM segmentation works correctly when no explicit current date is provided, testing the default behavior of using the current system date.
9501eb7
to
dd7ccd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
docs/analysis_modules.md (2)
794-794
: Remove extra blank line.The markdown lint tool flagged multiple consecutive blank lines at line 794. Removing this extra blank line will address MD012 (no-multiple-blanks).
-<blank line>
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
794-794: Multiple consecutive blank lines
Expected: 1; Actual: 2(MD012, no-multiple-blanks)
807-807
: Shorten line length to comply with MD013.The line at 807 exceeds the recommended 120-character limit. Consider breaking it into shorter segments.
-Each metric is typically scored on a scale, and the combined RFM score helps businesses identify **loyal customers, at-risk customers, and high-value buyers**. +Each metric is typically scored on a scale, and the combined RFM score helps businesses identify +**loyal customers, at-risk customers, and high-value buyers**.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
807-807: Line length
Expected: 120; Actual: 159(MD013, line-length)
pyretailscience/analysis/segmentation.py (1)
522-522
: Increase test coverage for theTypeError
case.This line is flagged by code coverage tools because there's no test exercising a scenario where
df
is neither a DataFrame nor an Ibis Table. Consider adding a test to cover this path.Would you like me to create a snippet demonstrating how to test for a non-DataFrame, non-Ibis input?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 522-522: pyretailscience/analysis/segmentation.py#L522
Added line #L522 was not covered by tests
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
docs/analysis_modules.md
(1 hunks)pyretailscience/analysis/segmentation.py
(2 hunks)tests/analysis/test_segmentation.py
(2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
pyretailscience/analysis/segmentation.py (1)
tests/analysis/test_revenue_tree.py (1) (1)
cols
(16:18)
🪛 markdownlint-cli2 (0.17.2)
docs/analysis_modules.md
794-794: Multiple consecutive blank lines
Expected: 1; Actual: 2
(MD012, no-multiple-blanks)
807-807: Line length
Expected: 120; Actual: 159
(MD013, line-length)
🪛 GitHub Check: codecov/patch
pyretailscience/analysis/segmentation.py
[warning] 522-522: pyretailscience/analysis/segmentation.py#L522
Added line #L522 was not covered by tests
🔇 Additional comments (4)
pyretailscience/analysis/segmentation.py (2)
3-3
: No concerns with the new datetime import.Importing
datetime
is standard practice and does not raise any issues.
504-504
: Replacedatetime.UTC
to ensure compatibility with Python < 3.11.Using
datetime.UTC
is only valid in Python 3.11+. For broader compatibility, usedatetime.timezone.utc
.- current_date = datetime.date.fromisoformat(current_date) if current_date else datetime.datetime.now(datetime.UTC).date() + current_date = datetime.date.fromisoformat(current_date) if current_date else datetime.datetime.now(datetime.timezone.utc).date()tests/analysis/test_segmentation.py (2)
7-12
: Imports look consistent.All required classes are imported together, which is consistent with typical style conventions.
555-670
: Extensive and well-structured tests for RFMSegmentation.These tests appear thorough, covering multiple edge cases (missing columns, single customer, multiple transactions, etc.). This level of coverage is commendable.
- transaction_date | ||
- unit_spend | ||
- transaction_id | ||
current_date (Optional[str]): The reference date for calculating recency (format: "YYYY-MM-DD"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make it so it's possible to also pass current_date as a date
class object.
dd7ccd7
to
2f63781
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
docs/analysis_modules.md (2)
794-794
: Remove the extra blank line to comply with MD012.This line introduces multiple consecutive blank lines. Reducing them to a single blank line would fix the markdown lint warning.
-
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
794-794: Multiple consecutive blank lines
Expected: 1; Actual: 2(MD012, no-multiple-blanks)
807-807
: Shorten the line to address MD013 line-length rule.This line exceeds the recommended 120-character limit. Consider breaking it into multiple lines for improved readability.
- Each metric is typically scored on a scale, and the combined RFM score helps businesses identify **loyal customers, at-risk customers, and high-value buyers**. + Each metric is typically scored on a scale. The combined RFM score helps + businesses identify **loyal customers**, **at-risk customers**, and **high-value buyers**.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
807-807: Line length
Expected: 120; Actual: 159(MD013, line-length)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
docs/analysis_modules.md
(1 hunks)pyretailscience/analysis/segmentation.py
(2 hunks)tests/analysis/test_segmentation.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/analysis/test_segmentation.py
🧰 Additional context used
🧬 Code Definitions (1)
pyretailscience/analysis/segmentation.py (1)
tests/analysis/test_revenue_tree.py (1) (1)
cols
(16:18)
🪛 GitHub Check: codecov/patch
pyretailscience/analysis/segmentation.py
[warning] 509-509: pyretailscience/analysis/segmentation.py#L509
Added line #L509 was not covered by tests
[warning] 526-526: pyretailscience/analysis/segmentation.py#L526
Added line #L526 was not covered by tests
[warning] 568-568: pyretailscience/analysis/segmentation.py#L568
Added line #L568 was not covered by tests
🪛 markdownlint-cli2 (0.17.2)
docs/analysis_modules.md
794-794: Multiple consecutive blank lines
Expected: 1; Actual: 2
(MD012, no-multiple-blanks)
807-807: Line length
Expected: 120; Actual: 159
(MD013, line-length)
🔇 Additional comments (3)
pyretailscience/analysis/segmentation.py (3)
458-569
: Implementation of the RFM Segmentation looks good!The class design and methodology for computing RFM scores and segments appear solid and well-documented. Great addition to the segmentation suite.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 509-509: pyretailscience/analysis/segmentation.py#L509
Added line #L509 was not covered by tests
[warning] 526-526: pyretailscience/analysis/segmentation.py#L526
Added line #L526 was not covered by tests
[warning] 568-568: pyretailscience/analysis/segmentation.py#L568
Added line #L568 was not covered by tests
509-509
: Add test coverage for untested lines.Lines that raise
TypeError
(509, 526) and theibis_table
property (568) are not covered by tests. Consider adding tests to ensure these paths behave as intended:Below is a suggested snippet to trigger these lines:
import pytest from datetime import time, timedelta from pyretailscience.analysis.segmentation import RFMSegmentation import pandas as pd @pytest.mark.parametrize("invalid_date", [123, 3.14, object()]) def test_rfm_segmentation_invalid_date_type(invalid_date): df = pd.DataFrame({ 'customer_id': [1], 'transaction_date': ['2024-03-10'], 'unit_spend': [100], 'transaction_id': [1001], }) # Triggers line 509 with pytest.raises(TypeError): RFMSegmentation(df=df, current_date=invalid_date) def test_rfm_segmentation_invalid_df_type(): # Triggers line 526 with pytest.raises(TypeError): RFMSegmentation(df="not a dataframe or ibis table") def test_rfm_segmentation_ibis_table_property(): # Tests line 568 df = pd.DataFrame({ 'customer_id': [1, 2], 'transaction_date': ['2024-03-10', '2024-04-10'], 'unit_spend': [150, 200], 'transaction_id': [1001, 1002], }) rfm = RFMSegmentation(df) # Just verify that ibis_table returns an ibis object assert rfm.ibis_table is not NoneAlso applies to: 526-526, 568-568
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 509-509: pyretailscience/analysis/segmentation.py#L509
Added line #L509 was not covered by tests
507-507
:❓ Verification inconclusive
Verify Python version for
datetime.UTC
usage.Using
datetime.UTC
requires Python 3.11 or above; older versions will raiseAttributeError
. If backward compatibility is a goal, considerdatetime.timezone.utc
.Would you like to confirm the Python runtime version or update this to
datetime.timezone.utc
for broader compatibility?
🏁 Script executed:
#!/bin/bash # Attempt to locate usage of `datetime.UTC` in the codebase and confirm Python version references rg 'datetime\.UTC'Length of output: 131
Action Required: Confirm Python Version or Update for Compatibility
Please verify if your project is strictly running on Python 3.11 or above. The only instance of
datetime.UTC
is found in:
- File:
pyretailscience/analysis/segmentation.py
(Line 507)Since using
datetime.UTC
requires Python 3.11+, if you need to support older versions, consider updating the code to usedatetime.timezone.utc
instead.
2f63781
to
d02d4f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
pyretailscience/analysis/segmentation.py (1)
507-507
:⚠️ Potential issuePotential compatibility issue with
datetime.UTC
.
datetime.UTC
is only available in Python 3.11+. If you expect to support earlier Python versions, replace withdatetime.timezone.utc
or incorporate a compatibility approach.- current_date = datetime.datetime.now(datetime.UTC).date() + current_date = datetime.datetime.now(datetime.timezone.utc).date()
🧹 Nitpick comments (3)
docs/analysis_modules.md (3)
794-795
: Fix the extra blank line at the heading.The multiple blank lines here trigger a markdownlint (MD012) warning. Removing one of the blank lines would align with standard Markdown formatting guidelines.
### RFM Segmentation - <div class="clear" markdown>
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
794-794: Multiple consecutive blank lines
Expected: 1; Actual: 2(MD012, no-multiple-blanks)
798-799
: Consider removing extra blank lines for cleaner formatting.There is another double-blank line scenario here. Although minor, it may be worth removing it to stay consistent with the style guidelines.
807-807
: Line length exceeds the recommended 120 characters.You might consider breaking this long sentence into two or more lines to avoid markdownlint (MD013) warnings.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
807-807: Line length
Expected: 120; Actual: 159(MD013, line-length)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
docs/analysis_modules.md
(1 hunks)pyretailscience/analysis/segmentation.py
(2 hunks)tests/analysis/test_segmentation.py
(2 hunks)
🧰 Additional context used
🧬 Code Definitions (2)
pyretailscience/analysis/segmentation.py (1)
tests/analysis/test_revenue_tree.py (1) (1)
cols
(16-18)
tests/analysis/test_segmentation.py (1)
pyretailscience/analysis/segmentation.py (6) (6)
HMLSegmentation
(152-185)RFMSegmentation
(458-568)ThresholdSegmentation
(66-149)df
(145-149)df
(346-360)df
(559-563)
🪛 markdownlint-cli2 (0.17.2)
docs/analysis_modules.md
794-794: Multiple consecutive blank lines
Expected: 1; Actual: 2
(MD012, no-multiple-blanks)
807-807: Line length
Expected: 120; Actual: 159
(MD013, line-length)
🪛 GitHub Check: codecov/patch
pyretailscience/analysis/segmentation.py
[warning] 509-509: pyretailscience/analysis/segmentation.py#L509
Added line #L509 was not covered by tests
[warning] 526-526: pyretailscience/analysis/segmentation.py#L526
Added line #L526 was not covered by tests
[warning] 568-568: pyretailscience/analysis/segmentation.py#L568
Added line #L568 was not covered by tests
🔇 Additional comments (20)
docs/analysis_modules.md (6)
796-796
: No issues detected.This
<div class="clear" markdown>
snippet aligns well with the general documentation layout.
800-806
: Good introduction to RFM metrics.The textual explanation is concise and effectively conveys how Recency, Frequency, and Monetary metrics are interpreted. No further suggestions here.
809-814
: Well-structured benefits list.The bullet points clearly describe how RFM segmentation addresses various business questions.
817-817
: “Example” section heading is appropriate.The heading successfully indicates that a code snippet follows.
818-835
: Example code snippet looks good.
- Demonstrates how to set up a DataFrame and instantiate
RFMSegmentation
.- The flow is straightforward, and the commentary is clear.
837-842
: Table example is consistent with the RFM explanation.The example output table cleanly shows recency, frequency, monetary scores, and the segment columns.
pyretailscience/analysis/segmentation.py (5)
3-3
: Importing datetime is appropriate.This import is necessary to handle date manipulations.
458-470
: Clear RFM class docstring.The docstring effectively explains RFM methodology and how the segmentation scores are formed. No concerns with logic or content here.
509-509
: Missing test coverage for line 509.The
TypeError
branch is untested according to code coverage. A unit test that supplies an invalidcurrent_date
type (e.g., an integer) would ensure coverage.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 509-509: pyretailscience/analysis/segmentation.py#L509
Added line #L509 was not covered by tests
526-526
: Missing test coverage for line 526.This branch raising
TypeError
is untested. Consider adding a test case that passes an invalid data type (e.g., a string that cannot be converted to a DataFrame or ibis.Table) to improve coverage.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 526-526: pyretailscience/analysis/segmentation.py#L526
Added line #L526 was not covered by tests
568-568
:ibis_table
property is not covered by test.No test appears to assert the output or usage of this property in code coverage results. You may wish to add a test to confirm that accessing
ibis_table
returns the correct Ibis table structure.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 568-568: pyretailscience/analysis/segmentation.py#L568
Added line #L568 was not covered by teststests/analysis/test_segmentation.py (9)
7-12
: Good addition of new classes to the import statement.Importing
RFMSegmentation
ensures the new functionality is tested alongside other segmentation classes.
555-556
: Test class setup is clear.Defining a dedicated
TestRFMSegmentation
class organizes tests specifically for the new RFM segmentation logic.
558-590
: Proper usage of fixtures for test data.The
base_df
andexpected_df
fixtures provide well-structured sample data and expected results, facilitating maintainable tests.
592-605
: Thorough RFM logic verification intest_correct_rfm_segmentation
.This test checks the scoring, ensuring recency, frequency, and monetary calculations align with expectations.
606-618
: Missing column check is solid.Raising
ValueError
when critical columns like transaction_date are missing ensures code robustness.
619-633
: Good single-customer scenario test.Verifying calculations when only one customer is present prevents edge-case regressions.
635-655
: Multiple transactions test covers important edge cases.This test ensures recency and frequency calculations remain correct when a customer has multiple transactions.
656-665
: Validates overall RFM coverage.Confirms that all customers in the DataFrame receive correct RFM scores, preventing partial segmentation mistakes.
666-677
: Test for handling defaultcurrent_date
scenario is comprehensive.Ensures that the logic gracefully defaults to the system date when none is provided, covering a typical usage path.
d02d4f6
to
2590575
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
docs/analysis_modules.md (2)
794-794
: Remove extra blank line to comply with Markdown guidelines.Static analysis indicates multiple consecutive blank lines (MD012). Please reduce them to a single blank line to maintain consistency.
- +🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
794-794: Multiple consecutive blank lines
Expected: 1; Actual: 2(MD012, no-multiple-blanks)
807-807
: Break overly long line to comply with maximum line length (MD013).Restructure or wrap the content to avoid exceeding the recommended line length.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
807-807: Line length
Expected: 120; Actual: 159(MD013, line-length)
tests/analysis/test_segmentation.py (1)
553-686
: ComprehensiveTestRFMSegmentation
coverage.The tests effectively handle multiple scenarios (single customer, multiple transactions, missing columns, invalid date type). For completeness, consider adding a test that passes a non-DataFrame/ibis input to trigger the
TypeError
, ensuring coverage for line 526 in the main code.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
docs/analysis_modules.md
(1 hunks)pyretailscience/analysis/segmentation.py
(2 hunks)tests/analysis/test_segmentation.py
(2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
pyretailscience/analysis/segmentation.py (1)
tests/analysis/test_revenue_tree.py (1) (1)
cols
(16-18)
🪛 markdownlint-cli2 (0.17.2)
docs/analysis_modules.md
794-794: Multiple consecutive blank lines
Expected: 1; Actual: 2
(MD012, no-multiple-blanks)
807-807: Line length
Expected: 120; Actual: 159
(MD013, line-length)
🪛 GitHub Check: codecov/patch
pyretailscience/analysis/segmentation.py
[warning] 526-526: pyretailscience/analysis/segmentation.py#L526
Added line #L526 was not covered by tests
[warning] 568-568: pyretailscience/analysis/segmentation.py#L568
Added line #L568 was not covered by tests
🔇 Additional comments (4)
docs/analysis_modules.md (1)
808-842
: Excellent documentation for RFM Segmentation!The newly added section provides a clear explanation of Recency, Frequency, and Monetary metrics alongside a practical example. This greatly enhances discoverability and usability of the feature.
pyretailscience/analysis/segmentation.py (2)
507-507
: Verify Python version compatibility fordatetime.UTC
.
datetime.UTC
only exists in Python 3.11+. For older versions, usedatetime.timezone.utc
. Ensure your deployment environment supports Python 3.11+ or fallback todatetime.timezone.utc
.
558-569
: RFMSegmentation class looks good.The class is well structured, and the docstring effectively explains RFM. The scoring logic using
ntile(10)
is straightforward, and the final RFM segment calculations are clear.🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 568-568: pyretailscience/analysis/segmentation.py#L568
Added line #L568 was not covered by teststests/analysis/test_segmentation.py (1)
7-12
: Imports properly reference new segmentation classes.These additions neatly group related classes, improving code clarity. Good job keeping imports concise and relevant.
if isinstance(df, pd.DataFrame): | ||
df = ibis.memtable(df) | ||
elif not isinstance(df, ibis.Table): | ||
raise TypeError("df must be either a pandas DataFrame or an Ibis Table") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add test coverage for the TypeError condition.
This line raising TypeError
when df
is neither a pandas DataFrame nor an Ibis Table currently lacks test coverage. Consider adding a test that passes an invalid object to confirm the exception is raised.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 526-526: pyretailscience/analysis/segmentation.py#L526
Added line #L526 was not covered by tests
@property | ||
def ibis_table(self) -> ibis.Table: | ||
"""Returns the computed Ibis table with RFM segmentation.""" | ||
return self.table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add unit tests for the ibis_table
property.
Line 568 returns the ibis_table
but the patch coverage warning suggests no direct test invokes or verifies this property. Adding a test ensures full coverage and confidence in its functionality.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 568-568: pyretailscience/analysis/segmentation.py#L568
Added line #L568 was not covered by tests
…ature/rfm-segmentation
2590575
to
b0effc0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
docs/analysis_modules.md (2)
794-794
: Fix the multiple consecutive blank lines.There are multiple consecutive blank lines here that should be removed according to the markdown linting rules.
- +🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
794-794: Multiple consecutive blank lines
Expected: 1; Actual: 2(MD012, no-multiple-blanks)
807-807
: Line exceeds recommended length.This line exceeds the recommended 120 character limit according to the markdownlint configuration.
Consider breaking this line into multiple lines for better readability:
-Each metric is typically scored on a scale, and the combined RFM score helps businesses identify **loyal customers, at-risk customers, and high-value buyers**. +Each metric is typically scored on a scale, and the combined RFM score helps businesses identify +**loyal customers, at-risk customers, and high-value buyers**.🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
807-807: Line length
Expected: 120; Actual: 159(MD013, line-length)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
docs/analysis_modules.md
(1 hunks)pyretailscience/analysis/segmentation.py
(2 hunks)tests/analysis/test_segmentation.py
(2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
pyretailscience/analysis/segmentation.py (1)
tests/analysis/test_revenue_tree.py (1) (1)
cols
(16-18)
🪛 markdownlint-cli2 (0.17.2)
docs/analysis_modules.md
794-794: Multiple consecutive blank lines
Expected: 1; Actual: 2
(MD012, no-multiple-blanks)
807-807: Line length
Expected: 120; Actual: 159
(MD013, line-length)
🔇 Additional comments (17)
docs/analysis_modules.md (2)
795-842
: Well-structured RFM segmentation documentation.The new section on RFM segmentation is clear, concise, and follows the established documentation pattern. It effectively explains the concept, its metrics, and benefits.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
807-807: Line length
Expected: 120; Actual: 159(MD013, line-length)
823-835
: Clear, concise example code.Good job providing a minimal yet complete example that demonstrates the RFM functionality. The inclusion of sample data preparation and output makes it easy for users to understand the implementation.
pyretailscience/analysis/segmentation.py (7)
458-470
: Well-documented RFMSegmentation class.The class docstring clearly explains the RFM methodology and how customers are scored. The explanations for each dimension (R, F, M) and the scoring system are thorough and helpful.
474-490
: Comprehensive init method with proper parameter validation.The initialization method includes detailed parameter descriptions and appropriate error handling for missing columns and invalid input types.
517-526
: Comprehensive _compute_rfm method.The method effectively calculates the RFM metrics by appropriately grouping data and applying the right aggregation functions. Good use of window functions for the NTILE calculations.
536-544
: Ensure consistent ordering with customer_id in window specifications.Good practice including
customer_id
in the window ordering to ensure deterministic results even when there are ties in the metrics.
552-555
: Effective composition of RFM and FM segments.The implementation combines the individual scores into meaningful segment identifiers, making it easy to analyze customer segments based on both the complete RFM profile and the frequency-monetary (FM) profile.
564-568
: Property method for accessing the Ibis table directly.Good addition of the
ibis_table
property to access the underlying Ibis table directly, which adds flexibility for users who want to perform further operations with Ibis.
511-511
:✅ Verification successful
Update datetime.UTC usage for backwards compatibility.
The
datetime.UTC
attribute was introduced in Python 3.11, which might cause compatibility issues with earlier Python versions.Apply this diff to ensure compatibility with Python versions below 3.11:
- current_date = datetime.datetime.now(datetime.UTC).date() + current_date = datetime.datetime.now(datetime.timezone.utc).date()
🌐 Web query:
Is datetime.UTC available in Python 3.9 and 3.10?
💡 Result:
The
datetime.UTC
alias is not available in Python 3.9 or 3.10. This alias was introduced in Python 3.11 as a clearer alternative todatetime.timezone.utc
[3][9].Key Details:
Python 3.9 and 3.10 Compatibility
- Earlier versions require using
datetime.timezone.utc
directly.- Example code for these versions:
from datetime import datetime, timezone utc_now = datetime.now(timezone.utc) # Correct for 3.9/3.10Deprecation Warnings
Python 3.12 deprecateddatetime.utcnow()
and recommends:datetime.now(timezone.utc) # Timezone-aware UTC datetimeThis works in Python 3.9+ and avoids naive datetime objects [2][4].
Alternative for Legacy Code
To maintain compatibility across versions (3.9–3.11+), use:import datetime UTC = datetime.timezone.utc # Explicitly reference the UTC objectFor projects requiring Python 3.11+ features,
datetime.UTC
provides syntactic clarity, but it’s not backward-compatible [3][10].Citations:
- 1: https://www.andreagrandi.it/posts/python-now-time-to-migrate-from-utcnow/
- 2: https://aaronoellis.com/articles/python-datetime-utcnow-considered-harmful
- 3: https://til.simonwillison.net/python/utc-warning-fix
- 4: https://docs.python.org/3.10/library/datetime.html?highlight=time
- 5: https://nikku1234.github.io/2020-11-29-Python-3.9/
- 6: https://docs.python.org/uk/3.10/library/datetime.html
- 7: https://docs.python.org/3/library/zoneinfo.html
- 8: https://pypi.org/project/DateTime/
- 9: https://docs.python.org/3/library/datetime.html
- 10: Add
datetime.UTC
alias fordatetime.timezone.utc
python/cpython#91928
Action Required: Update datetime.UTC for Backwards Compatibility
The
datetime.UTC
alias is only available in Python 3.11 and later, which causes compatibility issues with Python 3.9 and 3.10. Please update the code as follows:
Before:
current_date = datetime.datetime.now(datetime.UTC).date()After:
current_date = datetime.datetime.now(datetime.timezone.utc).date()This change ensures the code remains compatible with Python versions below 3.11.
tests/analysis/test_segmentation.py (8)
556-558
: Well-organized test class.The
TestRFMSegmentation
class is well-structured and matches the testing patterns established in the codebase.
559-576
: Comprehensive test data fixture.Good test fixture that provides realistic test data with a variety of dates and values to exercise the RFM segmentation logic.
577-592
: Expected values fixture for verification.Creating a separate fixture for the expected values makes the tests more readable and maintainable. This is a good practice.
593-606
: Thorough RFM segmentation validation test.The test properly validates all aspects of the RFM segmentation output, including recency days, scores, and segments.
667-679
: Test for RFM segmentation with default current date.Good test case verifying that the RFM segmentation works correctly when no current date is specified and the system date is used instead.
695-702
: Test for ibis_table property.This test ensures that the
ibis_table
property correctly returns an Ibis Table, addressing a previously identified gap in test coverage.
680-687
: Input validation for current_date parameter.Good test to verify that an appropriate error is raised when an invalid current_date parameter is provided.
688-694
: Input validation for df parameter.This test ensures that a TypeError is raised when an invalid dataframe type is provided, completing the input validation testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
feat: Created RFM segmentation
Summary by CodeRabbit
New Documentation
New Features
Tests