-
Notifications
You must be signed in to change notification settings - Fork 1
feat: add standard histogram plot #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (2)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including You can disable this status message by setting the WalkthroughThe pull request introduces a "Histogram Plot" feature in the Changes
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Reviewer Guide 🔍
|
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Code Suggestions ✨Explore these optional code suggestions:
|
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. CI Failure Feedback 🧐
✨ CI feedback usage guide:The CI feedback tool (
In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:
where Configuration options
See more information about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (13)
docs/api/plots/histogram.md (1)
1-3
: LGTM, but consider enhancing the documentation.The basic structure for documenting the Histogram Plot is in place. However, to make it more useful for users, consider the following improvements:
- Add a brief introduction explaining what a histogram plot is and its common use cases in retail science.
- Include a basic example of how to create a histogram plot using the
pyretailscience.plots.histogram
module.- Mention any key parameters or options available for customizing the histogram.
Here's a suggested expansion of the documentation:
# Histogram Plot A histogram plot is a graphical representation of the distribution of a dataset. In retail science, it's commonly used to visualize the frequency distribution of various metrics such as sales, customer age, or product prices. ## Basic Usage To create a histogram plot using `pyretailscience.plots.histogram`: ```python import pyretailscience.plots as plots data = [1, 2, 2, 3, 3, 3, 4, 4, 5] plots.histogram(data, bins=5, title="Sample Histogram")Module Documentation
::: pyretailscience.plots.histogram
This expanded version provides more context and a basic example, making it more helpful for users. </blockquote></details> <details> <summary>mkdocs.yml (1)</summary><blockquote> `27-28`: **LGTM! Consider alphabetizing plot entries for improved navigation.** The addition of the "Histogram Plot" entry is consistent with the existing structure and aligns with the PR objective. It enhances the documentation by providing easy access to information about the new plot type. For improved navigation, consider alphabetizing the plot entries. You could reorder them as follows: ```yaml - Histogram Plot: api/plots/histogram.md - Line Plot: api/plots/line.md
This alphabetical order would make it easier for users to locate specific plot types as the number of available plots grows.
pyretailscience/style/graph_utils.py (1)
90-91
: LGTM! Consider removing redundant comment.The changes improve the graph's readability by setting a white background and placing grid lines behind the plot elements. These are good practices for graph styling.
Consider removing the comment on line 90 as the code is self-explanatory:
- ax.set_facecolor("w") # set background colour to white + ax.set_facecolor("w")tests/plots/test_histogram.py (3)
13-13
: Minor style improvement: Remove empty parentheses from pytest.fixture decoratorsFor better readability and consistency with pytest best practices, remove the empty parentheses from the
@pytest.fixture()
decorators.Apply this change to the affected lines:
-@pytest.fixture() +@pytest.fixtureAlso applies to: 22-22, 29-29, 42-42
🧰 Tools
🪛 Ruff
13-13: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
261-267
: LGTM with a minor suggestion: Good error handling testThis test case effectively verifies that the plot function raises a ValueError when the required value_col parameter is missing. It's a crucial check for proper error handling.
For consistency with other test cases, consider using a fixture for the ax parameter:
def test_plot_missing_value_col_raises_error(sample_dataframe): """Test the plot function raises an error when value_col is missing.""" _, ax = plt.subplots() with pytest.raises(ValueError, match="Please provide a value column to plot."): plot(df=sample_dataframe, value_col=None, ax=ax)This change aligns the test more closely with how the plot function is typically called in other tests.
37-37
: Style improvement: Add trailing commas to multi-line structuresFor better readability and easier version control diffs, consider adding trailing commas to multi-line lists, dictionaries, and function calls throughout the file. This practice makes it easier to add or remove items in the future without affecting surrounding lines in version control.
Here's an example of how to apply this change:
return pd.DataFrame( { "value": [1, 2, 3, 4, 5], - "category": ["A", "A", "B", "B", "B"] + "category": ["A", "A", "B", "B", "B"], } )Apply this change consistently throughout the file for all multi-line structures.
Also applies to: 64-64, 85-85, 106-106, 127-127, 148-148, 169-169, 182-182, 190-190, 200-200, 208-208, 218-218, 226-226, 236-236, 248-248, 256-256
🧰 Tools
🪛 Ruff
37-37: Trailing comma missing
Add trailing comma
(COM812)
docs/analysis_modules.md (3)
53-53
: Add a blank line before the section header.To improve readability and consistency with other sections, add a blank line before the "Histogram Plot" header.
Apply this change:
+ ### Histogram Plot
🧰 Tools
🪛 Markdownlint
53-53: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines(MD022, blanks-around-headings)
56-56
: Replace the image placeholder with the actual histogram plot image.The image placeholder needs to be updated with the actual histogram plot image to provide visual context for the section.
Update the image source once the actual image is available:
{ align=right loading=lazy width="50%"}
58-58
: Consider breaking long lines for better readability.Two lines in this section exceed the recommended line length of 120 characters. Consider breaking them into multiple lines for improved readability.
Apply these changes:
-Histograms are particularly useful for visualizing the distribution of data, allowing you to see how values in one or more metrics are spread across different ranges. This module also supports grouping by categories, enabling you to compare the distributions across different groups. When grouping by a category, multiple histograms are generated on the same plot, allowing for easy comparison across categories. +Histograms are particularly useful for visualizing the distribution of data, allowing you to see how values in one or +more metrics are spread across different ranges. This module also supports grouping by categories, enabling you to +compare the distributions across different groups. When grouping by a category, multiple histograms are generated on +the same plot, allowing for easy comparison across categories. -This module allows you to customize legends, axes, and other visual elements, as well as apply clipping or filtering on the data values to focus on specific ranges. +This module allows you to customize legends, axes, and other visual elements, as well as apply clipping or filtering +on the data values to focus on specific ranges.Also applies to: 66-66
🧰 Tools
🪛 Markdownlint
58-58: Expected: 120; Actual: 412
Line length(MD013, line-length)
pyretailscience/plots/histogram.py (4)
97-97
: Add a trailing comma after the last argument in function call.In the function call to
apply_range_clipping
, the last argumentrange_method=range_method
is missing a trailing comma. Adding a trailing comma improves readability and follows PEP8 style guidelines.Apply this diff to fix the issue:
df = apply_range_clipping( df=df, value_col=value_col, range_lower=range_lower, range_upper=range_upper, - range_method=range_method + range_method=range_method, )🧰 Tools
🪛 Ruff
97-97: Trailing comma missing
Add trailing comma
(COM812)
105-105
: Add a trailing comma after the last argument in function call.The function call to
_plot_histogram
is missing a trailing comma after**kwargs
. This enhances code readability and maintains consistency.Apply this diff:
ax = _plot_histogram( df=df, value_col=value_col, group_col=group_col, ax=ax, cmap=cmap, num_histograms=num_histograms, - **kwargs + **kwargs, )🧰 Tools
🪛 Ruff
105-105: Trailing comma missing
Add trailing comma
(COM812)
146-146
: Add a trailing comma after the last parameter in function definition.In the function
_prepare_dataframe
, adding a trailing comma after the last parameter improves readability and aligns with PEP8 guidelines.Apply this diff:
def _prepare_dataframe( - data: pd.DataFrame | pd.Series, value_col: str | list[str] | None, group_col: str | None + data: pd.DataFrame | pd.Series, value_col: str | list[str] | None, group_col: str | None, ) -> pd.DataFrame:🧰 Tools
🪛 Ruff
146-146: Trailing comma missing
Add trailing comma
(COM812)
43-52
: Organize imports according to PEP8 recommendations.The import statements can be organized into standard library imports, third-party imports, and local application imports to enhance readability.
Apply this diff:
from typing import Literal +import numpy as np +import pandas as pd from matplotlib.axes import Axes, SubplotBase from matplotlib.colors import ListedColormap -import numpy as np -import pandas as pd import pyretailscience.style.graph_utils as gu from pyretailscience.style.tailwind import get_base_cmapThis groups the imports logically: standard library (
typing
), third-party (numpy
,pandas
,matplotlib
), and local modules (pyretailscience
).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
docs/assets/images/analysis_modules/plots/histogram_plot.svg
is excluded by!**/*.svg
📒 Files selected for processing (7)
- docs/analysis_modules.md (1 hunks)
- docs/api/plots/histogram.md (1 hunks)
- mkdocs.yml (1 hunks)
- pyretailscience/plots/histogram.py (1 hunks)
- pyretailscience/plots/line.py (1 hunks)
- pyretailscience/style/graph_utils.py (2 hunks)
- tests/plots/test_histogram.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
- pyretailscience/plots/line.py
🧰 Additional context used
🪛 Markdownlint
docs/analysis_modules.md
58-58: Expected: 120; Actual: 412
Line length(MD013, line-length)
66-66: Expected: 120; Actual: 164
Line length(MD013, line-length)
53-53: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines(MD022, blanks-around-headings)
🪛 Ruff
pyretailscience/plots/histogram.py
97-97: Trailing comma missing
Add trailing comma
(COM812)
105-105: Trailing comma missing
Add trailing comma
(COM812)
146-146: Trailing comma missing
Add trailing comma
(COM812)
200-200: Trailing comma missing
Add trailing comma
(COM812)
203-203: Trailing comma missing
Add trailing comma
(COM812)
tests/plots/test_histogram.py
13-13: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
22-22: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
29-29: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
37-37: Trailing comma missing
Add trailing comma
(COM812)
42-42: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
64-64: Trailing comma missing
Add trailing comma
(COM812)
85-85: Trailing comma missing
Add trailing comma
(COM812)
106-106: Trailing comma missing
Add trailing comma
(COM812)
127-127: Trailing comma missing
Add trailing comma
(COM812)
148-148: Trailing comma missing
Add trailing comma
(COM812)
169-169: Trailing comma missing
Add trailing comma
(COM812)
182-182: Trailing comma missing
Add trailing comma
(COM812)
190-190: Trailing comma missing
Add trailing comma
(COM812)
200-200: Trailing comma missing
Add trailing comma
(COM812)
208-208: Trailing comma missing
Add trailing comma
(COM812)
218-218: Trailing comma missing
Add trailing comma
(COM812)
226-226: Trailing comma missing
Add trailing comma
(COM812)
236-236: Trailing comma missing
Add trailing comma
(COM812)
248-248: Trailing comma missing
Add trailing comma
(COM812)
256-256: Trailing comma missing
Add trailing comma
(COM812)
🔇 Additional comments (9)
docs/api/plots/histogram.md (1)
3-3
: Verify the documentation syntax.The syntax
::: pyretailscience.plots.histogram
is used to include the module documentation. This appears to be using MkDocs-style syntax, specifically the "mkdocstrings" plugin.To ensure this syntax is correct and the module is properly documented, please run the following verification:
This script will help verify that the module exists, is properly placed, and contains docstrings. It also checks for the presence of mkdocs configuration with the mkdocstrings plugin.
✅ Verification successful
pyretailscience/style/graph_utils.py
Outdated
# Add the legend only if legend_title is provided or move_legend_outside is True | ||
if legend_title or move_legend_outside: | ||
legend = ax.legend() | ||
if move_legend_outside and legend: | ||
legend.set_bbox_to_anchor((1.05, 1)) | ||
if legend_title: | ||
if legend_title and legend: | ||
legend.set_title(legend_title) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Fix indentation on line 125.
The changes improve legend handling by simplifying the conditions for adding a legend and ensuring the legend title is set only when the legend exists. This aligns well with the PR objective of adding a standard histogram plot.
There's a minor indentation issue on line 125. Please apply the following fix:
if legend_title or move_legend_outside:
legend = ax.legend()
if move_legend_outside and legend:
legend.set_bbox_to_anchor((1.05, 1))
- if legend_title and legend:
- legend.set_title(legend_title)
+ if legend_title and legend:
+ legend.set_title(legend_title)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Add the legend only if legend_title is provided or move_legend_outside is True | |
if legend_title or move_legend_outside: | |
legend = ax.legend() | |
if move_legend_outside and legend: | |
legend.set_bbox_to_anchor((1.05, 1)) | |
if legend_title: | |
if legend_title and legend: | |
legend.set_title(legend_title) | |
# Add the legend only if legend_title is provided or move_legend_outside is True | |
if legend_title or move_legend_outside: | |
legend = ax.legend() | |
if move_legend_outside and legend: | |
legend.set_bbox_to_anchor((1.05, 1)) | |
if legend_title and legend: | |
legend.set_title(legend_title) |
def _prepare_value_col(df: pd.DataFrame | pd.Series, value_col: str | list[str] | None) -> list[str]: | ||
"""Ensures that value_col is properly handled and returned as a list. | ||
|
||
Args: | ||
df (pd.DataFrame | pd.Series): The input dataframe or series. | ||
value_col (str or list of str, optional): The column(s) to plot. If a single string, it is converted to a list. | ||
|
||
Returns: | ||
list[str]: The processed value_col as a list of strings. | ||
""" | ||
if isinstance(df, pd.Series): | ||
return ["value"] if value_col is None else [value_col] | ||
|
||
if value_col is None: | ||
raise ValueError("Please provide a value column to plot.") | ||
|
||
if isinstance(value_col, str): | ||
value_col = [value_col] | ||
|
||
return value_col | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor to eliminate code duplication between _prepare_value_col
and _prepare_dataframe
.
The functions _prepare_value_col
and _prepare_dataframe
contain overlapping logic for handling value_col
, group_col
, and data input types. Refactoring these functions can reduce code duplication and improve maintainability.
Consider combining these functions or restructuring them to share common preprocessing steps.
Also applies to: 145-171
pyretailscience/plots/histogram.py
Outdated
col: df[col].apply( | ||
lambda x: np.nan | ||
if (range_lower is not None and x < range_lower) or (range_upper is not None and x > range_upper) | ||
else x | ||
) | ||
for col in value_col | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Refactor to use vectorized operations instead of apply
for better performance.
In the fillna
method of apply_range_clipping
, using df[col].apply(lambda x: ...)
can be inefficient for large DataFrames. Refactoring to use vectorized operations enhances performance and code clarity.
Apply this diff to refactor the code:
return df.assign(
**{
- col: df[col].apply(
- lambda x: np.nan
- if (range_lower is not None and x < range_lower) or (range_upper is not None and x > range_upper)
- else x
- )
+ col: df[col].where(
+ (
+ (range_lower is None) | (df[col] >= range_lower)
+ ) & (
+ (range_upper is None) | (df[col] <= range_upper)
+ ),
+ np.nan
+ )
for col in value_col
}
)
This refactoring uses where
with boolean indexing, which is more efficient than apply
with a lambda function.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
col: df[col].apply( | |
lambda x: np.nan | |
if (range_lower is not None and x < range_lower) or (range_upper is not None and x > range_upper) | |
else x | |
) | |
for col in value_col | |
} | |
) | |
col: df[col].where( | |
( | |
(range_lower is None) | (df[col] >= range_lower) | |
) & ( | |
(range_upper is None) | (df[col] <= range_upper) | |
), | |
np.nan | |
) | |
for col in value_col | |
} | |
) |
🧰 Tools
🪛 Ruff
200-200: Trailing comma missing
Add trailing comma
(COM812)
203-203: Trailing comma missing
Add trailing comma
(COM812)
pyretailscience/plots/histogram.py
Outdated
if range_lower is not None or range_upper is not None: | ||
if range_method == "clip": | ||
return df.assign(**{col: df[col].clip(lower=range_lower, upper=range_upper) for col in value_col}) | ||
return df.assign( | ||
**{ | ||
col: df[col].apply( | ||
lambda x: np.nan | ||
if (range_lower is not None and x < range_lower) or (range_upper is not None and x > range_upper) | ||
else x | ||
) | ||
for col in value_col | ||
} | ||
) | ||
|
||
return df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for invalid range_method
values.
Currently, if range_method
is set to a value other than "clip"
or "fillna"
, the function returns the unmodified DataFrame without any notification. It's important to handle invalid values to prevent unintended behavior.
Consider adding a check and raising a ValueError
for invalid range_method
values:
if range_lower is not None or range_upper is not None:
if range_method == "clip":
return df.assign(**{col: df[col].clip(lower=range_lower, upper=range_upper) for col in value_col})
elif range_method == "fillna":
return df.assign(
**{
col: df[col].where(
(
(range_lower is None) | (df[col] >= range_lower)
) & (
(range_upper is None) | (df[col] <= range_upper)
),
np.nan
)
for col in value_col
}
)
+ else:
+ raise ValueError(f"Invalid range_method: {range_method}. Expected 'clip' or 'fillna'.")
return df
Committable suggestion was skipped due to low confidence.
🧰 Tools
🪛 Ruff
200-200: Trailing comma missing
Add trailing comma
(COM812)
203-203: Trailing comma missing
Add trailing comma
(COM812)
Codecov ReportAttention: Patch coverage is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (4)
tests/plots/test_histogram.py (4)
13-13
: Simplify fixture decorator by removing parenthesesYou can remove the parentheses from the
@pytest.fixture()
decorator when no parameters are required.Apply this diff to simplify the decorator:
-@pytest.fixture() +@pytest.fixture🧰 Tools
🪛 Ruff
13-13: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
22-22
: Simplify fixture decorator by removing parenthesesYou can remove the parentheses from the
@pytest.fixture()
decorator when no parameters are required.Apply this diff to simplify the decorator:
-@pytest.fixture() +@pytest.fixture🧰 Tools
🪛 Ruff
22-22: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
29-29
: Simplify fixture decorator by removing parenthesesYou can remove the parentheses from the
@pytest.fixture()
decorator when no parameters are required.Apply this diff to simplify the decorator:
-@pytest.fixture() +@pytest.fixture🧰 Tools
🪛 Ruff
29-29: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
42-42
: Simplify fixture decorator by removing parenthesesYou can remove the parentheses from the
@pytest.fixture()
decorator when no parameters are required.Apply this diff to simplify the decorator:
-@pytest.fixture() +@pytest.fixture🧰 Tools
🪛 Ruff
42-42: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- pyretailscience/plots/histogram.py (1 hunks)
- tests/plots/test_histogram.py (1 hunks)
🧰 Additional context used
🪛 Ruff
tests/plots/test_histogram.py
13-13: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
22-22: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
29-29: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
42-42: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
pyretailscience/plots/histogram.py
Outdated
return df[value_col].plot(kind="hist", ax=ax, alpha=0.5, legend=add_legend, color=cmap.colors[0], **kwargs) | ||
|
||
df_pivot = df.pivot(columns=group_col, values=value_col[0]) | ||
|
||
# Plot all columns at once | ||
return df_pivot.plot( | ||
kind="hist", | ||
ax=ax, | ||
alpha=0.5, | ||
legend=add_legend, | ||
color=cmap.colors[: len(df_pivot.columns)], # Use the appropriate number of colors | ||
**kwargs, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: Inconsistent handling of multiple value_col
with group_col
When group_col
is provided, the _plot_histogram
function only plots the first value column (value_col[0]
), as seen on line 268. If value_col
contains multiple columns, the additional columns are ignored, which may not be the intended behavior.
Consider updating the function to either support plotting multiple value_col
when group_col
is provided, or explicitly raise a ValueError
when multiple value_col
are provided along with a group_col
, to prevent unexpected behavior.
pyretailscience/plots/histogram.py
Outdated
return df[value_col].plot(kind="hist", ax=ax, alpha=0.5, legend=add_legend, color=cmap.colors[0], **kwargs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: All histograms plotted with the same color when multiple value_col
are provided
When plotting multiple value_col
without a group_col
, the code assigns the same color to all histograms by using color=cmap.colors[0]
at line 266. This results in all histograms appearing in the same color, making it difficult to distinguish between them.
Consider modifying the color
parameter to use a list of colors corresponding to each value column:
- return df[value_col].plot(kind="hist", ax=ax, alpha=0.5, legend=add_legend, color=cmap.colors[0], **kwargs)
+ return df[value_col].plot(
+ kind="hist",
+ ax=ax,
+ alpha=0.5,
+ legend=add_legend,
+ color=cmap.colors[:len(value_col)],
+ **kwargs,
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
return df[value_col].plot(kind="hist", ax=ax, alpha=0.5, legend=add_legend, color=cmap.colors[0], **kwargs) | |
return df[value_col].plot( | |
kind="hist", | |
ax=ax, | |
alpha=0.5, | |
legend=add_legend, | |
color=cmap.colors[:len(value_col)], | |
**kwargs, | |
) |
tests/plots/test_histogram.py
Outdated
resulted_ax = plot(df=sample_dataframe, value_col="value", ax=ax) | ||
|
||
# Verify that df.plot was called with correct parameters | ||
pd.DataFrame.plot.assert_called_once_with(kind="hist", ax=ax, alpha=0.5, legend=False, color=mocker.ANY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect assertion on the mocked DataFrame.plot
method
When mocking instance methods like DataFrame.plot
, asserting directly on pd.DataFrame.plot
may not capture the calls made by the DataFrame instances in your tests. Instead, you should mock or spy on the plot
method of the specific DataFrame or Series instances you're using.
Solution: Spy on the plot
method of the DataFrame or Series instances
Adjust your test functions by adding a spy on the plot
method of the DataFrame or Series instance and then assert on that.
For example, in your test function at line 51:
def test_plot_single_histogram_no_legend(sample_dataframe, mocker):
"""Test the plot function with a single histogram and no legend."""
# Create the plot axis using plt.subplots()
_, ax = plt.subplots()
+ # Spy on the plot method of the DataFrame instance
+ mocker.spy(sample_dataframe, 'plot')
# Call the plot function
resulted_ax = plot(df=sample_dataframe, value_col="value", ax=ax)
# Verify that df.plot was called with correct parameters
- pd.DataFrame.plot.assert_called_once_with(kind="hist", ax=ax, alpha=0.5, legend=False, color=mocker.ANY)
+ sample_dataframe.plot.assert_called_once_with(kind="hist", ax=ax, alpha=0.5, legend=False, color=mocker.ANY)
Repeat this adjustment for the other test functions at lines 77, 103, 129, 155, and 181 where you are asserting on pd.DataFrame.plot
or pd.Series.plot
.
Also applies to: 86-86, 112-112, 138-138, 164-164, 190-190
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. CI Failure Feedback 🧐
✨ CI feedback usage guide:The CI feedback tool (
In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:
where Configuration options
See more information about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- pyretailscience/plots/histogram.py (1 hunks)
🔇 Additional comments (1)
pyretailscience/plots/histogram.py (1)
1-41
: Module docstring is clear and comprehensiveThe module's docstring provides a detailed and well-structured overview of the functionality, features, use cases, limitations, and additional features. This enhances the readability and maintainability of the code.
pyretailscience/plots/histogram.py
Outdated
clipping them or filling values outside the range with **NaN**. This is particularly useful when visualizing specific | ||
data ranges. | ||
- **Comprehensive Customization**: Customize plot titles, axis labels, and legends, with the option to move the legend outside the plot. | ||
- **Support for Pre-Aggregated Data**: The module assumes that the data has already been aggregated. No aggregation occurs during the plotting process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this line please. I don't think it's necessary in this case
- **Support for Pre-Aggregated Data**: The module assumes that the data has already been aggregated. No aggregation occurs during the plotting process. |
pyretailscience/plots/histogram.py
Outdated
if range_lower is not None or range_upper is not None: | ||
if range_method == "clip": | ||
return df.assign(**{col: df[col].clip(lower=range_lower, upper=range_upper) for col in value_col}) | ||
return df.assign( | ||
**{ | ||
col: df[col].apply( | ||
lambda x: np.nan | ||
if (range_lower is not None and x < range_lower) or (range_upper is not None and x > range_upper) | ||
else x, | ||
) | ||
for col in value_col | ||
}, | ||
) | ||
|
||
return df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flow removes the double negatives and simplifies the code. Not sure about the formatting thought. It's a bit tough to do in github.
if range_lower is not None or range_upper is not None: | |
if range_method == "clip": | |
return df.assign(**{col: df[col].clip(lower=range_lower, upper=range_upper) for col in value_col}) | |
return df.assign( | |
**{ | |
col: df[col].apply( | |
lambda x: np.nan | |
if (range_lower is not None and x < range_lower) or (range_upper is not None and x > range_upper) | |
else x, | |
) | |
for col in value_col | |
}, | |
) | |
return df | |
if range_lower is None and range_upper is None: | |
return df | |
if range_method == "clip": | |
return df.assign(**{col: df[col].clip(lower=range_lower, upper=range_upper) for col in value_col}) | |
return df.assign( | |
**{ | |
col: df[col].apply( | |
lambda x: np.nan | |
if (range_lower is not None and x < range_lower) or (range_upper is not None and x > range_upper) | |
else x, | |
) | |
for col in value_col | |
}, | |
) |
pyretailscience/plots/histogram.py
Outdated
return df[value_col].plot( | ||
kind="hist", | ||
ax=ax, | ||
alpha=0.5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make it so that you check kwargs for a key with alpha in it and use that, otherwise use a default value for alpha. Basically, make it so alpha can get overridden.
Also do you think alpha 0.5 is too transparent?
pyretailscience/plots/histogram.py
Outdated
return df_pivot.plot( | ||
kind="hist", | ||
ax=ax, | ||
alpha=0.5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same override comment here
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. CI Failure Feedback 🧐
✨ CI feedback usage guide:The CI feedback tool (
In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:
where Configuration options
See more information about the |
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. CI Failure Feedback 🧐
✨ CI feedback usage guide:The CI feedback tool (
In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:
where Configuration options
See more information about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (8)
tests/plots/test_line.py (3)
28-32
: LGTM: Improved color mocking for tests.The changes to the
_mock_get_base_cmap
fixture are well-implemented. Using a cycling color generator instead of aListedColormap
provides more flexibility for testing scenarios with an unknown number of color assignments.Consider defining the
colors
list as a constant at the module level or as a parameter to the fixture for easier maintenance and reusability. For example:MOCK_COLORS = ["#FF0000", "#00FF00", "#0000FF"] @pytest.fixture() def _mock_get_base_cmap(mocker, colors=MOCK_COLORS): color_gen = cycle(colors) mocker.patch("pyretailscience.style.tailwind.get_base_cmap", return_value=color_gen)This approach would make it easier to modify the mock colors if needed in the future.
Line range hint
213-241
: LGTM: Comprehensive test case for combined legend title and positioning.The new test case
test_plot_with_legend_title_and_move_outside
is a valuable addition to the test suite. It effectively verifies the plot function's ability to handle both a custom legend title and moving the legend outside the plot. The test is well-structured and ensures that both features work correctly in combination.For consistency with other test cases, consider adding an assertion to check the number of lines in the plot. This would ensure that the basic plotting functionality is not affected by the legend modifications. For example:
assert len(result_ax.get_lines()) == 2 # Assuming two groups in the sample_dataframeThis additional assertion would make the test even more comprehensive.
Line range hint
244-276
: LGTM: Valuable test case for datetime index warning.The new test case
test_plot_with_datetime_index_warns
is an excellent addition to the test suite. It effectively verifies that the appropriate warning is raised when plotting with a datetime index and no x_col specified. This test enhances the robustness of the plotting function by ensuring proper guidance is provided to users in specific scenarios.To improve clarity and make the test more self-contained, consider adding a comment or docstring explaining why this warning is important. For example:
def test_plot_with_datetime_index_warns(sample_dataframe, mocker): """ Test that a warning is raised when plotting with a datetime index and no x_col. This is important because time-based plots may require special handling, and users should be directed to the appropriate module. """ # ... (rest of the test case)This additional context would make the purpose and importance of the test more evident to other developers.
pyretailscience/style/graph_utils.py (2)
14-17
: LGTM: New_hatches_gen
function is well-implemented.The
_hatches_gen
function is correctly implemented and efficiently usesitertools.cycle
to create a generator of hatch patterns. The return type annotation is accurate.Consider adding a brief explanation of the hatch patterns in the docstring:
def _hatches_gen() -> Generator[str, None, None]: """Returns a generator that cycles through the hatch patterns. Hatch patterns: '/', '\', '|', '-', '+', 'x', 'o', 'O', '.', '*' """ _hatches = ["/", "\\", "|", "-", "+", "x", "o", "O", ".", "*"] return cycle(_hatches)
70-87
: LGTM: New_add_plot_legend
function improves legend handling.The
_add_plot_legend
function effectively centralizes legend handling logic, aligning with the PR objective. It correctly manages different scenarios for legend placement and titling.For improved clarity, consider simplifying the condition for adding a legend:
def _add_plot_legend(ax: Axes, legend_title: str | None, move_legend_outside: bool) -> Axes: """Add a legend to a Matplotlib graph.""" if ax.get_legend() is not None or legend_title or move_legend_outside: legend = ax.legend( bbox_to_anchor=(1.05, 1), loc="upper left", frameon=False ) if move_legend_outside else ax.legend(frameon=False) if legend_title and legend: legend.set_title(legend_title) return axThis change removes the
has_legend
variable and simplifies the condition, making the code more concise.docs/analysis_modules.md (3)
53-98
: Great addition of the Histogram Plot section!The new section on Histogram Plot is well-structured and informative. It provides a clear explanation of the purpose and functionality of histograms in data visualization. The example code effectively demonstrates how to use the new histogram plot feature.
Here are a few minor suggestions for improvement:
- Consider adding a blank line before the "### Histogram Plot" heading (line 53) for consistency with other sections.
- The explanation paragraph (lines 58-66) is quite long. Consider breaking it into smaller paragraphs for better readability.
- In the example code, consider adding a comment explaining the purpose of the
np.concatenate
and the creation of two normal distributions (lines 80-83) to help users understand the data generation process.🧰 Tools
🪛 Markdownlint
58-58: Expected: 120; Actual: 412
Line length(MD013, line-length)
66-66: Expected: 120; Actual: 164
Line length(MD013, line-length)
53-53: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines(MD022, blanks-around-headings)
Line range hint
284-341
: Excellent expansion of the Product Association Rules section!The expanded content in the Product Association Rules section greatly enhances the documentation. It provides a comprehensive explanation of product association rules, their applications in retail analytics, and key metrics used to quantify associations.
The example code snippet effectively demonstrates how to use the ProductAssociation class. However, to further improve this section, consider adding a brief explanation of the output table (lines 336-341) to help users interpret the results.
Would you like me to draft a short explanation for the output table?
🧰 Tools
🪛 Markdownlint
58-58: Expected: 120; Actual: 412
Line length(MD013, line-length)
66-66: Expected: 120; Actual: 164
Line length(MD013, line-length)
53-53: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines(MD022, blanks-around-headings)
Line range hint
1-598
: Address placeholder sections and minor formatting issues
Placeholder sections: Several sections (e.g., Cross Shop, Gain Loss, Revenue Tree, etc.) contain placeholder text ("PASTE TEXT HERE" and "PASTE CODE HERE"). These sections should be completed with actual content or removed if not ready for this PR.
Formatting:
- Consider breaking long lines (e.g., lines 58-66) into shorter paragraphs for better readability.
- Add a blank line before the "### Histogram Plot" heading (line 53) for consistency with other sections.
- Review and adjust line lengths exceeding 120 characters, particularly in the explanation paragraphs.
Consistency: Ensure all sections follow the same structure: explanation, image (if applicable), and code example.
Would you like assistance in drafting content for any of the placeholder sections?
🧰 Tools
🪛 Markdownlint
58-58: Expected: 120; Actual: 412
Line length(MD013, line-length)
66-66: Expected: 120; Actual: 164
Line length(MD013, line-length)
53-53: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines(MD022, blanks-around-headings)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (2)
docs/assets/images/analysis_modules/plots/histogram_plot.svg
is excluded by!**/*.svg
docs/assets/images/analysis_modules/plots/line_plot.svg
is excluded by!**/*.svg
📒 Files selected for processing (8)
- docs/analysis_modules.md (1 hunks)
- pyretailscience/plots/histogram.py (1 hunks)
- pyretailscience/plots/line.py (3 hunks)
- pyretailscience/style/graph_utils.py (6 hunks)
- pyretailscience/style/tailwind.py (2 hunks)
- tests/plots/test_histogram.py (1 hunks)
- tests/plots/test_line.py (2 hunks)
- tests/test_standard_graphs.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pyretailscience/plots/line.py
🧰 Additional context used
🪛 Markdownlint
docs/analysis_modules.md
58-58: Expected: 120; Actual: 412
Line length(MD013, line-length)
66-66: Expected: 120; Actual: 164
Line length(MD013, line-length)
53-53: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines(MD022, blanks-around-headings)
🪛 Ruff
tests/plots/test_histogram.py
15-15: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
24-24: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
30-30: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
40-40: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
54-54: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
🔇 Additional comments (22)
tests/plots/test_line.py (3)
4-4
: LGTM: Import statement addition is appropriate.The addition of
from iterators import cycle
is necessary for the changes made in the_mock_get_base_cmap
fixture. This import allows for the creation of an infinite iterator for color cycling in the tests.
Line range hint
186-210
: LGTM: Well-structured test case for legend title functionality.The new test case
test_plot_with_legend_title
is a valuable addition to the test suite. It effectively verifies the plot function's ability to handle a custom legend title. The test is well-structured, following the Arrange-Act-Assert pattern, and checks that thestandard_graph_styles
function is called with the correct parameters, including the provided legend title.This test enhances the overall coverage of the plotting functionality and helps ensure that the legend title feature works as expected.
Line range hint
1-276
: Overall assessment: Well-implemented changes with improved test coverage.The modifications to
tests/plots/test_line.py
significantly enhance the test suite for the plotting functionality. Key improvements include:
- More flexible color mocking using a cycling color generator.
- New test cases covering legend title handling, legend positioning, and datetime index warnings.
- Improved coverage of edge cases and combined feature usage.
These changes contribute to a more robust and comprehensive test suite, which will help maintain the quality and reliability of the plotting module. The suggestions provided are minor and aimed at further improving clarity and consistency.
Great job on these enhancements!
pyretailscience/style/graph_utils.py (3)
4-5
: LGTM: Import statement addition is correct.The addition of
from collections.abc import Generator
is appropriate for type hinting purposes, likely used in the new_hatches_gen
function. This follows Python best practices for type annotations.
Line range hint
89-148
: LGTM: Updatedstandard_graph_styles
function improves code organization.The changes to
standard_graph_styles
effectively utilize the new_add_plot_legend
function, centralizing legend handling logic. This update aligns with the PR objective of improving legend handling while maintaining backward compatibility.
Line range hint
1-265
: Overall assessment: Excellent additions and improvements to graph styling capabilities.The changes in this file successfully implement the PR objectives, introducing new functions for hatch pattern generation and application, improving legend handling, and enhancing overall graph styling capabilities. The code is well-structured and follows good practices, with only minor suggestions for improvements.
Key points:
- New
_hatches_gen
andapply_hatches
functions add valuable styling options.- Improved legend handling with the new
_add_plot_legend
function.- Updated
standard_graph_styles
function for better code organization.The suggested minor improvements (docstring addition, legend condition simplification, and import relocation) will further enhance the code's clarity and maintainability.
tests/test_standard_graphs.py (7)
132-142
: LGTM: Well-structured test for 3 colors scenarioThe test effectively validates the
get_base_cmap
function for the case of exactly 3 colors, which is a crucial boundary condition. The use ofislice
to extract the first three colors from the generator is appropriate and efficient.
145-155
: LGTM: Comprehensive test for 2 colors scenarioThis test effectively covers the case of 2 colors, which is another important scenario for the
get_base_cmap
function. The implementation is consistent with the previous test, maintaining good code quality.
158-168
: LGTM: Edge case test for single color scenarioThis test effectively covers the edge case of a single color for the
get_base_cmap
function. The implementation is consistent with the previous tests and properly validates the function's behavior for this minimal input scenario.
171-186
: LGTM: Thorough test for more than three colorsThis test effectively validates the
get_base_cmap
function's behavior when requesting more than three colors. It ensures that the function correctly cycles through different color shades as expected. The implementation is consistent with previous tests and covers an important scenario beyond the basic use case.
189-209
: LGTM: Comprehensive test for color cycling behaviorThis test effectively validates the
get_base_cmap
function's behavior when requesting more colors than are distinctly available. It ensures that the function correctly cycles through the available colors and shades as expected. The implementation is thorough and covers an important edge case for the function's behavior.
212-229
: LGTM: Excellent test for color cycling consistencyThis test effectively validates the consistent cycling behavior of the
get_base_cmap
function. By requesting more colors than initially specified, it ensures that the function correctly repeats the color pattern as expected. This is crucial for understanding and verifying the function's behavior over extended use.
132-229
: Excellent addition of comprehensive tests forget_base_cmap
The new test suite for the
get_base_cmap
function is well-designed and thorough. It covers a wide range of scenarios including:
- Exactly three colors
- Two colors
- Single color
- More than three colors
- More colors than available (testing cycling)
- Specific cycling behavior
These tests significantly enhance the reliability and robustness of the
get_base_cmap
function by validating its behavior under various conditions. The consistent use ofislice
and clear assertions make the tests easy to understand and maintain.Great job on improving the test coverage!
docs/analysis_modules.md (1)
Line range hint
1-598
: Overall, great improvements to the documentation!The additions and expansions in this file significantly enhance the documentation for the analysis modules. The new Histogram Plot section and the expanded Product Association Rules section provide valuable information and clear examples for users.
To further improve the document:
- Complete or remove the placeholder sections.
- Address the minor formatting and consistency issues mentioned in previous comments.
- Consider adding brief explanations for the output of code examples where applicable.
These changes will result in a more comprehensive and polished documentation for the analysis modules.
🧰 Tools
🪛 Markdownlint
58-58: Expected: 120; Actual: 412
Line length(MD013, line-length)
66-66: Expected: 120; Actual: 164
Line length(MD013, line-length)
53-53: Expected: 1; Actual: 0; Below
Headings should be surrounded by blank lines(MD022, blanks-around-headings)
pyretailscience/style/tailwind.py (3)
11-12
: Necessary imports addedThe imports of
Generator
fromcollections.abc
andcycle
fromitertools
are appropriate for the new generator functions introduced.
377-385
:get_single_color_cmap
function implemented correctlyThe
get_single_color_cmap
function provides a generator for green shades and is correctly implemented. The use ofcycle
ensures continuous looping over the specified shades.
351-370
: Ensure expandedcolor_order
does not affect existing usageThe
color_order
inget_base_cmap
has been expanded to include more colors, enhancing the variety in the base colormap. Please verify that this change does not adversely impact any existing code or visualizations that rely on the previous color order or specific color sequencing.You can run the following script to identify where
get_base_cmap
is used in the codebase:pyretailscience/plots/histogram.py (4)
1-40
: Comprehensive and Clear Module DocumentationThe module docstring provides a comprehensive overview of the functionality, features, and limitations. It is well-structured and aids in understanding the purpose and usage of the module.
91-93
: Good handling of conflicting parametersThe function now correctly raises a
ValueError
whenvalue_col
is a list andgroup_col
is provided, preventing unexpected behavior.
241-242
: Flexible alpha parameter handlingGreat job allowing the
alpha
parameter to be overridden viakwargs
. This enhances the flexibility of the plotting function.
94-98
: 🛠️ Refactor suggestionConsider validating that
value_col
columns exist indf
Currently, there is no check to ensure that the columns specified in
value_col
actually exist indf
. Adding a validation step can prevent errors downstream when non-existent columns are referenced.Apply this diff to add the validation:
value_col = _prepare_value_col(df=df, value_col=value_col) +missing_cols = [col for col in value_col if col not in df.columns] +if missing_cols: + raise ValueError(f"The following value columns are not present in the DataFrame: {missing_cols}") if isinstance(df, pd.Series): df = df.to_frame(name=value_col[0])Likely invalid or redundant comment.
tests/plots/test_histogram.py (1)
72-72
: Previous comment about mockingDataFrame.plot
still appliesThe issue regarding incorrect assertions on the mocked
DataFrame.plot
method is still present. Please refer to the previous review comment for details on how to address this.Also applies to: 98-98, 124-124, 150-150, 176-176, 202-202
pyretailscience/style/graph_utils.py
Outdated
def apply_hatches(ax: Axes, num_segments: int) -> Axes: | ||
"""Apply hatches to a plot. Useful for bars/histograms/area plots.""" | ||
import numpy as np | ||
|
||
hatches = _hatches_gen() | ||
patch_groups = np.array_split(ax.patches, num_segments) | ||
for patch_group in patch_groups: | ||
hatch = next(hatches) | ||
for patch in patch_group: | ||
patch.set_hatch(hatch) | ||
return ax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
LGTM: New apply_hatches
function enhances graph styling capabilities.
The apply_hatches
function effectively uses the _hatches_gen
generator to apply hatch patterns to plot elements, supporting multiple segments for grouped plots. This aligns well with the PR objective of enhancing graph styling capabilities.
Consider moving the numpy
import to the top of the file with other imports:
import numpy as np
# ... (other imports)
def apply_hatches(ax: Axes, num_segments: int) -> Axes:
"""Apply hatches to a plot. Useful for bars/histograms/area plots."""
hatches = _hatches_gen()
patch_groups = np.array_split(ax.patches, num_segments)
for patch_group in patch_groups:
hatch = next(hatches)
for patch in patch_group:
patch.set_hatch(hatch)
return ax
This change follows the common practice of placing all imports at the beginning of the file, improving code organization and readability.
return df[value_col].plot( | ||
kind="hist", | ||
ax=ax, | ||
legend=is_multi_histogram, | ||
color=colors, | ||
alpha=alpha, | ||
**kwargs, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance support for multiple value_col
Currently, when group_col
is not provided, and multiple value_col
are given, all histograms are plotted on the same axes, which might overlap and cause readability issues.
Consider allowing the option to plot multiple value_col
histograms in separate subplots or adjust transparency and stacking to improve clarity.
df_pivot = df.pivot(columns=group_col, values=value_col[0]) | ||
|
||
# Plot all columns at once | ||
return df_pivot.plot( | ||
kind="hist", | ||
ax=ax, | ||
legend=is_multi_histogram, | ||
alpha=alpha, | ||
color=colors, | ||
**kwargs, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Extend functionality to support multiple value_col
with group_col
The function currently raises an error when both value_col
is a list and group_col
is provided. Enhancing the function to handle multiple value columns with grouping would provide more flexibility.
Consider updating the code to plot multiple value_col
with group_col
. This would involve reshaping the DataFrame appropriately and adjusting the plotting logic.
pyretailscience/plots/histogram.py
Outdated
range_upper: float | None = None, | ||
range_method: Literal["clip", "fillna"] = "clip", | ||
use_hatch: bool = False, | ||
**kwargs: dict[str, any], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Any
from typing
instead of any
for type annotations
In the type annotation for **kwargs
, any
should be replaced with Any
from the typing
module for correct type hinting.
Apply this diff to fix the type annotation:
-from typing import Literal
+from typing import Any, Literal
...
def plot(
df: pd.DataFrame | pd.Series,
...
- **kwargs: dict[str, any],
+ **kwargs: dict[str, Any],
) -> SubplotBase:
Committable suggestion was skipped due to low confidence.
|
||
|
||
# Fixture for mocking graph_utils functions | ||
@pytest.fixture() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify fixture decorators by removing parentheses
When defining fixtures without parameters, the parentheses in @pytest.fixture()
are unnecessary. Removing them can make the code cleaner and adhere to pytest best practices.
Apply this diff to fix the issue:
-@pytest.fixture()
+@pytest.fixture
This change should be applied to lines 15, 24, 30, 40, and 54.
Also applies to: 24-24, 30-30, 40-40, 54-54
🧰 Tools
🪛 Ruff
15-15: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mrglglglglgl Tiny change here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. CI Failure Feedback 🧐
✨ CI feedback usage guide:The CI feedback tool (
In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:
where Configuration options
See more information about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (2)
pyretailscience/plots/histogram.py (2)
52-90
: Consider adding parameter validation and error handling.While the function signature is well-defined with type hints, consider adding explicit parameter validation for critical inputs. For example:
- Validate that
df
is not empty.- Ensure
range_lower
is less thanrange_upper
when both are provided.- Validate that
value_col
exists in the DataFrame when provided.Adding these checks can prevent potential runtime errors and provide more informative error messages to users.
140-160
: Consider improving error handling in_prepare_value_col
.The function handles most cases well, but consider adding a check to ensure that when
value_col
is provided as a list, all elements are strings. This can prevent potential issues later in the code.You could add something like:
if isinstance(value_col, list) and not all(isinstance(col, str) for col in value_col): raise ValueError("All elements in value_col must be strings when provided as a list.")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- pyretailscience/plots/histogram.py (1 hunks)
- pyretailscience/style/graph_utils.py (5 hunks)
🔇 Additional comments (11)
pyretailscience/style/graph_utils.py (7)
4-5
: LGTM: New imports support the hatching functionality.The addition of
Generator
fromcollections.abc
andcycle
fromitertools
is appropriate for implementing the new hatching feature. These imports are correctly placed at the top of the file with other imports.
8-8
: LGTM: numpy import added as suggested.The
numpy
import has been correctly added at the top of the file with other imports, addressing a previous review comment. This import supports the newapply_hatches
function.
15-22
: LGTM: Well-implemented _hatches_gen function.The new
_hatches_gen
function is a well-designed generator that efficiently cycles through predefined hatch patterns. It's properly documented and follows Python conventions for internal use (leading underscore). This function aligns with the PR objective of enhancing graph styling capabilities.
75-92
: LGTM: Excellent addition of _add_plot_legend function.The new
_add_plot_legend
function effectively centralizes and improves legend handling logic. It aligns perfectly with the PR objective of refining legend handling for better clarity. The function is well-documented and handles various scenarios, including existing legends, legend titles, and positioning options.
153-153
: LGTM: Improved legend handling in standard_graph_styles.The
standard_graph_styles
function now delegates legend handling to the new_add_plot_legend
helper function. This change enhances modularity and aligns with the PR objective of improving graph styling and legend handling. It simplifies the main function while maintaining clear functionality.
173-192
: LGTM: Excellent implementation of apply_hatches function.The new
apply_hatches
function is a valuable addition that enhances graph styling capabilities, aligning perfectly with the PR objectives. It efficiently applies hatch patterns to plot elements, supporting multiple segments for grouped plots. The function is well-documented and makes good use of numpy for performance optimization.
Line range hint
1-194
: Overall excellent enhancements to graph styling capabilities.The changes in this file significantly improve graph styling functionality and code organization. Key improvements include:
- New
_hatches_gen
andapply_hatches
functions for enhanced plot element styling.- Improved legend handling with the new
_add_plot_legend
function.- Refactored
standard_graph_styles
for better modularity.- Proper placement of imports and adherence to coding standards.
These changes align well with the PR objectives and address previous review comments. The new features and refactoring enhance the overall quality and capabilities of the graph utils module.
pyretailscience/plots/histogram.py (4)
1-40
: Comprehensive module-level documentation.The module-level docstring provides a thorough explanation of the module's functionality, core features, use cases, limitations, and additional features. This level of detail is excellent for users and maintainers of the code.
91-92
: Good error handling for incompatible inputs.The check for incompatible
value_col
andgroup_col
inputs is a good practice. It prevents potential issues early in the function execution.
199-216
: Well-implemented histogram count calculation.The
_get_num_histograms
function correctly handles bothvalue_col
andgroup_col
cases to determine the number of histograms. The implementation is concise and effective.
1-266
: Overall, well-implemented histogram plotting functionality with room for minor improvements.The
histogram.py
module provides a comprehensive and flexible solution for plotting histograms from pandas DataFrames or Series. Strengths of the implementation include:
- Detailed documentation at both module and function levels.
- Proper error handling and parameter validation in most cases.
- Support for both single and multiple histograms, with optional grouping.
- Range clipping and filling functionality.
Areas for potential improvement:
- Optimize performance in
apply_range_clipping
by using vectorized operations.- Extend support for multiple
value_col
whengroup_col
is provided in_plot_histogram
.- Add more comprehensive parameter validation in the main
plot
function.These improvements would enhance the module's efficiency and flexibility without compromising its current functionality.
pyretailscience/plots/histogram.py
Outdated
def apply_range_clipping( | ||
df: pd.DataFrame, | ||
value_col: list[str], | ||
range_lower: float | None, | ||
range_upper: float | None, | ||
range_method: Literal["clip", "fillna"], | ||
) -> pd.DataFrame: | ||
"""Applies range clipping or filling based on the provided method and returns the modified dataframe. | ||
|
||
Args: | ||
df (pd.DataFrame): The dataframe to apply range clipping to. | ||
value_col (list of str): The column(s) to apply clipping or filling to. | ||
range_lower (float, optional): Lower bound for clipping or filling NA values. | ||
range_upper (float, optional): Upper bound for clipping or filling NA values. | ||
range_method (Literal, optional): Whether to "clip" values outside the range or "fillna". Defaults to "clip". | ||
|
||
Returns: | ||
pd.DataFrame: The modified dataframe with the clipping or filling applied. | ||
""" | ||
if range_lower is None and range_upper is None: | ||
return df | ||
|
||
if range_method == "clip": | ||
return df.assign(**{col: df[col].clip(lower=range_lower, upper=range_upper) for col in value_col}) | ||
|
||
return df.assign( | ||
**{ | ||
col: df[col].apply( | ||
lambda x: np.nan | ||
if (range_lower is not None and x < range_lower) or (range_upper is not None and x > range_upper) | ||
else x, | ||
) | ||
for col in value_col | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimize performance using vectorized operations.
The current implementation uses apply
with a lambda function, which can be inefficient for large DataFrames. Consider using vectorized operations for better performance:
if range_method == "fillna":
return df.assign(
**{
col: df[col].where(
((range_lower is None) | (df[col] >= range_lower)) &
((range_upper is None) | (df[col] <= range_upper)),
np.nan
)
for col in value_col
}
)
This vectorized approach should be significantly faster, especially for large datasets.
pyretailscience/plots/histogram.py
Outdated
def _plot_histogram( | ||
df: pd.DataFrame, | ||
value_col: list[str], | ||
group_col: str | None, | ||
ax: Axes | None, | ||
colors: list[str], | ||
num_histograms: int, | ||
**kwargs: dict, | ||
) -> Axes: | ||
"""Plots histograms for the provided dataframe. | ||
|
||
Args: | ||
df (pd.DataFrame): The dataframe to plot. | ||
value_col (list of str): The column(s) to plot. | ||
group_col (str, optional): The column used to group data into multiple histograms. | ||
ax (Axes, optional): Matplotlib axes object to plot on. | ||
colors: The list of colors use for the plot. | ||
num_histograms (int): The number of histograms being plotted. | ||
**kwargs: Additional keyword arguments for Pandas' `plot` function. | ||
|
||
Returns: | ||
Axes: The matplotlib axes object with the plotted histogram. | ||
""" | ||
alpha = kwargs.pop("alpha", 0.5) | ||
|
||
is_multi_histogram = num_histograms > 1 | ||
|
||
if group_col is None: | ||
return df[value_col].plot( | ||
kind="hist", | ||
ax=ax, | ||
legend=is_multi_histogram, | ||
color=colors, | ||
alpha=alpha, | ||
**kwargs, | ||
) | ||
|
||
# if group_col is provided, only use a single value_col | ||
df_pivot = df.pivot(columns=group_col, values=value_col[0]) | ||
|
||
# Plot all columns at once | ||
return df_pivot.plot( | ||
kind="hist", | ||
ax=ax, | ||
legend=is_multi_histogram, | ||
alpha=alpha, | ||
color=colors, | ||
**kwargs, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider supporting multiple value_col
with group_col
.
Currently, when group_col
is provided, the function only uses the first value_col
. Consider extending the functionality to support multiple value_col
with group_col
. This could involve creating subplots for each value_col
or using a more complex pivoting strategy.
For example:
if group_col is not None:
fig, axes = plt.subplots(len(value_col), 1, figsize=(10, 5*len(value_col)))
for i, col in enumerate(value_col):
df_pivot = df.pivot(columns=group_col, values=col)
df_pivot.plot(kind="hist", ax=axes[i] if len(value_col) > 1 else axes,
legend=is_multi_histogram, alpha=alpha, color=colors, **kwargs)
axes[i].set_title(f'Histogram for {col}')
return axes
This would allow users to visualize multiple value columns grouped by a categorical variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason the legend isn't hatched? https://matplotlib.org/stable/gallery/lines_bars_and_markers/filled_step.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (21)
pyretailscience/style/graph_utils.py (4)
15-22
: LGTM: Well-implemented _hatches_gen function. Consider adding unit tests.The
_hatches_gen
function is well-implemented and documented. The use of a generator withcycle
is an efficient approach for repeating hatch patterns. The underscore prefix correctly indicates it's for internal use.To address the codecov/patch warning, consider adding unit tests for this function. Would you like assistance in generating test cases?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 21-22: pyretailscience/style/graph_utils.py#L21-L22
Added lines #L21 - L22 were not covered by tests
75-94
: LGTM: Improved legend handling. Consider minor enhancements and adding tests.The
_add_legend
function effectively centralizes legend-related logic, improving modularity and addressing the PR objective of refining legend handling. The underscore prefix correctly indicates it's for internal use.Consider the following minor enhancements:
- Use a more descriptive variable name instead of
has_legend
(e.g.,legend_exists
).- Consider using a ternary operator for the legend creation to reduce nesting:
legend = ax.legend(bbox_to_anchor=(1.05, 1), loc="upper left", frameon=False) if move_legend_outside else ax.legend(frameon=False)To address the codecov/patch warning, consider adding unit tests for this function. Would you like assistance in generating test cases?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 85-85: pyretailscience/style/graph_utils.py#L85
Added line #L85 was not covered by tests
[warning] 91-91: pyretailscience/style/graph_utils.py#L91
Added line #L91 was not covered by tests
174-202
: LGTM: Well-implemented apply_hatches function. Consider minor enhancement and adding tests.The
apply_hatches
function is a valuable addition that aligns with the PR objective of enhancing graph styling capabilities. It provides a flexible way to apply hatch patterns to different segments of a plot and includes proper handling for legend patches.Consider the following minor enhancement:
- Use
enumerate
in the loop to avoid callingnext(available_hatches)
separately:for i, patch_group in enumerate(patch_groups): hatch = list(available_hatches)[i % len(list(available_hatches))] for patch in patch_group: patch.set_hatch(hatch)To address the codecov/patch warning, consider adding unit tests for this function. Would you like assistance in generating test cases?
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 187-188: pyretailscience/style/graph_utils.py#L187-L188
Added lines #L187 - L188 were not covered by tests
[warning] 190-190: pyretailscience/style/graph_utils.py#L190
Added line #L190 was not covered by tests
[warning] 192-192: pyretailscience/style/graph_utils.py#L192
Added line #L192 was not covered by tests
[warning] 194-194: pyretailscience/style/graph_utils.py#L194
Added line #L194 was not covered by tests
[warning] 199-199: pyretailscience/style/graph_utils.py#L199
Added line #L199 was not covered by tests
[warning] 201-201: pyretailscience/style/graph_utils.py#L201
Added line #L201 was not covered by tests
Line range hint
1-292
: Overall: Excellent enhancements to graph styling capabilities. Consider improving test coverage.The changes in this file significantly improve the graph styling capabilities, aligning well with the PR objectives. The new functions (
_hatches_gen
,_add_legend
, andapply_hatches
) and modifications to existing functions enhance the modularity and flexibility of the graph styling code.To further improve the quality of this PR:
- Add unit tests for the new functions to address the codecov/patch warnings and ensure robust functionality.
- Consider implementing the minor enhancements suggested in the individual review comments.
Great work on these improvements! Let me know if you need any assistance with writing the additional tests or implementing the suggested enhancements.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 187-188: pyretailscience/style/graph_utils.py#L187-L188
Added lines #L187 - L188 were not covered by tests
[warning] 190-190: pyretailscience/style/graph_utils.py#L190
Added line #L190 was not covered by tests
[warning] 192-192: pyretailscience/style/graph_utils.py#L192
Added line #L192 was not covered by tests
[warning] 194-194: pyretailscience/style/graph_utils.py#L194
Added line #L194 was not covered by tests
[warning] 199-199: pyretailscience/style/graph_utils.py#L199
Added line #L199 was not covered by tests
[warning] 201-201: pyretailscience/style/graph_utils.py#L201
Added line #L201 was not covered by teststests/plots/test_line.py (4)
27-33
: LGTM: Improved color mocking fixture.The new
_mock_color_generators
fixture is a good improvement over the previous_mock_get_base_cmap
. It provides more flexibility and realism in color mocking for both single and multi-color scenarios.Consider adding a comment explaining the purpose of using
cycle
for color generation, as it might not be immediately obvious to all developers why this approach was chosen over a simple list.
277-302
: LGTM: New test for single value column plot.The
test_line_plot_single_value_col_calls_dataframe_plot
function is a valuable addition to the test suite. It verifies that theDataFrame.plot
method is called with the correct arguments for single line plots, which is crucial for ensuring the correct behavior of the plotting function.Consider adding an assertion to verify that the
x
parameter is passed correctly to theDataFrame.plot
method, as it's an important aspect of the plot configuration.
304-329
: LGTM: New test for grouped series plot.The
test_line_plot_grouped_series_calls_dataframe_plot
function is an excellent addition to the test suite. It verifies that theDataFrame.plot
method is called with the correct arguments for grouped line plots, ensuring the correct behavior of the plotting function when dealing with multiple series.Consider adding an assertion to verify that the
x
parameter is passed correctly to theDataFrame.plot
method, similar to the suggestion for the single value column test. Also, it might be beneficial to assert that thegroup_col
is used correctly in the plot configuration.
Line range hint
1-329
: Great improvements to the line plot test suite!The changes made to this test file significantly enhance the coverage and robustness of the line plotting functionality tests. The improvements include:
- A more flexible and realistic color mocking approach.
- Additional tests for edge cases (e.g., datetime index warnings).
- New tests to verify correct DataFrame.plot method calls for both single and grouped series plots.
These enhancements align well with the PR objectives and will help ensure the reliability of the line plotting feature.
Consider adding integration tests that verify the actual output of the plotting function, complementing these unit tests. This could involve comparing generated plot images to expected results, ensuring that the visual output meets the desired specifications.
tests/plots/test_histogram.py (11)
15-23
: Simplify fixture decoratorRemove the parentheses from the
@pytest.fixture()
decorator to align with pytest best practices.Apply this change:
-@pytest.fixture() +@pytest.fixtureThis change should also be applied to the other fixtures in this file (lines 26, 32, and 39).
🧰 Tools
🪛 Ruff
15-15: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
49-62
: Enhance test assertions fortest_plot_single_histogram
While the current test checks for the basics, consider adding more specific assertions to ensure the histogram is plotted correctly.
Consider adding the following assertions:
Check if the title is set correctly:
assert result_ax.get_title() == "Test Single Histogram"Verify the number of bins (assuming a default of 10 bins):
assert len(result_ax.patches) == 10Check if the x-axis label is set to the value column name:
assert result_ax.get_xlabel() == "value_1"These additional assertions will make the test more robust and informative.
65-78
: Enhance test assertions fortest_plot_grouped_histogram
The current test checks the basics, but adding more specific assertions would improve its effectiveness.
Consider adding the following assertions:
Check if the title is set correctly:
assert result_ax.get_title() == "Test Grouped Histogram"Verify that a legend is present:
assert result_ax.get_legend() is not NoneCheck if the legend has the correct number of entries (should be 2 for groups A and B):
assert len(result_ax.get_legend().get_texts()) == 2Verify the x-axis label is set to the value column name:
assert result_ax.get_xlabel() == "value_1"These additional assertions will provide a more comprehensive test of the grouped histogram functionality.
82-104
: Improve range clipping test assertionsThe current test checks if all values are within the specified range, but it could be more precise by checking both lower and upper bounds separately.
Consider modifying the assertion as follows:
# Check lower bound assert all(val >= range_lower - np.finfo(np.float64).eps for val in clipped_values), "Some values are below the lower bound" # Check upper bound assert all(val <= range_upper + np.finfo(np.float64).eps for val in clipped_values), "Some values are above the upper bound" # Check that at least one value is at or near each bound assert any(abs(val - range_lower) < np.finfo(np.float64).eps for val in clipped_values), "No values near the lower bound" assert any(abs(val - range_upper) < np.finfo(np.float64).eps for val in clipped_values), "No values near the upper bound"This modification will provide more detailed information about any potential issues with the range clipping.
108-130
: Improve range fillna test assertionsSimilar to the range clipping test, this test could be more precise by checking both lower and upper bounds separately and ensuring that the fillna method is correctly applied.
Consider modifying the assertions as follows:
# Check lower bound assert all(val >= range_lower - np.finfo(np.float64).eps for val in clipped_values), "Some values are below the lower bound" # Check upper bound assert all(val <= range_upper + np.finfo(np.float64).eps for val in clipped_values), "Some values are above the upper bound" # Check that values outside the range are replaced with NaN original_values = sample_dataframe["value_1"] assert np.isnan(original_values[original_values < range_lower]).all(), "Values below lower bound not replaced with NaN" assert np.isnan(original_values[original_values > range_upper]).all(), "Values above upper bound not replaced with NaN" # Check that values within the range are unchanged assert np.array_equal( original_values[(original_values >= range_lower) & (original_values <= range_upper)], sample_dataframe["value_1"][(original_values >= range_lower) & (original_values <= range_upper)], equal_nan=True ), "Values within range were changed"These modifications will provide a more comprehensive test of the range fillna functionality.
134-145
: Enhance test assertions fortest_plot_single_histogram_series
The current test checks the basics for plotting a histogram from a Series, but it could be improved with more specific assertions.
Consider adding the following assertions:
Check if the title is set correctly:
assert result_ax.get_title() == "Test Single Histogram (Series)"Verify the number of bins (assuming a default of 10 bins):
assert len(result_ax.patches) == 10Check if the x-axis label is set to the Series name (if applicable):
if sample_series.name: assert result_ax.get_xlabel() == sample_series.nameVerify that the y-axis label is set to "Count" or "Frequency":
assert result_ax.get_ylabel() in ["Count", "Frequency"]These additional assertions will provide a more comprehensive test of the Series histogram functionality.
149-161
: Improve hatching test assertionsWhile the current test verifies that
apply_hatches
is called, it doesn't check if the hatching is actually applied to the plot.Consider adding the following assertions:
Check if hatching is applied to the patches:
assert all(patch.get_hatch() is not None for patch in result_ax.patches), "Not all patches have hatching applied"Verify that all patches have the same hatch pattern (assuming a single hatch pattern is used):
hatch_patterns = set(patch.get_hatch() for patch in result_ax.patches) assert len(hatch_patterns) == 1, "Multiple hatch patterns found"These additional checks will ensure that the hatching is correctly applied to the histogram.
180-200
: Enhance legend position testThe current test verifies that
standard_graph_styles
is called with the correct parameters, but it doesn't check the actual position of the legend on the plot.Consider adding the following assertions:
Check if the legend is present:
assert result_ax.get_legend() is not None, "Legend is not present"Verify that the legend is outside the plot area:
legend = result_ax.get_legend() assert legend._loc not in [1, 2, 3, 4], "Legend is not outside the plot area"Check if the legend position is to the right of the plot (assuming that's the desired location):
bbox = legend.get_bbox_to_anchor() assert bbox.x0 >= 1.0, "Legend is not positioned to the right of the plot"These additional checks will ensure that the legend is correctly positioned outside the plot area.
204-217
: Improve source text testThe current test verifies that
add_source_text
is called with the correct parameters, but it doesn't check if the text is actually present on the plot.Consider adding the following assertion:
# Check if the source text is present in the plot's text elements assert any(text.get_text() == source_text for text in result_ax.texts), "Source text not found in the plot"This additional check will ensure that the source text is actually added to the plot.
221-233
: Enhance multiple histograms testThe current test checks if patches are plotted, but it could be more specific about the number of histograms and their properties.
Consider adding the following assertions:
Check if two histograms are plotted:
assert len(set(patch.get_facecolor() for patch in result_ax.patches)) == 2, "Expected two different histogram colors"Verify that a legend is present with two entries:
legend = result_ax.get_legend() assert legend is not None, "Legend is missing" assert len(legend.get_texts()) == 2, "Expected two legend entries"Check if the legend labels match the column names:
legend_labels = [text.get_text() for text in legend.get_texts()] assert set(legend_labels) == set(["value_1", "value_2"]), "Legend labels do not match column names"Verify that the x-axis label is not set (as it's ambiguous for multiple histograms):
assert result_ax.get_xlabel() == "", "X-axis label should not be set for multiple histograms"These additional checks will provide a more comprehensive test of the multiple histograms functionality.
237-299
: Enhance DataFrame.plot call testsThe current tests verify that
DataFrame.plot
is called with correct arguments, but they could be more specific about the exact values passed.For both
test_plot_single_histogram_calls_dataframe_plot
andtest_plot_grouped_histogram_calls_dataframe_plot
, consider adding the following improvements:
Use
assert_called_once_with
instead ofassert_called_once
and then checking arguments separately. This ensures all arguments are checked in a single assertion.For the single histogram test:
mock_df_plot.assert_called_once_with( kind="hist", y=value_col, ax=mocker.ANY, legend=False, color=mocker.ANY, alpha=None, bins=mocker.ANY, # Add this to check if bins argument is passed )For the grouped histogram test:
mock_df_plot.assert_called_once_with( kind="hist", y=value_col, ax=mocker.ANY, legend=True, color=mocker.ANY, alpha=0.7, bins=mocker.ANY, # Add this to check if bins argument is passed by=group_col, # Add this to check if the grouping column is correctly passed )Add assertions to check if the range clipping is applied correctly:
# Check if the data was clipped before plotting assert (sample_dataframe[value_col] >= range_lower).all() and (sample_dataframe[value_col] <= range_upper).all(), "Data was not clipped correctly"These modifications will provide a more comprehensive test of the plotting function's behavior.
tests/test_standard_graphs.py (2)
187-211
: LGTM: Comprehensive test for color cycling inget_multi_color_cmap
.This test effectively verifies the cycling behavior of the
get_multi_color_cmap
function. It ensures that the color sequence repeats correctly after the seventh color and that the second cycle uses lighter shades (300 instead of 500).Consider adding a comment explaining the significance of the color shade change in the second cycle (from 500 to 300) to improve the test's readability and document this important behavior.
132-244
: Great addition of comprehensive tests for color mapping functions.The new test functions significantly enhance the coverage of
get_single_color_cmap
andget_multi_color_cmap
. They effectively verify various aspects including basic functionality, cycling behavior, and edge cases. The tests are well-structured and use appropriate assertions.To further improve the test suite, consider adding a test case for
get_multi_color_cmap
with more than 14 colors to ensure it continues cycling correctly beyond two full cycles.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (2)
docs/assets/images/analysis_modules/plots/histogram_plot.svg
is excluded by!**/*.svg
docs/assets/images/analysis_modules/plots/line_plot.svg
is excluded by!**/*.svg
📒 Files selected for processing (6)
- pyretailscience/plots/histogram.py (1 hunks)
- pyretailscience/plots/line.py (3 hunks)
- pyretailscience/style/graph_utils.py (5 hunks)
- tests/plots/test_histogram.py (1 hunks)
- tests/plots/test_line.py (12 hunks)
- tests/test_standard_graphs.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pyretailscience/plots/line.py
🧰 Additional context used
🪛 GitHub Check: codecov/patch
pyretailscience/plots/histogram.py
[warning] 155-155: pyretailscience/plots/histogram.py#L155
Added line #L155 was not covered by testspyretailscience/style/graph_utils.py
[warning] 21-22: pyretailscience/style/graph_utils.py#L21-L22
Added lines #L21 - L22 were not covered by tests
[warning] 85-85: pyretailscience/style/graph_utils.py#L85
Added line #L85 was not covered by tests
[warning] 91-91: pyretailscience/style/graph_utils.py#L91
Added line #L91 was not covered by tests
[warning] 187-188: pyretailscience/style/graph_utils.py#L187-L188
Added lines #L187 - L188 were not covered by tests
[warning] 190-190: pyretailscience/style/graph_utils.py#L190
Added line #L190 was not covered by tests
[warning] 192-192: pyretailscience/style/graph_utils.py#L192
Added line #L192 was not covered by tests
[warning] 194-194: pyretailscience/style/graph_utils.py#L194
Added line #L194 was not covered by tests
[warning] 199-199: pyretailscience/style/graph_utils.py#L199
Added line #L199 was not covered by tests
[warning] 201-201: pyretailscience/style/graph_utils.py#L201
Added line #L201 was not covered by tests
🪛 Ruff
tests/plots/test_histogram.py
15-15: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
26-26: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
32-32: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
39-39: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
🔇 Additional comments (18)
pyretailscience/style/graph_utils.py (4)
4-5
: LGTM: New imports support the added functionality.The addition of
Generator
fromcollections.abc
andcycle
fromitertools
is appropriate for the new_hatches_gen
function. These imports are correctly placed at the top of the file, following Python's import conventions.
8-8
: LGTM: numpy import added as suggested.The addition of the
numpy
import at the top of the file aligns with the previous review suggestion and follows Python's import conventions. This import is necessary for the newapply_hatches
function.
124-125
: LGTM: Enhanced graph styling for improved visual clarity.The changes to set the background color to white and position grid lines behind the plot align with the PR objective of improving graph styling. These modifications enhance the overall visual clarity of the graphs.
154-154
: LGTM: Integrated new legend handling into standard_graph_styles.The updated return statement effectively integrates the new
_add_legend
function into thestandard_graph_styles
workflow. This change maintains the function's return type (Axes) while incorporating the improved legend handling logic.tests/plots/test_line.py (3)
4-4
: LGTM: Import addition for color cycling.The addition of
from itertools import cycle
is appropriate for the new color generation mocking approach.
43-43
: LGTM: Updated fixture usage in existing tests.The test functions have been correctly updated to use the new
_mock_color_generators
fixture, maintaining consistency with the new color mocking approach.Also applies to: 64-64
Line range hint
247-274
: LGTM: New test for datetime index warning.This new test function
test_plot_with_datetime_index_warns
is a valuable addition. It ensures that users are properly warned when using datetime-like indices, guiding them towards the more appropriateplots.time_line
module. The test is well-implemented and covers an important edge case.tests/plots/test_histogram.py (1)
1-299
: Overall assessment of the test suiteThe test suite for the histogram plotting functionality is comprehensive and well-structured. It covers various scenarios including single and grouped histograms, range clipping, and different input types (DataFrame and Series). The use of fixtures and mocks is appropriate and helps in isolating the tests.
However, there are opportunities to enhance the test coverage and specificity:
- Many tests could benefit from more detailed assertions to verify specific aspects of the plotted histograms.
- Some tests could be improved by checking the actual visual elements of the plots, not just the function calls.
- The range clipping and fillna tests could be more precise in their checks.
- Tests for multiple histograms and legend positioning could be more comprehensive.
Implementing the suggested improvements will result in a more robust and informative test suite, increasing confidence in the histogram plotting functionality.
🧰 Tools
🪛 Ruff
15-15: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
26-26: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
32-32: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
39-39: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
pyretailscience/plots/histogram.py (5)
1-40
: Excellent module documentation!The comprehensive docstring provides a clear overview of the module's functionality, including core features, use cases, limitations, and additional features. This level of detail will greatly assist users in understanding and effectively utilizing the histogram plotting capabilities.
42-49
: Clean and appropriate importsThe imports are concise, relevant, and well-organized. The use of type hinting imports and local imports from the pyretailscience package demonstrates good practices in module organization.
52-90
: Well-structured and documentedplot
functionThe
plot
function is comprehensive, with a flexible signature that accommodates various plotting scenarios. The use of type hints and a detailed docstring enhances code readability and usability.
141-264
: Well-structured and documented helper functionsThe helper functions (_prepare_value_col, _apply_range_clipping, _get_num_histograms, and _plot_histogram) are well-organized, with clear purposes, appropriate type hints, and informative docstrings. This modular approach enhances code readability and maintainability.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 155-155: pyretailscience/plots/histogram.py#L155
Added line #L155 was not covered by tests
1-264
: Excellent implementation of histogram plotting functionalityOverall, this module provides a comprehensive and well-implemented solution for creating histograms from pandas DataFrames or Series. The code is well-structured, thoroughly documented, and follows good programming practices. The modular approach with helper functions enhances readability and maintainability.
Key strengths:
- Comprehensive module-level documentation
- Flexible
plot
function with many customization options- Well-structured helper functions with clear purposes
- Proper use of type hints and informative docstrings
Areas for minor improvement:
- Add error handling for invalid
range_method
values in theplot
function- Optimize the
_apply_range_clipping
function using vectorized operations for better performanceThese improvements will further enhance the robustness and efficiency of the module. Great work on implementing this feature!
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 155-155: pyretailscience/plots/histogram.py#L155
Added line #L155 was not covered by teststests/test_standard_graphs.py (5)
132-145
: LGTM: Well-structured test forget_single_color_cmap
.The test effectively validates the first three colors returned by the
get_single_color_cmap
function. The use ofislice
is appropriate for obtaining a specific number of items from the generator.
148-164
: LGTM: Comprehensive test for color cycling inget_single_color_cmap
.This test effectively verifies the cycling behavior of the
get_single_color_cmap
function. It ensures that the color sequence repeats correctly after the third color, which is crucial for the function's intended behavior.
167-184
: LGTM: Thorough test forget_multi_color_cmap
with seven colors.This test effectively verifies that the
get_multi_color_cmap
function returns the expected sequence of seven colors. It ensures that the function is correctly implementing the multi-color scheme as intended.
214-226
: LGTM: Good edge case test forget_single_color_cmap
.This test effectively verifies the behavior of
get_single_color_cmap
when only two colors are requested. It's a valuable addition to the test suite, complementing the three-color test and helping to ensure the function works correctly for different numbers of colors.
229-244
: LGTM: Valuable edge case test forget_multi_color_cmap
.This test effectively verifies the behavior of
get_multi_color_cmap
when five colors are requested. It's a useful addition to the test suite, complementing the seven-color test and helping to ensure the function works correctly for different numbers of colors within its cycle.
pyretailscience/plots/histogram.py
Outdated
if range_lower is not None and range_upper is not None: | ||
df = _apply_range_clipping( | ||
df=df, | ||
value_col=value_col, | ||
range_lower=range_lower, | ||
range_upper=range_upper, | ||
range_method=range_method, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling for invalid range_method
values
While the function handles various input scenarios well, it doesn't explicitly handle cases where an invalid range_method
is provided. Consider adding a check to raise a ValueError
for invalid range_method
values:
if range_method not in ["clip", "fillna"]:
raise ValueError(f"Invalid range_method: {range_method}. Expected 'clip' or 'fillna'.")
This will prevent unexpected behavior and provide clear feedback to users.
pyretailscience/plots/histogram.py
Outdated
def _apply_range_clipping( | ||
df: pd.DataFrame, | ||
value_col: list[str], | ||
range_lower: float, | ||
range_upper: float, | ||
range_method: Literal["clip", "fillna"], | ||
) -> pd.DataFrame: | ||
"""Applies range clipping or filling based on the provided method and returns the modified dataframe. | ||
|
||
Args: | ||
df (pd.DataFrame): The dataframe to apply range clipping to. | ||
value_col (list of str): The column(s) to apply clipping or filling to. | ||
range_lower (float): Lower bound for clipping or filling NA values. | ||
range_upper (float): Upper bound for clipping or filling NA values. | ||
range_method (Literal, optional): Whether to "clip" values outside the range or "fillna". Defaults to "clip". | ||
|
||
Returns: | ||
pd.DataFrame: The modified dataframe with the clipping or filling applied. | ||
""" | ||
if range_method == "clip": | ||
return df.assign(**{col: df[col].clip(lower=range_lower, upper=range_upper) for col in value_col}) | ||
|
||
return df.assign( | ||
**{ | ||
col: df[col].apply( | ||
lambda x: np.nan | ||
if (range_lower is not None and x < range_lower) or (range_upper is not None and x > range_upper) | ||
else x, | ||
) | ||
for col in value_col | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimize performance in _apply_range_clipping
function
The current implementation of _apply_range_clipping
uses apply
with a lambda function, which can be inefficient for large DataFrames. Consider using vectorized operations for better performance:
def _apply_range_clipping(
df: pd.DataFrame,
value_col: list[str],
range_lower: float,
range_upper: float,
range_method: Literal["clip", "fillna"],
) -> pd.DataFrame:
if range_method == "clip":
return df.assign(**{col: df[col].clip(lower=range_lower, upper=range_upper) for col in value_col})
return df.assign(
**{
col: df[col].where(
((range_lower is None) | (df[col] >= range_lower)) &
((range_upper is None) | (df[col] <= range_upper)),
np.nan
)
for col in value_col
}
)
This vectorized approach should be significantly faster, especially for large datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- pyretailscience/plots/histogram.py (1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
pyretailscience/plots/histogram.py
[warning] 155-155: pyretailscience/plots/histogram.py#L155
Added line #L155 was not covered by tests
🔇 Additional comments (2)
pyretailscience/plots/histogram.py (2)
1-40
: Well-structured and comprehensive module-level docstring.The module-level docstring provides an excellent overview of the histogram plotting functionality. It clearly outlines core features, use cases, limitations, and additional features, which will greatly assist users in understanding and utilizing the module effectively.
1-263
: Overall, excellent implementation with room for minor enhancements.The
histogram.py
module provides a comprehensive and well-structured solution for creating histograms from pandas DataFrames or Series. The code is well-documented, handles various scenarios, and includes appropriate error checking.Key strengths:
- Comprehensive module-level docstring
- Flexible functionality supporting both single and grouped histograms
- Well-organized helper functions
- Proper type hinting and error handling
Suggested enhancements:
- Use
typing.Union
for better compatibility with older Python versions- Add error handling for invalid
range_method
values- Optimize the
_apply_range_clipping
function for better performance- Extend support for multiple
value_col
withgroup_col
These enhancements will further improve the module's robustness, performance, and flexibility.
🧰 Tools
🪛 GitHub Check: codecov/patch
[warning] 155-155: pyretailscience/plots/histogram.py#L155
Added line #L155 was not covered by tests
pyretailscience/plots/histogram.py
Outdated
def plot( | ||
df: pd.DataFrame | pd.Series, | ||
value_col: str | list[str] | None = None, | ||
group_col: str | None = None, | ||
title: str | None = None, | ||
x_label: str | None = None, | ||
y_label: str | None = None, | ||
legend_title: str | None = None, | ||
ax: Axes | None = None, | ||
source_text: str | None = None, | ||
move_legend_outside: bool = False, | ||
range_lower: float | None = None, | ||
range_upper: float | None = None, | ||
range_method: Literal["clip", "fillna"] = "clip", | ||
use_hatch: bool = False, | ||
**kwargs: dict[str, Any], | ||
) -> SubplotBase: | ||
"""Plots a histogram of `value_col`, optionally split by `group_col`. | ||
|
||
Args: | ||
df (pd.DataFrame | pd.Series): The dataframe (or series) to plot. | ||
value_col (str or list of str, optional): The column(s) to plot. Can be a list of columns for multiple histograms. | ||
group_col (str, optional): The column used to define different histograms. | ||
title (str, optional): The title of the plot. | ||
x_label (str, optional): The x-axis label. | ||
y_label (str, optional): The y-axis label. | ||
legend_title (str, optional): The title of the legend. | ||
ax (Axes, optional): Matplotlib axes object to plot on. | ||
source_text (str, optional): The source text to add to the plot. | ||
move_legend_outside (bool, optional): Move the legend outside the plot. | ||
range_lower (float, optional): Lower bound for clipping or filling NA values. | ||
range_upper (float, optional): Upper bound for clipping or filling NA values. | ||
range_method (str, optional): Whether to "clip" values outside the range or "fillna". Defaults to "clip". | ||
use_hatch (bool, optional): Whether to use hatching for the bars. | ||
**kwargs: Additional keyword arguments for Pandas' `plot` function. | ||
|
||
Returns: | ||
SubplotBase: The matplotlib axes object. | ||
""" | ||
if isinstance(value_col, list) and group_col is not None: | ||
raise ValueError("`value_col` cannot be a list when `group_col` is provided. Please choose one or the other.") | ||
|
||
value_col = _prepare_value_col(df=df, value_col=value_col) | ||
|
||
if isinstance(df, pd.Series): | ||
df = df.to_frame(name=value_col[0]) | ||
|
||
if range_lower is not None and range_upper is not None: | ||
df = _apply_range_clipping( | ||
df=df, | ||
value_col=value_col, | ||
range_lower=range_lower, | ||
range_upper=range_upper, | ||
range_method=range_method, | ||
) | ||
|
||
num_histograms = _get_num_histograms(df=df, value_col=value_col, group_col=group_col) | ||
|
||
color_gen = get_multi_color_cmap() | ||
colors = [next(color_gen) for _ in range(num_histograms)] | ||
|
||
ax = _plot_histogram( | ||
df=df, | ||
value_col=value_col, | ||
group_col=group_col, | ||
ax=ax, | ||
colors=colors, | ||
num_histograms=num_histograms, | ||
**kwargs, | ||
) | ||
|
||
ax = gu.standard_graph_styles( | ||
ax=ax, | ||
title=title, | ||
x_label=x_label, | ||
y_label=y_label, | ||
legend_title=legend_title, | ||
move_legend_outside=move_legend_outside, | ||
) | ||
|
||
if use_hatch: | ||
ax = gu.apply_hatches(ax=ax, num_segments=num_histograms) | ||
|
||
if source_text: | ||
gu.add_source_text(ax=ax, source_text=source_text) | ||
|
||
return gu.standard_tick_styles(ax=ax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for invalid range_method
values.
The plot
function handles various scenarios well, but it doesn't explicitly check for invalid range_method
values. Consider adding a check to raise a ValueError
for invalid range_method
values:
Add the following check at the beginning of the function:
if range_method not in ["clip", "fillna"]:
raise ValueError(f"Invalid range_method: {range_method}. Expected 'clip' or 'fillna'.")
This will prevent unexpected behavior and provide clear feedback to users.
pyretailscience/plots/histogram.py
Outdated
def _apply_range_clipping( | ||
df: pd.DataFrame, | ||
value_col: list[str], | ||
range_lower: float, | ||
range_upper: float, | ||
range_method: Literal["clip", "fillna"], | ||
) -> pd.DataFrame: | ||
"""Applies range clipping or filling based on the provided method and returns the modified dataframe. | ||
|
||
Args: | ||
df (pd.DataFrame): The dataframe to apply range clipping to. | ||
value_col (list of str): The column(s) to apply clipping or filling to. | ||
range_lower (float): Lower bound for clipping or filling NA values. | ||
range_upper (float): Upper bound for clipping or filling NA values. | ||
range_method (Literal, optional): Whether to "clip" values outside the range or "fillna". Defaults to "clip". | ||
|
||
Returns: | ||
pd.DataFrame: The modified dataframe with the clipping or filling applied. | ||
""" | ||
if range_method == "clip": | ||
return df.assign(**{col: df[col].clip(lower=range_lower, upper=range_upper) for col in value_col}) | ||
|
||
return df.assign( | ||
**{ | ||
col: df[col].where( | ||
((range_lower is None) | (df[col] >= range_lower)) & ((range_upper is None) | (df[col] <= range_upper)), | ||
np.nan, | ||
) | ||
for col in value_col | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimize performance in _apply_range_clipping
function.
The current implementation of _apply_range_clipping
uses a list comprehension with where
method, which is good. However, we can further optimize it by using a single where
operation for all columns:
Consider refactoring the function as follows:
def _apply_range_clipping(
df: pd.DataFrame,
value_col: list[str],
range_lower: float,
range_upper: float,
range_method: Literal["clip", "fillna"],
) -> pd.DataFrame:
if range_method == "clip":
return df.assign(**{col: df[col].clip(lower=range_lower, upper=range_upper) for col in value_col})
mask = ((range_lower is None) | (df[value_col] >= range_lower)) & ((range_upper is None) | (df[value_col] <= range_upper))
return df.assign(**{col: df[col].where(mask[col], np.nan) for col in value_col})
This approach creates a single boolean mask for all columns at once, which can be more efficient for large DataFrames with multiple value columns.
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. CI Failure Feedback 🧐
✨ CI feedback usage guide:The CI feedback tool (
In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:
where Configuration options
See more information about the |
pyretailscience/plots/histogram.py
Outdated
range_upper=range_upper, | ||
range_method=range_method, | ||
) | ||
if range_lower is not None and range_upper is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the range values could be none. For instance if you want to only upper or lower clip the values.
pyretailscience/plots/histogram.py
Outdated
range_lower: float | None, | ||
range_upper: float | None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar hear. Either of the values could be none
|
||
|
||
# Fixture for mocking graph_utils functions | ||
@pytest.fixture() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Mrglglglglgl Tiny change here
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. CI Failure Feedback 🧐
✨ CI feedback usage guide:The CI feedback tool (
In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:
where Configuration options
See more information about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (2)
tests/plots/test_histogram.py (2)
15-15
: Consider simplifying the fixture decoratorThe
@pytest.fixture()
decorator can be simplified to@pytest.fixture
without parentheses when no parameters are needed.Apply this change:
-@pytest.fixture() +@pytest.fixtureThis change should also be applied to the fixtures on lines 26, 32, and 39.
🧰 Tools
🪛 Ruff
15-15: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
104-104
: Consider using a larger epsilon value for floating-point comparisonsThe current epsilon value
np.finfo(np.float64).eps
is extremely small and might lead to false negatives in floating-point comparisons. A slightly larger epsilon value could provide more robust comparisons while still maintaining accuracy.Consider using a larger epsilon value, such as
1e-9
:-assert all(range_lower <= val + np.finfo(np.float64).eps <= range_upper for val in clipped_values) +assert all(range_lower <= val + 1e-9 <= range_upper for val in clipped_values)This change should also be applied to similar assertions in other test functions (e.g., lines 130, 154, 178, 203).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- pyretailscience/plots/histogram.py (1 hunks)
- tests/plots/test_histogram.py (1 hunks)
🧰 Additional context used
🪛 Ruff
tests/plots/test_histogram.py
15-15: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
26-26: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
32-32: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
39-39: Use
@pytest.fixture
over@pytest.fixture()
Remove parentheses
(PT001)
🔇 Additional comments (6)
pyretailscience/plots/histogram.py (5)
1-40
: Comprehensive module-level docstring.The module-level docstring provides a clear and detailed explanation of the module's functionality, features, use cases, and limitations. This is excellent documentation practice.
141-161
: Well-implemented value column preparation.The
_prepare_value_col
function effectively handles different input types and edge cases, ensuring thatvalue_col
is always a list of strings.
203-220
: Correct implementation of histogram count calculation.The
_get_num_histograms
function correctly calculates the number of histograms based on the input parameters and handles both grouped and non-grouped cases.
1-270
: Overall, well-implemented histogram plotting module with room for minor improvements.The
histogram.py
module provides comprehensive functionality for creating histograms from pandas DataFrames or Series. The code is well-documented, handles various input types and scenarios, and generally follows good programming practices.Key strengths:
- Comprehensive module-level and function-level docstrings.
- Flexible handling of input types (DataFrame and Series).
- Support for range clipping and filling.
- Efficient use of vectorized operations.
Areas for potential improvement:
- Use
typing.Union
for better compatibility with older Python versions.- Add error handling for invalid
range_method
values in the mainplot
function.- Optimize the
_apply_range_clipping
function for the "fillna" method.- Extend functionality to support multiple value columns with grouping in
_plot_histogram
.These improvements would enhance the module's robustness, performance, and flexibility.
42-50
: 🛠️ Refactor suggestionConsider using
typing.Union
for better compatibility.The use of the
|
operator in type hints (e.g.,pd.DataFrame | pd.Series
) is only supported in Python 3.10 and newer. To maintain compatibility with earlier Python versions, consider usingtyping.Union
.Apply this diff to update the imports:
-from typing import Any, Literal +from typing import Any, Literal, UnionThen update type hints throughout the file, for example:
-def plot(df: pd.DataFrame | pd.Series, ... +def plot(df: Union[pd.DataFrame, pd.Series], ...Likely invalid or redundant comment.
tests/plots/test_histogram.py (1)
309-371
: Good job addressing the mocking issueThe implementation of
test_plot_single_histogram_calls_dataframe_plot
andtest_plot_grouped_histogram_calls_dataframe_plot
correctly addresses the issue mentioned in the past review comments. By mockingpandas.DataFrame.plot
directly and asserting on the mock, you've ensured that the tests accurately capture the calls made to the plot method.
def plot( | ||
df: pd.DataFrame | pd.Series, | ||
value_col: str | list[str] | None = None, | ||
group_col: str | None = None, | ||
title: str | None = None, | ||
x_label: str | None = None, | ||
y_label: str | None = None, | ||
legend_title: str | None = None, | ||
ax: Axes | None = None, | ||
source_text: str | None = None, | ||
move_legend_outside: bool = False, | ||
range_lower: float | None = None, | ||
range_upper: float | None = None, | ||
range_method: Literal["clip", "fillna"] = "clip", | ||
use_hatch: bool = False, | ||
**kwargs: dict[str, Any], | ||
) -> SubplotBase: | ||
"""Plots a histogram of `value_col`, optionally split by `group_col`. | ||
|
||
Args: | ||
df (pd.DataFrame | pd.Series): The dataframe (or series) to plot. | ||
value_col (str or list of str, optional): The column(s) to plot. Can be a list of columns for multiple histograms. | ||
group_col (str, optional): The column used to define different histograms. | ||
title (str, optional): The title of the plot. | ||
x_label (str, optional): The x-axis label. | ||
y_label (str, optional): The y-axis label. | ||
legend_title (str, optional): The title of the legend. | ||
ax (Axes, optional): Matplotlib axes object to plot on. | ||
source_text (str, optional): The source text to add to the plot. | ||
move_legend_outside (bool, optional): Move the legend outside the plot. | ||
range_lower (float, optional): Lower bound for clipping or filling NA values. | ||
range_upper (float, optional): Upper bound for clipping or filling NA values. | ||
range_method (str, optional): Whether to "clip" values outside the range or "fillna". Defaults to "clip". | ||
use_hatch (bool, optional): Whether to use hatching for the bars. | ||
**kwargs: Additional keyword arguments for Pandas' `plot` function. | ||
|
||
Returns: | ||
SubplotBase: The matplotlib axes object. | ||
""" | ||
if isinstance(value_col, list) and group_col is not None: | ||
raise ValueError("`value_col` cannot be a list when `group_col` is provided. Please choose one or the other.") | ||
|
||
value_col = _prepare_value_col(df=df, value_col=value_col) | ||
|
||
if isinstance(df, pd.Series): | ||
df = df.to_frame(name=value_col[0]) | ||
|
||
if (range_lower is not None) or (range_upper is not None): | ||
df = _apply_range_clipping( | ||
df=df, | ||
value_col=value_col, | ||
range_lower=range_lower, | ||
range_upper=range_upper, | ||
range_method=range_method, | ||
) | ||
|
||
num_histograms = _get_num_histograms(df=df, value_col=value_col, group_col=group_col) | ||
|
||
color_gen = get_multi_color_cmap() | ||
colors = [next(color_gen) for _ in range(num_histograms)] | ||
|
||
ax = _plot_histogram( | ||
df=df, | ||
value_col=value_col, | ||
group_col=group_col, | ||
ax=ax, | ||
colors=colors, | ||
num_histograms=num_histograms, | ||
**kwargs, | ||
) | ||
|
||
ax = gu.standard_graph_styles( | ||
ax=ax, | ||
title=title, | ||
x_label=x_label, | ||
y_label=y_label, | ||
legend_title=legend_title, | ||
move_legend_outside=move_legend_outside, | ||
) | ||
|
||
if use_hatch: | ||
ax = gu.apply_hatches(ax=ax, num_segments=num_histograms) | ||
|
||
if source_text: | ||
gu.add_source_text(ax=ax, source_text=source_text) | ||
|
||
return gu.standard_tick_styles(ax=ax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for invalid range_method
values.
The plot
function handles various scenarios well, but it doesn't explicitly check for invalid range_method
values. Consider adding a check to raise a ValueError
for invalid range_method
values.
Add the following check at the beginning of the function:
if range_method not in ["clip", "fillna"]:
raise ValueError(f"Invalid range_method: {range_method}. Expected 'clip' or 'fillna'.")
This will prevent unexpected behavior and provide clear feedback to users.
def _apply_range_clipping( | ||
df: pd.DataFrame, | ||
value_col: list[str], | ||
range_lower: float | None = None, | ||
range_upper: float | None = None, | ||
range_method: Literal["clip", "fillna"] = "fillna", | ||
) -> pd.DataFrame: | ||
"""Applies range clipping or filling based on the provided method and returns the modified dataframe. | ||
|
||
Args: | ||
df (pd.DataFrame): The dataframe to apply range clipping to. | ||
value_col (list of str): The column(s) to apply clipping or filling to. | ||
range_lower (float | None, optional): Lower bound for clipping or filling NA values. | ||
range_upper (float | None, optional): Upper bound for clipping or filling NA values. | ||
range_method (Literal, optional): Whether to "clip" values outside the range or "fillna". Defaults to "fillna". | ||
|
||
Returns: | ||
pd.DataFrame: The modified dataframe with the clipping or filling applied. | ||
""" | ||
if range_method not in ["clip", "fillna"]: | ||
error_msg = f"Invalid range_method: {range_method}. Expected 'clip' or 'fillna'." | ||
raise ValueError(error_msg) | ||
|
||
if range_method == "clip": | ||
# Clip values based on the provided lower and upper bounds | ||
return df.assign(**{col: df[col].clip(lower=range_lower, upper=range_upper) for col in value_col}) | ||
|
||
# For the "fillna" method, we will create a mask for the valid range and replace out-of-range values with NaN | ||
def apply_mask(col: str) -> pd.Series: | ||
mask = pd.Series([True] * len(df)) | ||
if range_lower is not None: | ||
mask &= df[col] >= range_lower | ||
if range_upper is not None: | ||
mask &= df[col] <= range_upper | ||
return df[col].where(mask, np.nan) | ||
|
||
# Apply the mask to each column | ||
return df.assign(**{col: apply_mask(col) for col in value_col}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Optimize performance in _apply_range_clipping
function.
The current implementation of _apply_range_clipping
is good, but we can further optimize it by using a single where
operation for all columns when using the "fillna" method.
Consider refactoring the function as follows:
def _apply_range_clipping(
df: pd.DataFrame,
value_col: list[str],
range_lower: float | None = None,
range_upper: float | None = None,
range_method: Literal["clip", "fillna"] = "fillna",
) -> pd.DataFrame:
if range_method == "clip":
return df.assign(**{col: df[col].clip(lower=range_lower, upper=range_upper) for col in value_col})
mask = pd.DataFrame(True, index=df.index, columns=value_col)
if range_lower is not None:
mask &= df[value_col] >= range_lower
if range_upper is not None:
mask &= df[value_col] <= range_upper
return df.assign(**{col: df[col].where(mask[col], np.nan) for col in value_col})
This approach creates a single boolean mask for all columns at once, which can be more efficient for large DataFrames with multiple value columns.
def _plot_histogram( | ||
df: pd.DataFrame, | ||
value_col: list[str], | ||
group_col: str | None, | ||
ax: Axes | None, | ||
colors: list[str], | ||
num_histograms: int, | ||
**kwargs: dict, | ||
) -> Axes: | ||
"""Plots histograms for the provided dataframe. | ||
|
||
Args: | ||
df (pd.DataFrame): The dataframe to plot. | ||
value_col (list of str): The column(s) to plot. | ||
group_col (str, optional): The column used to group data into multiple histograms. | ||
ax (Axes, optional): Matplotlib axes object to plot on. | ||
colors: The list of colors use for the plot. | ||
num_histograms (int): The number of histograms being plotted. | ||
**kwargs: Additional keyword arguments for Pandas' `plot` function. | ||
|
||
Returns: | ||
Axes: The matplotlib axes object with the plotted histogram. | ||
""" | ||
is_multi_histogram = num_histograms > 1 | ||
|
||
alpha = kwargs.pop("alpha", 0.7) if is_multi_histogram else kwargs.pop("alpha", None) | ||
|
||
if group_col is None: | ||
return df[value_col].plot( | ||
kind="hist", | ||
ax=ax, | ||
legend=is_multi_histogram, | ||
color=colors, | ||
alpha=alpha, | ||
**kwargs, | ||
) | ||
|
||
# if group_col is provided, only use a single value_col | ||
df_pivot = df.pivot(columns=group_col, values=value_col[0]) | ||
|
||
# Plot all columns at once | ||
return df_pivot.plot( | ||
kind="hist", | ||
ax=ax, | ||
legend=is_multi_histogram, | ||
alpha=alpha, | ||
color=colors, | ||
**kwargs, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider supporting multiple value_col
with group_col
.
Currently, when group_col
is provided, the function only uses the first value_col
. This limits the functionality when users want to plot multiple value columns grouped by a categorical variable.
Consider extending the functionality to support multiple value_col
with group_col
. Here's a possible approach:
if group_col is not None:
fig, axes = plt.subplots(len(value_col), 1, figsize=(10, 5*len(value_col)))
axes = [axes] if len(value_col) == 1 else axes
for i, col in enumerate(value_col):
df_pivot = df.pivot(columns=group_col, values=col)
df_pivot.plot(
kind="hist",
ax=axes[i],
legend=is_multi_histogram,
alpha=alpha,
color=colors,
**kwargs
)
axes[i].set_title(f'Histogram for {col}')
return axes if len(value_col) > 1 else axes[0]
This approach creates a subplot for each value column when a group column is provided, allowing users to visualize multiple value columns grouped by a categorical variable.
@Mrglglglglgl lgtm. Feel free to merge once you've fixed the pre-commit issues |
PR Type
Enhancement, Tests, Documentation
Description
Changes walkthrough 📝
histogram.py
Add histogram plotting functionality with customization options
pyretailscience/plots/histogram.py
Series.
categorical column.
options.
graph_utils.py
Enhance graph styling for better visualization
pyretailscience/style/graph_utils.py
test_histogram.py
Add tests for histogram plot functionality
tests/plots/test_histogram.py
analysis_modules.md
Document histogram plot usage and features
docs/analysis_modules.md
histogram.md
Add API documentation for histogram plot
docs/api/plots/histogram.md
mkdocs.yml
Update documentation navigation for histogram plot
mkdocs.yml
Summary by CodeRabbit
New Features
Documentation
Tests