Skip to content

Add Plot Gallery Documentation #371

Description

@murray-ds

Plot Gallery Documentation

Problem Statement

The current PyRetailScience documentation includes API reference pages for individual plots, but lacks a visual gallery that helps users quickly discover and understand plotting capabilities. Users need to see what each plot type looks like and understand the major configuration options available, similar to how the Matplotlib plot types gallery presents visualizations.

Proposed Solution

Create a comprehensive plot gallery section in the documentation with:

  1. Root "Plots" page - Overview gallery showing all available plot types with thumbnail examples
  2. Individual plot pages - Detailed pages for each plot type showing major configuration examples

The gallery should focus on demonstrating built-in features (e.g., group_col, Series vs DataFrame input, orientation options) rather than general matplotlib customizations (e.g., tick label sizes, kwargs passthrough).

Structure

Navigation Hierarchy

docs/
└── gallery/
    ├── index.md              # Root gallery page
    ├── plots/
    │   ├── area.ipynb
    │   ├── bar.ipynb
    │   ├── broken_timeline.ipynb
    │   ├── cohort.ipynb
    │   ├── heatmap.ipynb
    │   ├── histogram.ipynb
    │   ├── index_plot.ipynb   # Named to avoid conflict with index.md
    │   ├── line.ipynb
    │   ├── period_on_period.ipynb
    │   ├── price.ipynb
    │   ├── scatter.ipynb
    │   ├── time.ipynb
    │   ├── venn.ipynb
    │   └── waterfall.ipynb

mkdocs.yml Changes

Add new navigation section:

nav:
  - Home: index.md
  - Getting Started:
      - Installation: getting_started/installation.md
      - Options & Configuration: getting_started/options_guide.md
  - Analysis Modules:
      - analysis_modules.md
  - Plot Gallery:                          # NEW SECTION
      - Overview: gallery/index.md
      - Area Plot: gallery/plots/area.ipynb
      - Bar Plot: gallery/plots/bar.ipynb
      - Broken Timeline: gallery/plots/broken_timeline.ipynb
      - Cohort Plot: gallery/plots/cohort.ipynb
      - Heatmap Plot: gallery/plots/heatmap.ipynb
      - Histogram Plot: gallery/plots/histogram.ipynb
      - Index Plot: gallery/plots/index_plot.ipynb
      - Line Plot: gallery/plots/line.ipynb
      - Period on Period: gallery/plots/period_on_period.ipynb
      - Price Plot: gallery/plots/price.ipynb
      - Scatter Plot: gallery/plots/scatter.ipynb
      - Time Plot: gallery/plots/time.ipynb
      - Venn Diagram: gallery/plots/venn.ipynb
      - Waterfall Plot: gallery/plots/waterfall.ipynb
  - Examples:
      - Customer Retention: examples/retention.ipynb
      # ... rest of examples
  - Reference:
      # ... existing reference sections

Content Format

Root Gallery Page (gallery/index.md)

A markdown page with:

  • Brief introduction to PyRetailScience plotting capabilities
  • Grid layout showing all plot types with thumbnail images and brief descriptions
  • Links to detailed plot pages

Format:

# Plot Gallery

PyRetailScience provides a comprehensive set of plotting functions designed specifically for retail analytics. All plots use a consistent API and come pre-styled with retail-friendly color schemes.

## Plot Types

### Basic Plots

#### [Line Plot](plots/line.ipynb)
![Line plot example](../assets/gallery/line_thumbnail.png)

Visualize sequential data like daily trends or event impact analysis.

#### [Bar Plot](plots/bar.ipynb)
![Bar plot example](../assets/gallery/bar_thumbnail.png)

Compare categorical data with vertical or horizontal bars.

[Continue for all plot types...]

Individual Plot Pages (gallery/plots/*.ipynb)

Each plot page should be a Jupyter notebook containing:

  1. Title and description - What the plot is used for
  2. Basic example - Simplest usage
  3. Major configuration examples - Each in its own section with markdown headers

Standard format for each example:

## [Feature Name]

Brief description of what this configuration does.
# Python code to generate the plot
import pandas as pd
from pyretailscience.plots import line

# Create example data
df = pd.DataFrame({...})

# Generate plot
ax = line.plot(df, value_col="sales", ...)

[Output cell showing the plot image]

Plot-Specific Requirements

Below are the ACTUAL features from the codebase for each plot. Demonstrations should focus on these real capabilities.

Line Plot (gallery/plots/line.ipynb)

Demonstrate:

  1. Basic line plot with DataFrame (with x_col and value_col)
  2. Plotting a pandas Series (no value_col needed)
  3. Using group_col for multiple lines (creates separate line per group)
  4. Multiple value columns (value_col as list) - note: cannot combine with group_col
  5. Index-based plotting (omit x_col, uses DataFrame index)
  6. Using fill_na_value parameter when pivoting with group_col

Bar Plot (gallery/plots/bar.ipynb)

Demonstrate:

  1. Basic vertical bar plot
  2. Horizontal bar plot (orientation="horizontal" or "h")
  3. Grouped bars using x_col parameter
  4. Multiple value columns (value_col as list)
  5. Sorting (show one example: sort_order="descending" or "ascending")
  6. Data labels: data_label_format="absolute", "percentage_by_bar_group", or "percentage_by_series"
  7. Hatching patterns (use_hatch=True)
  8. Stacked bars (via stacked=True kwarg)

Heatmap Plot (gallery/plots/heatmap.ipynb)

Demonstrate:

  1. Basic heatmap from DataFrame (index=rows, columns=columns)
  2. Custom colorbar label (cbar_label)
  3. Custom colorbar format (cbar_format string)
  4. Cell text annotations (automatically added with auto-contrast black/white text)

Waterfall Plot (gallery/plots/waterfall.ipynb)

Demonstrate:

  1. Basic waterfall from amounts list and labels list
  2. Data label formats: data_label_format="absolute", "percentage", or "both"
  3. Net bar display (display_net_bar=True)
  4. Net line display (display_net_line=True)
  5. Removing zero amounts (remove_zero_amounts=True)

Scatter Plot (gallery/plots/scatter.ipynb)

Demonstrate:

  1. Basic scatter plot (single value_col)
  2. Multiple scatter series using group_col
  3. Multiple value columns (value_col as list) - note: cannot combine with group_col
  4. Point labels using label_col parameter (only works with single value_col)
  5. Customizing label appearance with label_kwargs

Time Plot (gallery/plots/time.ipynb)

Demonstrate:

  1. Basic time series (requires transaction_date column and aggregates by period)
  2. Different aggregation periods: period="D" (daily), "W" (weekly), "M" (monthly)
  3. Different aggregation functions: agg_func="sum" or "mean"
  4. Grouping by category with group_col parameter

Area Plot (gallery/plots/area.ipynb)

Demonstrate:

  1. Basic area plot (single value_col)
  2. Multiple areas using group_col parameter
  3. Multiple value columns (value_col as list) - note: cannot combine with group_col
  4. Stacked areas (via stacked=True kwarg)
  5. Using x_col vs index

Histogram Plot (gallery/plots/histogram.ipynb)

Demonstrate:

  1. Basic histogram (single value_col)
  2. Multiple histograms using group_col
  3. Multiple value columns (value_col as list) - note: cannot combine with group_col
  4. Range clipping: range_lower, range_upper, range_method="clip" or "fillna"
  5. Hatching patterns (use_hatch=True)
  6. Custom bins (via bins kwarg)

Cohort Plot (gallery/plots/cohort.ipynb)

Demonstrate:

  1. Basic cohort heatmap from DataFrame
  2. Percentage display (percentage=True - default)
  3. Raw value display (percentage=False)
  4. The distinctive horizontal line at row 3 (automatically added)

Period on Period Plot (gallery/plots/period_on_period.ipynb)

Demonstrate:

  1. Basic period-on-period comparison with list of (start_date, end_date) tuples in periods parameter
  2. Overlaying 2-3 different time periods on same chart
  3. Different line styles automatically applied to each period

Venn Diagram (gallery/plots/venn.ipynb)

Demonstrate:

  1. 2-set Venn diagram (requires DataFrame with 'groups' and 'percent' columns)
  2. 3-set Venn diagram
  3. Euler diagram mode (vary_size=True - sizes proportional to values)
  4. Custom subset label formatting with subset_label_formatter

Broken Timeline Plot (gallery/plots/broken_timeline.ipynb)

Demonstrate:

  1. Basic broken timeline showing data availability across categories over time
  2. Different aggregation periods: period="D" or "W"
  3. Threshold filtering: threshold_value to hide low-value periods
  4. Different aggregation functions: agg_func="sum" or other

Index Plot (gallery/plots/index_plot.ipynb)

Demonstrate:

  1. Basic index plot showing performance relative to baseline (100)
  2. Sorting options: sort_by="group" or "value"
  3. Group filtering: exclude_groups or include_only_groups
  4. Multiple series with series_col parameter
  5. Highlighting range with highlight_range parameter
  6. Value filtering: filter_above or filter_below

Price Plot (gallery/plots/price.ipynb)

Demonstrate:

  1. Basic bubble chart showing price distribution across categories
  2. Price binning with bins parameter (int for equal-width, list for custom boundaries)
  3. Grouping by categorical column (group_col)
  4. Bubble sizes represent percentage of products in each price band

Data Guidelines

All example data should:

  • Use realistic retail domain values (customer_id, store_id, product names, dates, dollar amounts)
  • Be small enough to be clearly readable (typically 5-15 rows)
  • Be self-contained within each notebook (no external data files)
  • Use descriptive variable names

Import Style (IMPORTANT)

All notebooks MUST use this import pattern:

from pyretailscience.plots import line
# Then call: line.plot(...)

DO NOT use:

import pyretailscience.plots.line as line_plot  # ❌ WRONG
# or
import pyretailscience.plots.line  # ❌ WRONG

This keeps imports consistent across all documentation and examples.

Example:

# Good - retail domain
df = pd.DataFrame({
    "product": ["Laptop", "Mouse", "Keyboard", "Monitor", "Headphones"],
    "sales": [125000, 15000, 22000, 85000, 18000],
    "category": ["Electronics", "Accessories", "Accessories", "Electronics", "Accessories"]
})

# Avoid - generic placeholders
df = pd.DataFrame({
    "x": ["A", "B", "C", "D", "E"],
    "y": [1, 2, 3, 4, 5],
    "group": ["test", "data", "test", "data", "test"]
})

Technical Implementation

Jupyter Notebooks

  • Create actual Jupyter notebook (.ipynb) files, not Python scripts
  • Use mkdocs-jupyter plugin (already configured in mkdocs.yml)
  • Each notebook should have markdown cells for section headers
  • Execute all cells before committing to ensure images are embedded
  • No need for %matplotlib inline or other magic commands - plots will render automatically in notebooks

Image Assets

For the root gallery page thumbnails:

  • Store in docs/assets/gallery/
  • Generate programmatically or screenshot from notebook outputs
  • Optimize images for web (PNG format, reasonable file sizes)
  • Consistent thumbnail dimensions (e.g., 400x300px)

Style Consistency

All plots should:

  • Use default PyRetailScience styling (don't override unless demonstrating that feature)
  • Include appropriate titles and axis labels
  • Be large enough to read clearly in documentation
  • Use consistent figure sizes across examples

Implementation Strategy

IMPORTANT: This work should be split into separate PRs - one PR per plot type. This approach:

  • Makes reviews manageable and focused
  • Allows incremental progress and merging
  • Reduces risk of conflicts
  • Enables parallel work if multiple contributors are involved

Suggested PR sequence:

  1. PR 1: Root gallery page structure (docs/gallery/index.md) with placeholder thumbnails and mkdocs.yml updates
  2. PR 2: Line plot gallery (gallery/plots/line.ipynb)
  3. PR 3: Bar plot gallery (gallery/plots/bar.ipynb)
  4. PR 4: Scatter plot gallery (gallery/plots/scatter.ipynb)
  5. PR 5: Heatmap plot gallery (gallery/plots/heatmap.ipynb)
  6. PR 6: Time plot gallery (gallery/plots/time.ipynb)
  7. PR 7: Area plot gallery (gallery/plots/area.ipynb)
  8. PR 8: Histogram plot gallery (gallery/plots/histogram.ipynb)
  9. PR 9: Waterfall plot gallery (gallery/plots/waterfall.ipynb)
  10. PR 10: Cohort plot gallery (gallery/plots/cohort.ipynb)
  11. PR 11: Venn diagram gallery (gallery/plots/venn.ipynb)
  12. PR 12: Period on Period plot gallery (gallery/plots/period_on_period.ipynb)
  13. PR 13: Broken Timeline plot gallery (gallery/plots/broken_timeline.ipynb)
  14. PR 14: Index plot gallery (gallery/plots/index_plot.ipynb)
  15. PR 15: Price plot gallery (gallery/plots/price.ipynb)

Each plot PR should:

  • Include the complete notebook with all examples
  • Update root gallery page with thumbnail and description for that plot
  • Execute all notebook cells to embed images
  • Ensure docs build successfully

Acceptance Criteria

  • Root gallery page (docs/gallery/index.md) created with overview and thumbnails
  • Individual plot notebooks created for all 14 user-facing plots:
    • area.ipynb
    • bar.ipynb
    • broken_timeline.ipynb
    • cohort.ipynb
    • heatmap.ipynb
    • histogram.ipynb
    • index_plot.ipynb
    • line.ipynb
    • period_on_period.ipynb
    • price.ipynb
    • scatter.ipynb
    • time.ipynb
    • venn.ipynb
    • waterfall.ipynb
  • Each notebook demonstrates major configuration options (not exhaustive, focus on built-in features)
  • All notebooks use realistic retail data
  • All notebooks execute without errors
  • mkdocs.yml updated with new "Plot Gallery" navigation section
  • Thumbnail images generated for root gallery page
  • Documentation builds successfully with mkdocs build
  • Gallery pages render correctly in local preview (mkdocs serve)
  • Tree diagram plot is NOT included (internal use only)

Out of Scope

  • Exhaustive coverage of every parameter and kwarg option
  • Demonstrations of general matplotlib customizations (tick sizes, font changes, etc.)
  • Interactive plots or widgets
  • Performance benchmarking
  • Comparison with other plotting libraries
  • Style customization guides (covered separately in api/plots/styles/)

Notes

  • The tree_diagram.py module should be excluded as it's used internally by the Revenue Tree analysis and not meant for direct user consumption
  • The gallery complements (not replaces) the existing API reference documentation
  • Examples should be copy-pasteable and runnable by users
  • Consider adding a note at the top of each plot page linking to the full API reference for that plot type

Example Notebook Structure

Below is an example showing the structure of a Jupyter notebook (.ipynb file). Create actual .ipynb files in Jupyter, not Python scripts.

First markdown cell:

# Line Plot Gallery

The line plot is used for visualizing sequential data like daily trends or event impact analysis.
It's ideal for time-based sequences or ordered data points.

First code cell:

import pandas as pd
from pyretailscience.plots import line
import matplotlib.pyplot as plt

Markdown cell:

## Basic Line Plot

Plot a single value column from a DataFrame.

Code cell:

df = pd.DataFrame({
    "day": range(1, 8),
    "revenue": [12000, 15000, 13000, 18000, 22000, 19000, 21000]
})

ax = line.plot(
    df,
    x_col="day",
    value_col="revenue",
    title="Daily Revenue",
    x_label="Day",
    y_label="Revenue ($)"
)
plt.show()

Markdown cell:

## Plotting a Series

You can also plot a pandas Series directly.

Code cell:

sales = pd.Series(
    [12000, 15000, 13000, 18000, 22000, 19000, 21000],
    index=range(1, 8),
    name="Sales"
)

ax = line.plot(
    sales,
    title="Daily Sales",
    x_label="Day",
    y_label="Sales ($)"
)
plt.show()

Markdown cell:

## Multiple Lines with group_col

Create separate lines for each category using the group_col parameter.

Code cell:

df_multi = pd.DataFrame({
    "day": [1, 1, 2, 2, 3, 3, 4, 4, 5, 5],
    "category": ["Electronics", "Apparel"] * 5,
    "revenue": [8000, 4000, 10000, 5000, 8500, 4500, 12000, 6000, 15000, 7000]
})

ax = line.plot(
    df_multi,
    x_col="day",
    value_col="revenue",
    group_col="category",
    title="Revenue by Category",
    x_label="Day",
    y_label="Revenue ($)",
    legend_title="Category"
)
plt.show()

Continue with more examples...

Related Issues

  • Existing API reference documentation: docs/api/plots/
  • Examples section: docs/examples/ (focuses on analysis workflows, not individual plots)

Labels

  • type:docs
  • status:draft (until approved)

Priority

P1 - High priority documentation improvement that will significantly enhance user experience and plot discoverability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:docsDocumentation work

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions