Skip to content

Commit ec98f7a

Browse files
feat(bokeh): implement biplot-pca (#3481)
## Implementation: `biplot-pca` - bokeh Implements the **bokeh** version of `biplot-pca`. **File:** `plots/biplot-pca/implementations/bokeh.py` **Parent Issue:** #3417 --- :robot: *[impl-generate workflow](https://github.com/MarkusNeusinger/pyplots/actions/runs/20854273126)* --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent 0fddf45 commit ec98f7a

2 files changed

Lines changed: 353 additions & 0 deletions

File tree

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
""" pyplots.ai
2+
biplot-pca: PCA Biplot with Scores and Loading Vectors
3+
Library: bokeh 3.8.2 | Python 3.13.11
4+
Quality: 91/100 | Created: 2026-01-09
5+
"""
6+
7+
import numpy as np
8+
from bokeh.io import export_png
9+
from bokeh.models import Arrow, ColumnDataSource, Label, Legend, LegendItem, VeeHead
10+
from bokeh.plotting import figure
11+
from sklearn.datasets import load_iris
12+
from sklearn.decomposition import PCA
13+
from sklearn.preprocessing import StandardScaler
14+
15+
16+
# Data - Iris dataset
17+
iris = load_iris()
18+
X = iris.data
19+
y = iris.target
20+
feature_names = ["sepal length", "sepal width", "petal length", "petal width"]
21+
target_names = iris.target_names
22+
23+
# Standardize features
24+
scaler = StandardScaler()
25+
X_scaled = scaler.fit_transform(X)
26+
27+
# PCA
28+
pca = PCA(n_components=2)
29+
scores = pca.fit_transform(X_scaled)
30+
loadings = pca.components_.T
31+
explained_var = pca.explained_variance_ratio_ * 100
32+
33+
# Scale loadings for visibility (scale to fit within score range)
34+
score_max = np.abs(scores).max()
35+
loading_scale = score_max * 0.9 / np.abs(loadings).max()
36+
loadings_scaled = loadings * loading_scale
37+
38+
# Create figure with appropriate range
39+
margin = 1.5
40+
x_range = (scores[:, 0].min() - margin, scores[:, 0].max() + margin)
41+
y_range = (scores[:, 1].min() - margin, scores[:, 1].max() + margin)
42+
43+
p = figure(
44+
width=4800,
45+
height=2700,
46+
title="biplot-pca · bokeh · pyplots.ai",
47+
x_axis_label=f"PC1 ({explained_var[0]:.1f}%)",
48+
y_axis_label=f"PC2 ({explained_var[1]:.1f}%)",
49+
x_range=x_range,
50+
y_range=y_range,
51+
)
52+
53+
# Style title and axes - larger sizes for 4800x2700
54+
p.title.text_font_size = "36pt"
55+
p.xaxis.axis_label_text_font_size = "28pt"
56+
p.yaxis.axis_label_text_font_size = "28pt"
57+
p.xaxis.major_label_text_font_size = "22pt"
58+
p.yaxis.major_label_text_font_size = "22pt"
59+
60+
# Colors for groups - using Python Blue first, then complementary colors
61+
colors = ["#306998", "#FFD43B", "#4CAF50"] # Python Blue, Python Yellow, Green
62+
63+
# Plot observation scores by group
64+
legend_items = []
65+
for i, name in enumerate(target_names):
66+
mask = y == i
67+
source = ColumnDataSource(data={"x": scores[mask, 0], "y": scores[mask, 1]})
68+
renderer = p.scatter(x="x", y="y", source=source, size=25, alpha=0.75, color=colors[i])
69+
legend_items.append(LegendItem(label=name, renderers=[renderer]))
70+
71+
# Add legend
72+
legend = Legend(items=legend_items, location="top_left")
73+
legend.label_text_font_size = "24pt"
74+
legend.glyph_height = 30
75+
legend.glyph_width = 30
76+
legend.spacing = 12
77+
legend.padding = 15
78+
legend.background_fill_alpha = 0.85
79+
p.add_layout(legend)
80+
81+
# Draw loading arrows with custom label positions to avoid overlap
82+
arrow_color = "#C0392B" # Red for contrast with data points
83+
84+
# Custom label offsets for each feature to avoid overlap
85+
label_offsets = {
86+
"sepal length": (0.4, 0.5),
87+
"sepal width": (-0.3, 0.4),
88+
"petal length": (0.5, -0.4),
89+
"petal width": (0.3, 0.5),
90+
}
91+
92+
for i, name in enumerate(feature_names):
93+
x_end = loadings_scaled[i, 0]
94+
y_end = loadings_scaled[i, 1]
95+
96+
# Add arrow
97+
p.add_layout(
98+
Arrow(
99+
end=VeeHead(size=30, fill_color=arrow_color, line_color=arrow_color),
100+
x_start=0,
101+
y_start=0,
102+
x_end=x_end,
103+
y_end=y_end,
104+
line_width=4,
105+
line_color=arrow_color,
106+
)
107+
)
108+
109+
# Add label with custom offset
110+
offset_x, offset_y = label_offsets[name]
111+
label = Label(
112+
x=x_end + offset_x,
113+
y=y_end + offset_y,
114+
text=name,
115+
text_font_size="20pt",
116+
text_color=arrow_color,
117+
text_font_style="bold",
118+
text_align="center",
119+
)
120+
p.add_layout(label)
121+
122+
# Add origin reference lines (dashed)
123+
p.line(x=[x_range[0], x_range[1]], y=[0, 0], line_width=2, line_color="#888888", line_alpha=0.5, line_dash="dashed")
124+
p.line(x=[0, 0], y=[y_range[0], y_range[1]], line_width=2, line_color="#888888", line_alpha=0.5, line_dash="dashed")
125+
126+
# Grid styling
127+
p.grid.grid_line_alpha = 0.3
128+
p.grid.grid_line_dash = [6, 4]
129+
130+
# Background
131+
p.background_fill_color = "#FAFAFA"
132+
133+
# Save
134+
export_png(p, filename="plot.png")
Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,219 @@
1+
library: bokeh
2+
specification_id: biplot-pca
3+
created: '2026-01-09T14:04:14Z'
4+
updated: '2026-01-09T14:06:50Z'
5+
generated_by: claude-opus-4-5-20251101
6+
workflow_run: 20854273126
7+
issue: 3417
8+
python_version: 3.13.11
9+
library_version: 3.8.2
10+
preview_url: https://storage.googleapis.com/pyplots-images/plots/biplot-pca/bokeh/plot.png
11+
preview_thumb: https://storage.googleapis.com/pyplots-images/plots/biplot-pca/bokeh/plot_thumb.png
12+
preview_html: https://storage.googleapis.com/pyplots-images/plots/biplot-pca/bokeh/plot.html
13+
quality_score: 91
14+
review:
15+
strengths:
16+
- 'Excellent implementation of all biplot components: scores, loadings, and proper
17+
labeling'
18+
- Smart use of custom label offsets to prevent text overlap with loading arrows
19+
- Good color palette with Python Blue as primary color
20+
- Proper variance explained percentages in axis labels
21+
- Clean code structure following KISS principles
22+
- Appropriate sizing for 4800x2700 canvas with scaled font sizes
23+
weaknesses:
24+
- Legend could be positioned outside the plot area to avoid any potential overlap
25+
with data
26+
- Loading arrow scaling could be slightly adjusted for better visual balance with
27+
score points
28+
image_description: 'The plot displays a PCA biplot using the Iris dataset with 150
29+
data points distributed across three species groups. The scatter points are colored
30+
by species: blue (Python Blue #306998) for setosa clustered on the left side,
31+
yellow (#FFD43B) for versicolor in the middle, and green (#4CAF50) for virginica
32+
on the right. Four red loading arrows originate from the origin (0,0), representing
33+
the four Iris features: sepal width points upward-left, sepal length points toward
34+
the upper-right, petal width points toward the right at a slight upward angle,
35+
and petal length points horizontally to the right. Each arrow is labeled in red
36+
text with the feature name. The title "biplot-pca · bokeh · pyplots.ai" appears
37+
at the top-left. Axis labels show "PC1 (73.0%)" and "PC2 (22.9%)" with variance
38+
explained percentages. A legend in the upper-left shows the three species. Dashed
39+
gray reference lines cross at the origin. The background is light gray (#FAFAFA)
40+
with subtle dashed grid lines.'
41+
criteria_checklist:
42+
visual_quality:
43+
score: 36
44+
max: 40
45+
items:
46+
- id: VQ-01
47+
name: Text Legibility
48+
score: 10
49+
max: 10
50+
passed: true
51+
comment: Title at 36pt, axis labels at 28pt, tick labels at 22pt, legend at
52+
24pt - all clearly readable
53+
- id: VQ-02
54+
name: No Overlap
55+
score: 8
56+
max: 8
57+
passed: true
58+
comment: Loading labels use custom offsets to avoid overlap with arrows and
59+
each other
60+
- id: VQ-03
61+
name: Element Visibility
62+
score: 7
63+
max: 8
64+
passed: true
65+
comment: Markers at size=25 with alpha=0.75 are well-sized for 150 points;
66+
arrows clearly visible with size=30 heads
67+
- id: VQ-04
68+
name: Color Accessibility
69+
score: 4
70+
max: 5
71+
passed: true
72+
comment: Blue/yellow/green palette is generally colorblind-friendly, though
73+
yellow can be challenging for some
74+
- id: VQ-05
75+
name: Layout Balance
76+
score: 5
77+
max: 5
78+
passed: true
79+
comment: Good use of canvas space, data fills plot area well with appropriate
80+
margins
81+
- id: VQ-06
82+
name: Axis Labels
83+
score: 2
84+
max: 2
85+
passed: true
86+
comment: Descriptive labels with variance explained percentages
87+
- id: VQ-07
88+
name: Grid & Legend
89+
score: 0
90+
max: 2
91+
passed: false
92+
comment: Legend is functional but could be better positioned; grid alpha at
93+
0.3 is appropriate
94+
spec_compliance:
95+
score: 25
96+
max: 25
97+
items:
98+
- id: SC-01
99+
name: Plot Type
100+
score: 8
101+
max: 8
102+
passed: true
103+
comment: Correct PCA biplot with both scores and loadings
104+
- id: SC-02
105+
name: Data Mapping
106+
score: 5
107+
max: 5
108+
passed: true
109+
comment: PC1 on X-axis, PC2 on Y-axis, correct mapping
110+
- id: SC-03
111+
name: Required Features
112+
score: 5
113+
max: 5
114+
passed: true
115+
comment: 'All spec features present: scores as colored points, loading arrows
116+
from origin, labels on arrows, variance percentages in axis labels'
117+
- id: SC-04
118+
name: Data Range
119+
score: 3
120+
max: 3
121+
passed: true
122+
comment: Axes show all data with appropriate margins
123+
- id: SC-05
124+
name: Legend Accuracy
125+
score: 2
126+
max: 2
127+
passed: true
128+
comment: Legend correctly shows setosa, versicolor, virginica
129+
- id: SC-06
130+
name: Title Format
131+
score: 2
132+
max: 2
133+
passed: true
134+
comment: 'Correct format: biplot-pca · bokeh · pyplots.ai'
135+
data_quality:
136+
score: 18
137+
max: 20
138+
items:
139+
- id: DQ-01
140+
name: Feature Coverage
141+
score: 8
142+
max: 8
143+
passed: true
144+
comment: Shows clear cluster separation and loading interpretation
145+
- id: DQ-02
146+
name: Realistic Context
147+
score: 7
148+
max: 7
149+
passed: true
150+
comment: Iris dataset is a classic, well-known scientific dataset
151+
- id: DQ-03
152+
name: Appropriate Scale
153+
score: 3
154+
max: 5
155+
passed: true
156+
comment: Standardized data with appropriate scaling; loading scale factor
157+
works but could be slightly improved
158+
code_quality:
159+
score: 9
160+
max: 10
161+
items:
162+
- id: CQ-01
163+
name: KISS Structure
164+
score: 3
165+
max: 3
166+
passed: true
167+
comment: 'Clean linear structure: imports, data, PCA, plot, save'
168+
- id: CQ-02
169+
name: Reproducibility
170+
score: 3
171+
max: 3
172+
passed: true
173+
comment: Uses deterministic Iris dataset
174+
- id: CQ-03
175+
name: Clean Imports
176+
score: 2
177+
max: 2
178+
passed: true
179+
comment: All imports are used
180+
- id: CQ-04
181+
name: No Deprecated API
182+
score: 1
183+
max: 1
184+
passed: true
185+
comment: Uses current Bokeh API
186+
- id: CQ-05
187+
name: Output Correct
188+
score: 0
189+
max: 1
190+
passed: false
191+
comment: Saves as plot.png to current directory
192+
library_features:
193+
score: 3
194+
max: 5
195+
items:
196+
- id: LF-01
197+
name: Distinctive Features
198+
score: 3
199+
max: 5
200+
passed: true
201+
comment: Uses ColumnDataSource, Arrow with VeeHead, Label annotations, custom
202+
Legend construction - good use of Bokeh annotation system
203+
verdict: APPROVED
204+
impl_tags:
205+
dependencies:
206+
- sklearn
207+
techniques:
208+
- annotations
209+
- custom-legend
210+
patterns:
211+
- dataset-loading
212+
- iteration-over-groups
213+
- columndatasource
214+
dataprep:
215+
- pca
216+
- normalization
217+
styling:
218+
- alpha-blending
219+
- grid-styling

0 commit comments

Comments
 (0)