Skip to content

Commit f078f76

Browse files
authored
feat: project topics — GitHub-style tag chips on projects (#80)
Adds per-project topic chips visible on: - Home-page project cards (4 chips + overflow "+N more") - Project detail page hero strip (up to 12 chips + description paragraph + clickable homepage link if present) New module llmwiki/project_topics.py (stdlib-only): - load_project_profile(projects_dir, slug): reads wiki/projects/<slug>.md frontmatter, normalizes topics (lowercase + dedup, preserve order), returns a ProjectTopicsProfile TypedDict with optional description and homepage fields - extract_session_topics(metas, min_count=2, max_topics=8): session-tag aggregation fallback, filters universal noise tags (claude-code, session-transcript, demo, codex-cli, cursor) and requires a tag to appear in ≥ min_count sessions before promoting it - get_project_topics(dir, slug, metas): applies the precedence rules — explicit profile wins, falls back to session-tag aggregation on empty/missing - render_topic_chips / render_topic_chips_linked: HTML render with overflow collapse, HTML escape, URL encoding for linked variant Wired into build.py: - PROJECTS_META_DIR = REPO_ROOT/wiki/projects constant - render_index: home cards get a `.card-topics` chip row below the meta line, sourced from get_project_topics - render_project_page: new hero strip between the hero and the heatmap with description + topics chips + homepage - Full CSS block with hover states, chip pills, mobile-responsive layout Seeded profiles committed for the 4 reference projects: - wiki/projects/demo-blog-engine.md — Rust SSG (rust, blog, ssg, pulldown-cmark, syntect, markdown) - wiki/projects/demo-ml-pipeline.md — DistilBERT fine-tune (python, machine-learning, distilbert, transformers, huggingface, wandb, training, fine-tuning) - wiki/projects/demo-todo-api.md — FastAPI CRUD (python, fastapi, rest, oauth2, sqlite, api, crud, web) - wiki/projects/llm-wiki.md — the meta project (python, wiki, karpathy, claude-code, static-site, markdown, open-source) gitignore: added `!wiki/projects/` exception so user profiles are committed by default. These files are small, user-curated, and stable — they belong in version control alongside the code, unlike the rest of the LLM-generated wiki tree. Tests in tests/test_project_topics.py — 24 new: - load_project_profile: missing file, topics list, lowercase dedup normalization, optional homepage, empty topics list, no frontmatter - extract_session_topics: noise filter, min_count threshold, frequency sort, max_topics cap, string-value edge case, empty input, noise constant sanity check - get_project_topics: explicit profile precedence, session fallback, empty-profile-triggers-fallback edge case - render_topic_chips: empty, all visible, overflow collapse, HTML escape, custom classname - render_topic_chips_linked: anchor output, URL encoding (special chars like `c++`), empty 398 tests passing (was 374). Verified on the real wiki: /projects/llm-wiki.html renders 7 chips (python, wiki, karpathy, claude-code, static-site, markdown, open-source), the description paragraph, and a clickable homepage link. Home page's `llm-wiki` card shows the first 4 chips plus "+3 more".
1 parent c54ebaf commit f078f76

9 files changed

Lines changed: 629 additions & 2 deletions

File tree

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,12 @@ wiki/lint-report.md
3333
# so the structured schema + changelog timeline are visible on the
3434
# live demo build.
3535
!wiki/entities/ClaudeSonnet4.md
36+
# Project topics (v0.9): `wiki/projects/<slug>.md` is hand-curated
37+
# per-project metadata (topics, description, homepage). Small, stable,
38+
# user-curated — committed so the demo build ships with topic chips
39+
# on every project card. Users can add their own project profiles
40+
# and they'll be picked up automatically.
41+
!wiki/projects/
3642

3743
# Generated static HTML site (Karpathy layer 3).
3844
site/

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Versions below 1.0 are pre-production — API and file formats may change.
1010

1111
### Added
1212

13+
- **Project topics — GitHub-style tag chips on project cards + pages** — new `llmwiki/project_topics.py` module and `wiki/projects/<slug>.md` per-project profile convention. Each profile file carries frontmatter with a `topics:` list (`[rust, blog, ssg]`), optional `description`, and optional `homepage` URL. Build-time surfaces topics as pill-shaped chips on the home-page project cards (4 chips + overflow collapse into `+N more`) AND as a hero strip on each project detail page (up to 12 chips, with the description paragraph and a clickable homepage link if present). When no profile exists, falls back to aggregating session `tags:` frontmatter with universal noise tags (`claude-code`, `session-transcript`, `demo`) filtered out and a `min_count ≥ 2` threshold so one-off stragglers don't crowd the strip. Full dark-mode styling via theme vars; chips hover with the accent color. Seeded four profiles to ship with the repo (`demo-blog-engine`, `demo-ml-pipeline`, `demo-todo-api`, `llm-wiki`) and added a gitignore exception for `wiki/projects/` so user profiles are committed by default. 24 new tests cover profile loading, topics normalization (lowercase + dedup), session-tag aggregation with noise filtering, precedence rules, chip rendering, overflow collapse, HTML escaping, URL encoding for linked chips.
1314
- **Append-only changelog field + timeline + pricing sparkline** (#56) — new `llmwiki/changelog_timeline.py` module consumes an optional `changelog:` list in model entity frontmatter and renders three surfaces: (1) a vertical **timeline widget** on each model detail page (newest-first, with from→to deltas colored by direction — price cuts green, price hikes red, benchmark lifts green, numeric context expansions shown with an up arrow), (2) an inline **pricing sparkline** (stdlib SVG) that appears when the changelog has ≥2 dated `input_per_1m` changes so readers can see the trend at a glance, and (3) a **"Recently updated · last 30 days"** card on the home page listing any model entity that changed recently. Append-only by design — if an entry is wrong, add a correcting entry rather than rewriting history. The frontmatter parser's naive comma-split on bracketed JSON arrays is papered over by a stitch-and-reparse fallback in `parse_changelog()`, with a regression test locking it in place. Numeric deltas get K/M suffix formatting, string deltas (e.g. license changes) render as strike-through → bold. All HTML-escaped. 27 new tests. Seeded `wiki/entities/ClaudeSonnet4.md` with a real 4-entry changelog (launch → price cut → context expansion → SWE-bench update) so the feature is visible immediately on the live build.
1415
- **Structured model-profile schema + `/models/` section** (#55) — new `llmwiki/schema.py` (stdlib-only TypedDict validator) and `llmwiki/models_page.py` (renderer) add an opt-in schema for entity pages with `entity_kind: ai-model`. Pages can declare `provider`, inline-JSON `model` / `pricing` / `benchmarks` blocks, and a `modalities` list. The build pipeline discovers every valid model page under `wiki/entities/`, validates it (bad data → warnings, not build crashes), renders a structured info-card at the top of a per-model detail page (`/models/<slug>.html`), and emits a sortable `/models/index.html` table with every benchmark key used anywhere as a column. 13 well-known benchmark keys get pretty labels (`gpqa_diamond` → "GPQA Diamond", `swe_bench` → "SWE-bench", `mmlu` → "MMLU", etc.); unknown keys pass through for forward compatibility. Price formatting supports USD/EUR/GBP and falls back to currency-prefixed for anything else. New nav-bar link "Models" active on the detail and index pages. Full docs in `docs/reference/entity-schema.md`, seeded example in `wiki/entities/ClaudeSonnet4.md`. 36 new tests across `test_schema.py` (21) and `test_models_page.py` (15): happy path, minimum-viable page, validation warnings, non-numeric benchmarks, out-of-range scores rejected, malformed JSON treated as empty + warning, unknown benchmark keys allowed, HTML escaping, benchmark sort order, table column union.
1516
- **Folder-level `_context.md` files** (#60) — new `llmwiki/context_md.py` module + convention, borrowed from [tobi/qmd](https://github.com/tobi/qmd)'s context pattern. An optional `_context.md` file can sit alongside pages in any wiki folder (e.g. `wiki/entities/_context.md`) to describe what the folder is for and which queries should traverse it. When a Claude Code `/wiki-query` session walks the tree, it reads the folder's context file first and uses the summary to decide whether to descend — saving context tokens on every deep query instead of sampling random pages to infer a folder's purpose. `build.py`'s `discover_sources()` now skips `_context.md` so these files never pollute the session index, search index, or AI-consumable exports. `CLAUDE.md` Query + Lint workflows document the convention, and `find_uncontexted_folders()` powers a new `/wiki-lint` warning for folders with >10 pages but no context stub. Ships with three seeded stubs (`wiki/entities/_context.md`, `wiki/concepts/_context.md`, `wiki/sources/_context.md`) to show the pattern. 19 new tests.

llmwiki/build.py

Lines changed: 87 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,11 @@
5757
render_model_info_card,
5858
render_models_index,
5959
)
60+
from llmwiki.project_topics import (
61+
get_project_topics,
62+
load_project_profile,
63+
render_topic_chips,
64+
)
6065
from llmwiki.viz_heatmap import collect_session_counts, render_heatmap
6166
from llmwiki.viz_tokens import (
6267
render_project_token_card,
@@ -73,6 +78,9 @@
7378
RAW_DIR = REPO_ROOT / "raw"
7479
RAW_SESSIONS = RAW_DIR / "sessions"
7580
DEFAULT_OUT_DIR = REPO_ROOT / "site"
81+
# v0.7+: optional per-project metadata (topics, description, homepage).
82+
# Users drop a `wiki/projects/<slug>.md` file with frontmatter.
83+
PROJECTS_META_DIR = REPO_ROOT / "wiki" / "projects"
7684

7785

7886
# ─── frontmatter ───────────────────────────────────────────────────────────
@@ -776,7 +784,42 @@ def card(p: Path, meta: dict[str, Any]) -> str:
776784
</div>
777785
</section>"""
778786

779-
body = f"""{heatmap_block}
787+
# Project topics strip — renders below the hero, above the heatmap.
788+
# Explicit profile via wiki/projects/<slug>.md wins over the
789+
# session-tag fallback. Projects with no topics render an empty
790+
# strip (no chip row at all).
791+
proj_profile = load_project_profile(PROJECTS_META_DIR, project_slug)
792+
proj_topics = get_project_topics(PROJECTS_META_DIR, project_slug, proj_entries)
793+
topics_html = render_topic_chips(
794+
proj_topics, max_visible=12, classname="project-topics project-hero-topics"
795+
)
796+
description_html = ""
797+
if proj_profile and proj_profile.get("description"):
798+
description_html = (
799+
f'<p class="project-description muted">'
800+
f'{html.escape(proj_profile["description"])}</p>'
801+
)
802+
homepage_html = ""
803+
if proj_profile and proj_profile.get("homepage"):
804+
hp = proj_profile["homepage"]
805+
homepage_html = (
806+
f'<a class="project-homepage" href="{html.escape(hp)}" '
807+
f'rel="noopener">{html.escape(hp)} ↗</a>'
808+
)
809+
topics_strip = ""
810+
if topics_html or description_html or homepage_html:
811+
topics_strip = (
812+
'<section class="section project-topics-section">\n'
813+
' <div class="container">\n'
814+
f' {description_html}\n'
815+
f' {topics_html}\n'
816+
f' {homepage_html}\n'
817+
' </div>\n'
818+
'</section>\n'
819+
)
820+
821+
body = f"""{topics_strip}
822+
{heatmap_block}
780823
{tool_chart_block}
781824
{token_timeline_block}
782825
<section class="section">
@@ -1033,10 +1076,18 @@ def render_index(
10331076
cards = []
10341077
for project, sessions in sorted(groups.items(), key=lambda x: -len(x[1])):
10351078
main_count = sum(1 for p, _, _ in sessions if "subagent" not in p.name)
1079+
# Project topics — explicit profile in wiki/projects/<slug>.md
1080+
# takes precedence, falls back to aggregated session tags with
1081+
# noise filtered out. Rendered as chips below the card meta.
1082+
proj_metas = [m for _, m, _ in sessions]
1083+
topics = get_project_topics(PROJECTS_META_DIR, project, proj_metas)
1084+
topics_html = render_topic_chips(topics, max_visible=4,
1085+
classname="project-topics card-topics")
10361086
cards.append(
1037-
f""" <a class="card" href="projects/{html.escape(project)}.html">
1087+
f""" <a class="card card-project" href="projects/{html.escape(project)}.html">
10381088
<div class="card-title">{html.escape(project)}</div>
10391089
<div class="card-meta">{main_count} main · {len(sessions) - main_count} sub-agent</div>
1090+
{topics_html}
10401091
</a>"""
10411092
)
10421093

@@ -1774,6 +1825,40 @@ def build_search_index(
17741825
.recently-updated-item a:hover { text-decoration: underline; }
17751826
.recently-updated-date { font-family: 'JetBrains Mono', monospace; font-size: 0.78rem; }
17761827
1828+
/* Project topics — GitHub-style tag chips on project cards, project
1829+
detail pages, and the home-page grid. Rendered by
1830+
llmwiki/project_topics.py. Tag colors are theme-neutral so the
1831+
same style reads on both project cards (light background) and
1832+
the project hero strip. */
1833+
.project-topics { display: flex; flex-wrap: wrap; gap: 6px; margin-top: 10px; }
1834+
.topic-chip {
1835+
display: inline-block;
1836+
padding: 3px 10px;
1837+
background: var(--bg-alt);
1838+
color: var(--text-secondary);
1839+
border: 1px solid var(--border);
1840+
border-radius: 999px;
1841+
font-size: 0.72rem;
1842+
font-weight: 500;
1843+
line-height: 1.4;
1844+
text-decoration: none;
1845+
transition: all 0.1s;
1846+
}
1847+
a.topic-chip:hover {
1848+
color: var(--accent);
1849+
border-color: var(--accent);
1850+
background: var(--bg-card);
1851+
}
1852+
.topic-chip-more { opacity: 0.7; }
1853+
.card-topics { margin-top: 8px; }
1854+
.card-topics .topic-chip { font-size: 0.68rem; padding: 2px 8px; }
1855+
.project-topics-section { padding-top: 0; padding-bottom: 0; }
1856+
.project-topics-section .container { padding-top: 16px; padding-bottom: 4px; }
1857+
.project-description { margin: 0 0 10px; font-size: 0.92rem; line-height: 1.5; max-width: 680px; }
1858+
.project-hero-topics { margin-bottom: 6px; }
1859+
.project-homepage { display: inline-block; margin-top: 6px; font-size: 0.82rem; color: var(--accent); text-decoration: none; }
1860+
.project-homepage:hover { text-decoration: underline; }
1861+
17771862
/* v0.4: Deep-link icon next to headings */
17781863
.content h2 .deep-link, .content h3 .deep-link, .content h4 .deep-link { margin-left: 8px; font-size: 0.8em; opacity: 0; text-decoration: none; transition: opacity 0.15s; }
17791864
.content h2:hover .deep-link, .content h3:hover .deep-link, .content h4:hover .deep-link { opacity: 0.7; }

llmwiki/project_topics.py

Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
"""Project topics — GitHub-style tag chips on project cards + pages.
2+
3+
Surfaces user-declared topic tags on the home page project cards, the
4+
per-project detail page, the projects index, and (as filterable chips)
5+
the sessions index. The same "topics" concept GitHub shows at the top
6+
of a repo page, so visitors can instantly see what a project is about.
7+
8+
Data sources, in order of precedence:
9+
10+
1. **Explicit**: `wiki/projects/<slug>.md` frontmatter. Users drop a
11+
small file per project with `topics: [rust, blog, ssg]` and an
12+
optional `description` + `homepage` URL. This is the primary
13+
source — it's user-curated and stable.
14+
15+
2. **Fallback**: session `tags:` frontmatter, aggregated across every
16+
session in the project, with the universal noise tags
17+
(`claude-code`, `session-transcript`, `demo`) filtered out. Sessions
18+
rarely carry distinctive tags today, but this makes the feature
19+
zero-config for projects where the user has added project-specific
20+
tags to their sessions.
21+
22+
Missing data gracefully returns an empty list — callers decide whether
23+
to render the empty state.
24+
25+
Stdlib-only.
26+
"""
27+
28+
from __future__ import annotations
29+
30+
import re
31+
from collections import Counter
32+
from pathlib import Path
33+
from typing import Any, Iterable, Mapping, Optional, TypedDict
34+
35+
_FRONTMATTER_RE = re.compile(r"^---\n(.*?)\n---\n(.*)$", re.DOTALL)
36+
37+
# Tags that appear on nearly every session and carry no
38+
# project-specific signal. Filtered out of the session-tag fallback.
39+
_NOISE_TAGS: frozenset[str] = frozenset(
40+
{"claude-code", "session-transcript", "demo", "codex-cli", "cursor"}
41+
)
42+
43+
44+
class ProjectTopicsProfile(TypedDict, total=False):
45+
"""Explicit metadata for a project, loaded from
46+
`wiki/projects/<slug>.md` frontmatter."""
47+
topics: list[str]
48+
description: str
49+
homepage: str
50+
51+
52+
def _parse_topics_frontmatter(text: str) -> dict[str, Any]:
53+
"""Tiny frontmatter parser — mirrors build.py's but self-contained
54+
so this module can be tested in isolation. Supports plain key/value
55+
and bracketed-list values."""
56+
m = _FRONTMATTER_RE.match(text)
57+
if not m:
58+
return {}
59+
raw = m.group(1)
60+
meta: dict[str, Any] = {}
61+
for line in raw.splitlines():
62+
if ":" not in line:
63+
continue
64+
key, _, value = line.partition(":")
65+
value = value.strip()
66+
if len(value) >= 2 and value[0] == value[-1] and value[0] in ("'", '"'):
67+
value = value[1:-1]
68+
if value.startswith("[") and value.endswith("]"):
69+
inner = value[1:-1].strip()
70+
meta[key.strip()] = (
71+
[x.strip() for x in inner.split(",") if x.strip()]
72+
if inner else []
73+
)
74+
else:
75+
meta[key.strip()] = value
76+
return meta
77+
78+
79+
def load_project_profile(
80+
projects_dir: Path,
81+
project_slug: str,
82+
) -> Optional[ProjectTopicsProfile]:
83+
"""Load `<projects_dir>/<slug>.md` and extract the topics profile.
84+
85+
Returns `None` if the file doesn't exist. Missing fields are
86+
omitted from the result dict.
87+
"""
88+
path = projects_dir / f"{project_slug}.md"
89+
if not path.is_file():
90+
return None
91+
try:
92+
text = path.read_text(encoding="utf-8")
93+
except OSError:
94+
return None
95+
meta = _parse_topics_frontmatter(text)
96+
profile: ProjectTopicsProfile = {}
97+
topics = meta.get("topics")
98+
if isinstance(topics, list):
99+
# Normalize: strip, lowercase, dedup, keep order
100+
seen: set[str] = set()
101+
normalized: list[str] = []
102+
for t in topics:
103+
t_clean = str(t).strip().lower()
104+
if t_clean and t_clean not in seen:
105+
seen.add(t_clean)
106+
normalized.append(t_clean)
107+
profile["topics"] = normalized
108+
elif isinstance(topics, str) and topics:
109+
profile["topics"] = [t.strip().lower() for t in topics.strip("[]").split(",") if t.strip()]
110+
description = meta.get("description")
111+
if description:
112+
profile["description"] = str(description)
113+
homepage = meta.get("homepage")
114+
if homepage:
115+
profile["homepage"] = str(homepage)
116+
return profile
117+
118+
119+
def extract_session_topics(
120+
session_metas: Iterable[Mapping[str, Any]],
121+
max_topics: int = 8,
122+
min_count: int = 2,
123+
) -> list[str]:
124+
"""Aggregate tags across a project's sessions and return the most
125+
common non-noise tags. Used as a fallback when there's no explicit
126+
`wiki/projects/<slug>.md` profile.
127+
128+
A tag must appear in at least `min_count` sessions to be included —
129+
filters out one-off stragglers. Returns at most `max_topics` tags,
130+
ordered by frequency descending.
131+
"""
132+
counts: Counter[str] = Counter()
133+
for meta in session_metas:
134+
raw = meta.get("tags")
135+
if isinstance(raw, list):
136+
for t in raw:
137+
tag = str(t).strip().lower()
138+
if tag and tag not in _NOISE_TAGS:
139+
counts[tag] += 1
140+
elif isinstance(raw, str) and raw:
141+
for t in raw.strip("[]").split(","):
142+
tag = t.strip().lower()
143+
if tag and tag not in _NOISE_TAGS:
144+
counts[tag] += 1
145+
filtered = [(tag, c) for tag, c in counts.items() if c >= min_count]
146+
filtered.sort(key=lambda kv: (-kv[1], kv[0]))
147+
return [tag for tag, _ in filtered[:max_topics]]
148+
149+
150+
def get_project_topics(
151+
projects_dir: Path,
152+
project_slug: str,
153+
session_metas: Iterable[Mapping[str, Any]],
154+
) -> list[str]:
155+
"""Return the topic list for a project using the precedence rules
156+
above: explicit profile first, session-tag fallback second."""
157+
profile = load_project_profile(projects_dir, project_slug)
158+
if profile and profile.get("topics"):
159+
return profile["topics"]
160+
return extract_session_topics(session_metas)
161+
162+
163+
# ─── render ──────────────────────────────────────────────────────────────
164+
165+
166+
import html # noqa: E402 — deliberately after the typed definitions
167+
168+
169+
def render_topic_chips(
170+
topics: list[str],
171+
max_visible: int = 6,
172+
classname: str = "project-topics",
173+
) -> str:
174+
"""Render a list of topics as a row of chip elements. Empty list
175+
returns an empty string. Overflow is collapsed into a `+N more`
176+
chip so the row stays one line on narrow cards."""
177+
if not topics:
178+
return ""
179+
visible = topics[:max_visible]
180+
hidden = len(topics) - len(visible)
181+
chip_html = "".join(
182+
f'<span class="topic-chip">{html.escape(t)}</span>'
183+
for t in visible
184+
)
185+
overflow = (
186+
f'<span class="topic-chip topic-chip-more">+{hidden} more</span>'
187+
if hidden > 0 else ""
188+
)
189+
return f'<div class="{html.escape(classname)}">{chip_html}{overflow}</div>'
190+
191+
192+
def render_topic_chips_linked(
193+
topics: list[str],
194+
href_template: str = "../projects/index.html?topic={topic}",
195+
max_visible: int = 6,
196+
classname: str = "project-topics",
197+
) -> str:
198+
"""Same as `render_topic_chips` but wraps each chip in an `<a>` so
199+
clicking a topic can navigate to a filter view. The href is
200+
rendered via `href_template.format(topic=...)` with URL escaping."""
201+
if not topics:
202+
return ""
203+
import urllib.parse
204+
visible = topics[:max_visible]
205+
hidden = len(topics) - len(visible)
206+
chip_html = "".join(
207+
f'<a class="topic-chip" href="{html.escape(href_template.format(topic=urllib.parse.quote(t)))}">'
208+
f'{html.escape(t)}</a>'
209+
for t in visible
210+
)
211+
overflow = (
212+
f'<span class="topic-chip topic-chip-more">+{hidden} more</span>'
213+
if hidden > 0 else ""
214+
)
215+
return f'<div class="{html.escape(classname)}">{chip_html}{overflow}</div>'

0 commit comments

Comments
 (0)