millionco · aidenybai · Jun 19, 2026 · Jun 19, 2026 · Jun 19, 2026 · Jun 19, 2026
diff --git a/.agents/skills/react-doctor/SKILL.md b/.agents/skills/react-doctor/SKILL.md
@@ -1,40 +1,95 @@
 ---
 name: react-doctor
-description: Use when finishing a feature, fixing a bug, before committing React code, or when the user types `/doctor`, asks to scan, triage, or clean up React diagnostics. Covers lint, accessibility, bundle size, architecture. Includes a regression check and a full local-triage workflow that fetches the canonical playbook.
-version: "1.2.0"
+description: Use when writing, finishing, or committing React or React Native code, when the user types `/react-doctor`, or when they ask to scan, triage, lint, profile performance, debug a UI in the browser, or review design and accessibility. Covers lint, accessibility, performance, bundle size, and architecture.
+version: "1.6.0"
 ---
 
 # React Doctor
 
-Scans React codebases for security, performance, correctness, and architecture issues. Outputs a 0–100 health score.
+One skill that makes your agent good at React. It writes better React by default, checks your changes in the background, and opens a real browser to profile performance, reproduce bugs, and review design.
 
-## After making React code changes:
+## Baseline rules (always on)
 
-Run `npx react-doctor@latest --verbose --scope changed` and check the score did not regress.
+Apply these on every React edit, before any tool runs. They shape how you write code, not only what you flag:
 
-If the score dropped, fix the regressions before committing.
+1. Derive state during render, don't duplicate it in another `useState`.
+2. Skip effects for values you can compute while rendering and for logic that belongs in an event handler.
+3. Compose components instead of piling on boolean props.
+4. Lift state only as far as it needs to go, no higher.
+5. Keep one source of truth for each piece of state.
+6. Render without side effects; keep the render pass pure.
+7. Use stable keys in lists, never the array index.
+8. Fetch independent data in parallel, not in a waterfall.
+9. Skip manual `useMemo`, `useCallback`, and `memo`; let the React Compiler handle it.
+10. Handle the loading, error, and empty states, not only the happy path.
 
-## For general cleanup or code improvement:
+## Routing
 
-Run `npx react-doctor@latest --verbose` (the default `--scope full`) to scan the full codebase. Fix issues by severity — errors first, then warnings.
+`/react-doctor` picks the job from what you're doing. Name a job (`/react-doctor perf`) to force it. When the request is genuinely unclear, ask which one rather than guessing.
 
-## /doctor — full local triage workflow
+| Signal                                                  | Job        | What it does                    |
+| ------------------------------------------------------- | ---------- | ------------------------------- |
+| "review", "before commit", "clean up", or changed files | **doctor** | static scan plus 0 to 100 score |
+| "slow", "laggy", "janky", "re-rendering"                | **perf**   | React render + CPU profilers    |
+| "broken", "crashes", "doesn't work" in the UI           | **debug**  | reproduce in a real browser     |
+| "looks off", "polish", a screenshot or pasted element   | **design** | measured UI review              |
 
-When the user types `/doctor`, says "run react doctor", or asks for a full triage / cleanup pass (not just a regression check), fetch the canonical local-triage playbook and follow every step in it:
+doctor runs from code alone, so it is the one that fires in the background. The browser jobs (perf, debug, design) need a live page and are slower, so they run only when asked.
+
+## Which browser to drive
+
+debug, design, and perf need a real Chrome. Two ways to get one:
+
+1. **A browser MCP already in your tools.** Prefer [Chrome DevTools MCP](https://github.com/ChromeDevTools/chrome-devtools-mcp) (`chrome-devtools`) or similar for console, network, and snapshots. It adds full performance traces and Lighthouse on top.
+2. **The bundled `react-doctor browser` command.** Attaches to the Chrome you already have open over the Chrome DevTools Protocol, and launches a dedicated persistent one only as a fallback. It covers `open`, `eval`, `snapshot`, `screenshot`, `console`, `network`, an axe-core `audit`, `perf` (long animation frames with per-script attribution), `profile` (one recording with both lenses — a React render profile of slowest commits, hottest components, and unnecessary re-renders, plus a V8/DevTools CPU profile over CDP with the hottest JS functions ranked by self time), and `report` (every signal in one load).
+
+It is the same Chrome either way, so the playbooks apply to both: `browser open`, `eval`, `snapshot`, and `screenshot` map onto the MCP's `navigate_page`, `evaluate_script`, `take_snapshot`, and `take_screenshot`.
+
+## Run as an MCP server
+
+React Doctor ships its own Model Context Protocol server over stdio so any MCP-capable agent can call the jobs directly:
+
+```bash
+npx react-doctor@latest mcp
+```
+
+It exposes `doctor_scan` (the static scan), the `browser_*` tools (`browser_open`, `browser_eval`, `browser_snapshot`, `browser_screenshot`, `browser_audit`, `browser_console`, `browser_network`, `browser_perf`, `browser_profile`, `browser_report`), and the `debug_*` log server (`debug_serve`, `debug_read_logs`, `debug_clear_logs`). `browser_profile` records both a React render profile and a literal Chrome DevTools CPU profile in one pass.
+
+## doctor: scan and triage
+
+After making React changes, run a regression check and confirm the score did not drop:
+
+```bash
+npx react-doctor@latest --verbose --scope changed
+```
+
+If the score dropped, fix the regressions before committing. For a cleanup of the whole codebase, drop `--scope changed` (the default is `--scope full`) and fix by severity: errors first, then warnings.
+
+When the user types `/react-doctor`, `/doctor`, says "run react doctor", or asks for a full triage or cleanup pass (not a regression check), fetch the canonical local-triage playbook and follow every step in it:
 
 ```bash
 curl --fail --silent --show-error \
   --header 'Cache-Control: no-cache' \
   https://www.react.doctor/prompts/react-doctor-agent.md
 ```
 
-The playbook is the single source of truth — a scan → filter → triage → fix → validate loop that edits the working tree directly (never commits, never opens PRs). Updating the prompt at its source updates every agent on its next fetch — no skill reinstall needed.
+The playbook is the single source of truth: a scan, filter, triage, fix, validate loop that edits the working tree directly and never commits or opens PRs. Updating the prompt at its source updates every agent on its next fetch, no reinstall needed. Pair it with the per-rule prompts at `https://www.react.doctor/prompts/rules/<plugin>/<rule>.md` (fetched on demand inside the playbook) so each fix uses the reviewer-tested recipe.
+
+## perf: profile performance
+
+When the user reports jank, slow interactions, dropped frames, excessive re-renders, or asks to profile or optimize render performance, read [references/performance.md](references/performance.md) and follow it. It runs an evidence-driven profile, analyze, fix, re-profile loop against the real React DevTools profiler export, never guessing from code alone.
+
+## debug: reproduce in a real browser
+
+When the user says something is broken, crashes, throws, or behaves wrong in the running app, read [references/debug.md](references/debug.md) and follow it. It runs the [debug-agent](https://github.com/millionco/debug-agent) loop: generate hypotheses, instrument the code with runtime NDJSON logs, reproduce the bug in the live browser, and fix only once the logs prove the cause.
+
+## design: review and improve UI
 
-Pair it with the matching per-rule prompts at `https://www.react.doctor/prompts/rules/<plugin>/<rule>.md` (fetched on demand inside the playbook) so each fix uses the canonical, reviewer-tested recipe.
+When the user wants to build, polish, or review an interface ("looks off", "make this nicer", a pasted screenshot or element), read [references/design.md](references/design.md) and follow it. It opens the page, takes a screenshot, and reports what it can measure (contrast, line length, spacing, tap-target size), not only taste.
 
 ## Configuring or explaining rules
 
-When the user wants to understand a rule, disagrees with one, or wants to disable / tune which rules run (not fix code), read [references/explain.md](references/explain.md) and follow it. Start with `npx react-doctor@latest rules explain <rule>`, then apply the narrowest control via `npx react-doctor@latest rules disable|set|category|ignore-tag …`, which edits your `doctor.config.*` (or `package.json#reactDoctor`).
+When the user wants to understand a rule, disagrees with one, or wants to disable or tune which rules run (not fix code), read [references/explain.md](references/explain.md) and follow it. Start with `npx react-doctor@latest rules explain <rule>`, then apply the narrowest control via `npx react-doctor@latest rules disable|set|category|ignore-tag …`.
 
 ## Command
 

diff --git a/.agents/skills/react-doctor/references/debug.md b/.agents/skills/react-doctor/references/debug.md
@@ -0,0 +1,87 @@
+# Debugging with runtime evidence
+
+Reproduce and fix UI bugs with runtime evidence, never by guessing from code alone. Use this when the user says something is broken, crashes, throws, hangs, or behaves wrong in the running app.
+
+This is the [debug-agent](https://github.com/millionco/debug-agent) loop, built into React Doctor: hypothesize, instrument with logs, reproduce, analyze the logs, fix only once the logs prove the cause, verify, clean up.
+
+## 0. Start the logging server (before any instrumentation)
+
+The server is long-running. Start it once and keep it up for the whole session. `--daemon` prints the server info and returns, leaving the server running in the background:
+
+```bash
+npx react-doctor debug serve --daemon
+```
+
+It prints one JSON line. Capture and remember:
+
+- `endpoint`: POST your logs here from JS or TS at runtime
+- `logPath`: the NDJSON log file you read after each run
+- `sessionId`: include it in every log payload
+
+The server is idempotent: a second start returns the running server's info. If it fails to start, stop and tell the user. Do not instrument without it.
+
+## 1. Generate hypotheses
+
+Write 3 to 5 precise hypotheses about why the bug happens: a thrown error in a specific component, a failed or duplicated request, a null or undefined access, a state update after unmount, a missing loading or error branch. Aim for more, not fewer. Each hypothesis gets an id (A, B, C, …).
+
+## 2. Instrument the code
+
+Add 2 to 6 logs (never more than 10) at the points that confirm or reject each hypothesis: function entry and exit, values before and after a critical operation, which branch ran. In JS or TS, POST to the server `endpoint`:
+
+```js
+// #region debug log
+fetch("ENDPOINT", {
+  method: "POST",
+  headers: { "Content-Type": "application/json" },
+  body: JSON.stringify({
+    sessionId: "SESSION_ID",
+    hypothesisId: "A",
+    location: "cart.tsx:42",
+    message: "cart total before render",
+    data: { total },
+    timestamp: Date.now(),
+  }),
+}).catch(() => {});
+// #endregion
+```
+
+Wrap every debug log in `// #region debug log` and `// #endregion` so cleanup later is deterministic. Each log maps to at least one `hypothesisId`. Never log secrets or PII.
+
+## 3. Reproduce
+
+Clear the log file (`DELETE` the file at `logPath`) before each run, then trigger the exact behavior the user described:
+
+- **Browser bugs:** drive the repro with whatever controls a live Chrome. The bundled browser core attaches to the Chrome you already have open over the Chrome DevTools Protocol, so the real session, logins, and cookies come along. If nothing debuggable is running, it launches a dedicated persistent Chrome (its own profile) that later commands reattach to, so the flow below works either way. To drive your real logged-in session, open Chrome with `--remote-debugging-port=9222` first and it attaches to that instead. `browser console` and `browser network` hand you the runtime console (with uncaught errors) and the request waterfall with failures flagged, often the evidence you need before instrumenting at all. To get the whole picture in one pass, `browser report` captures console, network, performance, and accessibility in a single page load instead of reloading once per command; prefer it over running the four separately. If [Chrome DevTools MCP](https://github.com/ChromeDevTools/chrome-devtools-mcp) (`chrome-devtools`) is in your tools, it also covers this and adds performance traces and Lighthouse.
+
+```bash
+npx react-doctor browser open http://localhost:3000           # attach + open the page
+npx react-doctor browser report http://localhost:3000         # console + network + perf + a11y in one load
+npx react-doctor browser console http://localhost:3000        # console output + uncaught errors
+npx react-doctor browser network http://localhost:3000        # request waterfall, failures flagged
+npx react-doctor browser snapshot                             # what rendered, by role + name
+npx react-doctor browser eval 'page.getByRole("button", { name: "Checkout" }).click()'
+npx react-doctor browser eval 'page.evaluate(() => document.title)'   # raw DOM when you need it
+```
+
+`snapshot` and `eval` are a pair. `snapshot` lists the rendered elements by role and accessible name. `eval` runs an expression with the Playwright `page` in scope, so you act on what you saw using Playwright's own selectors: `page.locator("text=Login").click()`, `page.getByRole(...)`, `page.fill(...)`, `page.waitForSelector(...)`. For raw DOM, reach through `page.evaluate(() => …)`. No separate ref scheme to track.
+
+- **Backend or CLI bugs:** write and run a small repro script (Node, shell) yourself.
+- Otherwise ask the user for numbered steps, and remind them to restart any app or service whose instrumented files are bundled or cached.
+
+Reuse the same repro pathway for every iteration.
+
+## 4. Analyze the logs
+
+Read the NDJSON at `logPath`. Mark each hypothesis CONFIRMED, REJECTED, or INCONCLUSIVE, citing the specific log lines. If the file is empty, the repro likely did not run the instrumented path, so try again. If every hypothesis is rejected, revert the rejected code changes, generate new hypotheses from a different subsystem, and add more instrumentation.
+
+## 5. Fix, only with proof
+
+Apply the smallest change that addresses the proven cause. Cross-check it against the baseline rules in `SKILL.md` (derive don't duplicate, effects, single source of truth). Do not remove the instrumentation yet. Never use `setTimeout` or `sleep` as a fix.
+
+## 6. Verify
+
+Clear the log file, re-run the same reproduction (tag the logs `runId:"post-fix"` if helpful), and compare before and after with cited lines. Re-run a couple of times to rule out races. No fix is confirmed without log proof.
+
+## 7. Clean up
+
+Once verified, search every file for `#region debug log`, delete each block through its `#endregion`, grep again to confirm none remain, and `git diff` to confirm only the intentional fix is left.
diff --git a/.agents/skills/react-doctor/references/design.md b/.agents/skills/react-doctor/references/design.md
@@ -0,0 +1,52 @@
+# Reviewing and improving UI
+
+Improve interfaces with measured evidence from the rendered page, not taste alone. Use this when the user wants to build, polish, or review a UI: "looks off", "make this nicer", or a pasted screenshot.
+
+The value here is what a screenshot and the live DOM let you measure that reading code cannot: contrast ratios, line length, the spacing scale, and tap-target size. Lead with those, then apply craft.
+
+## Review against the live page
+
+```bash
+npx react-doctor browser open http://localhost:3000
+npx react-doctor browser screenshot --out review.png   # what the user actually sees
+npx react-doctor browser audit                          # axe-core: contrast, names, landmarks
+```
+
+Review responsive breakpoints with `--viewport WIDTHxHEIGHT` (for example `--viewport 390x844` for a phone) on `screenshot`, `snapshot`, `audit`, or `perf`. It emulates the size for that one command via a CDP override, so it never resizes your real browser window:
+
+```bash
+npx react-doctor browser screenshot --viewport 390x844 --out mobile.png
+```
+
+Look at the screenshot, then measure specifics with `eval` (computed styles, bounding boxes, color values) to get objective numbers rather than opinions:
+
+```bash
+npx react-doctor browser eval 'page.evaluate(() => getComputedStyle(document.querySelector("button")).fontSize)'
+```
+
+`browser audit` runs axe-core against the live page and reports accessibility violations (color contrast, missing button or SVG names, heading order, landmarks) with the failing selectors. If [Chrome DevTools MCP](https://github.com/ChromeDevTools/chrome-devtools-mcp) (`chrome-devtools`) is in your tools, its `lighthouse_audit` adds performance and best-practice findings on top. Lead with the measured issues; a smarter model cannot dismiss them as opinion.
+
+## What to check
+
+Measured, in priority order:
+
+1. **Contrast**: body text at least 4.5:1, large text at least 3:1. Report the actual ratio.
+2. **Tap targets**: interactive elements at least 24 × 24 px (ideally 44 × 44 on touch).
+3. **Line length**: body copy roughly 45 to 75 characters per line.
+4. **Spacing**: spacing values come from one consistent scale, not ad-hoc px.
+
+Then craft, drawing on the bundled design rules:
+
+5. **Type**: one clear hierarchy; avoid default system-only stacks for brand surfaces; consistent line-height.
+6. **Color**: a committed palette, not arbitrary hexes; check both light and dark.
+7. **Layout**: alignment, rhythm, and a deliberate focal point.
+8. **State**: hover, focus-visible, disabled, loading, and empty states exist.
+
+## The loop
+
+Build or fix, screenshot, re-audit, compare. Confirm the measured issue you targeted actually moved (the ratio crossed the threshold, the target grew) and that the screenshot looks right before and after.
+
+## Working rules
+
+- Always look at the screenshot; do not review UI from JSX alone.
+- Report measured findings with their numbers; keep taste suggestions short and clearly separate from the measured ones.