-
Notifications
You must be signed in to change notification settings - Fork 416
feat(react): /react-doctor umbrella skill + in-house browser core + debug-agent debug job #853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
aidenybai
wants to merge
10
commits into
main
Choose a base branch
from
feat/react-skill
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 9 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
a7d5a69
feat(react): /react-doctor umbrella skill, in-house browser core, and…
aidenybai 5b1dc39
feat(react): add `react-doctor mcp` Model Context Protocol server
aidenybai a3484a9
refactor(mcp): collapse the read-only browser tools into a registrati…
aidenybai 29178ba
fix(mcp): harden debug fetch + clear viewport override (thermos review)
aidenybai bbf744e
refactor(cli): reuse the shared Viewport type for --viewport
aidenybai 068f3b2
fix(mcp): allowlist debug endpoints + guard non-loopback bind (thermo…
aidenybai 7665dad
refactor(browser): encapsulate playwright/axe laziness in the package
aidenybai c51eed3
feat(browser): add combined React + CPU profiler
aidenybai a7c4d4a
fix(cli): allowlist browser profile's --interaction flag (bugbot)
aidenybai c4e5658
refactor: drop comments that restate names or duplicate doc
aidenybai File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| # Debugging with runtime evidence | ||
|
|
||
| Reproduce and fix UI bugs with runtime evidence, never by guessing from code alone. Use this when the user says something is broken, crashes, throws, hangs, or behaves wrong in the running app. | ||
|
|
||
| This is the [debug-agent](https://github.com/millionco/debug-agent) loop, built into React Doctor: hypothesize, instrument with logs, reproduce, analyze the logs, fix only once the logs prove the cause, verify, clean up. | ||
|
|
||
| ## 0. Start the logging server (before any instrumentation) | ||
|
|
||
| The server is long-running. Start it once and keep it up for the whole session. `--daemon` prints the server info and returns, leaving the server running in the background: | ||
|
|
||
| ```bash | ||
| npx react-doctor debug serve --daemon | ||
| ``` | ||
|
|
||
| It prints one JSON line. Capture and remember: | ||
|
|
||
| - `endpoint`: POST your logs here from JS or TS at runtime | ||
| - `logPath`: the NDJSON log file you read after each run | ||
| - `sessionId`: include it in every log payload | ||
|
|
||
| The server is idempotent: a second start returns the running server's info. If it fails to start, stop and tell the user. Do not instrument without it. | ||
|
|
||
| ## 1. Generate hypotheses | ||
|
|
||
| Write 3 to 5 precise hypotheses about why the bug happens: a thrown error in a specific component, a failed or duplicated request, a null or undefined access, a state update after unmount, a missing loading or error branch. Aim for more, not fewer. Each hypothesis gets an id (A, B, C, …). | ||
|
|
||
| ## 2. Instrument the code | ||
|
|
||
| Add 2 to 6 logs (never more than 10) at the points that confirm or reject each hypothesis: function entry and exit, values before and after a critical operation, which branch ran. In JS or TS, POST to the server `endpoint`: | ||
|
|
||
| ```js | ||
| // #region debug log | ||
| fetch("ENDPOINT", { | ||
| method: "POST", | ||
| headers: { "Content-Type": "application/json" }, | ||
| body: JSON.stringify({ | ||
| sessionId: "SESSION_ID", | ||
| hypothesisId: "A", | ||
| location: "cart.tsx:42", | ||
| message: "cart total before render", | ||
| data: { total }, | ||
| timestamp: Date.now(), | ||
| }), | ||
| }).catch(() => {}); | ||
| // #endregion | ||
| ``` | ||
|
|
||
| Wrap every debug log in `// #region debug log` and `// #endregion` so cleanup later is deterministic. Each log maps to at least one `hypothesisId`. Never log secrets or PII. | ||
|
|
||
| ## 3. Reproduce | ||
|
|
||
| Clear the log file (`DELETE` the file at `logPath`) before each run, then trigger the exact behavior the user described: | ||
|
|
||
| - **Browser bugs:** drive the repro with whatever controls a live Chrome. The bundled browser core attaches to the Chrome you already have open over the Chrome DevTools Protocol, so the real session, logins, and cookies come along. If nothing debuggable is running, it launches a dedicated persistent Chrome (its own profile) that later commands reattach to, so the flow below works either way. To drive your real logged-in session, open Chrome with `--remote-debugging-port=9222` first and it attaches to that instead. `browser console` and `browser network` hand you the runtime console (with uncaught errors) and the request waterfall with failures flagged, often the evidence you need before instrumenting at all. To get the whole picture in one pass, `browser report` captures console, network, performance, and accessibility in a single page load instead of reloading once per command; prefer it over running the four separately. If [Chrome DevTools MCP](https://github.com/ChromeDevTools/chrome-devtools-mcp) (`chrome-devtools`) is in your tools, it also covers this and adds performance traces and Lighthouse. | ||
|
|
||
| ```bash | ||
| npx react-doctor browser open http://localhost:3000 # attach + open the page | ||
| npx react-doctor browser report http://localhost:3000 # console + network + perf + a11y in one load | ||
| npx react-doctor browser console http://localhost:3000 # console output + uncaught errors | ||
| npx react-doctor browser network http://localhost:3000 # request waterfall, failures flagged | ||
| npx react-doctor browser snapshot # what rendered, by role + name | ||
| npx react-doctor browser eval 'page.getByRole("button", { name: "Checkout" }).click()' | ||
| npx react-doctor browser eval 'page.evaluate(() => document.title)' # raw DOM when you need it | ||
| ``` | ||
|
|
||
| `snapshot` and `eval` are a pair. `snapshot` lists the rendered elements by role and accessible name. `eval` runs an expression with the Playwright `page` in scope, so you act on what you saw using Playwright's own selectors: `page.locator("text=Login").click()`, `page.getByRole(...)`, `page.fill(...)`, `page.waitForSelector(...)`. For raw DOM, reach through `page.evaluate(() => …)`. No separate ref scheme to track. | ||
|
|
||
| - **Backend or CLI bugs:** write and run a small repro script (Node, shell) yourself. | ||
| - Otherwise ask the user for numbered steps, and remind them to restart any app or service whose instrumented files are bundled or cached. | ||
|
|
||
| Reuse the same repro pathway for every iteration. | ||
|
|
||
| ## 4. Analyze the logs | ||
|
|
||
| Read the NDJSON at `logPath`. Mark each hypothesis CONFIRMED, REJECTED, or INCONCLUSIVE, citing the specific log lines. If the file is empty, the repro likely did not run the instrumented path, so try again. If every hypothesis is rejected, revert the rejected code changes, generate new hypotheses from a different subsystem, and add more instrumentation. | ||
|
|
||
| ## 5. Fix, only with proof | ||
|
|
||
| Apply the smallest change that addresses the proven cause. Cross-check it against the baseline rules in `SKILL.md` (derive don't duplicate, effects, single source of truth). Do not remove the instrumentation yet. Never use `setTimeout` or `sleep` as a fix. | ||
|
|
||
| ## 6. Verify | ||
|
|
||
| Clear the log file, re-run the same reproduction (tag the logs `runId:"post-fix"` if helpful), and compare before and after with cited lines. Re-run a couple of times to rule out races. No fix is confirmed without log proof. | ||
|
|
||
| ## 7. Clean up | ||
|
|
||
| Once verified, search every file for `#region debug log`, delete each block through its `#endregion`, grep again to confirm none remain, and `git diff` to confirm only the intentional fix is left. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,52 @@ | ||
| # Reviewing and improving UI | ||
|
|
||
| Improve interfaces with measured evidence from the rendered page, not taste alone. Use this when the user wants to build, polish, or review a UI: "looks off", "make this nicer", or a pasted screenshot. | ||
|
|
||
| The value here is what a screenshot and the live DOM let you measure that reading code cannot: contrast ratios, line length, the spacing scale, and tap-target size. Lead with those, then apply craft. | ||
|
|
||
| ## Review against the live page | ||
|
|
||
| ```bash | ||
| npx react-doctor browser open http://localhost:3000 | ||
| npx react-doctor browser screenshot --out review.png # what the user actually sees | ||
| npx react-doctor browser audit # axe-core: contrast, names, landmarks | ||
| ``` | ||
|
|
||
| Review responsive breakpoints with `--viewport WIDTHxHEIGHT` (for example `--viewport 390x844` for a phone) on `screenshot`, `snapshot`, `audit`, or `perf`. It emulates the size for that one command via a CDP override, so it never resizes your real browser window: | ||
|
|
||
| ```bash | ||
| npx react-doctor browser screenshot --viewport 390x844 --out mobile.png | ||
| ``` | ||
|
|
||
| Look at the screenshot, then measure specifics with `eval` (computed styles, bounding boxes, color values) to get objective numbers rather than opinions: | ||
|
|
||
| ```bash | ||
| npx react-doctor browser eval 'page.evaluate(() => getComputedStyle(document.querySelector("button")).fontSize)' | ||
| ``` | ||
|
|
||
| `browser audit` runs axe-core against the live page and reports accessibility violations (color contrast, missing button or SVG names, heading order, landmarks) with the failing selectors. If [Chrome DevTools MCP](https://github.com/ChromeDevTools/chrome-devtools-mcp) (`chrome-devtools`) is in your tools, its `lighthouse_audit` adds performance and best-practice findings on top. Lead with the measured issues; a smarter model cannot dismiss them as opinion. | ||
|
|
||
| ## What to check | ||
|
|
||
| Measured, in priority order: | ||
|
|
||
| 1. **Contrast**: body text at least 4.5:1, large text at least 3:1. Report the actual ratio. | ||
| 2. **Tap targets**: interactive elements at least 24 × 24 px (ideally 44 × 44 on touch). | ||
| 3. **Line length**: body copy roughly 45 to 75 characters per line. | ||
| 4. **Spacing**: spacing values come from one consistent scale, not ad-hoc px. | ||
|
|
||
| Then craft, drawing on the bundled design rules: | ||
|
|
||
| 5. **Type**: one clear hierarchy; avoid default system-only stacks for brand surfaces; consistent line-height. | ||
| 6. **Color**: a committed palette, not arbitrary hexes; check both light and dark. | ||
| 7. **Layout**: alignment, rhythm, and a deliberate focal point. | ||
| 8. **State**: hover, focus-visible, disabled, loading, and empty states exist. | ||
|
|
||
| ## The loop | ||
|
|
||
| Build or fix, screenshot, re-audit, compare. Confirm the measured issue you targeted actually moved (the ratio crossed the threshold, the target grew) and that the screenshot looks right before and after. | ||
|
|
||
| ## Working rules | ||
|
|
||
| - Always look at the screenshot; do not review UI from JSX alone. | ||
| - Report measured findings with their numbers; keep taste suggestions short and clearly separate from the measured ones. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.