feat(cli): implement visual validation framework and TTY smoke tests by mattKorwel · Pull Request #22461 · google-gemini/gemini-cli

mattKorwel · 2026-03-14T19:12:12Z

Summary

This PR implements a comprehensive visual validation and TTY testing framework for the Gemini CLI. It bridges the gap between logic-based integration tests and virtual behavioral evaluations by introducing high-fidelity terminal testing and integrated loop snapshots.

Details

1. New Testing Tools

TTY Bootstrap Smoke Tests: Uses node-pty (via runInteractive) to spawn the actual built binary in a real pseudo-terminal. This validates startup health, binary dependencies, and Ink rendering fidelity.
Integrated Visual Regression: Combines AppRig with toMatchSvgSnapshot to capture full model-user interaction loops. This ensures that UI states are correctly rendered and visible to the user.

2. TDD Example: Adding a Welcome Message

This PR includes a concrete example of using the TTY rig for feature development:

Failing Test: Updated bootstrap.test.ts to expect "Welcome to Gemini CLI!".
Implementation: Added the message to AppHeader.tsx.
Verification: Confirmed the passing test with the built binary.

3. Documentation

Added docs/cli/visual-validation.md with a comparison matrix for maintainers and technical rationale.
Updated docs/integration-tests.md and docs/sidebar.json for discoverability.

Related Issues

Related to UI/UX automation. Note that core fixes for the policy engine were moved to a separate PR (#22462).

How to Validate

Run TTY Smoke Test:
```
npm test -w @google/gemini-cli -- integration-tests/bootstrap.test.ts
```
Verify it passes and confirms the "Welcome to Gemini CLI!" message.

Pre-Merge Checklist

Updated relevant documentation and README (if needed)
Added/updated tests (if needed)
Noted breaking changes (if any)
Validated on required platforms/methods:
- MacOS
  - npm run

gemini-code-assist

Code Review

This pull request introduces a valuable visual validation and TTY testing framework, along with several key bug fixes that improve UI visibility and scheduler stability. The implementation of smoke tests and visual regression testing is a significant step forward for ensuring CLI robustness. However, the PR also includes unrelated changes to package.json and the Vitest configuration, which should be moved to a separate pull request to adhere to the project's contribution guidelines of keeping PRs small and focused.

gemini-code-assist · 2026-03-14T19:14:04Z

package.json

+    "offload": "tsx .gemini/skills/offload/scripts/orchestrator.ts",
+    "offload:setup": "tsx .gemini/skills/offload/scripts/setup.ts",
+    "offload:check": "tsx .gemini/skills/offload/scripts/check.ts",
+    "offload:clean": "tsx .gemini/skills/offload/scripts/clean.ts",
+    "offload:fleet": "tsx .gemini/skills/offload/scripts/fleet.ts",


This pull request introduces several offload:* scripts that seem unrelated to the primary goal of implementing a visual validation framework. Including unrelated changes makes the PR harder to review and can introduce unintended side effects.

References

Pull Requests: Keep PRs small, focused, and linked to an existing issue. ^(link)

When a pull request is intended as a 'pure refactor', defer unrelated changes like performance optimizations. Create a separate issue to address them later to maintain the PR's focus.

gemini-code-assist · 2026-03-14T19:14:04Z

scripts/tests/vitest.config.ts

+    include: [
+      'scripts/tests/**/*.test.{js,ts}',
+      '.gemini/skills/**/tests/*.test.ts',
+    ],


This change to include .gemini/skills/**/tests/*.test.ts in the test configuration appears to be out of scope for this pull request, which focuses on visual and TTY testing. To maintain focus and make reviews more efficient, please move this change to a separate PR that deals with skill-related testing infrastructure.

References

Pull Requests: Keep PRs small, focused, and linked to an existing issue. ^(link)

When a pull request is intended as a 'pure refactor', defer unrelated changes like performance optimizations. Create a separate issue to address them later to maintain the PR's focus.

github-actions · 2026-03-14T19:19:44Z

Size Change: +254 B (0%)

Total Size: 26.3 MB

Filename	Size	Change
`./bundle/chunk-23DI7L5V.js`	0 B	-3.4 kB (removed)	🏆
`./bundle/chunk-RGAW74XF.js`	0 B	-14.6 MB (removed)	🏆
`./bundle/chunk-XN6LIP7Z.js`	0 B	-3.64 MB (removed)	🏆
`./bundle/core-V2XIMZSL.js`	0 B	-43.4 kB (removed)	🏆
`./bundle/devtoolsService-UUX2LQ3U.js`	0 B	-27.7 kB (removed)	🏆
`./bundle/gemini-O3YTTRVA.js`	0 B	-521 kB (removed)	🏆
`./bundle/interactiveCli-R244XGXP.js`	0 B	-1.62 MB (removed)	🏆
`./bundle/oauth2-provider-PTOK7KZE.js`	0 B	-9.16 kB (removed)	🏆
`./bundle/chunk-7CIZ35PT.js`	14.6 MB	+14.6 MB (new file)	🆕
`./bundle/chunk-N7D7VZ4C.js`	3.64 MB	+3.64 MB (new file)	🆕
`./bundle/chunk-TQCNSXQ4.js`	3.4 kB	+3.4 kB (new file)	🆕
`./bundle/core-ZZSTUMBI.js`	43.4 kB	+43.4 kB (new file)	🆕
`./bundle/devtoolsService-UNTPMI42.js`	27.7 kB	+27.7 kB (new file)	🆕
`./bundle/gemini-H4M7AYNX.js`	521 kB	+521 kB (new file)	🆕
`./bundle/interactiveCli-FQG7N2ZB.js`	1.62 MB	+1.62 MB (new file)	🆕
`./bundle/oauth2-provider-CUX42D2I.js`	9.16 kB	+9.16 kB (new file)	🆕

ℹ️ View Unchanged

Filename	Size	Change
`./bundle/chunk-34MYV7JD.js`	2.45 kB	0 B
`./bundle/chunk-5AUYMPVF.js`	858 B	0 B
`./bundle/chunk-664ZODQF.js`	124 kB	0 B
`./bundle/chunk-DAHVX5MI.js`	206 kB	0 B
`./bundle/chunk-IUUIT4SU.js`	56.5 kB	0 B
`./bundle/chunk-IV2KUFMZ.js`	1.96 MB	0 B
`./bundle/chunk-RJTRUG2J.js`	39.8 kB	0 B
`./bundle/cleanup-R4BVQ3OU.js`	0 B	-856 B (removed)	🏆
`./bundle/devtools-36NN55EP.js`	696 kB	0 B
`./bundle/dist-T73EYRDX.js`	356 B	0 B
`./bundle/gemini.js`	2.06 kB	0 B
`./bundle/getMachineId-bsd-TXG52NKR.js`	1.55 kB	0 B
`./bundle/getMachineId-darwin-7OE4DDZ6.js`	1.55 kB	0 B
`./bundle/getMachineId-linux-SHIFKOOX.js`	1.34 kB	0 B
`./bundle/getMachineId-unsupported-5U5DOEYY.js`	1.06 kB	0 B
`./bundle/getMachineId-win-6KLLGOI4.js`	1.72 kB	0 B
`./bundle/memoryDiscovery-CXSTQXLK.js`	922 B	0 B
`./bundle/multipart-parser-KPBZEGQU.js`	11.7 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/client/main.js`	221 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/_client-assets.js`	227 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/index.js`	11.5 kB	0 B
`./bundle/node_modules/@google/gemini-cli-devtools/dist/src/types.js`	132 B	0 B
`./bundle/sandbox-macos-permissive-open.sb`	890 B	0 B
`./bundle/sandbox-macos-permissive-proxied.sb`	1.31 kB	0 B
`./bundle/sandbox-macos-restrictive-open.sb`	3.36 kB	0 B
`./bundle/sandbox-macos-restrictive-proxied.sb`	3.56 kB	0 B
`./bundle/sandbox-macos-strict-open.sb`	4.82 kB	0 B
`./bundle/sandbox-macos-strict-proxied.sb`	5.02 kB	0 B
`./bundle/src-QVCVGIUX.js`	47 kB	0 B
`./bundle/tree-sitter-7U6MW5PS.js`	274 kB	0 B
`./bundle/tree-sitter-bash-34ZGLXVX.js`	1.84 MB	0 B
`./bundle/cleanup-STLZ23AS.js`	856 B	+856 B (new file)	🆕

_{compressed-size-action}

This change introduces a multi-layered validation strategy for the Gemini CLI UI, including: - TTY Bootstrap Smoke Tests using node-pty to validate real terminal startup. - Visual Regression Testing using SVG snapshots and AppRig. - Core fixes for a scheduler hang and suppressed policy violations. - Comprehensive documentation for maintainers.

Verified the TDD workflow by: 1. Adding an expectation for a 'Welcome to Gemini CLI!' message to the TTY bootstrap test. 2. Observing the test failure. 3. Implementing the message in AppHeader.tsx. 4. Verifying the test pass with the built binary.

The core fixes (scheduler hang, policy visibility) have been moved to a standalone PR to ensure atomicity. This PR now focuses exclusively on the testing infrastructure and documentation.

mattKorwel · 2026-03-14T19:35:02Z

I have rebased the branch to remove the unrelated offload changes and moved the core fixes (scheduler hang, policy visibility) to a standalone PR (#22462). This PR now focuses exclusively on the visual validation and TTY testing infrastructure.

mattKorwel · 2026-03-14T19:36:01Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a valuable visual validation and TTY testing framework, which is a great step towards improving UI testing fidelity. The addition of TTY smoke tests and the clear documentation are excellent. My review includes one suggestion in the new bootstrap test to improve its maintainability by removing a redundant check and aligning timeout handling with established patterns and utilities, as per repository rules.

gemini-code-assist · 2026-03-14T19:37:19Z

packages/cli/integration-tests/bootstrap.test.ts

+    await run.expectText(welcomeMessage, 30000);
+    await run.expectText(readyMarker, 30000);


There are a couple of improvements that can be made here:

Redundant Check: The rig.runInteractive() helper already waits for the ready marker (' Type your message or @path/to/file') before its promise resolves. Therefore, the call to run.expectText(readyMarker, ...) on line 34 is redundant and can be removed. The test only needs to verify the new welcome message.

Hardcoded Timeout: The hardcoded timeout of 30000 should be avoided. The expectText function is designed to use getDefaultTimeout() from the test utilities when the timeout argument is omitted. This aligns with the principle of using consistent, managed mechanisms for time-sensitive operations, similar to how AbortSignal is preferred for cancellation over separate timeouts. This allows timeouts to adjust automatically for different environments (e.g., 15s locally, 60s in CI), improving test robustness and maintainability.

By removing the redundant check and the hardcoded timeout, the test becomes cleaner and more aligned with the existing test utilities.

Suggested change

await run.expectText(welcomeMessage, 30000);

await run.expectText(readyMarker, 30000);

await run.expectText(welcomeMessage);

References

Asynchronous operations waiting for user input via the MessageBus should rely on the provided AbortSignal for cancellation, rather than implementing a separate timeout, to maintain consistency with existing patterns. This principle extends to using standardized timeout mechanisms like getDefaultTimeout() in tests instead of hardcoded values.

mattKorwel

🤖 Automated Review Assessment (via Gemini CLI)

The changes in this PR have been functionally verified and analyzed. The implementation is high-quality and follows project patterns.

✅ Functional Verification

TTY Bootstrap Test: The new bootstrap.test.ts passes successfully, confirming the CLI boots correctly in a TTY environment.
UI Welcome Message: Verified that the "Welcome to Gemini CLI!" message renders correctly in the AppHeader and respects user settings (hideBanner).
Documentation: The new guide on visual validation (visual-validation.md) provides excellent technical context.

💡 Suggested Improvements

Redundancy in bootstrap.test.ts: The call to await run.expectText(readyMarker, 30000) at the end of the test is redundant because rig.runInteractive() already waits for this exact marker before returning. Removing it would slightly clean up the test.
Timeout Alignment: The test hardcodes 30000ms. It is recommended to use the default timeout (omit the argument) to allow the test to adapt between local and CI environments (via getDefaultTimeout()).
String Consistency: The readyMarker in the test is defined without the two leading spaces ( ) used internally by the TestRig. While it works (.includes()), alignment with the framework's internal markers is preferred.

Recommendation: Approve after considering the minor test cleanup.

mattKorwel requested review from a team as code owners March 14, 2026 19:12

This comment was marked as outdated.

Sign in to view

gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Mar 14, 2026

gemini-code-assist bot reviewed Mar 14, 2026

View reviewed changes

mattKorwel added 5 commits March 14, 2026 12:30

docs(cli): add TDD welcome message example to visual validation guide

55a138a

chore: remove core fixes from validation PR

6880bc2

The core fixes (scheduler hang, policy visibility) have been moved to a standalone PR to ensure atomicity. This PR now focuses exclusively on the testing infrastructure and documentation.

chore: move policy visual tests to core fixes PR

1a4fbbf

mattKorwel force-pushed the mk-ux-validation branch from 0651ab2 to 1a4fbbf Compare March 14, 2026 19:31

docs(cli): focus visual validation guide on exposure of issues

f4f4c5f

gemini-code-assist bot reviewed Mar 14, 2026

View reviewed changes

mattKorwel mentioned this pull request Mar 15, 2026

fix(core): resolve scheduler hang and improve policy violation visibility #22462

Open

mattKorwel self-assigned this Mar 16, 2026

github-actions bot mentioned this pull request Mar 16, 2026

📊 Bản tin hàng ngày công cụ AI CLI 2026-03-16 compasify/agents-radar#50

Open

mattKorwel commented Mar 16, 2026

View reviewed changes

Merge branch 'main' into mk-ux-validation

a85a2ea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): implement visual validation framework and TTY smoke tests#22461

feat(cli): implement visual validation framework and TTY smoke tests#22461
mattKorwel wants to merge 7 commits intomainfrom
mk-ux-validation

mattKorwel commented Mar 14, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 14, 2026

Uh oh!

gemini-code-assist bot Mar 14, 2026

Uh oh!

github-actions bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

mattKorwel commented Mar 14, 2026

Uh oh!

mattKorwel commented Mar 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 14, 2026

Uh oh!

mattKorwel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		await run.expectText(welcomeMessage, 30000);
		await run.expectText(readyMarker, 30000);

	await run.expectText(welcomeMessage, 30000);
	await run.expectText(readyMarker, 30000);
	await run.expectText(welcomeMessage);

Conversation

mattKorwel commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

1. New Testing Tools

2. TDD Example: Adding a Welcome Message

3. Documentation

Related Issues

How to Validate

Pre-Merge Checklist

Uh oh!

This comment was marked as outdated.

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattKorwel commented Mar 14, 2026

Uh oh!

mattKorwel commented Mar 14, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

mattKorwel left a comment

Choose a reason for hiding this comment

🤖 Automated Review Assessment (via Gemini CLI)

✅ Functional Verification

💡 Suggested Improvements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mattKorwel commented Mar 14, 2026 •

edited

Loading

github-actions bot commented Mar 14, 2026 •

edited

Loading