Skip to content

feat(cli): implement visual validation framework and TTY smoke tests#22461

Open
mattKorwel wants to merge 7 commits intomainfrom
mk-ux-validation
Open

feat(cli): implement visual validation framework and TTY smoke tests#22461
mattKorwel wants to merge 7 commits intomainfrom
mk-ux-validation

Conversation

@mattKorwel
Copy link
Copy Markdown
Collaborator

@mattKorwel mattKorwel commented Mar 14, 2026

Summary

This PR implements a comprehensive visual validation and TTY testing framework for the Gemini CLI. It bridges the gap between logic-based integration tests and virtual behavioral evaluations by introducing high-fidelity terminal testing and integrated loop snapshots.

Details

1. New Testing Tools

  • TTY Bootstrap Smoke Tests: Uses node-pty (via runInteractive) to spawn the actual built binary in a real pseudo-terminal. This validates startup health, binary dependencies, and Ink rendering fidelity.
  • Integrated Visual Regression: Combines AppRig with toMatchSvgSnapshot to capture full model-user interaction loops. This ensures that UI states are correctly rendered and visible to the user.

2. TDD Example: Adding a Welcome Message

This PR includes a concrete example of using the TTY rig for feature development:

  1. Failing Test: Updated bootstrap.test.ts to expect "Welcome to Gemini CLI!".
  2. Implementation: Added the message to AppHeader.tsx.
  3. Verification: Confirmed the passing test with the built binary.

3. Documentation

  • Added docs/cli/visual-validation.md with a comparison matrix for maintainers and technical rationale.
  • Updated docs/integration-tests.md and docs/sidebar.json for discoverability.

Related Issues

Related to UI/UX automation. Note that core fixes for the policy engine were moved to a separate PR (#22462).

How to Validate

  1. Run TTY Smoke Test:
    npm test -w @google/gemini-cli -- integration-tests/bootstrap.test.ts
    Verify it passes and confirms the "Welcome to Gemini CLI!" message.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run

@mattKorwel mattKorwel requested review from a team as code owners March 14, 2026 19:12
@gemini-code-assist

This comment was marked as outdated.

@gemini-cli gemini-cli bot added the status/need-issue Pull requests that need to have an associated issue. label Mar 14, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable visual validation and TTY testing framework, along with several key bug fixes that improve UI visibility and scheduler stability. The implementation of smoke tests and visual regression testing is a significant step forward for ensuring CLI robustness. However, the PR also includes unrelated changes to package.json and the Vitest configuration, which should be moved to a separate pull request to adhere to the project's contribution guidelines of keeping PRs small and focused.

package.json Outdated
Comment on lines +66 to +70
"offload": "tsx .gemini/skills/offload/scripts/orchestrator.ts",
"offload:setup": "tsx .gemini/skills/offload/scripts/setup.ts",
"offload:check": "tsx .gemini/skills/offload/scripts/check.ts",
"offload:clean": "tsx .gemini/skills/offload/scripts/clean.ts",
"offload:fleet": "tsx .gemini/skills/offload/scripts/fleet.ts",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This pull request introduces several offload:* scripts that seem unrelated to the primary goal of implementing a visual validation framework. Including unrelated changes makes the PR harder to review and can introduce unintended side effects.

References
  1. Pull Requests: Keep PRs small, focused, and linked to an existing issue. (link)
  2. When a pull request is intended as a 'pure refactor', defer unrelated changes like performance optimizations. Create a separate issue to address them later to maintain the PR's focus.

Comment on lines +13 to +16
include: [
'scripts/tests/**/*.test.{js,ts}',
'.gemini/skills/**/tests/*.test.ts',
],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change to include .gemini/skills/**/tests/*.test.ts in the test configuration appears to be out of scope for this pull request, which focuses on visual and TTY testing. To maintain focus and make reviews more efficient, please move this change to a separate PR that deals with skill-related testing infrastructure.

References
  1. Pull Requests: Keep PRs small, focused, and linked to an existing issue. (link)
  2. When a pull request is intended as a 'pure refactor', defer unrelated changes like performance optimizations. Create a separate issue to address them later to maintain the PR's focus.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 14, 2026

Size Change: +254 B (0%)

Total Size: 26.3 MB

Filename Size Change
./bundle/chunk-23DI7L5V.js 0 B -3.4 kB (removed) 🏆
./bundle/chunk-RGAW74XF.js 0 B -14.6 MB (removed) 🏆
./bundle/chunk-XN6LIP7Z.js 0 B -3.64 MB (removed) 🏆
./bundle/core-V2XIMZSL.js 0 B -43.4 kB (removed) 🏆
./bundle/devtoolsService-UUX2LQ3U.js 0 B -27.7 kB (removed) 🏆
./bundle/gemini-O3YTTRVA.js 0 B -521 kB (removed) 🏆
./bundle/interactiveCli-R244XGXP.js 0 B -1.62 MB (removed) 🏆
./bundle/oauth2-provider-PTOK7KZE.js 0 B -9.16 kB (removed) 🏆
./bundle/chunk-7CIZ35PT.js 14.6 MB +14.6 MB (new file) 🆕
./bundle/chunk-N7D7VZ4C.js 3.64 MB +3.64 MB (new file) 🆕
./bundle/chunk-TQCNSXQ4.js 3.4 kB +3.4 kB (new file) 🆕
./bundle/core-ZZSTUMBI.js 43.4 kB +43.4 kB (new file) 🆕
./bundle/devtoolsService-UNTPMI42.js 27.7 kB +27.7 kB (new file) 🆕
./bundle/gemini-H4M7AYNX.js 521 kB +521 kB (new file) 🆕
./bundle/interactiveCli-FQG7N2ZB.js 1.62 MB +1.62 MB (new file) 🆕
./bundle/oauth2-provider-CUX42D2I.js 9.16 kB +9.16 kB (new file) 🆕
ℹ️ View Unchanged
Filename Size Change
./bundle/chunk-34MYV7JD.js 2.45 kB 0 B
./bundle/chunk-5AUYMPVF.js 858 B 0 B
./bundle/chunk-664ZODQF.js 124 kB 0 B
./bundle/chunk-DAHVX5MI.js 206 kB 0 B
./bundle/chunk-IUUIT4SU.js 56.5 kB 0 B
./bundle/chunk-IV2KUFMZ.js 1.96 MB 0 B
./bundle/chunk-RJTRUG2J.js 39.8 kB 0 B
./bundle/cleanup-R4BVQ3OU.js 0 B -856 B (removed) 🏆
./bundle/devtools-36NN55EP.js 696 kB 0 B
./bundle/dist-T73EYRDX.js 356 B 0 B
./bundle/gemini.js 2.06 kB 0 B
./bundle/getMachineId-bsd-TXG52NKR.js 1.55 kB 0 B
./bundle/getMachineId-darwin-7OE4DDZ6.js 1.55 kB 0 B
./bundle/getMachineId-linux-SHIFKOOX.js 1.34 kB 0 B
./bundle/getMachineId-unsupported-5U5DOEYY.js 1.06 kB 0 B
./bundle/getMachineId-win-6KLLGOI4.js 1.72 kB 0 B
./bundle/memoryDiscovery-CXSTQXLK.js 922 B 0 B
./bundle/multipart-parser-KPBZEGQU.js 11.7 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/client/main.js 221 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/_client-assets.js 227 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/index.js 11.5 kB 0 B
./bundle/node_modules/@google/gemini-cli-devtools/dist/src/types.js 132 B 0 B
./bundle/sandbox-macos-permissive-open.sb 890 B 0 B
./bundle/sandbox-macos-permissive-proxied.sb 1.31 kB 0 B
./bundle/sandbox-macos-restrictive-open.sb 3.36 kB 0 B
./bundle/sandbox-macos-restrictive-proxied.sb 3.56 kB 0 B
./bundle/sandbox-macos-strict-open.sb 4.82 kB 0 B
./bundle/sandbox-macos-strict-proxied.sb 5.02 kB 0 B
./bundle/src-QVCVGIUX.js 47 kB 0 B
./bundle/tree-sitter-7U6MW5PS.js 274 kB 0 B
./bundle/tree-sitter-bash-34ZGLXVX.js 1.84 MB 0 B
./bundle/cleanup-STLZ23AS.js 856 B +856 B (new file) 🆕

compressed-size-action

This change introduces a multi-layered validation strategy for the Gemini CLI UI, including:
- TTY Bootstrap Smoke Tests using node-pty to validate real terminal startup.
- Visual Regression Testing using SVG snapshots and AppRig.
- Core fixes for a scheduler hang and suppressed policy violations.
- Comprehensive documentation for maintainers.
Verified the TDD workflow by:
1. Adding an expectation for a 'Welcome to Gemini CLI!' message to the TTY bootstrap test.
2. Observing the test failure.
3. Implementing the message in AppHeader.tsx.
4. Verifying the test pass with the built binary.
The core fixes (scheduler hang, policy visibility) have been moved to a standalone PR to ensure atomicity. This PR now focuses exclusively on the testing infrastructure and documentation.
@mattKorwel
Copy link
Copy Markdown
Collaborator Author

I have rebased the branch to remove the unrelated offload changes and moved the core fixes (scheduler hang, policy visibility) to a standalone PR (#22462). This PR now focuses exclusively on the visual validation and TTY testing infrastructure.

@mattKorwel
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable visual validation and TTY testing framework, which is a great step towards improving UI testing fidelity. The addition of TTY smoke tests and the clear documentation are excellent. My review includes one suggestion in the new bootstrap test to improve its maintainability by removing a redundant check and aligning timeout handling with established patterns and utilities, as per repository rules.

Comment on lines +33 to +34
await run.expectText(welcomeMessage, 30000);
await run.expectText(readyMarker, 30000);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are a couple of improvements that can be made here:

  1. Redundant Check: The rig.runInteractive() helper already waits for the ready marker (' Type your message or @path/to/file') before its promise resolves. Therefore, the call to run.expectText(readyMarker, ...) on line 34 is redundant and can be removed. The test only needs to verify the new welcome message.

  2. Hardcoded Timeout: The hardcoded timeout of 30000 should be avoided. The expectText function is designed to use getDefaultTimeout() from the test utilities when the timeout argument is omitted. This aligns with the principle of using consistent, managed mechanisms for time-sensitive operations, similar to how AbortSignal is preferred for cancellation over separate timeouts. This allows timeouts to adjust automatically for different environments (e.g., 15s locally, 60s in CI), improving test robustness and maintainability.

By removing the redundant check and the hardcoded timeout, the test becomes cleaner and more aligned with the existing test utilities.

Suggested change
await run.expectText(welcomeMessage, 30000);
await run.expectText(readyMarker, 30000);
await run.expectText(welcomeMessage);
References
  1. Asynchronous operations waiting for user input via the MessageBus should rely on the provided AbortSignal for cancellation, rather than implementing a separate timeout, to maintain consistency with existing patterns. This principle extends to using standardized timeout mechanisms like getDefaultTimeout() in tests instead of hardcoded values.

Copy link
Copy Markdown
Collaborator Author

@mattKorwel mattKorwel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Automated Review Assessment (via Gemini CLI)

The changes in this PR have been functionally verified and analyzed. The implementation is high-quality and follows project patterns.

✅ Functional Verification

  • TTY Bootstrap Test: The new bootstrap.test.ts passes successfully, confirming the CLI boots correctly in a TTY environment.
  • UI Welcome Message: Verified that the "Welcome to Gemini CLI!" message renders correctly in the AppHeader and respects user settings (hideBanner).
  • Documentation: The new guide on visual validation (visual-validation.md) provides excellent technical context.

💡 Suggested Improvements

  • Redundancy in bootstrap.test.ts: The call to await run.expectText(readyMarker, 30000) at the end of the test is redundant because rig.runInteractive() already waits for this exact marker before returning. Removing it would slightly clean up the test.
  • Timeout Alignment: The test hardcodes 30000ms. It is recommended to use the default timeout (omit the argument) to allow the test to adapt between local and CI environments (via getDefaultTimeout()).
  • String Consistency: The readyMarker in the test is defined without the two leading spaces ( ) used internally by the TestRig. While it works (.includes()), alignment with the framework's internal markers is preferred.

Recommendation: Approve after considering the minor test cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/need-issue Pull requests that need to have an associated issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant