feat(cli): implement visual validation framework and TTY smoke tests#22461
feat(cli): implement visual validation framework and TTY smoke tests#22461mattKorwel wants to merge 7 commits intomainfrom
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable visual validation and TTY testing framework, along with several key bug fixes that improve UI visibility and scheduler stability. The implementation of smoke tests and visual regression testing is a significant step forward for ensuring CLI robustness. However, the PR also includes unrelated changes to package.json and the Vitest configuration, which should be moved to a separate pull request to adhere to the project's contribution guidelines of keeping PRs small and focused.
package.json
Outdated
| "offload": "tsx .gemini/skills/offload/scripts/orchestrator.ts", | ||
| "offload:setup": "tsx .gemini/skills/offload/scripts/setup.ts", | ||
| "offload:check": "tsx .gemini/skills/offload/scripts/check.ts", | ||
| "offload:clean": "tsx .gemini/skills/offload/scripts/clean.ts", | ||
| "offload:fleet": "tsx .gemini/skills/offload/scripts/fleet.ts", |
There was a problem hiding this comment.
This pull request introduces several offload:* scripts that seem unrelated to the primary goal of implementing a visual validation framework. Including unrelated changes makes the PR harder to review and can introduce unintended side effects.
References
- Pull Requests: Keep PRs small, focused, and linked to an existing issue. (link)
- When a pull request is intended as a 'pure refactor', defer unrelated changes like performance optimizations. Create a separate issue to address them later to maintain the PR's focus.
scripts/tests/vitest.config.ts
Outdated
| include: [ | ||
| 'scripts/tests/**/*.test.{js,ts}', | ||
| '.gemini/skills/**/tests/*.test.ts', | ||
| ], |
There was a problem hiding this comment.
This change to include .gemini/skills/**/tests/*.test.ts in the test configuration appears to be out of scope for this pull request, which focuses on visual and TTY testing. To maintain focus and make reviews more efficient, please move this change to a separate PR that deals with skill-related testing infrastructure.
References
- Pull Requests: Keep PRs small, focused, and linked to an existing issue. (link)
- When a pull request is intended as a 'pure refactor', defer unrelated changes like performance optimizations. Create a separate issue to address them later to maintain the PR's focus.
|
Size Change: +254 B (0%) Total Size: 26.3 MB
ℹ️ View Unchanged
|
This change introduces a multi-layered validation strategy for the Gemini CLI UI, including: - TTY Bootstrap Smoke Tests using node-pty to validate real terminal startup. - Visual Regression Testing using SVG snapshots and AppRig. - Core fixes for a scheduler hang and suppressed policy violations. - Comprehensive documentation for maintainers.
Verified the TDD workflow by: 1. Adding an expectation for a 'Welcome to Gemini CLI!' message to the TTY bootstrap test. 2. Observing the test failure. 3. Implementing the message in AppHeader.tsx. 4. Verifying the test pass with the built binary.
The core fixes (scheduler hang, policy visibility) have been moved to a standalone PR to ensure atomicity. This PR now focuses exclusively on the testing infrastructure and documentation.
0651ab2 to
1a4fbbf
Compare
|
I have rebased the branch to remove the unrelated offload changes and moved the core fixes (scheduler hang, policy visibility) to a standalone PR (#22462). This PR now focuses exclusively on the visual validation and TTY testing infrastructure. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable visual validation and TTY testing framework, which is a great step towards improving UI testing fidelity. The addition of TTY smoke tests and the clear documentation are excellent. My review includes one suggestion in the new bootstrap test to improve its maintainability by removing a redundant check and aligning timeout handling with established patterns and utilities, as per repository rules.
| await run.expectText(welcomeMessage, 30000); | ||
| await run.expectText(readyMarker, 30000); |
There was a problem hiding this comment.
There are a couple of improvements that can be made here:
-
Redundant Check: The
rig.runInteractive()helper already waits for the ready marker (' Type your message or @path/to/file') before its promise resolves. Therefore, the call torun.expectText(readyMarker, ...)on line 34 is redundant and can be removed. The test only needs to verify the new welcome message. -
Hardcoded Timeout: The hardcoded timeout of
30000should be avoided. TheexpectTextfunction is designed to usegetDefaultTimeout()from the test utilities when the timeout argument is omitted. This aligns with the principle of using consistent, managed mechanisms for time-sensitive operations, similar to howAbortSignalis preferred for cancellation over separate timeouts. This allows timeouts to adjust automatically for different environments (e.g., 15s locally, 60s in CI), improving test robustness and maintainability.
By removing the redundant check and the hardcoded timeout, the test becomes cleaner and more aligned with the existing test utilities.
| await run.expectText(welcomeMessage, 30000); | |
| await run.expectText(readyMarker, 30000); | |
| await run.expectText(welcomeMessage); |
References
- Asynchronous operations waiting for user input via the MessageBus should rely on the provided AbortSignal for cancellation, rather than implementing a separate timeout, to maintain consistency with existing patterns. This principle extends to using standardized timeout mechanisms like
getDefaultTimeout()in tests instead of hardcoded values.
mattKorwel
left a comment
There was a problem hiding this comment.
🤖 Automated Review Assessment (via Gemini CLI)
The changes in this PR have been functionally verified and analyzed. The implementation is high-quality and follows project patterns.
✅ Functional Verification
- TTY Bootstrap Test: The new
bootstrap.test.tspasses successfully, confirming the CLI boots correctly in a TTY environment. - UI Welcome Message: Verified that the "Welcome to Gemini CLI!" message renders correctly in the
AppHeaderand respects user settings (hideBanner). - Documentation: The new guide on visual validation (
visual-validation.md) provides excellent technical context.
💡 Suggested Improvements
- Redundancy in
bootstrap.test.ts: The call toawait run.expectText(readyMarker, 30000)at the end of the test is redundant becauserig.runInteractive()already waits for this exact marker before returning. Removing it would slightly clean up the test. - Timeout Alignment: The test hardcodes
30000ms. It is recommended to use the default timeout (omit the argument) to allow the test to adapt between local and CI environments (viagetDefaultTimeout()). - String Consistency: The
readyMarkerin the test is defined without the two leading spaces () used internally by theTestRig. While it works (.includes()), alignment with the framework's internal markers is preferred.
Recommendation: Approve after considering the minor test cleanup.
Summary
This PR implements a comprehensive visual validation and TTY testing framework for the Gemini CLI. It bridges the gap between logic-based integration tests and virtual behavioral evaluations by introducing high-fidelity terminal testing and integrated loop snapshots.
Details
1. New Testing Tools
node-pty(viarunInteractive) to spawn the actual built binary in a real pseudo-terminal. This validates startup health, binary dependencies, and Ink rendering fidelity.AppRigwithtoMatchSvgSnapshotto capture full model-user interaction loops. This ensures that UI states are correctly rendered and visible to the user.2. TDD Example: Adding a Welcome Message
This PR includes a concrete example of using the TTY rig for feature development:
bootstrap.test.tsto expect "Welcome to Gemini CLI!".AppHeader.tsx.3. Documentation
docs/cli/visual-validation.mdwith a comparison matrix for maintainers and technical rationale.docs/integration-tests.mdanddocs/sidebar.jsonfor discoverability.Related Issues
Related to UI/UX automation. Note that core fixes for the policy engine were moved to a separate PR (#22462).
How to Validate
npm test -w @google/gemini-cli -- integration-tests/bootstrap.test.tsPre-Merge Checklist