Fix CI approval flows and stale fixtures#1478
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on enhancing the robustness of CI processes and improving the flexibility of approval workflows. It addresses issues with stale test configurations that could lead to cascading failures and introduces a more versatile mechanism for handling cross-channel approval submissions, particularly for routine creation. The changes aim to streamline development and testing by ensuring more reliable and adaptable system interactions. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request effectively addresses the stated goals of fixing CI approval flows and updating stale fixtures. The removal of the outdated triage plan document and the test fixture updates contribute to a cleaner and more relevant codebase. The core functional change in src/agent/agent_loop.rs to allow approval submissions to target specific threads by UUID across channels is a valuable enhancement for the web approval UI. The corresponding updates to the E2E tests in tests/e2e/scenarios/test_owner_scope.py correctly reflect and validate this new approval mechanism. Overall, the changes are well-implemented and improve the system's functionality and test coverage.
| } else { | ||
| drop(sess); | ||
| self.session_manager | ||
| .resolve_thread( | ||
| &message.user_id, | ||
| &message.channel, | ||
| message.conversation_scope(), | ||
| ) | ||
| .await | ||
| } |
There was a problem hiding this comment.
The self.session_manager.resolve_thread(...) call is duplicated here and in the final else block. While functionally correct, consider extracting this logic into a helper function or a dedicated method within SessionManager to reduce repetition and improve readability. For instance, SessionManager could expose a method like resolve_or_activate_thread_for_approval that encapsulates this specific logic.
| } else { | |
| drop(sess); | |
| self.session_manager | |
| .resolve_thread( | |
| &message.user_id, | |
| &message.channel, | |
| message.conversation_scope(), | |
| ) | |
| .await | |
| } | |
| } else { | |
| // If the target_thread_id is not in the current session, fall back to generic resolution. | |
| // This might create a new thread or resolve an existing one based on channel/scope. | |
| self.session_manager | |
| .resolve_thread( | |
| &message.user_id, | |
| &message.channel, | |
| message.conversation_scope(), | |
| ) | |
| .await | |
| } |
References
- When an issue is found in duplicated code, prefer refactoring into a shared function over applying localized fixes.
- Consolidate related sequences of operations, such as creating, persisting, and scheduling a job, into a single reusable method to improve code consistency and maintainability.
There was a problem hiding this comment.
Thanks. I left this as-is for now to keep the CI fix narrowly scoped, since the duplicated branch is just the fallback into the existing SessionManager path. If we touch this area again, I agree a small helper such as resolve_or_activate_thread_for_approval would make the control flow easier to read.
| - the owner-scoped HTTP webhook channel | ||
| - routine tools / routines tab | ||
| - job creation via routine execution / jobs tab | ||
| """ |
There was a problem hiding this comment.
Yep, that was intentional. The E2E harness runs with sandbox-backed full-job execution disabled, so I narrowed the docstring to match what this scenario actually verifies: owner-scoped routine creation plus web approval/visibility.
There was a problem hiding this comment.
Pull request overview
This PR updates CI/E2E fixtures and approval handling so owner-scoped workflows (especially approvals for work originating in non-gateway channels) behave consistently and stop failing due to stale URL fixtures.
Changes:
- Update E2E owner-scope scenario to approve pending requests via the web API for routines created from the HTTP owner-scoped webhook.
- Adjust agent-loop thread resolution to allow approval submissions to target an already-loaded owned thread UUID across channels.
- Refresh config URL test fixtures (LLM + embeddings) to avoid DNS-related flakiness/ENV_MUTEX poisoning, and remove a stale staging CI triage plan doc.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/e2e/scenarios/test_owner_scope.py | Switches the “full-job routine from HTTP” scenario to use /api/chat/history + /api/chat/approval for explicit approvals. |
| src/agent/agent_loop.rs | Adds an approval-specific thread-resolution fast path using UUID thread targeting. |
| src/config/llm.rs | Updates base URL test fixtures to localhost URLs. |
| src/config/embeddings.rs | Updates embedding base URL fixture to an IP literal to avoid DNS resolution. |
| docs/plans/2026-03-18-staging-ci-triage.md | Removes outdated triage plan document. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let (session, thread_id) = if let Some(target_thread_id) = approval_thread_uuid { | ||
| let session = self | ||
| .session_manager | ||
| .get_or_create_session(&message.user_id) | ||
| .await; | ||
| let mut sess = session.lock().await; | ||
| if sess.threads.contains_key(&target_thread_id) { | ||
| sess.active_thread = Some(target_thread_id); | ||
| sess.last_active_at = chrono::Utc::now(); | ||
| drop(sess); | ||
| (session, target_thread_id) | ||
| } else { | ||
| drop(sess); | ||
| self.session_manager | ||
| .resolve_thread( | ||
| &message.user_id, | ||
| &message.channel, | ||
| message.conversation_scope(), | ||
| ) | ||
| .await | ||
| } |
There was a problem hiding this comment.
In the approval-thread fast path you switch active_thread and return the UUID, but you don't register a thread_map entry for the current (user_id, channel, external_thread_id) key. That means subsequent non-approval gateway messages that include the same UUID can still be routed by resolve_thread() into a new thread because this UUID is already mapped under a different channel key. Consider registering the mapping (e.g., via SessionManager::register_thread) when the UUID is accepted so future requests on this channel+thread_id consistently resolve to the same in-memory thread.
There was a problem hiding this comment.
Fixed in aa426a00. When the approval fast path adopts an already-loaded UUID, it now also registers that thread for the current (user_id, channel, external_thread_id) key via SessionManager::register_thread, so later gateway messages on the same UUID resolve back to the same in-memory thread instead of creating a fresh channel-scoped thread. I also added a focused unit test for the second-channel registration case and reran tests/e2e/scenarios/test_owner_scope.py (3 passed).
|
Followed up on the approval-thread mapping review note in aa426a0. When the approval fast path adopts an already-loaded UUID, it now also registers that thread for the current channel key via |
* Fix CI approval flows and stale fixtures * Backfill approval thread mapping across channels
* Fix CI approval flows and stale fixtures * Backfill approval thread mapping across channels
* Fix CI approval flows and stale fixtures * Backfill approval thread mapping across channels
Summary
ENV_MUTEXpoisoningTesting
tests/e2e/.venv/bin/pytest tests/e2e/scenarios/test_owner_scope.py -q