feat(routines): enable tool access in lightweight routine execution (#257)#730
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the capabilities of lightweight routines by integrating tool access. This allows routines to perform more complex, data-driven actions autonomously, such as periodic memory searches or status checks, by leveraging a controlled agentic loop. The implementation prioritizes safety and stability, ensuring that tool usage is strictly governed by predefined constraints and does not introduce new risks or unexpected costs, while maintaining full compatibility with existing routine configurations. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-implemented feature: enabling tool access for lightweight routines. This is achieved by adding a mini-agentic loop that allows the LLM to call tools, process results, and repeat for a limited number of rounds. However, the security audit identified two medium-to-high severity vulnerabilities related to broken access control. Specifically, the use of a shared workspace instance in the routine engine allows for potential cross-user data leakage, violating the rule that tools interacting with user-owned resources must verify user IDs. The lack of validation for notification targets in routine creation also allows for unauthorized messaging. These issues should be addressed by ensuring proper user isolation in the workspace and validating notification permissions. Additionally, a minor suggestion to improve maintainability by replacing a magic number with a shared constant has been noted.
| rt_config.clone(), | ||
| Arc::clone(store), | ||
| self.llm().clone(), | ||
| Arc::clone(workspace), |
There was a problem hiding this comment.
The RoutineEngine is initialized with a single, shared Workspace instance. This workspace has a fixed user_id (likely the one who started the agent or a default one). When routines are executed for different users, they all access the same workspace instance. This allows a user to read or influence files belonging to the workspace's owner by creating routines with specific context_paths or using memory tools, bypassing user isolation. This is a significant Broken Access Control (IDOR) vulnerability in a multi-user environment.
References
- Tools that interact with user-owned resources, such as sandbox jobs, must verify that the authenticated user ID in the tool context matches the resource owner's ID (e.g., by using a
ContextManager) before performing any read or write operations to prevent unauthorized cross-user access.
| let max_tool_rounds = params | ||
| .get("max_tool_rounds") | ||
| .and_then(|v| v.as_u64()) | ||
| .map(|v| v.clamp(1, 20) as u32) |
There was a problem hiding this comment.
To avoid using a magic number and to ensure consistency with other parts of the codebase, it's better to use the MAX_TOOL_ROUNDS_LIMIT constant defined in src/agent/routine.rs. This will keep the upper limit for tool rounds synchronized.
| .map(|v| v.clamp(1, 20) as u32) | |
| .map(|v| v.clamp(1, crate::agent::routine::MAX_TOOL_ROUNDS_LIMIT as u64) as u32) |
zmanian
left a comment
There was a problem hiding this comment.
The design is sound -- optional tool access via a mini agentic loop with defense-in-depth (double-checking approval at execution time). Good test coverage (10 tests). Backwards compatible (use_tools: false default).
Issues to address:
1. Bug: fire_manual() and spawn_fire() don't propagate tools/safety (must fix). The PR adds tools and safety to EngineContext but only updates the cron/event trigger paths. Manual triggers via fire_manual() and spawn_fire() construct EngineContext without the new fields, so use_tools: true silently falls back to text-only for manual triggers.
2. Security: RestartTool passes the Never approval filter (must fix). The filter tool.requires_approval(&json!({})) == ApprovalRequirement::Never lets through RestartTool, which uses the default Never approval. A lightweight routine with use_tools: true could restart the agent autonomously. Also consider memory_write -- routines could write arbitrary content to workspace memory, poisoning future conversations.
Recommended: either add an explicit tool denylist for routine tool access, or change RestartTool to require approval (UnlessAutoApproved).
3. Magic number duplication (minor). The clamping in routine.rs tool creation hardcodes 20 instead of using the MAX_TOOL_ROUNDS_LIMIT constant.
zmanian
left a comment
There was a problem hiding this comment.
Good progress -- all three previous review items addressed (fire_manual/spawn_fire propagation, denylist, magic number). Defense-in-depth approach is solid (approval filtering + denylist at both definition and execution time). 10+ tests.
One remaining issue:
routine_create/routine_update/routine_delete not on denylist -- A lightweight routine with use_tools: true could self-replicate (create copies), modify its own prompt or triggers, or delete other routines. These are autonomy-escalating actions. Add routine_create, routine_update, routine_delete to ROUTINE_TOOL_DENYLIST.
Non-blocking:
- Duplicated sentinel logic in
execute_lightweightshould callinterpret_text_responseinstead _is_errorvariable computed but never used
e3b6e4a to
9aaeb67
Compare
zmanian
left a comment
There was a problem hiding this comment.
Re-review: Third pass
Good progress on most items. Review 1's fire_manual/spawn_fire propagation issue is fixed, the magic number is fixed, the non-blocking _is_error variable is cleaned up. The overall architecture is solid -- per-routine use_tools flag, clamped max_tool_rounds, double-gated with global lightweight_tools_enabled, tool output sanitization through SafetyLayer. The handle_text_response refactor is a nice improvement.
Blocking: routine self-management tools still pass the Never filter
The approval-based filter in execute_routine_tool (line ~1012) blocks UnlessAutoApproved and Always tools but allows all ApprovalRequirement::Never tools through. These Never-approval tools are accessible to lightweight routines and present autonomy-escalation risks:
routine_create-- a routine can self-replicate by creating copies of itselfroutine_update-- a routine can modify its own prompt, triggers, or those of other routinesroutine_delete-- a routine can delete other routinesrestart(RestartTool) -- a routine can restart the agent autonomously
This was the primary feedback from review 2 (March 9). The fix is straightforward: add a ROUTINE_TOOL_DENYLIST constant and check it in execute_routine_tool alongside the approval filter. Something like:
const ROUTINE_TOOL_DENYLIST: &[&str] = &[
"routine_create", "routine_update", "routine_delete",
"routine_fire", "restart",
];Then in execute_routine_tool, before the approval check:
if ROUTINE_TOOL_DENYLIST.contains(&tc.name.as_str()) {
return Err(format!("Tool '{}' is not available in lightweight routines", tc.name).into());
}Non-blocking
-
Sentinel logic still duplicated between
execute_lightweight_no_tools(line ~815) andhandle_text_response. The_no_toolsvariant could callhandle_text_responseinstead of reimplementing the same trim/empty/sentinel checks. Low priority but reduces maintenance surface. -
Tool definitions sent unfiltered to LLM --
ctx.tools.tool_definitions().await(line ~928) sends ALL registered tools to the LLM, including denylisted and approval-gated ones. The LLM will attempt to call them and get errors. Consider filtering the tool definitions to only include tools that would pass both the denylist and approval checks. This improves token efficiency and reduces wasted LLM turns.
- Add ROUTINE_TOOL_DENYLIST to block routine_create/update/delete/fire
and restart from being callable by lightweight routines
- Deduplicate sentinel logic by reusing handle_text_response() in the
no-tools path
- Filter tool definitions sent to LLM to only include callable tools,
avoiding wasted tokens on tools that would be rejected
9aaeb67 to
17830d0
Compare
zmanian
left a comment
There was a problem hiding this comment.
Good work, Reid. The design is solid -- defense-in-depth with both LLM-side filtering (tool_definitions_excluding) and runtime enforcement (execute_routine_tool denylist + approval check). A few observations:
Security -- looks good:
- ROUTINE_TOOL_DENYLIST correctly blocks self-management tools (routine_create/update/delete/fire, restart) preventing autonomy escalation
- ApprovalRequirement::Never filter in tool_definitions_excluding ensures shell and other gated tools never reach the LLM
- Runtime denylist check in execute_routine_tool provides proper defense-in-depth
- Safety layer validation and output sanitization are preserved
Backward compatibility -- clean:
- serde(default) on new fields, from_db parsing with defaults, existing routines unaffected
Minor note (not blocking):
- max_tool_rounds accepts up to 20 (MAX_TOOL_ROUNDS_LIMIT) but execute_lightweight_with_tools hard-caps iterations at .min(5) (pre-existing). This means values 6-20 are silently equivalent to 5. Consider either documenting this or raising the hard cap. Not introduced by this PR so not blocking.
Test coverage: 10 new tests covering clamping bounds, backward compat, denylist validation. Good.
…earai#257) (nearai#730) * Rebase onto staging * fix(routines): prevent autonomy-escalation in lightweight routines - Add ROUTINE_TOOL_DENYLIST to block routine_create/update/delete/fire and restart from being callable by lightweight routines - Deduplicate sentinel logic by reusing handle_text_response() in the no-tools path - Filter tool definitions sent to LLM to only include callable tools, avoiding wasted tokens on tools that would be rejected
…earai#257) (nearai#730) * Rebase onto staging * fix(routines): prevent autonomy-escalation in lightweight routines - Add ROUTINE_TOOL_DENYLIST to block routine_create/update/delete/fire and restart from being callable by lightweight routines - Deduplicate sentinel logic by reusing handle_text_response() in the no-tools path - Filter tool definitions sent to LLM to only include callable tools, avoiding wasted tokens on tools that would be rejected
Closes #257
Allow lightweight routines to optionally invoke tools via a mini agentic
loop (LLM → tool calls → feed results → repeat), unlocking use cases
like periodic memory searches, status checks with HTTP fetches, and
data-driven alerting — all without the overhead of a full scheduler job.
Safety constraints:
Backward compatible: new fields use #[serde(default)], existing routines
parse and behave identically (use_tools defaults to false).
Adds 10 regression tests covering approval filtering, timeout enforcement,
tool failure handling, and max_tool_rounds boundary clamping.