Skip to content

feat(routines): enable tool access in lightweight routine execution (#257)#730

Merged
zmanian merged 2 commits intonearai:stagingfrom
reidliu41:feat/257-routine-tool-access
Mar 12, 2026
Merged

feat(routines): enable tool access in lightweight routine execution (#257)#730
zmanian merged 2 commits intonearai:stagingfrom
reidliu41:feat/257-routine-tool-access

Conversation

@reidliu41
Copy link
Copy Markdown
Contributor

Closes #257
Allow lightweight routines to optionally invoke tools via a mini agentic
loop (LLM → tool calls → feed results → repeat), unlocking use cases
like periodic memory searches, status checks with HTTP fetches, and
data-driven alerting — all without the overhead of a full scheduler job.

Safety constraints:

  • Only tools with ApprovalRequirement::Never can execute (no human-in-the-loop tools)
  • Tool parameters validated through SafetyLayer before execution
  • Per-tool timeout enforced (mirrors dispatcher.rs guarded path)
  • Tool output sanitized before feeding back to LLM
  • max_tool_rounds clamped to [1, 20] to prevent runaway loops and cost explosion

Backward compatible: new fields use #[serde(default)], existing routines
parse and behave identically (use_tools defaults to false).

Adds 10 regression tests covering approval filtering, timeout enforcement,
tool failure handling, and max_tool_rounds boundary clamping.

@github-actions github-actions Bot added scope: agent Agent core (agent loop, router, scheduler) scope: tool/builtin Built-in tools size: XL 500+ changed lines risk: medium Business logic, config, or moderate-risk modules contributor: regular 2-5 merged PRs labels Mar 8, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the capabilities of lightweight routines by integrating tool access. This allows routines to perform more complex, data-driven actions autonomously, such as periodic memory searches or status checks, by leveraging a controlled agentic loop. The implementation prioritizes safety and stability, ensuring that tool usage is strictly governed by predefined constraints and does not introduce new risks or unexpected costs, while maintaining full compatibility with existing routine configurations.

Highlights

  • Tool Access for Lightweight Routines: Enabled lightweight routines to optionally invoke tools through a mini-agentic loop (LLM → tool calls → feed results → repeat), allowing for more dynamic and powerful automated tasks without the overhead of a full scheduler job.
  • Safety Constraints Implemented: Introduced several safety measures for tool execution within routines, including restricting tools to those with ApprovalRequirement::Never, validating tool parameters via a SafetyLayer, enforcing per-tool timeouts, sanitizing tool output, and clamping max_tool_rounds to prevent runaway loops and cost explosions.
  • Backward Compatibility Ensured: New fields (use_tools, max_tool_rounds) were added with #[serde(default)], ensuring existing routines parse and behave identically, with use_tools defaulting to false.
  • Comprehensive Regression Tests: Added 10 new regression tests covering critical aspects such as approval filtering, timeout enforcement, tool failure handling, and boundary clamping for max_tool_rounds.
Changelog
  • src/agent/agent_loop.rs
    • Updated RoutineEngine initialization to include tool registry and safety layer, enabling tool access for lightweight routines.
  • src/agent/routine.rs
    • Modified RoutineAction::Lightweight enum variant to include use_tools (boolean) and max_tool_rounds (u32) fields.
    • Added default_max_tool_rounds function, MAX_TOOL_ROUNDS_LIMIT constant, and clamp_max_tool_rounds function for tool round management.
    • Updated RoutineAction::from_db and to_config_json implementations to correctly handle the new use_tools and max_tool_rounds fields.
    • Imported MAX_TOOL_ROUNDS_LIMIT in the test module.
    • Added new test cases for Lightweight action roundtrip serialization/deserialization, backward compatibility, and max_tool_rounds clamping behavior.
  • src/agent/routine_engine.rs
    • Imported necessary types for tool and safety layer integration, including ToolCompletionRequest, ToolCompletionResponse, ToolDefinition, SafetyLayer, ToolRegistry, and ApprovalRequirement.
    • Extended RoutineEngine and EngineContext structs with optional tools (Arc) and safety (Arc) fields.
    • Added a with_tools builder method to RoutineEngine for setting the tool registry and safety layer.
    • Modified execute_routine and execute_lightweight functions to accept and utilize the new use_tools and max_tool_rounds parameters.
    • Introduced safe_tool_definitions to filter tools based on ApprovalRequirement::Never for autonomous execution.
    • Implemented execute_lightweight_with_tools to manage the mini-agentic loop (LLM -> tool calls -> feed results) for routines with tool access.
    • Added execute_routine_tool to handle the guarded execution of individual tool calls, including parameter validation, timeout enforcement, and output sanitization.
    • Added extensive test helpers for mock tools (SafeTool, ApprovalTool, SlowTool, FailingTool) and new tests to verify tool filtering, timeout, failure handling, and unknown tool scenarios.
  • src/testing.rs
    • Updated make_test_routine and make_test_routine_with_guardrails functions to include default use_tools and max_tool_rounds values in Lightweight routine actions for consistency in tests.
  • src/tools/builtin/routine.rs
    • Updated the parameters_schema for RoutineCreateTool to expose use_tools and max_tool_rounds as configurable options.
    • Modified the execute method of RoutineCreateTool to parse and apply use_tools and max_tool_rounds when creating a Lightweight routine, including clamping max_tool_rounds.
  • tests/e2e_routine_heartbeat.rs
    • Updated the make_test_routine function to include default use_tools and max_tool_rounds values in Lightweight routine actions for end-to-end tests.
Activity
  • No human activity (comments, reviews) was provided in the context for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-implemented feature: enabling tool access for lightweight routines. This is achieved by adding a mini-agentic loop that allows the LLM to call tools, process results, and repeat for a limited number of rounds. However, the security audit identified two medium-to-high severity vulnerabilities related to broken access control. Specifically, the use of a shared workspace instance in the routine engine allows for potential cross-user data leakage, violating the rule that tools interacting with user-owned resources must verify user IDs. The lack of validation for notification targets in routine creation also allows for unauthorized messaging. These issues should be addressed by ensuring proper user isolation in the workspace and validating notification permissions. Additionally, a minor suggestion to improve maintainability by replacing a magic number with a shared constant has been noted.

Comment thread src/agent/agent_loop.rs Outdated
rt_config.clone(),
Arc::clone(store),
self.llm().clone(),
Arc::clone(workspace),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The RoutineEngine is initialized with a single, shared Workspace instance. This workspace has a fixed user_id (likely the one who started the agent or a default one). When routines are executed for different users, they all access the same workspace instance. This allows a user to read or influence files belonging to the workspace's owner by creating routines with specific context_paths or using memory tools, bypassing user isolation. This is a significant Broken Access Control (IDOR) vulnerability in a multi-user environment.

References
  1. Tools that interact with user-owned resources, such as sandbox jobs, must verify that the authenticated user ID in the tool context matches the resource owner's ID (e.g., by using a ContextManager) before performing any read or write operations to prevent unauthorized cross-user access.

Comment thread src/tools/builtin/routine.rs Outdated
let max_tool_rounds = params
.get("max_tool_rounds")
.and_then(|v| v.as_u64())
.map(|v| v.clamp(1, 20) as u32)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To avoid using a magic number and to ensure consistency with other parts of the codebase, it's better to use the MAX_TOOL_ROUNDS_LIMIT constant defined in src/agent/routine.rs. This will keep the upper limit for tool rounds synchronized.

Suggested change
.map(|v| v.clamp(1, 20) as u32)
.map(|v| v.clamp(1, crate::agent::routine::MAX_TOOL_ROUNDS_LIMIT as u64) as u32)

Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design is sound -- optional tool access via a mini agentic loop with defense-in-depth (double-checking approval at execution time). Good test coverage (10 tests). Backwards compatible (use_tools: false default).

Issues to address:

1. Bug: fire_manual() and spawn_fire() don't propagate tools/safety (must fix). The PR adds tools and safety to EngineContext but only updates the cron/event trigger paths. Manual triggers via fire_manual() and spawn_fire() construct EngineContext without the new fields, so use_tools: true silently falls back to text-only for manual triggers.

2. Security: RestartTool passes the Never approval filter (must fix). The filter tool.requires_approval(&json!({})) == ApprovalRequirement::Never lets through RestartTool, which uses the default Never approval. A lightweight routine with use_tools: true could restart the agent autonomously. Also consider memory_write -- routines could write arbitrary content to workspace memory, poisoning future conversations.

Recommended: either add an explicit tool denylist for routine tool access, or change RestartTool to require approval (UnlessAutoApproved).

3. Magic number duplication (minor). The clamping in routine.rs tool creation hardcodes 20 instead of using the MAX_TOOL_ROUNDS_LIMIT constant.

Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress -- all three previous review items addressed (fire_manual/spawn_fire propagation, denylist, magic number). Defense-in-depth approach is solid (approval filtering + denylist at both definition and execution time). 10+ tests.

One remaining issue:

routine_create/routine_update/routine_delete not on denylist -- A lightweight routine with use_tools: true could self-replicate (create copies), modify its own prompt or triggers, or delete other routines. These are autonomy-escalating actions. Add routine_create, routine_update, routine_delete to ROUTINE_TOOL_DENYLIST.

Non-blocking:

  • Duplicated sentinel logic in execute_lightweight should call interpret_text_response instead
  • _is_error variable computed but never used

@henrypark133 henrypark133 changed the base branch from main to staging March 10, 2026 02:19
@reidliu41 reidliu41 force-pushed the feat/257-routine-tool-access branch from e3b6e4a to 9aaeb67 Compare March 11, 2026 03:00
@github-actions github-actions Bot added size: M 50-199 changed lines contributor: experienced 6-19 merged PRs and removed size: XL 500+ changed lines contributor: regular 2-5 merged PRs labels Mar 11, 2026
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: Third pass

Good progress on most items. Review 1's fire_manual/spawn_fire propagation issue is fixed, the magic number is fixed, the non-blocking _is_error variable is cleaned up. The overall architecture is solid -- per-routine use_tools flag, clamped max_tool_rounds, double-gated with global lightweight_tools_enabled, tool output sanitization through SafetyLayer. The handle_text_response refactor is a nice improvement.

Blocking: routine self-management tools still pass the Never filter

The approval-based filter in execute_routine_tool (line ~1012) blocks UnlessAutoApproved and Always tools but allows all ApprovalRequirement::Never tools through. These Never-approval tools are accessible to lightweight routines and present autonomy-escalation risks:

  • routine_create -- a routine can self-replicate by creating copies of itself
  • routine_update -- a routine can modify its own prompt, triggers, or those of other routines
  • routine_delete -- a routine can delete other routines
  • restart (RestartTool) -- a routine can restart the agent autonomously

This was the primary feedback from review 2 (March 9). The fix is straightforward: add a ROUTINE_TOOL_DENYLIST constant and check it in execute_routine_tool alongside the approval filter. Something like:

const ROUTINE_TOOL_DENYLIST: &[&str] = &[
    "routine_create", "routine_update", "routine_delete",
    "routine_fire", "restart",
];

Then in execute_routine_tool, before the approval check:

if ROUTINE_TOOL_DENYLIST.contains(&tc.name.as_str()) {
    return Err(format!("Tool '{}' is not available in lightweight routines", tc.name).into());
}

Non-blocking

  1. Sentinel logic still duplicated between execute_lightweight_no_tools (line ~815) and handle_text_response. The _no_tools variant could call handle_text_response instead of reimplementing the same trim/empty/sentinel checks. Low priority but reduces maintenance surface.

  2. Tool definitions sent unfiltered to LLM -- ctx.tools.tool_definitions().await (line ~928) sends ALL registered tools to the LLM, including denylisted and approval-gated ones. The LLM will attempt to call them and get errors. Consider filtering the tool definitions to only include tools that would pass both the denylist and approval checks. This improves token efficiency and reduces wasted LLM turns.

  - Add ROUTINE_TOOL_DENYLIST to block routine_create/update/delete/fire
    and restart from being callable by lightweight routines
  - Deduplicate sentinel logic by reusing handle_text_response() in the
    no-tools path
  - Filter tool definitions sent to LLM to only include callable tools,
    avoiding wasted tokens on tools that would be rejected
@reidliu41 reidliu41 force-pushed the feat/257-routine-tool-access branch from 9aaeb67 to 17830d0 Compare March 11, 2026 05:07
@github-actions github-actions Bot added size: M 50-199 changed lines scope: tool Tool infrastructure size: L 200-499 changed lines and removed size: M 50-199 changed lines labels Mar 11, 2026
@reidliu41 reidliu41 requested a review from zmanian March 11, 2026 23:46
Copy link
Copy Markdown
Collaborator

@zmanian zmanian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, Reid. The design is solid -- defense-in-depth with both LLM-side filtering (tool_definitions_excluding) and runtime enforcement (execute_routine_tool denylist + approval check). A few observations:

Security -- looks good:

  • ROUTINE_TOOL_DENYLIST correctly blocks self-management tools (routine_create/update/delete/fire, restart) preventing autonomy escalation
  • ApprovalRequirement::Never filter in tool_definitions_excluding ensures shell and other gated tools never reach the LLM
  • Runtime denylist check in execute_routine_tool provides proper defense-in-depth
  • Safety layer validation and output sanitization are preserved

Backward compatibility -- clean:

  • serde(default) on new fields, from_db parsing with defaults, existing routines unaffected

Minor note (not blocking):

  • max_tool_rounds accepts up to 20 (MAX_TOOL_ROUNDS_LIMIT) but execute_lightweight_with_tools hard-caps iterations at .min(5) (pre-existing). This means values 6-20 are silently equivalent to 5. Consider either documenting this or raising the hard cap. Not introduced by this PR so not blocking.

Test coverage: 10 new tests covering clamping bounds, backward compat, denylist validation. Good.

@ironclaw-ci ironclaw-ci Bot mentioned this pull request Mar 12, 2026
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
…earai#257) (nearai#730)

* Rebase onto staging

* fix(routines): prevent autonomy-escalation in lightweight routines

  - Add ROUTINE_TOOL_DENYLIST to block routine_create/update/delete/fire
    and restart from being callable by lightweight routines
  - Deduplicate sentinel logic by reusing handle_text_response() in the
    no-tools path
  - Filter tool definitions sent to LLM to only include callable tools,
    avoiding wasted tokens on tools that would be rejected
drchirag1991 pushed a commit to drchirag1991/ironclaw that referenced this pull request Apr 8, 2026
…earai#257) (nearai#730)

* Rebase onto staging

* fix(routines): prevent autonomy-escalation in lightweight routines

  - Add ROUTINE_TOOL_DENYLIST to block routine_create/update/delete/fire
    and restart from being callable by lightweight routines
  - Deduplicate sentinel logic by reusing handle_text_response() in the
    no-tools path
  - Filter tool definitions sent to LLM to only include callable tools,
    avoiding wasted tokens on tools that would be rejected
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: experienced 6-19 merged PRs risk: medium Business logic, config, or moderate-risk modules scope: agent Agent core (agent loop, router, scheduler) scope: tool/builtin Built-in tools scope: tool Tool infrastructure size: L 200-499 changed lines size: M 50-199 changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: enable tool access in lightweight routine execution

2 participants