Native Tool Call #6744

NaccOll · 2025-08-06T09:21:04Z

Issue

What I Do

Without major refactoring of the existing workflow, the tool call based on XML-formatted text prompts was converted to tool call based on json.

Implementation Approach

Added a toggle in the Provider settings page, allowing users to enable/disable tool call
Designed a compatible tool call interface and migrated tools from text+XML prompts to tool call format objects
When tool call is enabled, converted tools will remove their original descriptions and examples during prompt generation
Designed unified post-call processing to transform the returned standard JSON into XML, seamlessly integrating with the existing function

Tools

Provider

Benefits

When tool information is returned via tool_call, format accuracy improves significantly—even 4B models can generate precisely formatted apply_diff outputs.
Prompt length is reduced.

Problem

Legacy Issues

Accuracy depends on tool call, but models may fail to grasp the intent and still return text instead of tool calls.
~~Not all tools have been migrated yet, which may lead the model to mimic behavior. As XML content accumulates in the chat, incorrect XML outputs may occur.~~ ( Abandoned the need to refine parameters and strengthen control (for example, splitting the content of apply_diff into search_str and replace_str), and completely aligned the tool call parameters with the original XML description to avoid parameter errors when LLM refers to tool parameters and large amounts of user XML input for XML output. )(This has been resolved through standard tool calls)
Format accuracy does not guarantee content accuracy. While the model can invoke tools correctly. e.g., apply_diff requires content-matching replacements, the generated original content may still be inaccurate.

New Problem

Streaming processing is disrupted. Although results are still returned via streaming, the actual tool invocation requires converting JSON to XML first. For large file modifications, this causes delays where users see no feedback during processing.(Support streaming conversion of json to xml)
~~Streaming state changes after JSON completion. Due to the current duplicate call detection, the model's retry limit must be adjusted to two or more attempts; otherwise, errors will occur.~~(See Errors and Repeated Use Limitations #6834)
For a tool, you need to maintain both the XML prompt word and the tool call description object. I originally wanted to map the tool call object to an XML description, but apply_diff and read_file are more complicated. I still need to think about how to complete it. (I am returning the tool's schema and xml prompt in the same function)
When you pass the tools parameter, OpenRouter promises to route your request to providers that support tool calls. However, some providers' tool call implementations aren't flawless. For instance, DeepSeek-V3-0324 on Chunts handles tool calls normally, but Qwen3-Coder's tool functionality may fail. That said, if you select Alibaba as Qwen3-Coder's provider, the tool calls will work fine—though at a cost several times higher. This inevitably complicates tool call support for models on OpenRouter, leaving users to figure out the quirks themselves.
I lack the test conditions for browser_action.
When context compression is in progress, if the last assistant message is tool use, it will fail.

Screenshots / Videos

Important

Introduces JSON-based tool calls with a toggle in settings, updating API handlers, tool schemas, and UI components to support this new functionality.

Behavior:
- Added toolCallEnabled toggle in provider-settings.ts to enable/disable tool calls.
- Converted tool calls from XML to JSON format in base-provider.ts, lm-studio.ts, openai.ts, and openrouter.ts.
- Unified post-call processing to convert JSON back to XML.
Tools:
- Added JSON-based tool call support for tools like apply_diff, insert_content, list_code_definition_names, list_files, read_file, search_and_replace, search_files, write_to_file, and others.
- Implemented ToolRegistry in tool-registry.ts to manage tool schemas.
UI:
- Added ToolCallSettingsControl.tsx to manage tool call settings in the UI.
- Updated i18n files for new tool call settings descriptions.
Misc:
- Added StreamingToolCallProcessor in tool-call-helper.ts for streaming JSON to XML conversion.
- Updated Task.ts to handle tool call streaming and processing.

^{This description was created by}^{for b1984b8. You can customize this summary. It will automatically update as commits are pushed.}

src/core/config/ProviderSettingsManager.ts

src/core/tools/schemas/read-file-schema.ts

webview-ui/src/i18n/locales/en/settings.json

roomote

Thank you for your contribution! I've reviewed the changes and found some issues that need attention. This PR implements a valuable feature for native tool calling, but there are some critical issues that should be addressed before merging.

src/core/tools/schemas/tool-call-response.ts

src/core/task/tool-call-helper.ts

src/core/prompts/tools/index.ts

src/core/tools/schemas/tool-call-response.ts

src/core/task/tool-call-helper.ts

webview-ui/src/components/settings/ToolCallSettingsControl.tsx

NaccOll · 2025-08-10T08:19:35Z

@roomote-agent
Review this PR again

roomote · 2025-08-10T08:21:15Z

I'm starting my review of this PR now. I'll examine the Native Tool Call implementation and provide feedback shortly.

roomote · 2025-08-10T08:24:41Z

Review Complete ✅

I've completed my review of the Native Tool Call implementation. Here's my assessment:

✅ Strengths

Well-structured implementation: The conversion from XML to JSON tool calls is cleanly implemented with a proper streaming processor
Good test coverage: The StreamingToolCallProcessor has comprehensive tests covering various scenarios including edge cases
Backward compatibility: The toggle feature allows users to enable/disable tool calls, maintaining compatibility with existing workflows
Provider support: Successfully integrated with OpenRouter, OpenAI Compatible, and DeepSeek providers

📋 Key Components Reviewed

✅ StreamingToolCallProcessor - Handles JSON to XML conversion with proper state management
✅ ToolRegistry - Manages tool schemas and provides conversion functions
✅ Tool schemas for supported tools (apply_diff, read_file, write_to_file, etc.)
✅ UI integration with ToolCallSettingsControl component
✅ Localization support across all languages

🔍 Observations

Incomplete tool support: ask_followup_question and browser_action are not yet registered in the ToolRegistry (lines 35, 37 commented out)
Special handling for apply_diff: The processor has custom logic for handling apply_diff content transformation (lines 409-430 in tool-call-helper.ts)
Migration strategy: Proper migration added to ensure existing installations get the new setting defaulted to false

✅ Testing Results

All tool-call-helper tests pass successfully (13/13)
The streaming processor correctly handles:
- Simple function calls
- Incremental streaming
- Nested objects and arrays
- Multiple tool calls
- Edge cases and error scenarios

💡 Recommendations

Consider completing support for ask_followup_question and browser_action tools
Document the provider-specific quirks mentioned in the PR description for users
Consider adding integration tests for the full tool call flow with actual API responses

✅ Verdict

The implementation is solid and ready for use. The streaming JSON to XML converter is well-designed, and the feature toggle provides a safe rollout path. The code quality is good with proper error handling and test coverage.

Great work on this feature! 🎉

hannesrudolph · 2025-08-21T19:06:22Z

Update on this (From @cte): there's a bug in the tool calling that causes a bunch of the evals to fail, so not quite ready to evaluate. Here's what the errors look like:

An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_F8hwFSLeOwtYsiX31wdZr9lP

NaccOll · 2025-08-22T09:57:17Z

Update on this (From @cte): there's a bug in the tool calling that causes a bunch of the evals to fail, so not quite ready to evaluate. Here's what the errors look like:
An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_F8hwFSLeOwtYsiX31wdZr9lP

This involves returning multiple tools data in a single stream, and I did not handle this situation. This issue has now been fixed.

Due to network issues, I have not yet run the evaluation locally.

I plan to run the evaluation locally, covering models such as GPT-5, GPT-4.1, Gemini-2.5-Pro, Gemini-2.5-Flash, and Deepseek-Chat. Once completed, I will refine the corresponding tests and request a review.

Regarding Claude, I have not yet found a solution to address the mixed text returned by OpenRouter, so this issue may be put on hold for now.

Mushoz · 2025-09-04T11:22:50Z

Does this PR also allow models such as gpt-oss to do tool calls within their reasoning? It's supported by the model, but the client needs to make sure the reasoning content is passed back to the backend when returning the result of the tool call. Right now the model really wastes a lot of reasoning because it basically does this:

Long reasoning of how it can implement a certain request, weighing pros and cons of each implementation. But to decide on the final implementation I will need to read file X.
Reads file X
Restarts the entire reasoning process with no knowledge of what it already reasoned about.

If the model could incorporate the toolcall within its reasoning, it could quickly check something, and then answer immediately, making the model much more efficient and potentially more intelligent as well (since it will be more coherent)

NaccOll · 2025-09-04T13:16:05Z

Does this PR also allow models such as gpt-oss to do tool calls within their reasoning? It's supported by the model, but the client needs to make sure the reasoning content is passed back to the backend when returning the result of the tool call. Right now the model really wastes a lot of reasoning because it basically does this:

Long reasoning of how it can implement a certain request, weighing pros and cons of each implementation. But to decide on the final implementation I will need to read file X.

Reads file X

Restarts the entire reasoning process with no knowledge of what it already reasoned about.

If the model could incorporate the toolcall within its reasoning, it could quickly check something, and then answer immediately, making the model much more efficient and potentially more intelligent as well (since it will be more coherent)

Probably not. Although I haven't tested gpt-oss, based on my experience testing gemini-2.5-pro, gpt-5, and claude-4-sonnet, native tool call can only be used to solve XML format errors in the process of calling existing XML tools, and cannot solve any other problems.

The related issue initially attempted to support native tool calls for the top-level model, but I believe this is not very meaningful. The top-level model can fully understand the XML usage examples in the system propmt and return the specified XML text to call the tool.

This is why I created this PR, which aims to improve the accuracy of tool calls for the sub-top-level model. This PR should also help with tool calls for gpt-oss, but it does not address the issue of invalid and lengthy reasoning chains.

- Implemented applyDiffToolLegacy to handle tool calls with diff content. - Enhanced attemptCompletionTool to push tool result messages when tool calls are enabled. - Updated multiApplyDiffTool to process diffs with search and replace functionality. - Introduced ToolCallSettingsControl component for managing tool call settings in the UI. - Added localization for tool call settings in multiple languages. - Updated settings view to conditionally render tool call options based on provider support. - Refactored utility functions to determine tool call support based on API provider. - Enhanced error handling and user feedback for tool call operations.

…me validation

…FilesTool and searchFilesTool

…olUseId

…emas with usage examples and guidelines

NaccOll · 2025-09-13T02:11:55Z

Due to network issues, I'm always having trouble getting stable results when running evals, especially for JavaScript and Java. Their dependency downloads experience significant network fluctuations, which frequently causes their tests to fail.

Until I can get an accurate evals report and further troubleshoot the issue through logs, I've decided to close this PR.

NaccOll requested review from mrubens, cte and jr as code owners August 6, 2025 09:21

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Aug 6, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Aug 6, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Aug 6, 2025

NaccOll marked this pull request as draft August 6, 2025 09:21

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Aug 6, 2025

ellipsis-dev bot reviewed Aug 6, 2025

View reviewed changes

src/core/config/ProviderSettingsManager.ts Outdated Show resolved Hide resolved

src/core/tools/schemas/read-file-schema.ts Outdated Show resolved Hide resolved

webview-ui/src/i18n/locales/en/settings.json Outdated Show resolved Hide resolved

roomote bot reviewed Aug 6, 2025

View reviewed changes

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Aug 6, 2025

NaccOll force-pushed the feature-tool-use branch from be6c66f to 2c0c3b3 Compare August 6, 2025 15:20

daniel-lxs moved this from Triage to PR [Draft / In Progress] in Roo Code Roadmap Aug 7, 2025

hannesrudolph added PR - Draft / In Progress and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Aug 7, 2025

NaccOll force-pushed the feature-tool-use branch 4 times, most recently from e711649 to a5df80c Compare August 10, 2025 06:51

NaccOll force-pushed the feature-tool-use branch 6 times, most recently from 3dc988a to f7eab93 Compare August 15, 2025 03:13

hannesrudolph added PR - Needs Preliminary Review and removed PR - Draft / In Progress labels Aug 21, 2025

hannesrudolph moved this from PR [Needs Prelim Review] to PR [Draft / In Progress] in Roo Code Roadmap Aug 21, 2025

daniel-lxs marked this pull request as draft August 21, 2025 19:05

hannesrudolph added PR - Draft / In Progress and removed PR - Needs Preliminary Review labels Aug 21, 2025

NaccOll force-pushed the feature-tool-use branch 2 times, most recently from 7263f50 to fcfed58 Compare August 28, 2025 07:05

daniel-lxs mentioned this pull request Sep 1, 2025

Feature: Add Structured Output and Code Execution Checkboxes for Gemini API Provider #7533

Closed

This was referenced Sep 2, 2025

RFC: Native Tool Use for Top-Tier AI Models #4047

Open

currently roo code does not have good tool-calling support of openai's new open mode gpt-oss #6814

Open

NaccOll force-pushed the feature-tool-use branch from 6724388 to 4009a4a Compare September 4, 2025 02:53

NaccOll force-pushed the feature-tool-use branch 3 times, most recently from f12f70b to 0cd2025 Compare September 9, 2025 10:01

NaccOll force-pushed the feature-tool-use branch from 30e0eab to aeac707 Compare September 11, 2025 06:38

NaccOll added 5 commits September 11, 2025 14:39

feat: Enhance update_todo_list schema description and improve tool na…

aeac707

…me validation

feat: Update directory path description and handle empty path in list…

83c0633

…FilesTool and searchFilesTool

feat: remove args param to avoid claude tool call format error

4476607

feat: update XML parsing logic to handle multi-file format without to…

bf255fb

…olUseId

feat: enhance documentation for ask_followup and update_todo_list sch…

62c66b1

…emas with usage examples and guidelines

NaccOll closed this Sep 13, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Sep 13, 2025

github-project-automation bot moved this from PR [Draft / In Progress] to Done in Roo Code Roadmap Sep 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Native Tool Call #6744

Native Tool Call #6744

Uh oh!

NaccOll commented Aug 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

roomote bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NaccOll commented Aug 10, 2025

Uh oh!

roomote bot commented Aug 10, 2025

Uh oh!

roomote bot commented Aug 10, 2025

Uh oh!

hannesrudolph commented Aug 21, 2025

Uh oh!

NaccOll commented Aug 22, 2025

Uh oh!

Mushoz commented Sep 4, 2025 •

edited

Loading

Uh oh!

NaccOll commented Sep 4, 2025

Uh oh!

NaccOll commented Sep 13, 2025

Uh oh!

Uh oh!

Native Tool Call #6744

Native Tool Call #6744

Uh oh!

Conversation

NaccOll commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

What I Do

Implementation Approach

Tools

Provider

Benefits

Problem

Legacy Issues

New Problem

Screenshots / Videos

Uh oh!

Uh oh!

Uh oh!

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NaccOll commented Aug 10, 2025

Uh oh!

roomote bot commented Aug 10, 2025

Uh oh!

roomote bot commented Aug 10, 2025

Review Complete ✅

✅ Strengths

📋 Key Components Reviewed

🔍 Observations

✅ Testing Results

💡 Recommendations

✅ Verdict

Uh oh!

hannesrudolph commented Aug 21, 2025

Uh oh!

NaccOll commented Aug 22, 2025

Uh oh!

Mushoz commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NaccOll commented Sep 4, 2025

Uh oh!

NaccOll commented Sep 13, 2025

Uh oh!

Uh oh!

NaccOll commented Aug 6, 2025 •

edited

Loading

Mushoz commented Sep 4, 2025 •

edited

Loading