devstral tool parser for tool calling #3851

dtrawins · 2025-12-09T23:53:44Z

🛠 Summary

JIRA/Issue if applicable.
Describe the changes.

🧪 Checklist

Unit tests added.
The documentation updated.
Change follows security best practices.
``

dkalinowski · 2025-12-10T13:04:34Z

src/llm/io_processing/devstral/tool_parser.hpp

+    DevstralToolParser() = delete;
+    DevstralToolParser(ov::genai::Tokenizer& tokenizer, const ToolsSchemas_t& toolSchemas) :
+        BaseOutputParser(tokenizer),
+        argsTokenId(tokenizer.encode("[ARGS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),


how are we ensured that [ARGS] / [TOOL_CALLS] are single tokens, treated as special, not as string, for example [AR and GS]?

Those are a special tokens.

that doesnt answer my question

devstral parser is setting requiresStreamingWithSpecialTokens() as true

dkalinowski · 2025-12-10T13:04:48Z

src/llm/io_processing/devstral/tool_parser.hpp

+    DevstralToolParser(ov::genai::Tokenizer& tokenizer, const ToolsSchemas_t& toolSchemas) :
+        BaseOutputParser(tokenizer),
+        argsTokenId(tokenizer.encode("[ARGS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),
+        botTokenId(tokenizer.encode("[TOOL_CALLS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),


validate if input_ids token count is == 1?

agreed, we could also do that for argsTokenId and fail if in any of those cases we have more than one token from encoding

dkalinowski · 2025-12-10T13:11:50Z

src/llm/io_processing/devstral/tool_parser.cpp

+    ToolCall toolCall;
+    std::string tool_name = tokenizer.decode(tool_name_tokens, ov::AnyMap{ov::genai::skip_special_tokens(true)});
+    if (this->toolSchemas.find(tool_name) == this->toolSchemas.end()) {
+        SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Tool name '{}' not valid.", tool_name);


this is behavior we havent implemented in other parsers, is it really worth return early? if we return function name that is not part of the toolschemas spec, we might be able to debug it in bfcl

This is not in line with current behavior in other parsers. I wouldn't do that check if it's only for this parser. Either drop it or create a task for alignment of other parsers.

dkalinowski · 2025-12-10T13:15:19Z

src/llm/io_processing/devstral/tool_parser.cpp

+            if (pos == 0) {
+                this->streamContent.clear();
+            } else {
+                this->streamContent = this->streamContent.substr(pos + 13);  // "[TOOLS_CALLS]" length is 13


we should avoid magic numbers, this way if we change this->streamingParsingToolCallsStartTag to another value, this part will be incorrect

you can look up how Adrian handles that in qwen coder

this->lastProcessedPosition = pos + Qwen3CoderToolParser::PARAMETER_END_TAG.length();

dkalinowski · 2025-12-10T13:16:25Z

src/llm/io_processing/devstral/tool_parser.cpp

+            ToolCall toolCall;
+            toolCall.arguments = arguments;
+            toolCall.name = this->toolName;
+            return sendFullDelta(toolCall);


shouldntg we stream partial function argument chunks? if i understand correctly you send full delta at the end of generation

We already accepted such approach for qwen3 coder, so I suppose we can have it in other parsers as well unless there are specific requirements for "real" streaming.

dkalinowski · 2025-12-10T13:18:10Z

src/llm/io_processing/output_parser.hpp

    bool requiresStreamingWithSpecialTokens() const {
-        return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) &&
+        return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) ||
               (toolParser && toolParser->requiresStreamingWithSpecialTokens());


@mzegla when i implemented it i remember your comment why it should really be && instead of ||, do you remember what was the reason?

This has been implemented this way to make sure we don't allow two parsers with different special tokens approach as they will receive the same model output, so they must both either require it or not.
I guess for this case, where we don't have reasoning parser but want to require special tokens for tool parser we should modify this function like:

if (!reasoningParser) { return toolParser && toolParser->requiresStreamingWithSpecialTokens(); } else if (!toolParser) { return reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens(); } else { return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) && (toolParser && toolParser->requiresStreamingWithSpecialTokens()); }

dkalinowski · 2025-12-10T13:19:19Z

src/test/llm/output_parsers/devstral_output_parser_test.cpp

+}
+
+TEST_F(DevstralOutputParserTest, ParseToolCallOutputWithSingleToolCall_MissingEndTag) {
+    std::string testInput = "Reasoninig before tool call [TOOL_CALLS]example_tool[ARGS]{\"arg1\":\"value1\",\"arg2\":42}";


can you add test for scenarios with whitespace between the tags? i saw other models often put spaces or new lines before/after the function name

dkalinowski · 2025-12-10T13:23:01Z

src/test/llm/output_parsers/devstral_output_parser_test.cpp

+        {"{\"", ov::genai::GenerationFinishReason::NONE, std::nullopt},
+        {"city\":", ov::genai::GenerationFinishReason::NONE, std::nullopt},
+        {" \"Paris", ov::genai::GenerationFinishReason::NONE, std::nullopt},
+        // Last chunk is added in the for loop below


missing test for enclosing scenario

mzegla · 2025-12-11T10:00:50Z

src/llm/io_processing/devstral/tool_parser.hpp

+    DevstralToolParser(ov::genai::Tokenizer& tokenizer, const ToolsSchemas_t& toolSchemas) :
+        BaseOutputParser(tokenizer),
+        argsTokenId(tokenizer.encode("[ARGS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),
+        botTokenId(tokenizer.encode("[TOOL_CALLS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]),


agreed, we could also do that for argsTokenId and fail if in any of those cases we have more than one token from encoding

mzegla · 2025-12-11T10:07:10Z

src/llm/io_processing/devstral/tool_parser.hpp

+    const int64_t argsTokenId;  // [ARGS]
+    const int64_t botTokenId;   // [TOOL_CALLS]
+
+    // in streaming mode we can rely on tags in string format as tokens are not available
+    const std::string streamingParsingArgsStartTag = "[ARGS]";
+    const std::string streamingParsingToolCallsStartTag = "[TOOL_CALLS]";


Those tags/tokens are not specific to streaming, so I think we can drop streamingParsing prefix
Those are variables are about the same thing - please unify naming:
either botToken or toolCallsStart:
toolCallsStartTokenId, toolCallsStartTag or
botTokenId, botTag
and either argsTokenId or ArgsStartTag:
argsStartTokenId, argsStartTag or
argsTokenId, argsTag

mzegla · 2025-12-11T10:14:44Z

src/llm/io_processing/output_parser.hpp

    bool requiresStreamingWithSpecialTokens() const {
-        return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) &&
+        return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) ||
               (toolParser && toolParser->requiresStreamingWithSpecialTokens());


This has been implemented this way to make sure we don't allow two parsers with different special tokens approach as they will receive the same model output, so they must both either require it or not.
I guess for this case, where we don't have reasoning parser but want to require special tokens for tool parser we should modify this function like:

if (!reasoningParser) { return toolParser && toolParser->requiresStreamingWithSpecialTokens(); } else if (!toolParser) { return reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens(); } else { return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) && (toolParser && toolParser->requiresStreamingWithSpecialTokens()); }

mzegla · 2025-12-11T10:21:59Z

src/test/llm/output_parsers/devstral_output_parser_test.cpp

+    EXPECT_EQ(parsedOutput.reasoning, "");
+    ASSERT_EQ(parsedOutput.toolCalls.size(), 0);
+}
+


could you also add test for coding-like output? see:
https://github.com/openvinotoolkit/model_server/blob/main/src/test/llm/output_parsers/qwen3_output_parser_test.cpp#L311

mzegla · 2025-12-11T10:22:50Z

src/llm/io_processing/devstral/tool_parser.cpp

+
+void DevstralToolParser::parse(ParsedOutput& parsedOutput, const std::vector<int64_t>& generatedTokens) {
+    std::vector<std::string> tools;
+    // Parser will consume entire model output only if the first generated token is the beginning of tools token.


Does not look like this comment is true for this parser

mzegla · 2025-12-11T10:34:06Z

src/llm/io_processing/devstral/tool_parser.cpp

+        }
+    }
+    if (this->internalState == AWAITING_ARGS_TAG) {
+        // check if [ARGS] tag is present in the chunk and update state accordingly


Suggested change

// check if [ARGS] tag is present in the chunk and update state accordingly

// check if [ARGS] tag is present in the streamContent and update state accordingly

technically we check streamContent but it will be the case only if [ARGS] is added in the chunk. Otherwise it would be different state

mzegla · 2025-12-11T10:36:04Z

src/llm/io_processing/devstral/tool_parser.cpp

+        if (pos != std::string::npos) {
+            this->internalState = PROCESSING_ARGS;
+            this->toolName = this->streamContent.substr(0, pos);
+            if (this->toolSchemas.find(this->toolName) == this->toolSchemas.end()) {


As for the unary part - this check is unique to this parser and I don't think it's a good idea to have different behavior for different parsers. Either remove or create a task for alignment of other parsers.

mzegla · 2025-12-11T10:36:39Z

src/llm/io_processing/devstral/tool_parser.cpp

+                SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Tool name '{}' not valid.", this->toolName);
+                return std::nullopt;
+            }
+            this->streamContent = this->streamContent.substr(pos + 6);  // "[ARGS]" length is 6


Magic number

mzegla · 2025-12-11T10:39:43Z

src/llm/io_processing/devstral/tool_parser.cpp

+        }
+    }
+    if (finishReason != ov::genai::GenerationFinishReason::NONE) {
+        size_t end_pos = this->streamContent.find("</s>");


What is this token? If it has some significant value for the parsing it should be a member of the parser class like args and tool calls token. Also:

Suggested change

size_t end_pos = this->streamContent.find("</s>");

size_t endPos = this->streamContent.find("</s>");

mzegla · 2025-12-11T10:42:55Z

src/llm/io_processing/devstral/tool_parser.cpp

+            ToolCall toolCall;
+            toolCall.arguments = arguments;
+            toolCall.name = this->toolName;
+            return sendFullDelta(toolCall);


We already accepted such approach for qwen3 coder, so I suppose we can have it in other parsers as well unless there are specific requirements for "real" streaming.

dtrawins added 2 commits December 10, 2025 00:53

devstral tool parser for tool calling

e3fb518

style

bf74839

dtrawins requested review from atobiszei, dkalinowski and mzegla December 10, 2025 10:05

dtrawins added 3 commits December 10, 2025 11:38

style

28cd83b

Merge remote-tracking branch 'origin/main' into devstral-parser

33a1062

get test tokenizer

104c980

dkalinowski reviewed Dec 10, 2025

View reviewed changes

mzegla reviewed Dec 11, 2025

View reviewed changes

	// check if [ARGS] tag is present in the chunk and update state accordingly
	// check if [ARGS] tag is present in the streamContent and update state accordingly

	size_t end_pos = this->streamContent.find("</s>");
	size_t endPos = this->streamContent.find("</s>");

devstral tool parser for tool calling #3851

Are you sure you want to change the base?

devstral tool parser for tool calling #3851

Conversation

dtrawins commented Dec 9, 2025

🛠 Summary

🧪 Checklist

Uh oh!

dkalinowski Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dkalinowski Dec 10, 2025 •

edited

Loading