-
Notifications
You must be signed in to change notification settings - Fork 231
devstral tool parser for tool calling #3851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| DevstralToolParser() = delete; | ||
| DevstralToolParser(ov::genai::Tokenizer& tokenizer, const ToolsSchemas_t& toolSchemas) : | ||
| BaseOutputParser(tokenizer), | ||
| argsTokenId(tokenizer.encode("[ARGS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how are we ensured that [ARGS] / [TOOL_CALLS] are single tokens, treated as special, not as string, for example [AR and GS]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are a special tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that doesnt answer my question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
devstral parser is setting requiresStreamingWithSpecialTokens() as true
| DevstralToolParser(ov::genai::Tokenizer& tokenizer, const ToolsSchemas_t& toolSchemas) : | ||
| BaseOutputParser(tokenizer), | ||
| argsTokenId(tokenizer.encode("[ARGS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]), | ||
| botTokenId(tokenizer.encode("[TOOL_CALLS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate if input_ids token count is == 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, we could also do that for argsTokenId and fail if in any of those cases we have more than one token from encoding
| ToolCall toolCall; | ||
| std::string tool_name = tokenizer.decode(tool_name_tokens, ov::AnyMap{ov::genai::skip_special_tokens(true)}); | ||
| if (this->toolSchemas.find(tool_name) == this->toolSchemas.end()) { | ||
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Tool name '{}' not valid.", tool_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is behavior we havent implemented in other parsers, is it really worth return early? if we return function name that is not part of the toolschemas spec, we might be able to debug it in bfcl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not in line with current behavior in other parsers. I wouldn't do that check if it's only for this parser. Either drop it or create a task for alignment of other parsers.
| if (pos == 0) { | ||
| this->streamContent.clear(); | ||
| } else { | ||
| this->streamContent = this->streamContent.substr(pos + 13); // "[TOOLS_CALLS]" length is 13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should avoid magic numbers, this way if we change this->streamingParsingToolCallsStartTag to another value, this part will be incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can look up how Adrian handles that in qwen coder
this->lastProcessedPosition = pos + Qwen3CoderToolParser::PARAMETER_END_TAG.length();
| ToolCall toolCall; | ||
| toolCall.arguments = arguments; | ||
| toolCall.name = this->toolName; | ||
| return sendFullDelta(toolCall); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldntg we stream partial function argument chunks? if i understand correctly you send full delta at the end of generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already accepted such approach for qwen3 coder, so I suppose we can have it in other parsers as well unless there are specific requirements for "real" streaming.
| bool requiresStreamingWithSpecialTokens() const { | ||
| return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) && | ||
| return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) || | ||
| (toolParser && toolParser->requiresStreamingWithSpecialTokens()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mzegla when i implemented it i remember your comment why it should really be && instead of ||, do you remember what was the reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been implemented this way to make sure we don't allow two parsers with different special tokens approach as they will receive the same model output, so they must both either require it or not.
I guess for this case, where we don't have reasoning parser but want to require special tokens for tool parser we should modify this function like:
if (!reasoningParser) {
return toolParser && toolParser->requiresStreamingWithSpecialTokens();
} else if (!toolParser) {
return reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens();
} else {
return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) &&
(toolParser && toolParser->requiresStreamingWithSpecialTokens());
}
| } | ||
|
|
||
| TEST_F(DevstralOutputParserTest, ParseToolCallOutputWithSingleToolCall_MissingEndTag) { | ||
| std::string testInput = "Reasoninig before tool call [TOOL_CALLS]example_tool[ARGS]{\"arg1\":\"value1\",\"arg2\":42}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add test for scenarios with whitespace between the tags? i saw other models often put spaces or new lines before/after the function name
| {"{\"", ov::genai::GenerationFinishReason::NONE, std::nullopt}, | ||
| {"city\":", ov::genai::GenerationFinishReason::NONE, std::nullopt}, | ||
| {" \"Paris", ov::genai::GenerationFinishReason::NONE, std::nullopt}, | ||
| // Last chunk is added in the for loop below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing test for enclosing scenario
| DevstralToolParser(ov::genai::Tokenizer& tokenizer, const ToolsSchemas_t& toolSchemas) : | ||
| BaseOutputParser(tokenizer), | ||
| argsTokenId(tokenizer.encode("[ARGS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]), | ||
| botTokenId(tokenizer.encode("[TOOL_CALLS]", {{"add_special_tokens", false}}).input_ids.data<int64_t>()[0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, we could also do that for argsTokenId and fail if in any of those cases we have more than one token from encoding
| const int64_t argsTokenId; // [ARGS] | ||
| const int64_t botTokenId; // [TOOL_CALLS] | ||
|
|
||
| // in streaming mode we can rely on tags in string format as tokens are not available | ||
| const std::string streamingParsingArgsStartTag = "[ARGS]"; | ||
| const std::string streamingParsingToolCallsStartTag = "[TOOL_CALLS]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those tags/tokens are not specific to streaming, so I think we can drop streamingParsing prefix
Those are variables are about the same thing - please unify naming:
either botToken or toolCallsStart:
toolCallsStartTokenId, toolCallsStartTag or
botTokenId, botTag
and either argsTokenId or ArgsStartTag:
argsStartTokenId, argsStartTag or
argsTokenId, argsTag
| bool requiresStreamingWithSpecialTokens() const { | ||
| return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) && | ||
| return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) || | ||
| (toolParser && toolParser->requiresStreamingWithSpecialTokens()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been implemented this way to make sure we don't allow two parsers with different special tokens approach as they will receive the same model output, so they must both either require it or not.
I guess for this case, where we don't have reasoning parser but want to require special tokens for tool parser we should modify this function like:
if (!reasoningParser) {
return toolParser && toolParser->requiresStreamingWithSpecialTokens();
} else if (!toolParser) {
return reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens();
} else {
return (reasoningParser && reasoningParser->requiresStreamingWithSpecialTokens()) &&
(toolParser && toolParser->requiresStreamingWithSpecialTokens());
}
| EXPECT_EQ(parsedOutput.reasoning, ""); | ||
| ASSERT_EQ(parsedOutput.toolCalls.size(), 0); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also add test for coding-like output? see:
https://github.com/openvinotoolkit/model_server/blob/main/src/test/llm/output_parsers/qwen3_output_parser_test.cpp#L311
|
|
||
| void DevstralToolParser::parse(ParsedOutput& parsedOutput, const std::vector<int64_t>& generatedTokens) { | ||
| std::vector<std::string> tools; | ||
| // Parser will consume entire model output only if the first generated token is the beginning of tools token. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not look like this comment is true for this parser
| } | ||
| } | ||
| if (this->internalState == AWAITING_ARGS_TAG) { | ||
| // check if [ARGS] tag is present in the chunk and update state accordingly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // check if [ARGS] tag is present in the chunk and update state accordingly | |
| // check if [ARGS] tag is present in the streamContent and update state accordingly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically we check streamContent but it will be the case only if [ARGS] is added in the chunk. Otherwise it would be different state
| if (pos != std::string::npos) { | ||
| this->internalState = PROCESSING_ARGS; | ||
| this->toolName = this->streamContent.substr(0, pos); | ||
| if (this->toolSchemas.find(this->toolName) == this->toolSchemas.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the unary part - this check is unique to this parser and I don't think it's a good idea to have different behavior for different parsers. Either remove or create a task for alignment of other parsers.
| SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Tool name '{}' not valid.", this->toolName); | ||
| return std::nullopt; | ||
| } | ||
| this->streamContent = this->streamContent.substr(pos + 6); // "[ARGS]" length is 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic number
| } | ||
| } | ||
| if (finishReason != ov::genai::GenerationFinishReason::NONE) { | ||
| size_t end_pos = this->streamContent.find("</s>"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this token? If it has some significant value for the parsing it should be a member of the parser class like args and tool calls token. Also:
| size_t end_pos = this->streamContent.find("</s>"); | |
| size_t endPos = this->streamContent.find("</s>"); |
| ToolCall toolCall; | ||
| toolCall.arguments = arguments; | ||
| toolCall.name = this->toolName; | ||
| return sendFullDelta(toolCall); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already accepted such approach for qwen3 coder, so I suppose we can have it in other parsers as well unless there are specific requirements for "real" streaming.
🛠 Summary
JIRA/Issue if applicable.
Describe the changes.
🧪 Checklist
``