Conversation
ab0aa5b to
1c329a4
Compare
|
|
||
| #pragma warning(push) | ||
| #pragma warning(disable : 6313) | ||
| #include <rapidjson/writer.h> |
There was a problem hiding this comment.
you can probably use technique suggested by @atobiszei -> hide these headers in port directory and include these headers. inside those headers keep pragmas, this way you dont need them here
| } | ||
|
|
||
| TEST_F(Gemma4OutputParserTest, ParseToolCallWithListOfStringsAsArgument) { | ||
| std::string inputWithProperClosure = "<|tool_call>call:generate_DNA_sequence{length:100,preferences:[<|\"|>G<|\"|>,<|\"|>C<|\"|>]}<tool_call|>"; |
There was a problem hiding this comment.
is it possible that model generates spaces between these keywords? if not, it can happen when model has low accuracy or subtle acc issues, how will this parser react to it?
There was a problem hiding this comment.
I am not sure, I didn't encounter any of accuracy issues
| } | ||
|
|
||
| TEST_F(Gemma4OutputParserTest, ParseToolCallOutputWithThreeToolCalls) { | ||
| std::string inputWithProperClosure = "<|tool_call>call:example_tool{arg1:<|\"|>value1<|\"|>,arg2:42}<tool_call|>" |
There was a problem hiding this comment.
did you test if the model is capable of generating multiple tool calls? (so called "parallel tool calling")
if so, is this behavior observed or from docs?
im just worried if it always is single tool call betweem <tool_call> or maybe it generates all in one? (we have seen such models)
also, do we support all white spaces between tool? space/newline/\r?
There was a problem hiding this comment.
I observed, that model can generate multiple tool calls, it separates it doesn't have special separators next function just starts with call
eg. call:solve{equation:<|"|>2(x + 5) = 13<|"|>}call:sort{array:[<|"|>hello, it's me<|"|>, <|"|>hi<|"|>]}
There was a problem hiding this comment.
I think we are missing such unit test
There was a problem hiding this comment.
Pull request overview
This PR adds Gemma4 tool-call parsing support to OVMS LLM I/O processing and adjusts the VLM legacy unary path to preserve/handle special tokens by reusing the streaming text collection logic (per CVS-184756). It also updates model-prep scripts and introduces a dedicated Gemma4 output parser test suite.
Changes:
- Added a new
Gemma4ToolParserand wired it intoOutputParserselection. - Modified VLM legacy unary response preparation to reconstruct the full text from streamer callbacks (to retain special tokens) and updated OpenAI API handler interfaces accordingly.
- Added Gemma4 tokenizer/model preparation steps and extensive unit tests for Gemma4 tool-call parsing (unary + streaming scenarios).
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| windows_prepare_llm_models.bat | Adds Gemma4 tokenizer download step for Windows test model prep. |
| prepare_llm_models.sh | Adds Gemma4 tokenizer conversion/download for Linux test model prep. |
| src/test/llm/output_parsers/gemma4_output_parser_test.cpp | New unit tests covering Gemma4 tool-call parsing and streaming chunk behavior. |
| src/test/http_openai_handler_test.cpp | Updates tests to match the new VLM unary serialization API signature. |
| src/llm/visual_language_model/legacy/servable.cpp | Implements unary “streaming-style” text accumulation to preserve special tokens; passes full text into unary serialization. |
| src/llm/io_processing/utils.hpp | Adds new utility declarations used by Gemma4 parsing/serialization. |
| src/llm/io_processing/utils.cpp | Implements JSON argument writing and delimiter search respecting nested structures/quotes. |
| src/llm/io_processing/output_parser.cpp | Registers gemma4 tool parser type. |
| src/llm/io_processing/gemma4/tool_parser.hpp | New Gemma4 tool parser interface + streaming state machine declarations. |
| src/llm/io_processing/gemma4/tool_parser.cpp | New Gemma4 tool parser implementation (unary + streaming). |
| src/llm/BUILD | Adds Bazel targets/deps for Gemma4 parser and RapidJSON usage in utils. |
| src/llm/apis/openai_api_handler.hpp | Updates VLM unary serialization interface to accept explicit textResponse. |
| src/llm/apis/openai_completions.hpp | Updates VLM unary serialization signature. |
| src/llm/apis/openai_completions.cpp | Uses provided textResponse for VLM unary serialization. |
| src/llm/apis/openai_responses.hpp | Updates VLM unary serialization signature. |
| src/llm/apis/openai_responses.cpp | Uses provided textResponse for VLM unary serialization. |
| spelling-whitelist.txt | Adds the new Gemma4 test file to the whitelist list. |
Comments suppressed due to low confidence (2)
src/llm/apis/openai_completions.cpp:497
- serializeUnaryResponse(VLMDecodedResults&, textResponse) only emits a choice when
textResponseis non-empty. If the model legitimately generates an empty string, this returns an emptychoicesarray, which is not a valid OpenAI-style response (a single choice with empty content/tool_calls should still be returned). Consider always creating one choice and usingtextResponseas-is (even when empty).
if (!textResponse.empty()) {
SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Generated text: {}", textResponse);
// Workaround to use OVMS unary parsers: get tokens from string
// This way we have detokenized text from GenAI and calculate tokens, to further convert back to text again, in parseOutputIfNeeded...
auto generatedTokens = encodeTextToTokens(textResponse);
SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Generated tokens: {}", generatedTokens);
ParsedOutput parsedOutput = parseOutputIfNeeded(generatedTokens);
jsonResponse.StartObject();
// finish_reason: "stop" in regular scenario, "tool_calls" if output contains tool calls
auto finishReason = mapFinishReason(ov::genai::GenerationFinishReason::STOP, !parsedOutput.toolCalls.empty());
jsonResponse.FinishReason(finishReason.value_or("unknown"));
// index: integer; Choice index, only n=1 supported anyway
jsonResponse.Index(index++);
// TODO: logprobs: object/null; Log probability information for the choice.
if (endpoint == Endpoint::CHAT_COMPLETIONS) {
jsonResponse.MessageObject(parsedOutput);
} else if (endpoint == Endpoint::COMPLETIONS) {
jsonResponse.Text(parsedOutput);
}
// finish message object
jsonResponse.EndObject();
}
// finish choices array
jsonResponse.EndArray();
src/llm/apis/openai_responses.cpp:676
- serializeUnaryResponse(VLMDecodedResults&, textResponse) skips adding any ParsedOutput when
textResponseis empty, so serializeUnaryResponseImpl() is called with an empty vector. If the model output is legitimately empty, this likely produces a response without any output items/content, which is invalid for the Responses API. Consider emitting a single empty ParsedOutput (or otherwise ensuring one output item exists) even whentextResponseis empty.
std::vector<ParsedOutput> parsedOutputs;
if (!textResponse.empty()) {
if (outputParser != nullptr) {
// Same workaround as in chat completions
auto generatedTokens = encodeTextToTokens(textResponse);
parsedOutputs.push_back(parseOutputIfNeeded(generatedTokens));
} else {
// Fast path: no output parser, use decoded text directly.
ParsedOutput output;
output.content = textResponse;
parsedOutputs.push_back(std::move(output));
}
}
return serializeUnaryResponseImpl(parsedOutputs);
| this->streamingPosition = toolCallEndTagPos + TOOL_CALL_END_TAG.length(); | ||
| this->currentState = State::AfterToolCall; | ||
| SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Detected end of tool call at position: {}, returning to content state", toolCallEndTagPos); |
| std::string Gemma4ToolParser::parseArrayParameter(std::string argumentStr) { | ||
| size_t pos = 1; | ||
| std::string parsedArguments = "["; | ||
|
|
||
| while (pos != std::string::npos) { | ||
| size_t stringStartPos = argumentStr.find(TOOL_ARGS_STRING_INDICATOR, pos); | ||
| if (stringStartPos == std::string::npos) { | ||
| break; | ||
| } | ||
| stringStartPos += TOOL_ARGS_STRING_INDICATOR.size(); | ||
| size_t stringEndPos = argumentStr.find(TOOL_ARGS_STRING_INDICATOR, stringStartPos); | ||
| if (stringEndPos == std::string::npos) { | ||
| break; | ||
| } | ||
|
|
||
| std::string originalStr = argumentStr.substr(stringStartPos, stringEndPos - stringStartPos); | ||
| size_t quotePos = 0; | ||
| while ((quotePos = originalStr.find('\"', quotePos)) != std::string::npos) { | ||
| originalStr.insert(quotePos, "\\"); | ||
| quotePos += 2; | ||
| } | ||
| parsedArguments += "\"" + originalStr + "\","; | ||
|
|
||
| pos = stringEndPos + TOOL_ARGS_STRING_INDICATOR.size() + 1; | ||
| } | ||
|
|
||
| parsedArguments.back() = ']'; | ||
|
|
||
| return parsedArguments; |
| break; | ||
| SPDLOG_LOGGER_TRACE(llm_calculator_logger, "No more tool strings found in the decoded string: {}", toolCallStr); |
| void writeArgumentOfAnyType(const rapidjson::Value& arg, rapidjson::Writer<rapidjson::StringBuffer>& writer) { | ||
| if (arg.IsString()) { | ||
| writer.String(arg.GetString()); | ||
| } else if (arg.IsInt64()) { | ||
| writer.Int64(arg.GetInt64()); | ||
| } else if (arg.IsDouble()) { | ||
| writer.Double(arg.GetDouble()); | ||
| } else if (arg.IsBool()) { | ||
| writer.Bool(arg.GetBool()); | ||
| } else if (arg.IsArray()) { | ||
| writer.StartArray(); | ||
| for (auto& elem : arg.GetArray()) { | ||
| writeArgumentOfAnyType(elem, writer); | ||
| } | ||
| writer.EndArray(); | ||
| } else if (arg.IsObject()) { | ||
| writer.StartObject(); | ||
| for (auto it = arg.MemberBegin(); it != arg.MemberEnd(); ++it) { | ||
| writer.Key(it->name.GetString()); | ||
| writeArgumentOfAnyType(it->value, writer); | ||
| } | ||
| writer.EndObject(); | ||
| } else { | ||
| writer.String(""); | ||
| } |
| std::optional<rapidjson::Document> Gemma4ToolParser::parseChunk(const std::string& chunk, ov::genai::GenerationFinishReason finishReason) { | ||
| if (chunk.empty()) { | ||
| return std::nullopt; | ||
| } | ||
|
|
||
| this->streamingContent += chunk; | ||
|
|
||
| if (parseNewContent()) { | ||
| if (this->currentState == State::ToolCallParameters) { | ||
| return BaseOutputParser::wrapFirstDelta(this->toolCall.name, toolCallIndex); | ||
| } | ||
| if (this->currentState == State::ToolCallEnded) { | ||
| return wrapDeltaArgs(this->toolCall.arguments, toolCallIndex); | ||
| } | ||
| if (this->currentState == State::Content) { | ||
| size_t contentEnd = this->streamingContent.find(TOOL_CALL_START_TAG, this->streamingPosition); | ||
| std::string content; | ||
| if (contentEnd != std::string::npos) { | ||
| content = this->streamingContent.substr(this->streamingPosition, contentEnd - this->streamingPosition); | ||
| } else { | ||
| content = this->streamingContent.substr(this->streamingPosition); | ||
| } | ||
| this->streamingPosition += content.size(); | ||
| if (!content.empty()) { | ||
| return wrapDeltaContent(content); | ||
| } | ||
| } | ||
| if (this->currentState == State::AfterToolCall) { | ||
| this->currentState = State::Content; | ||
| } | ||
| } |
| std::vector<int64_t> generatedTokens(generatedTensor.data<int64_t>(), generatedTensor.data<int64_t>() + generatedTensor.get_size()); | ||
| ParsedOutput parsedOutput = outputParserWithRegularToolParsing->parse(generatedTokens, true); | ||
| ASSERT_EQ(parsedOutput.toolCalls.size(), 1); | ||
| EXPECT_EQ(parsedOutput.toolCalls[0].name, "no_args_tool"); |
There was a problem hiding this comment.
| EXPECT_EQ(parsedOutput.toolCalls[0].name, "no_args_tool"); | |
| EXPECT_EQ(parsedOutput.toolCalls[0].name, "no_args_tool"); | |
| EXPECT_EQ(parsedOutput.toolCalls[0].arguments, ""); |
| std::vector<std::tuple<std::string, ov::genai::GenerationFinishReason, std::optional<std::string>>> chunkToDeltaVec{ | ||
| {"SOME_CONTENT", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"content":"SOME_CONTENT"}})"}, | ||
| {"MORE_CONTENT<|tool_call>", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"content":"MORE_CONTENT"}})"}, | ||
| {"call:sort", ov::genai::GenerationFinishReason::NONE, std::nullopt}, |
There was a problem hiding this comment.
do we need tests for whitespaces between keywords? do we treat such requests as valid?
|
|
||
| // Tests with special characters | ||
| TEST_F(Gemma4OutputParserTest, ParseToolCallWithStringArgumentsContainingComparison) { | ||
| std::string input = R"x(<|tool_call>call:search{query:<|"|>price >= 100, (sale)<|"|>,limit:5}<tool_call|>)x"; |
There was a problem hiding this comment.
why its not supported? isnt it normal thing that whitespace is part of the string? and also > < =
| } | ||
|
|
||
| TEST_F(Gemma4OutputParserTest, ParseToolCallWithStringArgumentsContainingSpecialCharacters) { | ||
| std::string impl = "import package\nimport package2\n\ndef func(a, b):\n\td={\"python\": \"dict\"}\n\tl = [\"list \\\"with escaped text\\\"\", 123, []]\n\treturn f\"formatted {a} and {b}\""; |
There was a problem hiding this comment.
did you test if the model really escapes everything? I remember @atobiszei doing qwen-coder parser and there was additional work needed to be made to support nested \
🛠 Summary
CVS-184756
Enabling gemma4 model, changing vlm pipeline to accept special tokens in streaming
🧪 Checklist
``