Skip to content

Gemma4 parser#4179

Open
przepeck wants to merge 17 commits intomainfrom
przepeck/gemma_parser
Open

Gemma4 parser#4179
przepeck wants to merge 17 commits intomainfrom
przepeck/gemma_parser

Conversation

@przepeck
Copy link
Copy Markdown
Collaborator

@przepeck przepeck commented May 5, 2026

🛠 Summary

CVS-184756
Enabling gemma4 model, changing vlm pipeline to accept special tokens in streaming

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

@przepeck przepeck force-pushed the przepeck/gemma_parser branch from ab0aa5b to 1c329a4 Compare May 6, 2026 09:53
Comment thread src/llm/io_processing/gemma4/tool_parser.cpp Outdated

#pragma warning(push)
#pragma warning(disable : 6313)
#include <rapidjson/writer.h>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably use technique suggested by @atobiszei -> hide these headers in port directory and include these headers. inside those headers keep pragmas, this way you dont need them here

}

TEST_F(Gemma4OutputParserTest, ParseToolCallWithListOfStringsAsArgument) {
std::string inputWithProperClosure = "<|tool_call>call:generate_DNA_sequence{length:100,preferences:[<|\"|>G<|\"|>,<|\"|>C<|\"|>]}<tool_call|>";
Copy link
Copy Markdown
Collaborator

@dkalinowski dkalinowski May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible that model generates spaces between these keywords? if not, it can happen when model has low accuracy or subtle acc issues, how will this parser react to it?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, I didn't encounter any of accuracy issues

Comment thread src/test/llm/output_parsers/gemma4_output_parser_test.cpp Outdated
@przepeck przepeck marked this pull request as ready for review May 6, 2026 12:07
Copilot AI review requested due to automatic review settings May 6, 2026 12:07
Comment thread src/test/llm/output_parsers/gemma4_output_parser_test.cpp Outdated
}

TEST_F(Gemma4OutputParserTest, ParseToolCallOutputWithThreeToolCalls) {
std::string inputWithProperClosure = "<|tool_call>call:example_tool{arg1:<|\"|>value1<|\"|>,arg2:42}<tool_call|>"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you test if the model is capable of generating multiple tool calls? (so called "parallel tool calling")
if so, is this behavior observed or from docs?
im just worried if it always is single tool call betweem <tool_call> or maybe it generates all in one? (we have seen such models)
also, do we support all white spaces between tool? space/newline/\r?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I observed, that model can generate multiple tool calls, it separates it doesn't have special separators next function just starts with call
eg. call:solve{equation:<|"|>2(x + 5) = 13<|"|>}call:sort{array:[<|"|>hello, it's me<|"|>, <|"|>hi<|"|>]}

Copy link
Copy Markdown
Collaborator

@dkalinowski dkalinowski May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing such unit test

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Gemma4 tool-call parsing support to OVMS LLM I/O processing and adjusts the VLM legacy unary path to preserve/handle special tokens by reusing the streaming text collection logic (per CVS-184756). It also updates model-prep scripts and introduces a dedicated Gemma4 output parser test suite.

Changes:

  • Added a new Gemma4ToolParser and wired it into OutputParser selection.
  • Modified VLM legacy unary response preparation to reconstruct the full text from streamer callbacks (to retain special tokens) and updated OpenAI API handler interfaces accordingly.
  • Added Gemma4 tokenizer/model preparation steps and extensive unit tests for Gemma4 tool-call parsing (unary + streaming scenarios).

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
windows_prepare_llm_models.bat Adds Gemma4 tokenizer download step for Windows test model prep.
prepare_llm_models.sh Adds Gemma4 tokenizer conversion/download for Linux test model prep.
src/test/llm/output_parsers/gemma4_output_parser_test.cpp New unit tests covering Gemma4 tool-call parsing and streaming chunk behavior.
src/test/http_openai_handler_test.cpp Updates tests to match the new VLM unary serialization API signature.
src/llm/visual_language_model/legacy/servable.cpp Implements unary “streaming-style” text accumulation to preserve special tokens; passes full text into unary serialization.
src/llm/io_processing/utils.hpp Adds new utility declarations used by Gemma4 parsing/serialization.
src/llm/io_processing/utils.cpp Implements JSON argument writing and delimiter search respecting nested structures/quotes.
src/llm/io_processing/output_parser.cpp Registers gemma4 tool parser type.
src/llm/io_processing/gemma4/tool_parser.hpp New Gemma4 tool parser interface + streaming state machine declarations.
src/llm/io_processing/gemma4/tool_parser.cpp New Gemma4 tool parser implementation (unary + streaming).
src/llm/BUILD Adds Bazel targets/deps for Gemma4 parser and RapidJSON usage in utils.
src/llm/apis/openai_api_handler.hpp Updates VLM unary serialization interface to accept explicit textResponse.
src/llm/apis/openai_completions.hpp Updates VLM unary serialization signature.
src/llm/apis/openai_completions.cpp Uses provided textResponse for VLM unary serialization.
src/llm/apis/openai_responses.hpp Updates VLM unary serialization signature.
src/llm/apis/openai_responses.cpp Uses provided textResponse for VLM unary serialization.
spelling-whitelist.txt Adds the new Gemma4 test file to the whitelist list.
Comments suppressed due to low confidence (2)

src/llm/apis/openai_completions.cpp:497

  • serializeUnaryResponse(VLMDecodedResults&, textResponse) only emits a choice when textResponse is non-empty. If the model legitimately generates an empty string, this returns an empty choices array, which is not a valid OpenAI-style response (a single choice with empty content/tool_calls should still be returned). Consider always creating one choice and using textResponse as-is (even when empty).
    if (!textResponse.empty()) {
        SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Generated text: {}", textResponse);

        // Workaround to use OVMS unary parsers: get tokens from string
        // This way we have detokenized text from GenAI and calculate tokens, to further convert back to text again, in parseOutputIfNeeded...
        auto generatedTokens = encodeTextToTokens(textResponse);

        SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Generated tokens: {}", generatedTokens);
        ParsedOutput parsedOutput = parseOutputIfNeeded(generatedTokens);
        jsonResponse.StartObject();
        // finish_reason: "stop" in regular scenario, "tool_calls" if output contains tool calls
        auto finishReason = mapFinishReason(ov::genai::GenerationFinishReason::STOP, !parsedOutput.toolCalls.empty());
        jsonResponse.FinishReason(finishReason.value_or("unknown"));
        // index: integer; Choice index, only n=1 supported anyway
        jsonResponse.Index(index++);
        // TODO: logprobs: object/null; Log probability information for the choice.

        if (endpoint == Endpoint::CHAT_COMPLETIONS) {
            jsonResponse.MessageObject(parsedOutput);
        } else if (endpoint == Endpoint::COMPLETIONS) {
            jsonResponse.Text(parsedOutput);
        }

        // finish message object
        jsonResponse.EndObject();
    }
    // finish choices array
    jsonResponse.EndArray();

src/llm/apis/openai_responses.cpp:676

  • serializeUnaryResponse(VLMDecodedResults&, textResponse) skips adding any ParsedOutput when textResponse is empty, so serializeUnaryResponseImpl() is called with an empty vector. If the model output is legitimately empty, this likely produces a response without any output items/content, which is invalid for the Responses API. Consider emitting a single empty ParsedOutput (or otherwise ensuring one output item exists) even when textResponse is empty.
    std::vector<ParsedOutput> parsedOutputs;
    if (!textResponse.empty()) {
        if (outputParser != nullptr) {
            // Same workaround as in chat completions
            auto generatedTokens = encodeTextToTokens(textResponse);
            parsedOutputs.push_back(parseOutputIfNeeded(generatedTokens));
        } else {
            // Fast path: no output parser, use decoded text directly.
            ParsedOutput output;
            output.content = textResponse;
            parsedOutputs.push_back(std::move(output));
        }
    }
    return serializeUnaryResponseImpl(parsedOutputs);

Comment on lines +314 to +316
this->streamingPosition = toolCallEndTagPos + TOOL_CALL_END_TAG.length();
this->currentState = State::AfterToolCall;
SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Detected end of tool call at position: {}, returning to content state", toolCallEndTagPos);
Comment on lines +39 to +67
std::string Gemma4ToolParser::parseArrayParameter(std::string argumentStr) {
size_t pos = 1;
std::string parsedArguments = "[";

while (pos != std::string::npos) {
size_t stringStartPos = argumentStr.find(TOOL_ARGS_STRING_INDICATOR, pos);
if (stringStartPos == std::string::npos) {
break;
}
stringStartPos += TOOL_ARGS_STRING_INDICATOR.size();
size_t stringEndPos = argumentStr.find(TOOL_ARGS_STRING_INDICATOR, stringStartPos);
if (stringEndPos == std::string::npos) {
break;
}

std::string originalStr = argumentStr.substr(stringStartPos, stringEndPos - stringStartPos);
size_t quotePos = 0;
while ((quotePos = originalStr.find('\"', quotePos)) != std::string::npos) {
originalStr.insert(quotePos, "\\");
quotePos += 2;
}
parsedArguments += "\"" + originalStr + "\",";

pos = stringEndPos + TOOL_ARGS_STRING_INDICATOR.size() + 1;
}

parsedArguments.back() = ']';

return parsedArguments;
Comment on lines +478 to +479
break;
SPDLOG_LOGGER_TRACE(llm_calculator_logger, "No more tool strings found in the decoded string: {}", toolCallStr);
Comment on lines +38 to +62
void writeArgumentOfAnyType(const rapidjson::Value& arg, rapidjson::Writer<rapidjson::StringBuffer>& writer) {
if (arg.IsString()) {
writer.String(arg.GetString());
} else if (arg.IsInt64()) {
writer.Int64(arg.GetInt64());
} else if (arg.IsDouble()) {
writer.Double(arg.GetDouble());
} else if (arg.IsBool()) {
writer.Bool(arg.GetBool());
} else if (arg.IsArray()) {
writer.StartArray();
for (auto& elem : arg.GetArray()) {
writeArgumentOfAnyType(elem, writer);
}
writer.EndArray();
} else if (arg.IsObject()) {
writer.StartObject();
for (auto it = arg.MemberBegin(); it != arg.MemberEnd(); ++it) {
writer.Key(it->name.GetString());
writeArgumentOfAnyType(it->value, writer);
}
writer.EndObject();
} else {
writer.String("");
}
Comment on lines +356 to +386
std::optional<rapidjson::Document> Gemma4ToolParser::parseChunk(const std::string& chunk, ov::genai::GenerationFinishReason finishReason) {
if (chunk.empty()) {
return std::nullopt;
}

this->streamingContent += chunk;

if (parseNewContent()) {
if (this->currentState == State::ToolCallParameters) {
return BaseOutputParser::wrapFirstDelta(this->toolCall.name, toolCallIndex);
}
if (this->currentState == State::ToolCallEnded) {
return wrapDeltaArgs(this->toolCall.arguments, toolCallIndex);
}
if (this->currentState == State::Content) {
size_t contentEnd = this->streamingContent.find(TOOL_CALL_START_TAG, this->streamingPosition);
std::string content;
if (contentEnd != std::string::npos) {
content = this->streamingContent.substr(this->streamingPosition, contentEnd - this->streamingPosition);
} else {
content = this->streamingContent.substr(this->streamingPosition);
}
this->streamingPosition += content.size();
if (!content.empty()) {
return wrapDeltaContent(content);
}
}
if (this->currentState == State::AfterToolCall) {
this->currentState = State::Content;
}
}
std::vector<int64_t> generatedTokens(generatedTensor.data<int64_t>(), generatedTensor.data<int64_t>() + generatedTensor.get_size());
ParsedOutput parsedOutput = outputParserWithRegularToolParsing->parse(generatedTokens, true);
ASSERT_EQ(parsedOutput.toolCalls.size(), 1);
EXPECT_EQ(parsedOutput.toolCalls[0].name, "no_args_tool");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EXPECT_EQ(parsedOutput.toolCalls[0].name, "no_args_tool");
EXPECT_EQ(parsedOutput.toolCalls[0].name, "no_args_tool");
EXPECT_EQ(parsedOutput.toolCalls[0].arguments, "");

std::vector<std::tuple<std::string, ov::genai::GenerationFinishReason, std::optional<std::string>>> chunkToDeltaVec{
{"SOME_CONTENT", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"content":"SOME_CONTENT"}})"},
{"MORE_CONTENT<|tool_call>", ov::genai::GenerationFinishReason::NONE, R"({"delta":{"content":"MORE_CONTENT"}})"},
{"call:sort", ov::genai::GenerationFinishReason::NONE, std::nullopt},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need tests for whitespaces between keywords? do we treat such requests as valid?


// Tests with special characters
TEST_F(Gemma4OutputParserTest, ParseToolCallWithStringArgumentsContainingComparison) {
std::string input = R"x(<|tool_call>call:search{query:<|"|>price >= 100, (sale)<|"|>,limit:5}<tool_call|>)x";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why its not supported? isnt it normal thing that whitespace is part of the string? and also > < =

}

TEST_F(Gemma4OutputParserTest, ParseToolCallWithStringArgumentsContainingSpecialCharacters) {
std::string impl = "import package\nimport package2\n\ndef func(a, b):\n\td={\"python\": \"dict\"}\n\tl = [\"list \\\"with escaped text\\\"\", 123, []]\n\treturn f\"formatted {a} and {b}\"";
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you test if the model really escapes everything? I remember @atobiszei doing qwen-coder parser and there was additional work needed to be made to support nested \

Comment thread prepare_llm_models.sh Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants