Skip to content

Mkulakow/legacy pipeline finish reason#4169

Open
michalkulakowski wants to merge 1 commit intomainfrom
mkulakow/legacy_pipeline_finish_reason
Open

Mkulakow/legacy pipeline finish reason#4169
michalkulakowski wants to merge 1 commit intomainfrom
mkulakow/legacy_pipeline_finish_reason

Conversation

@michalkulakowski
Copy link
Copy Markdown
Collaborator

🛠 Summary

JIRA/Issue if applicable.
Describe the changes.

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

Copilot AI review requested due to automatic review settings April 29, 2026 13:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates OVMS legacy LLM pipeline response handling to propagate the actual GenAI finish_reason (instead of always reporting stop), and bumps OpenVINO/GenAI dependency pins to a newer nightly.

Changes:

  • Propagate legacy pipeline finish_reason into the final streamed chunk and into OpenAI Chat/Completions unary responses.
  • Enable finish-reason checking for the legacy LLM flow test case.
  • Update OpenVINO / OpenVINO GenAI source pins and nightly package versions (including export-model requirements).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
versions.mk Updates OV/GenAI commit pins and GenAI nightly package URLs to 2026.2.0.0.dev20260428.
src/test/llm/llmnode_test.cpp Enables finish-reason validation for lm_legacy_regular in parameterized tests.
src/llm/language_model/legacy/servable.cpp Uses results.finish_reasons[0] (when available) for the final streaming chunk finish reason.
src/llm/apis/openai_completions.cpp Switches unary finish_reason serialization to use results.finish_reasons[...] for Encoded/VLM results.
demos/common/export_models/requirements.txt Bumps openvino / openvino-tokenizers nightly versions to 2026.2.0.dev20260428.

Comment thread src/llm/apis/openai_completions.cpp Outdated
Comment on lines 409 to 418
auto finishReason = mapFinishReason(results.finish_reasons[index], !parsedOutput.toolCalls.empty());
jsonResponse.FinishReason(finishReason.value_or("unknown"));
// index: integer; Choice index, only n=1 supported anyway
jsonResponse.Index(index++);
Comment thread src/llm/apis/openai_completions.cpp Outdated
jsonResponse.StartObject();
// finish_reason: "stop" in regular scenario, "tool_calls" if output contains tool calls
auto finishReason = mapFinishReason(ov::genai::GenerationFinishReason::STOP, !parsedOutput.toolCalls.empty());
auto finishReason = mapFinishReason(results.finish_reasons[index], !parsedOutput.toolCalls.empty());
@michalkulakowski michalkulakowski force-pushed the mkulakow/legacy_pipeline_finish_reason branch from 4a740a6 to 31f3036 Compare May 7, 2026 09:25
@dtrawins dtrawins added this to the 2026.2_rc milestone May 8, 2026
@michalkulakowski michalkulakowski force-pushed the mkulakow/legacy_pipeline_finish_reason branch from 2d62b16 to fbd018c Compare May 8, 2026 12:53
jsonResponse.StartObject();
// finish_reason: "stop" in regular scenario, "tool_calls" if output contains tool calls
auto finishReason = mapFinishReason(ov::genai::GenerationFinishReason::STOP, !parsedOutput.toolCalls.empty());
const ov::genai::GenerationFinishReason finishReasonRaw =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to have results.finish_reasons.empty() ? in which situations?

// finish_reason: "stop" in regular scenario, "tool_calls" if output contains tool calls
auto finishReason = mapFinishReason(ov::genai::GenerationFinishReason::STOP, !parsedOutput.toolCalls.empty());
const ov::genai::GenerationFinishReason finishReasonRaw =
(!results.finish_reasons.empty()) ? results.finish_reasons[0] : ov::genai::GenerationFinishReason::STOP;
Copy link
Copy Markdown
Collaborator

@dkalinowski dkalinowski May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comment that we access [0] because we always use generate with batch=1

parsedOutputs.push_back(parseOutputIfNeeded(tokens));
}
return serializeUnaryResponseImpl(parsedOutputs);
for (const auto& finishReason : results.finish_reasons) {
Copy link
Copy Markdown
Collaborator

@dkalinowski dkalinowski May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have different implementation than chat/completions? cant we just take [0]? I think we also have batch size=1 here always

std::string serializedChunk = executionContext->apiHandler->serializeStreamingChunk(lastTextChunk, ov::genai::GenerationFinishReason::STOP);
ov::genai::GenerationFinishReason finishReason = ov::genai::GenerationFinishReason::STOP;
if (!legacyExecutionContext->results.finish_reasons.empty()) {
finishReason = legacyExecutionContext->results.finish_reasons[0];
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt 0 finish reasons internal error here? should we throw if it happens?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants