Mkulakow/legacy pipeline finish reason#4169
Conversation
There was a problem hiding this comment.
Pull request overview
Updates OVMS legacy LLM pipeline response handling to propagate the actual GenAI finish_reason (instead of always reporting stop), and bumps OpenVINO/GenAI dependency pins to a newer nightly.
Changes:
- Propagate legacy pipeline
finish_reasoninto the final streamed chunk and into OpenAI Chat/Completions unary responses. - Enable finish-reason checking for the legacy LLM flow test case.
- Update OpenVINO / OpenVINO GenAI source pins and nightly package versions (including export-model requirements).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
versions.mk |
Updates OV/GenAI commit pins and GenAI nightly package URLs to 2026.2.0.0.dev20260428. |
src/test/llm/llmnode_test.cpp |
Enables finish-reason validation for lm_legacy_regular in parameterized tests. |
src/llm/language_model/legacy/servable.cpp |
Uses results.finish_reasons[0] (when available) for the final streaming chunk finish reason. |
src/llm/apis/openai_completions.cpp |
Switches unary finish_reason serialization to use results.finish_reasons[...] for Encoded/VLM results. |
demos/common/export_models/requirements.txt |
Bumps openvino / openvino-tokenizers nightly versions to 2026.2.0.dev20260428. |
| auto finishReason = mapFinishReason(results.finish_reasons[index], !parsedOutput.toolCalls.empty()); | ||
| jsonResponse.FinishReason(finishReason.value_or("unknown")); | ||
| // index: integer; Choice index, only n=1 supported anyway | ||
| jsonResponse.Index(index++); |
| jsonResponse.StartObject(); | ||
| // finish_reason: "stop" in regular scenario, "tool_calls" if output contains tool calls | ||
| auto finishReason = mapFinishReason(ov::genai::GenerationFinishReason::STOP, !parsedOutput.toolCalls.empty()); | ||
| auto finishReason = mapFinishReason(results.finish_reasons[index], !parsedOutput.toolCalls.empty()); |
4a740a6 to
31f3036
Compare
2d62b16 to
fbd018c
Compare
| jsonResponse.StartObject(); | ||
| // finish_reason: "stop" in regular scenario, "tool_calls" if output contains tool calls | ||
| auto finishReason = mapFinishReason(ov::genai::GenerationFinishReason::STOP, !parsedOutput.toolCalls.empty()); | ||
| const ov::genai::GenerationFinishReason finishReasonRaw = |
There was a problem hiding this comment.
is it possible to have results.finish_reasons.empty() ? in which situations?
| // finish_reason: "stop" in regular scenario, "tool_calls" if output contains tool calls | ||
| auto finishReason = mapFinishReason(ov::genai::GenerationFinishReason::STOP, !parsedOutput.toolCalls.empty()); | ||
| const ov::genai::GenerationFinishReason finishReasonRaw = | ||
| (!results.finish_reasons.empty()) ? results.finish_reasons[0] : ov::genai::GenerationFinishReason::STOP; |
There was a problem hiding this comment.
please add comment that we access [0] because we always use generate with batch=1
| parsedOutputs.push_back(parseOutputIfNeeded(tokens)); | ||
| } | ||
| return serializeUnaryResponseImpl(parsedOutputs); | ||
| for (const auto& finishReason : results.finish_reasons) { |
There was a problem hiding this comment.
why do we have different implementation than chat/completions? cant we just take [0]? I think we also have batch size=1 here always
| std::string serializedChunk = executionContext->apiHandler->serializeStreamingChunk(lastTextChunk, ov::genai::GenerationFinishReason::STOP); | ||
| ov::genai::GenerationFinishReason finishReason = ov::genai::GenerationFinishReason::STOP; | ||
| if (!legacyExecutionContext->results.finish_reasons.empty()) { | ||
| finishReason = legacyExecutionContext->results.finish_reasons[0]; |
There was a problem hiding this comment.
isnt 0 finish reasons internal error here? should we throw if it happens?
🛠 Summary
JIRA/Issue if applicable.
Describe the changes.
🧪 Checklist
``