[llm][1/4] Add Jinja2Cpp-based chat template formatter library#19533
[llm][1/4] Add Jinja2Cpp-based chat template formatter library#19533seyeong-han wants to merge 1 commit into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19533
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 103 New Failures, 1 Cancelled Job, 1 Unrelated Failure, 6 Unclassified FailuresAs of commit 0898aa3 with merge base 2ea50ac ( NEW FAILURES - The following jobs have failed:
UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:
CANCELLED JOB - The following job was cancelled. Please retry:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Foundation PR for the chat-template support stack. Adds the Jinja2Cpp-based JinjaChatFormatter, supporting chat-types, embedded Llama3/Llama3.2/Gemma3 templates, build glue (CMake/Buck), and a focused C++ unit-test suite. This PR is reviewable in isolation — it has no behavior change for any existing runner; downstream PRs (B/C/D) plug it in. This is part 1 of a 4-PR stack split out of pytorch#16987 per reviewer request: 1/4 (this PR) Library + tests 2/4 TextLLMRunner echo-gated special-token filter + EOS merge 3/4 Python bindings + Python LlamaRunner integration 4/4 llama_main CLI flags + chat_formatter wrapper + docs What this PR adds ----------------- * extension/llm/chat_template/{chat_templates.h, BUCK, CMakeLists.txt, targets.bzl} — embedded Llama3/Llama3.2/Gemma3 templates and the ChatTemplateType enum + ModelTokens. The CMake file FetchContent's Jinja2Cpp 1.3.2, with SUPPORT_REGEX_LOOKAHEAD set BEFORE FetchContent_MakeAvailable so it propagates correctly, plus header staging for nonstd headers that some Jinja2Cpp installations omit. Installs chat_templates.h so SDK consumers can include it. * extension/llm/runner/{chat_types.h, jinja_chat_formatter.{h,cpp}} — the Universal Jinja chat formatter that supports any HuggingFace / vLLM chat template, not just the embedded ones. Loadable via fromTemplate (built-in), fromString (any string), or fromFile (any .jinja file). formatConversation injects vLLM/HuggingFace-standard params (tools=[], tool_choice=None, date_string, chat_template_kwargs) so any template that references those variables renders correctly. * normalizeTemplate handles vLLM/HF template quirks for Jinja2Cpp: notably, 'not tools is none' maps to 'tools' (truthy check), preserving the intent of 'tools is not none' for empty-list defaults. * extension/llm/runner/{CMakeLists.txt, targets.bzl} — link extension_llm_runner against jinja2cpp (PRIVATE) and define EXECUTORCH_USE_JINJA2CPP. * extension/llm/runner/test/{test_jinja_chat_formatter.cpp, CMakeLists.txt, targets.bzl, BUCK} — unit tests covering Llama3 / Llama3.2 / Gemma3 embedded templates, parseChatTemplateType (case-insensitive), and three universal-Jinja regression tests: - generic HuggingFace-style template (proves it's not Llama-specific) - tools-aware template (validates the tools=[] default) - 'not tools is none' normalization regression test * CMakeLists.txt — adds add_subdirectory(extension/llm/chat_template) guarded by EXECUTORCH_BUILD_EXTENSION_LLM_RUNNER. * shim_et/xplat/executorch/build/build_variables.bzl — adds jinja_chat_formatter.cpp to the runner sources. Notes ----- * No behavior change for existing TextLLMRunner / MultimodalRunner users: the formatter is opt-in, only invoked when downstream code calls it. * Sample vLLM templates are NOT checked in (per reviewer feedback); documentation in the follow-up CLI PR points users to vLLM's examples directory and HuggingFace tokenizer_config.json files. Original PR (full stack): pytorch#16987
e5c5a56 to
0898aa3
Compare
Summary
Foundation PR for the chat-template support stack split out of #16987 per reviewer request from @kirklandsign. This PR adds the Jinja2Cpp-based
JinjaChatFormatter, supporting chat-types, embedded Llama3 / Llama3.2 / Gemma3 templates, build glue (CMake/Buck), and a focused C++ unit-test suite.This PR is reviewable in isolation — it has no behavior change for any existing runner; downstream PRs (B/C/D) plug it in.
Stack overview
What this PR adds
extension/llm/chat_template/{chat_templates.h, BUCK, CMakeLists.txt, targets.bzl}— embedded Llama3 / Llama3.2 / Gemma3 templates and theChatTemplateTypeenum +ModelTokens. The CMake fileFetchContents Jinja2Cpp 1.3.2, withSUPPORT_REGEX_LOOKAHEADset beforeFetchContent_MakeAvailableso it propagates correctly, plus header staging fornonstdheaders that some Jinja2Cpp installations omit. Installschat_templates.hso SDK consumers can include it.extension/llm/runner/{chat_types.h, jinja_chat_formatter.{h,cpp}}— the Universal Jinja chat formatter that supports any HuggingFace / vLLM chat template, not just the embedded ones. Loadable viafromTemplate(built-in),fromString(any string), orfromFile(any.jinjafile).formatConversationinjects vLLM/HuggingFace-standard params (tools=[],tool_choice=None,date_string,chat_template_kwargs) so any template that references those variables renders correctly.normalizeTemplatehandles vLLM/HF template quirks for Jinja2Cpp: notably,not tools is nonemaps totools(truthy check), preserving the intent oftools is not nonefor empty-list defaults.extension/llm/runner/{CMakeLists.txt, targets.bzl}— linkextension_llm_runneragainstjinja2cpp(PRIVATE) and defineEXECUTORCH_USE_JINJA2CPP.extension/llm/runner/test/test_jinja_chat_formatter.cpp+ test build files — unit tests covering Llama3 / Llama3.2 / Gemma3 embedded templates,parseChatTemplateType(case-insensitive), and three universal-Jinja regression tests:tools=[]default)not tools is nonenormalization regression testCMakeLists.txt— addsadd_subdirectory(extension/llm/chat_template)guarded byEXECUTORCH_BUILD_EXTENSION_LLM_RUNNER.shim_et/xplat/executorch/build/build_variables.bzl— addsjinja_chat_formatter.cppto the runner sources.Universal Jinja support
Any HuggingFace / vLLM-style Jinja template works:
Notes
examples/directory and HuggingFacetokenizer_config.jsonfiles.Test Plan
cmake --workflow llm-releasemake llama-cpuextension/llm/runner/test/test_jinja_chat_formatterOriginal PR
Splitting #16987 into 4 reviewable PRs.
cc @kirklandsign @larryliu0820 @metascroy @lucylq @mergennachin