[llm][2/4] Echo-gated special-token filtering and EOS metadata merge by seyeong-han · Pull Request #19534 · pytorch/executorch

seyeong-han · 2026-05-13T04:51:59Z

Summary

Part 2 of the chat-template support stack split out of #16987 per @kirklandsign's request.

This PR adds two runner-behavior changes to TextLLMRunner that affect all users:

Echo-gated special-token filtering (so chat-template tokens don't leak into clean output)
EOS metadata merge (instead of clearing the tokenizer's primary EOS token)

Stack overview

PR	Subject
1/4	#19533 Library + tests
2/4 (this PR)	TextLLMRunner echo gating + EOS merge
3/4	Python bindings + Python LlamaRunner integration
4/4	llama_main CLI flags + chat_formatter wrapper + universal Jinja docs

What this PR adds

Echo-gated special-token filtering (`text_llm_runner.cpp`)

Adds is_special_token() with a small kKnownSpecialTokens set covering Llama 3.x, Gemma, and generic <s> / </s> / <pad> / <unk> tokens, plus a regex-style match for Llama-format <|...|> tokens.

wrapped_callback now suppresses these from the printed stream only when GenerationConfig.echo == false. When echo == true, raw model output (including chat-template tokens) is emitted unchanged — this preserves backward compatibility for users who explicitly want to see raw tokens.

if (config.echo || !is_special_token(piece)) {
  llm::safe_printf(piece.c_str());
  fflush(stdout);
}

EOS metadata merge (`llm_runner_helper.cpp`)

get_eos_ids() now merges the tokenizer's primary eos_tok() with any additional EOS IDs the model metadata exports under kEosIds, instead of clear()-ing the set when metadata is present.

This is the correct behavior for HF-tokenizer models (e.g. Llama 3.x) where eos_tok() = <|end_of_text|> but the model also wants <|eot_id|> as a stop token. Also logs the primary EOS token and only logs metadata IDs that are newly inserted.

Why this is split out

These are runner-behavior changes that affect ALL TextLLMRunner users, not just the new chat-template path. They deserve focused review for:

Backward-compat impact (echo gating)
EOS-set semantics (merge vs. clear)

Test Plan

Existing TextLLMRunner tests still pass
Verify special tokens filtered when --echo=false (clean output)
Verify special tokens emitted when --echo=true (raw output)
Verify EOS set contains both tokenizer primary EOS and model-metadata EOS IDs

Depends on

PR-A: #19533 (only for stack ordering; this PR has no #include or symbol dependency on the JinjaChatFormatter library)

Original PR

Splitting #16987 into 4 reviewable PRs.

cc @kirklandsign @larryliu0820 @metascroy

pytorch-bot · 2026-05-13T04:52:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19534

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Run pull request jobs on OSDC runners in shadow mode

❌ 103 New Failures, 1 Cancelled Job, 1 Unrelated Failure, 6 Unclassified Failures

As of commit 0b6a51d with merge base 2ea50ac ():

NEW FAILURES - The following jobs have failed:

Build documentation / build (buck2) / Build doc (gh)
RuntimeError: Command docker exec -t 53de01bcff97736a53e26898a3df8cd1e982298886eed441ccb349a23e31c3a8 /exec failed with exit code 3
Build Presets / apple (ios-simulator) / build (gh)
/Users/runner/work/executorch/executorch/pytorch/executorch/cmake-out/_deps/boost-src/libs/regex/include/boost/regex/v5/c_regex_traits.hpp:461:17: error: implicit conversion loses integer precision: 'long' to 'int' [-Werror,-Wshorten-64-to-32]
Build Presets / apple (ios) / build (gh)
/Users/runner/work/executorch/executorch/pytorch/executorch/cmake-out/_deps/boost-src/libs/regex/include/boost/regex/v5/c_regex_traits.hpp:461:17: error: implicit conversion loses integer precision: 'long' to 'int' [-Werror,-Wshorten-64-to-32]
Build Presets / apple (llm) / build (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 2
Build Presets / apple (macos) / build (gh)
/Users/runner/work/executorch/executorch/pytorch/executorch/cmake-out/_deps/boost-src/libs/regex/include/boost/regex/v5/c_regex_traits.hpp:461:17: error: implicit conversion loses integer precision: 'long' to 'int' [-Werror,-Wshorten-64-to-32]
Build Presets / apple (profiling) / build (gh)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 2
Build Presets / apple (pybind) / build (gh)
/Users/runner/work/executorch/executorch/pytorch/executorch/cmake-out/_deps/fmt-src/include/fmt/format.h:3639:12: error: call to deleted constructor of 'formatter<basic_memory_buffer<wchar_t, 500, allocator<wchar_t>>, char>'
Build Presets / linux (linux, linux.2xlarge, executorch-ubuntu-22.04-clang12) / build (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
Build Presets / linux (linux, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11-aarch64) / build (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
Build Presets / linux (llm, linux.2xlarge, executorch-ubuntu-22.04-clang12) / build (gh)
Build Presets / linux (llm, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11-aarch64) / build (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
Build Presets / linux (pybind, linux.2xlarge, executorch-ubuntu-22.04-clang12) / build (gh)
Build Presets / linux (pybind, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11-aarch64) / build (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
Build Presets / windows (pybind) / build (gh)
Process completed with exit code 1.
Build Presets / windows (windows) / build (gh)
Process completed with exit code 1.
Lint / lintrunner (gh)
>>> Lint for extension/llm/runner/test/test_jinja_chat_formatter.cpp:
pull / android / build-android (gh)
Process completed with exit code 1.
pull / test-build-wasm-linux / linux-job (gh)
RuntimeError: Command docker exec -t cf0f1ec27515b7593aa2e119f46625b1eb56e447c758883a2baefd5300ae3c98 /exec failed with exit code 2
pull / test-coreml-bc-macos (macos-m1-stable) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / test-coreml-bc-macos (macos-m2-stable) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / test-llama-runner-linux (bf16, custom, linux.2xlarge, executorch-ubuntu-22.04-clang12) / linux-job (gh)
AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'
pull / test-llama-runner-linux (fp32, xnnpack+custom+qe, linux.2xlarge, executorch-ubuntu-22.04-clang12) / linux-job (gh)
AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'
pull / test-llama-runner-linux (fp32, xnnpack+custom+qe, linux.arm64.2xlarge, executorch-ubuntu-22.04-gc... / linux-job (gh)
AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'
pull / test-llama-runner-linux (fp32, xnnpack+custom+quantize_kv, linux.2xlarge, executorch-ubuntu-22.04... / linux-job (gh)
AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'
pull / test-llama-runner-linux (fp32, xnnpack+custom+quantize_kv, linux.arm64.2xlarge, executorch-ubuntu... / linux-job (gh)
AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'
pull / test-llama-runner-linux (fp32, xnnpack+quantize_kv, linux.2xlarge, executorch-ubuntu-22.04-clang12) / linux-job (gh)
AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'
pull / test-llama-runner-linux (fp32, xnnpack+quantize_kv, linux.arm64.2xlarge, executorch-ubuntu-22.04-... / linux-job (gh)
AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'
pull / test-llama-runner-linux-android / linux-job (gh)
RuntimeError: Command docker exec -t e3ea59dec25f5b3ca9fddb102193e699b71ef6196dad128e11c4200cfcbbc48e /exec failed with exit code 1
pull / test-llama-runner-qnn-linux (fp32, qnn_16a16w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t a3e28dd67972bafcd1195181da31d7fc1664ebd32f6bc352876592696db25c3e /exec failed with exit code 2
pull / test-llama-runner-qnn-linux (fp32, qnn_8a8w, qnn) / linux-job (gh)
RuntimeError: Command docker exec -t 70402087cb640568b64641f7854464d09a7f88b24ddfe4e560897ff802a2e430 /exec failed with exit code 2
pull / test-lora-linux / linux-job (gh)
AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'
pull / test-lora-multimethod-linux / linux-job (gh)
AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'
pull / test-mediatek-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t d7f99ce72ecfed281a09c14cb1c5505e741a60294e94ed2f765f625a042d6d54 /exec failed with exit code 2
pull / test-models-linux (add_mul, portable, linux.2xlarge) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux (add_mul, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
RuntimeError: Command docker exec -t cb6965d2771eaf6f42f8d19214df0ae156c4f1abf1cf4a71d221d80e42549310 /exec failed with exit code 1
pull / test-models-linux (add, portable, linux.2xlarge) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux (add, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
RuntimeError: Command docker exec -t c3993fda391ab2fc24410759f2696ac3627e974d1254b24a17eb6892e1f886b6 /exec failed with exit code 1
pull / test-models-linux (emformer_join, portable, linux.4xlarge.memory) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux (emformer_join, xnnpack-quantization-delegation, linux.4xlarge.memory) / linux-job (gh)
RuntimeError: Command docker exec -t 5e458b978c4b5ebcdc87df1a8d1a79be2318cf85b662e44f135d549334f8ae54 /exec failed with exit code 1
pull / test-models-linux (emformer_transcribe, portable, linux.2xlarge) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux (emformer_transcribe, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
RuntimeError: Command docker exec -t a23c713a3d3c3c07e245c0734a2fb2b6f232d468359015b9565f9151371b9681 /exec failed with exit code 1
pull / test-models-linux (ic3, portable, linux.2xlarge) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux (ic3, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
RuntimeError: Command docker exec -t 51902a852d908d0793c00488922b8aa6a1cd8a6852d37f391f6e4ae9dfab813a /exec failed with exit code 1
pull / test-models-linux (ic4, portable, linux.4xlarge.memory) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux (ic4, xnnpack-quantization-delegation, linux.4xlarge.memory) / linux-job (gh)
RuntimeError: Command docker exec -t c82cad05c8eb3bfe9ae59df322a6b1854fdeab6e3032aeb254125b3b520ef0fd /exec failed with exit code 1
pull / test-models-linux (linear, portable, linux.2xlarge) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux (linear, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
RuntimeError: Command docker exec -t 9b719dd77186c32226ce3f1154113163e1cb6c64038f6e385c4e6b7a052185b8 /exec failed with exit code 1
pull / test-models-linux (llama3_2_vision_encoder, portable, linux.4xlarge.memory) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux (mobilebert, portable, linux.2xlarge) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux (mobilebert, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
RuntimeError: Command docker exec -t 2db37f6e1f77613ce35bf4e405fa750d59760b89731b3b384ab511ac0c5ad984 /exec failed with exit code 1
pull / test-models-linux (mv2, portable, linux.2xlarge) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux (mv2, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
RuntimeError: Command docker exec -t e9f43fef4dc0ba4edb4075b3a1bc42d3a7a3b0efa5f8b6aa43924cd3568c558a /exec failed with exit code 1
pull / test-models-linux (phi_4_mini, portable, linux.4xlarge.memory) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux (resnet18, portable, linux.2xlarge) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux (resnet18, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
RuntimeError: Command docker exec -t f87668cab3acd00e3dadeb70d0322cfd881a58d5f7201191206e10ab21382869 /exec failed with exit code 1
pull / test-models-linux (resnet50, portable, linux.2xlarge) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux (resnet50, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
RuntimeError: Command docker exec -t a5aee9523f433105ed44be344b1a5c44aca4b07148192645482aa8dfd2dd85ab /exec failed with exit code 1
pull / test-models-linux (w2l, portable, linux.4xlarge.memory) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux-basic (mv3, portable, buck2, linux.2xlarge, executorch-ubuntu-22.04-clang12) / linux-job (gh)
RuntimeError: Command docker exec -t 85f3e9f194f8d9e3e18e61a7e3d9dea2c02c1814ce3e3948c64e0eced0932347 /exec failed with exit code 3
pull / test-models-linux-basic (mv3, portable, cmake, linux.2xlarge, executorch-ubuntu-22.04-clang12) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux-basic (mv3, portable, cmake, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11... / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux-basic (mv3, xnnpack-quantization-delegation, buck2, linux.2xlarge, executorch-u... / linux-job (gh)
RuntimeError: Command docker exec -t c5640648ad307068b3105d2426a2455ac4cd27eb80510e2f7dca297d5948bffd /exec failed with exit code 3
pull / test-models-linux-basic (mv3, xnnpack-quantization-delegation, cmake, linux.2xlarge, executorch-u... / linux-job (gh)
RuntimeError: Command docker exec -t 90eef03eb4c524b9ce72c7a434226f7b3b856509daef1e56794c428cc55a19c5 /exec failed with exit code 1
pull / test-models-linux-basic (mv3, xnnpack-quantization-delegation, cmake, linux.arm64.2xlarge, execut... / linux-job (gh)
RuntimeError: Command docker exec -t d437e9ffe2131e8093417e29688f50dc48bf5c4ffd44853a33d934da0b00a5bc /exec failed with exit code 1
pull / test-models-linux-basic (vit, portable, buck2, linux.2xlarge, executorch-ubuntu-22.04-clang12) / linux-job (gh)
RuntimeError: Command docker exec -t 14994d60877b6bf1be167daa0636b3b84dc2c091696791e8aff40be79c82e02c /exec failed with exit code 3
pull / test-models-linux-basic (vit, portable, cmake, linux.2xlarge, executorch-ubuntu-22.04-clang12) / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux-basic (vit, portable, cmake, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11... / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-models-linux-basic (vit, xnnpack-quantization-delegation, buck2, linux.2xlarge, executorch-u... / linux-job (gh)
RuntimeError: Command docker exec -t 932b94c8c2ac86a8f80cef3de0115d6aa6ee68a04d1d158cb6130bdbf2cd803a /exec failed with exit code 3
pull / test-models-linux-basic (vit, xnnpack-quantization-delegation, cmake, linux.2xlarge, executorch-u... / linux-job (gh)
RuntimeError: Command docker exec -t a20a81fd2b11569da08b64db405151d3606934c7c750ad7252ebc62ce4f28de4 /exec failed with exit code 1
pull / test-models-linux-basic (vit, xnnpack-quantization-delegation, cmake, linux.arm64.2xlarge, execut... / linux-job (gh)
RuntimeError: Command docker exec -t 34a0633801b6e004fd946b4a30abd322f033ca5231483cc57bf881d167c8c505 /exec failed with exit code 1
pull / test-moshi-linux / linux-job (gh)
test_mimi
pull / test-openvino-linux / linux-job (gh)
RuntimeError: Command docker exec -t c56c77d190fd519fc7427957e815ce9312aa78c87e62a2bd9b0171276b2385fe /exec failed with exit code 1
pull / test-parakeet-xnnpack-linux / linux-job (gh)
undefined reference to executorch::backends::xnnpack::XnnpackBackendOptions::get_option(executorch::runtime::BackendOption&) const'`
pull / test-phi-3-mini-runner-linux / linux-job (gh)
undefined reference to executorch::backends::xnnpack::XnnpackBackendOptions::get_option(executorch::runtime::BackendOption&) const'`
pull / test-qnn-buck-build-linux / linux-job (gh)
RuntimeError: Command docker exec -t 2738a8fc1323a475fea9eac809ca0d053816db3c754498bbc2056632e2ffdbf6 /exec failed with exit code 3
pull / test-qnn-delegate-linux / linux-job (gh)
RuntimeError: Command docker exec -t d3a86decf46efc6dea33e757684c33d3301f06c8007810af841be9661b72e9d5 /exec failed with exit code 2
pull / test-qnn-models-linux (dl3) / linux-job (gh)
RuntimeError: Command docker exec -t 0a5d364c27fe5ebeec7cfb8cacb01f4c70601b105d563ee435b9222517df951e /exec failed with exit code 2
pull / test-qnn-models-linux (mv2) / linux-job (gh)
RuntimeError: Command docker exec -t 3380f1958d5f3e3373a9546f2cadabcf106df8f6495d9ad82b84729d4c9879a1 /exec failed with exit code 2
pull / test-qnn-models-linux (mv3) / linux-job (gh)
RuntimeError: Command docker exec -t be70cb609c95f1c868682c4e0d973c266306b6d41e597542195bb1117df2ede0 /exec failed with exit code 2
pull / test-qnn-passes-linux / linux-job (gh)
RuntimeError: Command docker exec -t acc3fde88fe157b733e26f1db4d1bda0f682b9e567d4dd364201c65ea5e32ab3 /exec failed with exit code 2
pull / test-qnn-python-imports-linux / linux-job (gh)
RuntimeError: Command docker exec -t 0a0096fd952bc36dfdbddb54e466e5e3a222d15fb0ea17b019ef78083f4830b7 /exec failed with exit code 2
pull / test-qnn-testsuite-linux / test-backend-linux (qnn, models) / linux-job (gh)
RuntimeError: Command docker exec -t a5cc66639e6520392dc39d911fc7caa6edddc54ba8da25bfaadee5cbee78f65d /exec failed with exit code 2
pull / test-qnn-testsuite-linux / test-backend-linux (qnn, operators) / linux-job (gh)
RuntimeError: Command docker exec -t f08689755e7a5dd5541c03dc7d91a4d7033fe6c8dd046f363cce92bb13132628 /exec failed with exit code 2
pull / test-selective-build-linux / linux-job (gh)
RuntimeError: Command docker exec -t 46c0282ab572480725588fd88c4adbd4d2d70c3ee47bad16802a075fec585ae2 /exec failed with exit code 2
pull / test-setup-linux-gcc / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / test-sqnr-static-llm-qnn-linux (smollm2_135m) / linux-job (gh)
RuntimeError: Command docker exec -t aeac7f9f03bfc8063cb8e3eafd23e0ef78775f53825120f513c3d6540804563b /exec failed with exit code 2
pull / test-static-llama-qnn-linux (stories_110m) / linux-job (gh)
RuntimeError: Command docker exec -t 7534140d5291f04948d3e35d1c339470d3f15c42470d9a1f2dd5a669462eed1a /exec failed with exit code 2
pull / test-static-llama-qnn-linux (stories_260k_bc) / linux-job (gh)
RuntimeError: Command docker exec -t 2b5dcf41ca897310253d096bc35709d2e44045b7692a480f934e5216c656ad9e /exec failed with exit code 2
pull / test-voxtral-realtime-xnnpack-linux / linux-job (gh)
AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'
pull / test-vulkan-models-linux / linux-job (gh)
undefined reference to executorch::backends::xnnpack::XnnpackBackendOptions::get_option(executorch::runtime::BackendOption&) const'`
pull / test-vulkan-operators-linux / linux-job (gh)
undefined reference to executorch::backends::xnnpack::XnnpackBackendOptions::get_option(executorch::runtime::BackendOption&) const'`
pull / unittest / linux / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / unittest / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 2
pull / unittest / windows / windows-job (gh)
Process completed with exit code 1.
pull / unittest-arm-backend-with-no-deps (test_pytest_ops_no_target) / linux-job (gh)
RuntimeError: Command docker exec -t b1ca4c460e94b9274446fe0066ddbeef3ffcd9ab3c66f1d643aec4ed1d43100c /exec failed with exit code 1
pull / unittest-buck / linux / linux-job (gh)
RuntimeError: Command docker exec -t b10deb9b203bdb60e9c958ffa2dfb3a777c1a65f3bf6fa948b9a586144174cf4 /exec failed with exit code 3
pull / unittest-buck / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 3
pull / unittest-editable / linux / linux-job (gh)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
pull / unittest-editable / macos / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 2
pull / unittest-editable / windows / windows-job (gh)
Process completed with exit code 1.
pull / unittest-nxp-neutron / linux-job (gh)
RuntimeError: Command docker exec -t ef963f59dc8c986bb2b61f275c457827bb44739bd5df1003045af934ddb3db70 /exec failed with exit code 1
Test QNN Backend / test-qnn / test-backend-linux (qnn, models) / linux-job (gh)
RuntimeError: Command docker exec -t 5285f5c2253bd86145889eb542dc927da89de8916d41bab0d77b6e7617cd3a72 /exec failed with exit code 2
Test QNN Backend / test-qnn / test-backend-linux (qnn, operators) / linux-job (gh)
RuntimeError: Command docker exec -t 51f2fc119fb7bb9ac76a1ba83fbe80abedd467b6f8da94fa69801ab3e5b67ea7 /exec failed with exit code 2

UNCLASSIFIED FAILURES - DrCI could not classify the following jobs because the workflow did not run on the merge base. The failures may be pre-existing on trunk or introduced by this PR:

periodic / test-models-linux (buck2, mv3, portable, linux.2xlarge, 90) / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command docker exec -t 0f686a6fcf964288005972fe0c1a603a0006c4af7f4d02c0c64ea9cc29f48b04 /exec failed with exit code 3
periodic / test-models-linux (buck2, mv3, xnnpack-quantization-delegation, linux.2xlarge, 90) / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command docker exec -t 21768a3726c9205926cff6df522c2006b0290347bfbd9ed58309f085c95d2ee5 /exec failed with exit code 3
periodic / test-models-linux (cmake, mv3, portable, linux.2xlarge, 90) / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
periodic / test-models-linux (cmake, mv3, xnnpack-quantization-delegation, linux.2xlarge, 90) / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command docker exec -t b8cb43927a7aa7116e1e163b142a43115cc1379d8414c4709e5d9e48832f07de /exec failed with exit code 1
periodic / test-models-linux (cmake, vit, portable, linux.2xlarge, 90) / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
undefined reference to executorch::runtime::validate_program(executorch_flatbuffer::Program const*)'`
periodic / test-models-linux (cmake, vit, xnnpack-quantization-delegation, linux.2xlarge, 90) / linux-job (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command docker exec -t da3770b08146ddd4bc8824f1ad1cf1a11c7612eb475d859185fddccc49fb5707 /exec failed with exit code 1

CANCELLED JOB - The following job was cancelled. Please retry:

pull / unittest-arm-backend-with-no-deps (test_pytest_ops_tosa) / linux-job (gh)
##[error]The operation was canceled.

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / unittest-arm-backend-with-no-deps (test_pytest_models_tosa) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-05-13T04:52:42Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Foundation PR for the chat-template support stack. Adds the Jinja2Cpp-based JinjaChatFormatter, supporting chat-types, embedded Llama3/Llama3.2/Gemma3 templates, build glue (CMake/Buck), and a focused C++ unit-test suite. This PR is reviewable in isolation — it has no behavior change for any existing runner; downstream PRs (B/C/D) plug it in. This is part 1 of a 4-PR stack split out of pytorch#16987 per reviewer request: 1/4 (this PR) Library + tests 2/4 TextLLMRunner echo-gated special-token filter + EOS merge 3/4 Python bindings + Python LlamaRunner integration 4/4 llama_main CLI flags + chat_formatter wrapper + docs What this PR adds ----------------- * extension/llm/chat_template/{chat_templates.h, BUCK, CMakeLists.txt, targets.bzl} — embedded Llama3/Llama3.2/Gemma3 templates and the ChatTemplateType enum + ModelTokens. The CMake file FetchContent's Jinja2Cpp 1.3.2, with SUPPORT_REGEX_LOOKAHEAD set BEFORE FetchContent_MakeAvailable so it propagates correctly, plus header staging for nonstd headers that some Jinja2Cpp installations omit. Installs chat_templates.h so SDK consumers can include it. * extension/llm/runner/{chat_types.h, jinja_chat_formatter.{h,cpp}} — the Universal Jinja chat formatter that supports any HuggingFace / vLLM chat template, not just the embedded ones. Loadable via fromTemplate (built-in), fromString (any string), or fromFile (any .jinja file). formatConversation injects vLLM/HuggingFace-standard params (tools=[], tool_choice=None, date_string, chat_template_kwargs) so any template that references those variables renders correctly. * normalizeTemplate handles vLLM/HF template quirks for Jinja2Cpp: notably, 'not tools is none' maps to 'tools' (truthy check), preserving the intent of 'tools is not none' for empty-list defaults. * extension/llm/runner/{CMakeLists.txt, targets.bzl} — link extension_llm_runner against jinja2cpp (PRIVATE) and define EXECUTORCH_USE_JINJA2CPP. * extension/llm/runner/test/{test_jinja_chat_formatter.cpp, CMakeLists.txt, targets.bzl, BUCK} — unit tests covering Llama3 / Llama3.2 / Gemma3 embedded templates, parseChatTemplateType (case-insensitive), and three universal-Jinja regression tests: - generic HuggingFace-style template (proves it's not Llama-specific) - tools-aware template (validates the tools=[] default) - 'not tools is none' normalization regression test * CMakeLists.txt — adds add_subdirectory(extension/llm/chat_template) guarded by EXECUTORCH_BUILD_EXTENSION_LLM_RUNNER. * shim_et/xplat/executorch/build/build_variables.bzl — adds jinja_chat_formatter.cpp to the runner sources. Notes ----- * No behavior change for existing TextLLMRunner / MultimodalRunner users: the formatter is opt-in, only invoked when downstream code calls it. * Sample vLLM templates are NOT checked in (per reviewer feedback); documentation in the follow-up CLI PR points users to vLLM's examples directory and HuggingFace tokenizer_config.json files. Original PR (full stack): pytorch#16987

Part 2 of the chat-template support stack split out of pytorch#16987. What this PR adds ----------------- * extension/llm/runner/text_llm_runner.cpp: Add 'is_special_token()' with a small kKnownSpecialTokens set covering Llama 3.x, Gemma, and generic <s>/</s>/<pad>/<unk> tokens, plus a regex-style match for Llama-format <|...|> tokens. wrapped_callback now suppresses these from the printed stream when GenerationConfig.echo == false. When echo == true, raw model output (including chat-template tokens) is emitted unchanged - this preserves backward compatibility for users who explicitly want to see raw tokens. * extension/llm/runner/llm_runner_helper.cpp: get_eos_ids() now MERGES the tokenizer's primary eos_tok() with any additional EOS IDs the model metadata exports under kEosIds, instead of clearing the set when metadata is present. This is correct for HF-tokenizer models (e.g. Llama 3.x) where eos_tok() = <|end_of_text|> but the model also wants <|eot_id|> as a stop token. Also logs the primary tok and only logs metadata IDs that are newly inserted. Why this is split out --------------------- These are runner-behavior changes that affect ALL TextLLMRunner users, not just the new chat-template path. They deserve focused review for backward-compat impact (echo gating) and EOS-set semantics (merge vs clear). Depends on: PR-A (extension/llm/chat_template/* + JinjaChatFormatter library) — only for stack ordering; this PR has no include or symbol dependency on that library. Original PR (full stack): pytorch#16987

seyeong-han requested a review from kirklandsign May 13, 2026 04:52

seyeong-han requested review from kirklandsign, larryliu0820 and mergennachin as code owners May 13, 2026 04:52

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 13, 2026

This was referenced May 13, 2026

[llm][3/4] Python bindings for JinjaChatFormatter + LlamaRunner integration #19535

Open

[llm][4/4] llama_main CLI flags + chat_formatter wrapper + universal Jinja docs #19536

Open

[llama] Add chat format support for Llama 3 Instruct models #16987

Open

seyeong-han added 2 commits May 12, 2026 22:03

seyeong-han force-pushed the chat-runner-token-filter branch from 0a20e9a to 0b6a51d Compare May 13, 2026 05:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llm][2/4] Echo-gated special-token filtering and EOS metadata merge#19534

[llm][2/4] Echo-gated special-token filtering and EOS metadata merge#19534
seyeong-han wants to merge 2 commits into
pytorch:mainfrom
seyeong-han:chat-runner-token-filter

seyeong-han commented May 13, 2026

Uh oh!

pytorch-bot Bot commented May 13, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

seyeong-han commented May 13, 2026

Summary

Stack overview

What this PR adds

Echo-gated special-token filtering (text_llm_runner.cpp)

EOS metadata merge (llm_runner_helper.cpp)

Why this is split out

Test Plan

Depends on

Original PR

Uh oh!

pytorch-bot Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19534

❗ 1 Active SEVs

❌ 103 New Failures, 1 Cancelled Job, 1 Unrelated Failure, 6 Unclassified Failures

Uh oh!

github-actions Bot commented May 13, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Echo-gated special-token filtering (`text_llm_runner.cpp`)

EOS metadata merge (`llm_runner_helper.cpp`)

pytorch-bot Bot commented May 13, 2026 •

edited

Loading

This PR needs a `release notes:` label