Add ExportRecipe support for Arm targets#19527
Conversation
Introduces `ArmRecipeProvider` and `ArmRecipeType` so callers can use the existing `ExportRecipe` abstraction to target Ethos-U, TOSA, and VGF instead of going through `aot_arm_compiler.py`. Shape mirrors the sibling XNNPACK / QNN providers; the provider auto-registers on import of `backends/arm/recipes/`. Eight recipes ship: Ethos-U55/U65/U85 INT8 (with `macs`, `system_config`, `memory_mode`, `extra_flags`, `config_ini` kwargs), TOSA FP / INT8 / A16W8, and VGF FP / INT8. Cortex-M is not yet supported via recipes — its no-partitioner flow needs a different pipeline shape and is left for a follow-up. Faithfulness to the CLI: INT8 and A16W8 paths wire `ReplaceQuantNodesPass` through `LoweringRecipe.edge_manager_transform_passes` and override `pipeline_stages` to insert `EDGE_PROGRAM_MANAGER_TRANSFORM` after `TO_EDGE_TRANSFORM_AND_LOWER`, matching `aot_arm_compiler.py:200-201`. The pass is skipped for VGF and FP, also matching the CLI gate. Ethos-U `extra_flags` are prepended with `--verbose-operators --verbose-cycle-estimate` to mirror `aot_arm_compiler.py:479-484`. Unknown kwargs raise `ValueError` (vs. XNNPACK/QNN which warn) — intentional for a new provider so typos like `mac=128` fail fast rather than silently producing a wrong-target binary. Enabling the post-partition hook required uncommenting the existing TODO at `EdgeProgramManagerTransformStage.valid_predecessor_stages` to also accept `TO_EDGE_TRANSFORM_AND_LOWER`. The stage's `run()` method already handles a partitioned `EdgeProgramManager` correctly. A pre-existing circular import between `tosa.backend` and `ethosu.backend` surfaces when `executorch.backends.arm.vgf` is loaded without `ethosu` already in `sys.modules`. The provider primes `ethosu` before importing `vgf`, the same workaround `aot_arm_compiler.py` uses implicitly through its module-level import order. Tests live in `backends/arm/test/recipes/test_arm_recipes.py`: - Registration suite runs anywhere (no Arm SDK deps). - TOSA / VGF / Ethos-U construction suites skip cleanly if the corresponding SDK piece isn't installed. - AOT round-trip suite exports `_AddModule` (TOSA FP) and `_ConvReluModule` (TOSA INT8) and asserts the right delegation shape — full delegation for FP; for INT8, ≥1 `DelegateCall` plus `cortex_m::quantize_per_tensor` / `cortex_m::dequantize_per_tensor` boundary kernels, which verifies `ReplaceQuantNodesPass` actually ran. CI hookup adds a `test_pytest_recipes` matrix entry to `unittest-arm-backend-with-no-deps` in pull.yml (Ethos-U tests skip via the Vela guard) and to `test-arm-backend-ethos-u` in trunk.yml (full SDK available; all tests run). Authored with Claude Code.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19527
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit 565b95f with merge base 9e4e497 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Summary
Introduces
ArmRecipeProviderandArmRecipeTypeso callers can use the existingExportRecipeabstraction to target Ethos-U, TOSA, and VGF instead of going throughaot_arm_compiler.py. Shape mirrors the sibling XNNPACK / QNN providers; the provider auto-registers on import ofbackends/arm/recipes/.Eight recipes ship: Ethos-U55/U65/U85 INT8 (with
macs,system_config,memory_mode,extra_flags,config_inikwargs), TOSA FP / INT8 / A16W8, and VGF FP / INT8. Cortex-M is not yet supported via recipes — its no-partitioner flow needs a different pipeline shape and is left for a follow-up.Faithfulness to the CLI: INT8 and A16W8 paths wire
ReplaceQuantNodesPassthroughLoweringRecipe.edge_manager_transform_passesand overridepipeline_stagesto insertEDGE_PROGRAM_MANAGER_TRANSFORMafterTO_EDGE_TRANSFORM_AND_LOWER, matchingaot_arm_compiler.py:200-201. The pass is skipped for VGF and FP, also matching the CLI gate. Ethos-Uextra_flagsare prepended with--verbose-operators --verbose-cycle-estimateto mirroraot_arm_compiler.py:479-484. Unknown kwargs raiseValueError(vs. XNNPACK/QNN which warn) — intentional for a new provider so typos likemac=128fail fast rather than silently producing a wrong-target binary.Enabling the post-partition hook required uncommenting the existing TODO at
EdgeProgramManagerTransformStage.valid_predecessor_stagesto also acceptTO_EDGE_TRANSFORM_AND_LOWER. The stage'srun()method already handles a partitionedEdgeProgramManagercorrectly.A pre-existing circular import between
tosa.backendandethosu.backendsurfaces whenexecutorch.backends.arm.vgfis loaded withoutethosualready insys.modules. The provider primesethosubefore importingvgf, the same workaroundaot_arm_compiler.pyuses implicitly through its module-level import order.Test plan
Tests live in
backends/arm/test/recipes/test_arm_recipes.py:_AddModule(TOSA FP) and_ConvReluModule(TOSA INT8) and asserts the right delegation shape — full delegation for FP; for INT8, ≥1DelegateCallpluscortex_m::quantize_per_tensor/cortex_m::dequantize_per_tensorboundary kernels, which verifiesReplaceQuantNodesPassactually ran.CI hookup adds a
test_pytest_recipesmatrix entry tounittest-arm-backend-with-no-depsin pull.yml (Ethos-U tests skip via the Vela guard) and totest-arm-backend-ethos-uin trunk.yml (full SDK available; all tests run).Authored with Claude Code.
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell