Cortex-M: Thread target CPU/ISA through the AOT pass manager#19470
Conversation
Introduce a CortexMCompileConfig dataclass (cpu + isa) that carries Cortex-M target information from the --target=cortex-m<variant>+int8 CLI string into CortexMPassManager. The full standard Cortex-M lineup is registered (M0, M0+, M3, M4, M7, M23, M33, M35P, M52, M55, M85), each with a sensible default ISA; the optional-DSP M33/M35P and optional-MVE M52/M55/M85 cases can be expressed via the isa= kwarg. No pass reads the config yet, so this change is purely plumbing — but it positions both the upcoming AOT scratch-buffer sizing work (pytorch#16580) and the M0+ (pytorch#17646) / M33 (pytorch#17644) backend support to plug in without re-plumbing the call site. Actually building for the new variants still requires Phase 2's MPS2 platform glue. CortexMTester gains an optional config kwarg, and the Pico 2 MLP example now constructs CortexMPassManager with cpu='cortex-m33' to match the RP2350 hardware it targets. Authored with Claude.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19470
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New Failure, 3 Pending, 1 Unrelated FailureAs of commit 2ec0840 with merge base 02f9866 ( NEW FAILURE - The following job has failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
| "TOSA-1.0+INT", | ||
| "TOSA-1.0+FP", | ||
| "TOSA-1.0+INT+int16", | ||
| "cortex-m0+int8", |
|
Really nice! |
| # Default ISA per CPU follows the most common configuration each core is | ||
| # shipped with. M33/M35P optionally lack DSP, and M52/M55/M85 optionally | ||
| # lack MVE; callers can pass `isa=` explicitly to override. | ||
| _CPU_DEFAULT_ISA: dict[str, str] = { |
There was a problem hiding this comment.
Use the Cpu, Isa literals you defined for typing?
| """ | ||
|
|
||
| cpu: Cpu = "cortex-m55" | ||
| isa: Isa | None = None |
There was a problem hiding this comment.
Use dataclass default/ default_factory instead of | None and post_init. Avoids annoying typing issues
| "TOSA-1.0+INT+int16", | ||
| "cortex-m0+int8", | ||
| "cortex-m0plus+int8", | ||
| "cortex-m3+int8", |
There was a problem hiding this comment.
I don't think +int8 is a good model for a particular target. int8 is a result of quantization and cmsis_nn support, not target. If we suddenly start supporting for example int16 for some cortex-m, all such cortex-m targets will then support int16. I know that we do something like this for TOSA with +int16, but I at least don't think int8 should make it into the CortexMCompileConfig. It can be used in aot_arm_compiler for deciding quantization.
There was a problem hiding this comment.
Agreed. I've dropped the +int8 on all cortex-m. It'll really just explode this list.
| """Cortex-M/CMSIS-NN compilation path with no delegation.""" | ||
| logging.info("Using Cortex-M/CMSIS-NN compilation path (no delegation)") | ||
| logging.info( | ||
| "Using Cortex-M/CMSIS-NN compilation path for cpu=%s isa=%s", |
There was a problem hiding this comment.
Nit: why not use f-strings
| self.passes: list[PassClass] = ( # type: ignore[assignment] | ||
| passes if passes is not None else self.pass_list # type: ignore[assignment] | ||
| ) | ||
| self.config: CortexMCompileConfig = config or CortexMCompileConfig() |
There was a problem hiding this comment.
Add docstr to init about the default value of config
|
|
||
| from executorch.backends.cortex_m.compile_config import CortexMCompileConfig | ||
|
|
||
| _HAS_CMSIS_NN = find_spec("cmsis_nn") is not None |
There was a problem hiding this comment.
pytest has a function called import_or_skip or similar
There was a problem hiding this comment.
Also, I think there is a danger in skipping tests when a dependency is missing. It could easily lead to something not being tested when we think it is.
| from dataclasses import dataclass | ||
| from typing import Literal | ||
|
|
||
| Cpu = Literal[ |
There was a problem hiding this comment.
Literals are fine I guess, but why not use Enums for minimal risk of typos? It is easy to go from string to enum in cases where you might need it (aot_arm_compiler)
| # Default ISA per CPU follows the most common configuration each core is | ||
| # shipped with. M33/M35P optionally lack DSP, and M52/M55/M85 optionally | ||
| # lack MVE; callers can pass `isa=` explicitly to override. | ||
| _CPU_DEFAULT_ISA: dict[str, str] = { |
There was a problem hiding this comment.
cmsis_nn has a function 'resolve_backend' https://github.com/ARM-software/CMSIS-NN/blob/main/Source/Bindings/arm_py_backend.cpp that should be used for default Isa.
Erik-Lundell
left a comment
There was a problem hiding this comment.
Hi! I was actually just working on something like this, there are patches on the way for aot memory planning: #19505
The main differences that I would like to see are:
- Enums instead of literals
- Involve cmsis_nn, for example have a .backend function that can be passed to cmsis_nn's buffer size calculationen functions. See my PR.
- Possibly: An "any" option that could be overly pessimistic but always works.
Aligns CortexMTargetConfig with the design Erik proposes in pytorch#19505 while keeping the wider plumbing in place. The earlier CortexMCompileConfig is renamed to CortexMTargetConfig (and its file moved to target_config.py) to disambiguate from EdgeCompileConfig — this dataclass models a compilation *target*, not a step in the compile pipeline. Adopted from Erik's feedback: * CortexM enum replaces the Cpu/Isa Literals — typo-safe and IDE-friendly. * `.backend` property returns `cmsis_nn.Backend` directly, resolved via `cmsis_nn.resolve_backend(cmsis_nn.CortexM.<X>)`. The hand-rolled `_CPU_DEFAULT_ISA` dict is gone — cmsis_nn is the single source of truth for the CPU → backend mapping. * CortexMPass abstract base class added; CortexMPassManager.transform() uses signature inspection to inject both `exported_program` and `target_config` into passes that declare them (mirroring Erik's proposal). The pass manager also gains stricter validation — the exported_program must be a real ExportedProgram and the pass list must contain classes, not instances — failing fast instead of producing opaque errors deep in _transform. * cmsis_nn is now a hard dependency for the cortex_m tests: the top-level `import cmsis_nn` in test_target_config.py replaces the previous skipif-on-find_spec dance, addressing Erik's concern that skipping tests on missing deps can mask regressions. * `+int8` dropped from cortex-m target strings — quantization is a result of the export flow, not a CPU attribute. TARGETS, help text, from_target_string, CI script and README aligned. * Logging in `_to_edge_cortex_m` and the --delegate-ignored warning switched to f-strings. * `__init__` docstring on CortexMPassManager documents the exported_program / passes / target_config defaults (including the M55+MVE fallback that matches pre-config behaviour). * `import-not-found` removed from the cmsis_nn type-ignore — only `import-untyped` actually fires, and if cmsis_nn ever ships stubs the unused ignore will become a tripwire. Kept the optional `isa` override field for the optional-extension cases (M55 without MVE, M33 without DSP, etc.) — different from Erik's enum-only design, but the override remains useful for cores where ISA extensions are optional. A `_SUPPORTED_BACKENDS` table encodes the per-CPU architectural capability set so overrides validate at construction; forcing MVE on an M0 raises ValueError with the actual supported list. The SCALAR ⊂ DSP ⊂ MVE supersession reflects that an MVE-capable core also runs DSP and scalar code. Defers Erik's `ANY` proposal. In pytorch#19505 ANY falls back to MVE, but an honest "any cortex-m" choice would have to do worst-case scratch buffer planning across the ISA classes (which may not be MVE). Deferring until the scratch-buffer side lands and we can implement the worst-case analysis properly. Authored with Claude.
Thanks for the feedback and so sorry we wound up duplicating effort. I've incorporated most of the feedback, but one of the things I punted on was an "Any" option. I think that's a valid use case where I want to generate a PTE that is compatible with any cortex-m target, but I want to think through how we'll do the scratch buffer allocation for it. One approach would be to continue to use temp allocator those. Another would be to memory plan for the worst case scratch buffer allocation. We'd probably need to query all of the ISA backends and pick the worst one. |
CI's mypy env doesn't pip-install cmsis_nn (it's a native pybind11 module — type-checking-only envs typically skip such deps), so `import-not-found` fires there even though `import-untyped` is the one that fires locally. The combo `[import-not-found, import-untyped]` is honest about both states; dropping either half breaks one or the other environment. Authored with Claude.
Thanks! Looks good from my side now. I agree on the "any" option, we can save that for another time. |
The Zephyr `hello-executorch` README has `<!-- RUN -->` directives that the test-arm-backend-zephyr CI extracts and executes verbatim. The prior wording used `--target=cortex-m55+int8`, which the AOT compiler no longer accepts after the +int8 drop. Update both code blocks (and the surrounding prose) to use the new bare `--target=cortex-m55` spelling. Authored with Claude.
Summary
Introduce a CortexMCompileConfig dataclass (cpu + isa) that carries Cortex-M target information from the --target=cortex-m+int8 CLI string into CortexMPassManager. The full standard Cortex-M lineup is registered (M0, M0+, M3, M4, M7, M23, M33, M35P, M52, M55, M85), each with a sensible default ISA; the optional-DSP M33/M35P and optional-MVE M52/M55/M85 cases can be expressed via the isa= kwarg.
No pass reads the config yet, so this change is purely plumbing, but it positions both the upcoming AOT scratch-buffer sizing work (#16580) and the CMSIS-NN scalar (#17646) / DSP (#17644) backend support to plug in without re-plumbing the call site. Actually building for the new variants still requires
CortexMTester gains an optional config kwarg, and the Pico 2 MLP example now constructs CortexMPassManager with cpu='cortex-m33' to match the RP2350 hardware it targets.
Authored with Claude.
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell