Skip to content

Cortex-M: build for any Cortex-M variant against Corstone-300#19520

Draft
rascani wants to merge 1 commit into
pytorch:mainfrom
rascani:cortex-m-non-mve-corstone
Draft

Cortex-M: build for any Cortex-M variant against Corstone-300#19520
rascani wants to merge 1 commit into
pytorch:mainfrom
rascani:cortex-m-non-mve-corstone

Conversation

@rascani
Copy link
Copy Markdown
Contributor

@rascani rascani commented May 12, 2026

Summary

Extend the Cortex-M test pipeline so the cortex-m<variant>+int8 target strings registered in the AOT compile-config plumbing actually produce runnable, ISA-faithful binaries. The binary is built end-to-end with -mcpu=cortex-m<variant> — runner and core libraries alike — so CMSIS-NN's compile-time __ARM_FEATURE_DSP / __ARM_FEATURE_MVE selector exercises the matching kernel implementation. The Corstone-300 M55 simulator is an ISA superset of every earlier Cortex-M, so it executes binaries compiled for older cores without modification — the CI gate becomes "did the right CMSIS-NN code path execute correctly" rather than "did per-CPU silicon behave as expected".

The build pipeline learns the target CPU end-to-end:

  • build_executorch.sh accepts --target_cpu, passes -DTARGET_CPU to the toolchain CMake, and stages per-CPU artifacts in cmake-out-<cpu> so they don't clobber each other.
  • build_test_runner.sh derives target_cpu from --target (using the same cortex-m+int8 regex as build_executor_runner.sh) and forwards it.
  • build_executor_runner.sh derives the matching target_cpu, points ET_BUILD_DIR_PATH at cmake-out-<cpu>, passes -Dexecutorch_DIR explicitly so find_package doesn't silently fall back to a stale cmake-out if it exists, and supplies a dummy ETHOSU_TARGET_NPU_CONFIG=ethos-u55-128 so core_platform's ethosu_get_architecture() parser stays happy.

Without these changes, build_executorch.sh defaulted to -mcpu=cortex-m55, so the core libraries (libexecutorch.a, libcortex_m_kernels.a, the bundled CMSIS-NN) baked in M55+MVE code paths. A runner built with -mcpu=cortex-m4 would link those libraries and execute MVE instructions on Corstone-300's M55 — passing bundled-IO checks while testing the wrong code path. The explicit -Dexecutorch_DIR is needed because CMake's find_package(HINTS ...) is not authoritative — a leftover cmake-out/lib/cmake/ExecuTorch/ from an earlier build was being preferred over the per-CPU dir we actually asked for.

One transient patch is layered into the externally-fetched ethos-u/core_platform repo via the existing patch_repo mechanism: an #if defined(__ARM_ARCH_8M_MAIN__) || defined(__ARM_ARCH_8_1M_MAIN__) guard around the MPU init block in corstone-300/target.cpp. Without it, the Armv8-M-only ARM_MPU_RBAR / ARM_MPU_RLAR API breaks the build for older cores. The FVP doesn't enforce protection regions without an explicit setup, so simulation correctness is unaffected. The patch is a bridge — see TODO at corstone_utils.cmake:52 — pending upstream merge of the equivalent guard.

Inside our own runner, the optional Armv8.1-M PMU intrinsics (ARM_PMU_*) in arm_executor_runner.cpp and arm_perf_monitor.cpp are guarded on __ARM_ARCH_8_1M_MAIN__. Earlier cores get a zero cycle count rather than a compile error; functional correctness is unaffected. run_fvp.sh routes all cortex-m* targets except cortex-m85* to the Corstone-300 FVP.

Test Plan

Locally validated end-to-end on Corstone-300 with the qadd model:

  • cortex-m55+int8 — baseline, PASS; op_quantize_per_tensor.cpp.obj in cmake-out-cortex-m55 contains MVE intrinsics (vdup.16, vmax.s16).
  • cortex-m4+int8 — PASS; same object in cmake-out-cortex-m4 has no MVE — only single-precision FP (vmul.f32, vcvt.s32.f32). CMSIS-NN selects the DSP path (1275 DSP opcodes in libcmsis-nn.a).
  • cortex-m7+int8 — PASS; same shape as M4.

Scalar-class variants (cortex-m{0,0plus,3,23}+int8) still need a follow-up: an Armv6-M HardFault_Handler guard in target.cpp and a core_software/cmsis.cmake ARMCM0plus directory-case fix. The target_cpu plumbing here already accommodates soft-float ABI builds — the follow-up only adds those two additional __ARM_ARCH_* guards.

Authored with Claude.

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

Extend the Cortex-M test pipeline so the `cortex-m<variant>+int8` target
strings registered in the AOT compile-config plumbing actually produce
runnable, ISA-faithful binaries. The binary is built end-to-end with
`-mcpu=cortex-m<variant>` — runner and core libraries alike — so
CMSIS-NN's compile-time `__ARM_FEATURE_DSP` / `__ARM_FEATURE_MVE`
selector exercises the matching kernel implementation. The Corstone-300
M55 simulator is an ISA superset of every earlier Cortex-M, so it
executes binaries compiled for older cores without modification — the
CI gate becomes "did the right CMSIS-NN code path execute correctly"
rather than "did per-CPU silicon behave as expected".

The build pipeline learns the target CPU end-to-end:

* `build_executorch.sh` accepts `--target_cpu`, passes `-DTARGET_CPU`
  to the toolchain CMake, and stages per-CPU artifacts in
  `cmake-out-<cpu>` so they don't clobber each other.
* `build_test_runner.sh` derives `target_cpu` from `--target` (using
  the same cortex-m<X>+int8 regex as build_executor_runner.sh) and
  forwards it.
* `build_executor_runner.sh` derives the matching `target_cpu`, points
  ET_BUILD_DIR_PATH at `cmake-out-<cpu>`, passes `-Dexecutorch_DIR`
  explicitly so `find_package` doesn't silently fall back to a stale
  `cmake-out` if it exists, and supplies a dummy
  ETHOSU_TARGET_NPU_CONFIG=ethos-u55-128 so core_platform's
  ethosu_get_architecture() parser stays happy.

Without these changes, build_executorch.sh defaulted to
`-mcpu=cortex-m55`, so the core libraries (libexecutorch.a,
libcortex_m_kernels.a, the bundled CMSIS-NN) baked in M55+MVE code
paths. A runner built with `-mcpu=cortex-m4` would link those libraries
and execute MVE instructions on Corstone-300's M55 — passing
bundled-IO checks while testing the wrong code path. The explicit
`-Dexecutorch_DIR` is needed because CMake's `find_package(HINTS ...)`
is not authoritative — a leftover `cmake-out/lib/cmake/ExecuTorch/`
from an earlier build was being preferred over the per-CPU dir we
actually asked for.

One transient patch is layered into the externally-fetched
`ethos-u/core_platform` repo via the existing `patch_repo` mechanism:
an `#if defined(__ARM_ARCH_8M_MAIN__) || defined(__ARM_ARCH_8_1M_MAIN__)`
guard around the MPU init block in `corstone-300/target.cpp`. Without
it, the Armv8-M-only `ARM_MPU_RBAR` / `ARM_MPU_RLAR` API breaks the
build for older cores. The FVP doesn't enforce protection regions
without an explicit setup, so simulation correctness is unaffected.
The patch is a bridge — see TODO at `corstone_utils.cmake:52` —
pending upstream merge of the equivalent guard.

Inside our own runner, the optional Armv8.1-M PMU intrinsics
(`ARM_PMU_*`) in `arm_executor_runner.cpp` and `arm_perf_monitor.cpp`
are guarded on `__ARM_ARCH_8_1M_MAIN__`. Earlier cores get a zero
cycle count rather than a compile error; functional correctness is
unaffected. `run_fvp.sh` routes all `cortex-m*` targets except
`cortex-m85*` to the Corstone-300 FVP.

Locally validated end-to-end on Corstone-300 with the `qadd` model:

* `cortex-m55+int8` — baseline, PASS; op_quantize_per_tensor.cpp.obj
  in cmake-out-cortex-m55 contains MVE intrinsics (vdup.16, vmax.s16).
* `cortex-m4+int8`  — PASS; same object in cmake-out-cortex-m4 has no
  MVE — only single-precision FP (vmul.f32, vcvt.s32.f32). CMSIS-NN
  selects the DSP path (1275 DSP opcodes in libcmsis-nn.a).
* `cortex-m7+int8`  — PASS; same shape as M4.

Scalar-class variants (`cortex-m{0,0plus,3,23}+int8`) still need a
follow-up: an Armv6-M `HardFault_Handler` guard in `target.cpp` and a
`core_software/cmsis.cmake` `ARMCM0plus` directory-case fix. The
target_cpu plumbing here already accommodates soft-float ABI builds —
the follow-up only adds those two additional `__ARM_ARCH_*` guards.

Authored with Claude.
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 12, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19520

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures

As of commit 22f6080 with merge base 23a91d5 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 12, 2026
@github-actions github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels May 12, 2026
@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: arm Issues related to arm backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant