Cortex-M: build for any Cortex-M variant against Corstone-300#19520
Cortex-M: build for any Cortex-M variant against Corstone-300#19520rascani wants to merge 1 commit into
Conversation
Extend the Cortex-M test pipeline so the `cortex-m<variant>+int8` target
strings registered in the AOT compile-config plumbing actually produce
runnable, ISA-faithful binaries. The binary is built end-to-end with
`-mcpu=cortex-m<variant>` — runner and core libraries alike — so
CMSIS-NN's compile-time `__ARM_FEATURE_DSP` / `__ARM_FEATURE_MVE`
selector exercises the matching kernel implementation. The Corstone-300
M55 simulator is an ISA superset of every earlier Cortex-M, so it
executes binaries compiled for older cores without modification — the
CI gate becomes "did the right CMSIS-NN code path execute correctly"
rather than "did per-CPU silicon behave as expected".
The build pipeline learns the target CPU end-to-end:
* `build_executorch.sh` accepts `--target_cpu`, passes `-DTARGET_CPU`
to the toolchain CMake, and stages per-CPU artifacts in
`cmake-out-<cpu>` so they don't clobber each other.
* `build_test_runner.sh` derives `target_cpu` from `--target` (using
the same cortex-m<X>+int8 regex as build_executor_runner.sh) and
forwards it.
* `build_executor_runner.sh` derives the matching `target_cpu`, points
ET_BUILD_DIR_PATH at `cmake-out-<cpu>`, passes `-Dexecutorch_DIR`
explicitly so `find_package` doesn't silently fall back to a stale
`cmake-out` if it exists, and supplies a dummy
ETHOSU_TARGET_NPU_CONFIG=ethos-u55-128 so core_platform's
ethosu_get_architecture() parser stays happy.
Without these changes, build_executorch.sh defaulted to
`-mcpu=cortex-m55`, so the core libraries (libexecutorch.a,
libcortex_m_kernels.a, the bundled CMSIS-NN) baked in M55+MVE code
paths. A runner built with `-mcpu=cortex-m4` would link those libraries
and execute MVE instructions on Corstone-300's M55 — passing
bundled-IO checks while testing the wrong code path. The explicit
`-Dexecutorch_DIR` is needed because CMake's `find_package(HINTS ...)`
is not authoritative — a leftover `cmake-out/lib/cmake/ExecuTorch/`
from an earlier build was being preferred over the per-CPU dir we
actually asked for.
One transient patch is layered into the externally-fetched
`ethos-u/core_platform` repo via the existing `patch_repo` mechanism:
an `#if defined(__ARM_ARCH_8M_MAIN__) || defined(__ARM_ARCH_8_1M_MAIN__)`
guard around the MPU init block in `corstone-300/target.cpp`. Without
it, the Armv8-M-only `ARM_MPU_RBAR` / `ARM_MPU_RLAR` API breaks the
build for older cores. The FVP doesn't enforce protection regions
without an explicit setup, so simulation correctness is unaffected.
The patch is a bridge — see TODO at `corstone_utils.cmake:52` —
pending upstream merge of the equivalent guard.
Inside our own runner, the optional Armv8.1-M PMU intrinsics
(`ARM_PMU_*`) in `arm_executor_runner.cpp` and `arm_perf_monitor.cpp`
are guarded on `__ARM_ARCH_8_1M_MAIN__`. Earlier cores get a zero
cycle count rather than a compile error; functional correctness is
unaffected. `run_fvp.sh` routes all `cortex-m*` targets except
`cortex-m85*` to the Corstone-300 FVP.
Locally validated end-to-end on Corstone-300 with the `qadd` model:
* `cortex-m55+int8` — baseline, PASS; op_quantize_per_tensor.cpp.obj
in cmake-out-cortex-m55 contains MVE intrinsics (vdup.16, vmax.s16).
* `cortex-m4+int8` — PASS; same object in cmake-out-cortex-m4 has no
MVE — only single-precision FP (vmul.f32, vcvt.s32.f32). CMSIS-NN
selects the DSP path (1275 DSP opcodes in libcmsis-nn.a).
* `cortex-m7+int8` — PASS; same shape as M4.
Scalar-class variants (`cortex-m{0,0plus,3,23}+int8`) still need a
follow-up: an Armv6-M `HardFault_Handler` guard in `target.cpp` and a
`core_software/cmsis.cmake` `ARMCM0plus` directory-case fix. The
target_cpu plumbing here already accommodates soft-float ABI builds —
the follow-up only adds those two additional `__ARM_ARCH_*` guards.
Authored with Claude.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19520
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 New FailuresAs of commit 22f6080 with merge base 23a91d5 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Summary
Extend the Cortex-M test pipeline so the
cortex-m<variant>+int8target strings registered in the AOT compile-config plumbing actually produce runnable, ISA-faithful binaries. The binary is built end-to-end with-mcpu=cortex-m<variant>— runner and core libraries alike — so CMSIS-NN's compile-time__ARM_FEATURE_DSP/__ARM_FEATURE_MVEselector exercises the matching kernel implementation. The Corstone-300 M55 simulator is an ISA superset of every earlier Cortex-M, so it executes binaries compiled for older cores without modification — the CI gate becomes "did the right CMSIS-NN code path execute correctly" rather than "did per-CPU silicon behave as expected".The build pipeline learns the target CPU end-to-end:
build_executorch.shaccepts--target_cpu, passes-DTARGET_CPUto the toolchain CMake, and stages per-CPU artifacts incmake-out-<cpu>so they don't clobber each other.build_test_runner.shderivestarget_cpufrom--target(using the same cortex-m+int8 regex as build_executor_runner.sh) and forwards it.build_executor_runner.shderives the matchingtarget_cpu, points ET_BUILD_DIR_PATH atcmake-out-<cpu>, passes-Dexecutorch_DIRexplicitly sofind_packagedoesn't silently fall back to a stalecmake-outif it exists, and supplies a dummy ETHOSU_TARGET_NPU_CONFIG=ethos-u55-128 so core_platform's ethosu_get_architecture() parser stays happy.Without these changes, build_executorch.sh defaulted to
-mcpu=cortex-m55, so the core libraries (libexecutorch.a, libcortex_m_kernels.a, the bundled CMSIS-NN) baked in M55+MVE code paths. A runner built with-mcpu=cortex-m4would link those libraries and execute MVE instructions on Corstone-300's M55 — passing bundled-IO checks while testing the wrong code path. The explicit-Dexecutorch_DIRis needed because CMake'sfind_package(HINTS ...)is not authoritative — a leftovercmake-out/lib/cmake/ExecuTorch/from an earlier build was being preferred over the per-CPU dir we actually asked for.One transient patch is layered into the externally-fetched
ethos-u/core_platformrepo via the existingpatch_repomechanism: an#if defined(__ARM_ARCH_8M_MAIN__) || defined(__ARM_ARCH_8_1M_MAIN__)guard around the MPU init block incorstone-300/target.cpp. Without it, the Armv8-M-onlyARM_MPU_RBAR/ARM_MPU_RLARAPI breaks the build for older cores. The FVP doesn't enforce protection regions without an explicit setup, so simulation correctness is unaffected. The patch is a bridge — see TODO atcorstone_utils.cmake:52— pending upstream merge of the equivalent guard.Inside our own runner, the optional Armv8.1-M PMU intrinsics (
ARM_PMU_*) inarm_executor_runner.cppandarm_perf_monitor.cppare guarded on__ARM_ARCH_8_1M_MAIN__. Earlier cores get a zero cycle count rather than a compile error; functional correctness is unaffected.run_fvp.shroutes allcortex-m*targets exceptcortex-m85*to the Corstone-300 FVP.Test Plan
Locally validated end-to-end on Corstone-300 with the
qaddmodel:cortex-m55+int8— baseline, PASS; op_quantize_per_tensor.cpp.obj in cmake-out-cortex-m55 contains MVE intrinsics (vdup.16, vmax.s16).cortex-m4+int8— PASS; same object in cmake-out-cortex-m4 has no MVE — only single-precision FP (vmul.f32, vcvt.s32.f32). CMSIS-NN selects the DSP path (1275 DSP opcodes in libcmsis-nn.a).cortex-m7+int8— PASS; same shape as M4.Scalar-class variants (
cortex-m{0,0plus,3,23}+int8) still need a follow-up: an Armv6-MHardFault_Handlerguard intarget.cppand acore_software/cmsis.cmakeARMCM0plusdirectory-case fix. The target_cpu plumbing here already accommodates soft-float ABI builds — the follow-up only adds those two additional__ARM_ARCH_*guards.Authored with Claude.
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell