Add baremetal RISC-V smoke tests (rv32, rv64) by luhenry · Pull Request #4 · riseproject-dev/executorch

luhenry · 2026-05-23T16:40:00Z

Summary

Add baremetal RISC-V testing on CI for rv32 and rv64.

Test plan

It's only testing on CI, no new code really, so CI is the testing.

Will submit to https://github.com/pytorch/executorch once pytorch#19741 is merged

Differential Revision: D105973185 Pull Request resolved: pytorch#19736

@digantdesai

Add model tests of currently not supported models - yolo11 - wav2letter - silero_vad cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani Signed-off-by: Adrian Lundell <adrian.lundell@arm.com>

Differential Revision: D102880053 Pull Request resolved: pytorch#19211

Differential Revision: D106123930 Pull Request resolved: pytorch#19742

pytorch#19746) pytorch#18476 clone version due to bot crash

…ackend (pytorch#19747) clone pytorch#18477 due to bot crash

clone pytorch#18728 due to bot crash

Differential Revision: D106162684 Pull Request resolved: pytorch#19749

@robert-kalmar

### Summary Add tests verifying correct support for add.tensor by the Neutron backend using the new Neutron MLIR flow. ### Test plan Unit tests provided. cc @robert-kalmar

…#19752) Differential Revision: D106254596 Pull Request resolved: pytorch#19752

Treat BUCK and TARGETS files as build metadata in the Arm pre-push license check so they do not need copyright headers. Signed-off-by: Per Held <per.held@arm.com> Change-Id: I4b3bbd1e03ba4b9c38fd06225156344985f0cc70

@robert-kalmar

### Summary Add tests verifying correct support for sub.tensor by the Neutron backend using the new Neutron MLIR flow. ### Test plan Unit tests provided. cc @robert-kalmar @JakeStevens @digantdesai @rascani

…opy (pytorch#19751) Follow-up to pytorch#17097, which added BF16 support to the TOSA GATHER op. `aten.index_select` and `aten.unfold_copy` both lower via TOSA GATHER but their support checks were not updated at the time. In both decompositions(`DecomposeIndexSelectToGatherPass()` and `DecomposeUnfoldToGatherPass()`), the bf16 values tensor flows through dtype-agnostic reshape ops and `tosa.GATHER`, which accepts `BF16`. The support check was the only blocker. | Op | bf16 before | bf16 after | |---------------------|:-----------:|:----------:| | `aten.gather` | ✅ | ✅ | | `aten.index.Tensor` | ✅ | ✅ | | `aten.slice_copy` | ✅ | ✅ | | `aten.index_select` | ❌ | ✅ | | `aten.unfold_copy` | ❌ | ✅ | Changes: - `index_select_support.py`, `unfold_copy_support.py`: extend float branch to include `bfloat16`; add bf16 extension guard; update rejection message. - `test_index_select.py`, `test_unfold_copy.py`: add isolated `_tosa_FP_bf16` test functions using `TosaPipelineFP(..., tosa_extensions=["bf16"])`. ### Test plan `test_index_select_tosa_FP_bf16` and `test_unfold_copy_tosa_FP_bf16` exercise the bf16 path end-to-end through `TosaPipelineFP` with the bf16 extension enabled, following the same pattern of the existing `test_slice_tensor_tosa_FP_bf16` from pytorch#17492

@psiddh

This is done for conv, depthwise conv, transpose conv, and bmm. Add scratch tensors to the operator signatures, which are then assigned exir.memory.alloc. These allocs are automatically memory planned by ExecuTorch. Introduce `required_cmsis_buffer_size`which computes the buffer size from node properties + the Cortex-M configuration. The function uses functions registered by target in backends/cortex_m/passes/scratch_buffer_sizes.py This is used to set the size of the allocs in ConvertToCortexMPass Finally, modify the kernels to use the new scratch tensor instead of allocating temporary memory. Add a new macro CORTEX_M_ENABLE_RUNTIME_CHECKS to do a safety check that the aot computed buffer size is equal to the buffer size computed at runtime. Use this when testing. cc @psiddh @AdrianLundell @digantdesai @rascani @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell --------- Signed-off-by: Erik Lundell <erik.lundell@arm.com> Co-authored-by: Måns Nilsson <mans.nilsson@arm.com>

@cccclai

…es (pytorch#19146) ### Summary To enable GPU backend support in the Llama runner, refactoring is required because the dtypes of kv_cache, attention_mask, and logits are currently hardcoded, preventing floating‑point models from running. This PR focuses on removing the hardcode dtype for them. #### Key changes - Remove template parameter <typename T> from KVManager, LhdTokenGenerator, MultimodalPromptProcessor, and related runner classes - Detect kv_cache and attention_mask dtypes dynamically from MethodMeta at construction time instead of compile-time bitwidth detection - Switch to std::byte* pointer arithmetic with getDtypeSize() for all buffer offsets; add fill_mask() helper for multi-dtype attention mask filling - Update spec_prop pass for custom llama op for sharding case greater than 1 ### Test plan ``` python backends/qualcomm/tests/test_qnn_delegate.py -k TestExampleLLMScript.test_llama_stories_110m --model SM8650 --build_folder /local/mnt/workspace/chenweng/executorch/executorch/build-android --device acfa9311 --executorch_root . --artifact_dir ./stories_110m_pte_size --llama_artifacts . --use_fp16 ``` <img width="1977" height="468" alt="image" src="https://github.com/user-attachments/assets/8bf3bffa-9b9f-4655-9cbc-b20127c2468a" /> cc @cccclai @cbilgin @abhinaykukkadapu

Summary: Pull Request resolved: pytorch#19764 Reviewed By: kirklandsign Differential Revision: D106332819

@digantdesai

As documented at https://vkdoc.net/man/VkDataGraphPipelineSessionBindPointRequirementARM .stype of VkDataGraphPipelineSessionBindPointRequirementARM should alway be set to VK_STRUCTURE_TYPE_DATA_GRAPH_PIPELINE_SESSION_BIND_POINT_REQUIREMENT_ARM cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani Signed-off-by: Erik Lundell <erik.lundell@arm.com>

Enable CPPCHECK for Cortex-M sources and headers. The Cortex-M kernels are registered through generated wrappers, so cppcheck cannot see direct call sites for the exported *_out entry points and reports them as unused. Keep narrow unusedFunction suppressions for those registration-visible functions. The scratch buffer context header is linted as a standalone header but currently exposes helper API without in-tree call sites, so suppress unusedFunction at file scope there instead of dropping Cortex-M header coverage. Keep the quantize and dequantize context parameters non-const to match the generated kernel ABI; changing them to const changes the mangled symbols used by registration. Signed-off-by: Per Held <per.held@arm.com> Change-Id: I3bcb6e5d3f125ae400005d1b033b24a07eb7924f

### Summary It relates to pytorch#18833. It doesn't add Yolo on baremetal, but it at least makes sure that it works using Portable Kernels and XNNPACK backends. ### Test plan It's only adding a model to CI, so the CI is the test plan.

Convert BenchmarkActivity, BenchmarkMetric, LlmBenchmark, LlmModelRunner, and ModelRunner from Java to Kotlin. Differential Revision: D106195816

@digantdesai

…rch#19731) ### Summary Extend the Cortex-M cross-CPU build pipeline to Armv6-M by patching two upstream issues that block the Corstone-300 target source and the CMSIS Cortex DFP from building for `cortex-m0plus`: * `core_platform/0003-*.patch` guards the `HardFault_Handler` in `targets/corstone-300/target.cpp`. The handler uses an `ite eq` IT-block in inline asm and dereferences the SCB CFSR/BFAR/MMFAR fault-status registers; both are Armv7-M / Armv8-M Mainline only. The patch wraps the rich handler in `__ARM_ARCH_7M__ / 7EM / 8M_MAIN / 8_1M_MAIN` and falls back to a minimal stub on Armv6-M / Armv8-M Baseline (M0/M0+/M23). * `core_software/0002-*.patch` fixes `cmsis.cmake`'s handling of the M0+ device. The Cortex DFP names the device directory and headers `ARMCM0plus` (lowercase suffix), while the device sources (`startup_ARMCM0plus.c`, `system_ARMCM0plus.c`) gate their implementations on the `ARMCM0P` preprocessor macro — three different spellings. The previous `string(TOUPPER ...)` produced `ARMCM0PLUS`: the include path lookup failed and the source files hit their `#error device not specified!` guard. Override `ARM_CPU` to `ARMCM0plus` for the directory + filename and introduce a separate `CMSIS_DEVICE_CPU_DEFINE` set to `ARMCM0P` for the cmsis_startup and cmsis_system compile-definitions; all other cores still drive both paths from the uppercased default. Both patches are layered via the existing `patch_repo` mechanism; the `corstone_utils.cmake` TODO is updated so the deletion plan for 0002 and 0003 is documented together. ### Test Plan Locally validated end-to-end on the Corstone-300 FVP with the `qadd` model: `cortex-m0plus` build links a runner that includes `startup_ARMCM0plus.c` / `system_ARMCM0plus.c` and the patched `target.cpp`, and the FVP run prints `TEST: BundleIO index[0] Test_result: PASS` with all error stats zero. The bundled `libcmsis-nn.a` reports `Tag_CPU_arch: v6S-M` and `Tag_THUMB_ISA_use: Thumb-1` with zero DSP / MVE / saturating instructions, confirming the scalar code path was exercised. Authored with Claude. cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell

Differential Revision: D106026285 Pull Request resolved: pytorch#19734

Differential Revision: D106394605 Pull Request resolved: pytorch#19775

@robert-kalmar

pytorch#19772) … Registration ### Summary Docs improvement. ### Test plan Docs only. cc @robert-kalmar @JakeStevens @digantdesai @rascani

@digantdesai

Re-upload with BUCK changes. Share TOSA RESIZE parameter validation between upsample support checks and fake RESIZE lowering so invalid nearest and bilinear resize parameters are rejected before delegation. Change-Id: I57c267aca96d733879ae90329267e44adce399c6 cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani Signed-off-by: Per Held <per.held@arm.com>

Differential Revision: D106408368 Pull Request resolved: pytorch#19783

### Summary In pytorch#19651, I added a global seed for pytest runs. This was intended to reduce random tolerance flakes, but didn't actually do so in practice. This is because the parallel test runners don't guarantee any ordering, so random state is unstable between runs. I've updated it to set the seed per-test. This should hopefully make the random state invariant of test execution order.

@digantdesai

…h#19839) Add ArmPass.should_run_pass() as a reusable early-exit hook before call() starts the normal ExportPass retracing path. The default hook returns true, preserving existing behavior for ArmPass subclasses. Introduce ArmOpTargetedPass for passes that only transform a known set of operator targets. It implements should_run_pass() by scanning the current graph and nested GraphModules for matching target operators. If no matching target operator is found, the pass returns an unmodified PassResult. For passes that already gate transformations with allowed_to_transform(), allow the target pre-scan to apply the same check before deciding whether the pass needs to run. This avoids running TFA passes when all matching target nodes are marked as disallowed. The should_run_pass() hook and ArmOpTargetedPass pre-scan avoid rebuilding graphs for decomposition and rewrite passes that cannot affect the current graph. The speedup is most visible on large models. Single-run paired benchmarks on Arm backend model tests across FP32, INT, VGF no-quant, and VGF quant variants: | Model | E2E avg | Pass-manager avg | |-------------|--------:|-----------------:| | T5-small | +30.5% | +47.5% | | DeepLabV3 | +12.9% | +49.8% | | Wav2Letter | +16.9% | +51.2% | | InceptionV3 | +22.2% | +46.5% | | MobileNetV2 | +22.2% | +52.5% | | MobileNetV3 | +29.9% | +54.6% | Model rows are unweighted averages over successful variants. Unweighted average across 23 successful model/target variants: E2E speedup: +22.4% Pass-manager speedup: +50.5% Change-Id: Iaa09638473a1d6d1e2ce98f5a0e3fc3a14378143 cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 @mansnils @Sebastian-Larsson @robell @rascani Signed-off-by: Yufeng Shi <yufeng.shi@arm.com> Co-authored-by: Erik Lundell <erik.lundell@arm.com>

- Export & lower the smollm2 via extensions/llm/export_llm - Build the arm_executor_runner application - Fix the propagation of select_ops_list in the CMakeLists.txt - Test the application runs on FVP in fast mode Signed-off-by: George Gekov <george.gekov@arm.com> Change-Id: I8acd87c2f5c3e6b5b189bb987ceccfe4877e2254

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Summary: Currently, __builtin_FUNCTION is used opportunistically if it exists. However, for heavily templated code, this results in extremely long string which adds .rodata which can be wasteful on embedded targets. This commit adds an override which uses the shorter __FUNCTION__ even if __bultin_FUNCTION exists and exposes as a BUCK constraint. Integration into CMake intentially left out for now. Differential Revision: D106668077

…ytorch#19834) Summary: The current approach use __FILE__ and opportunistically trims it if the utility is available. However, the long name is still stored in .rodata This can contribute some memory on embedded platforms. Instead, first try __FILE_NAME__ Differential Revision: D106587633

Summary: ghstack 0.15.0 changed the header URL in PR bodies from `Stack from [ghstack](https://github.com/ezyang/ghstack)` to `Stack from [ghstack](https://github.com/ezyang/ghstack/tree/0.15.0)`. The exact string match in `propose_ghstack_orig_pr.py` no longer matched, causing every ghstack_land workflow run to fail since May 14. Use `startswith("Stack from [ghstack]")` instead to be resilient to URL changes. Test Plan: Verified the new pattern matches both the old format (`https://github.com/ezyang/ghstack`) and the new format (`https://github.com/ezyang/ghstack/tree/0.15.0`). This PR was authored with the help of Claude. Reviewers:

Pull Request resolved: pytorch#19867 Some environments preserve stale failure state when tests are reported through unittest skip results. This switches currently disabled Vulkan delegate coverage to a local decorator so those tests stay discoverable, log their disabled reason, and produce an executed result. ghstack-source-id: 387629544 @exported-using-ghexport Differential Revision: [D106732141](https://our.internmc.facebook.com/intern/diff/D106732141/)

Applies the same disabled-test treatment as the prior diffs in this stack to the devtools inspector tests. Some test runners preserve stale failure state when tests report through unittest skip results, so this replaces the conditionally disabled coverage with a local decorator that keeps the tests discoverable, logs their disabled reason, and produces an executed result. Adds a disable_if decorator that mirrors unittest.skipIf (evaluating the condition at decoration time) and converts the three Windows-gated test cases to use it. Differential Revision: [D106736354](https://our.internmc.facebook.com/intern/diff/D106736354/) ghstack-source-id: 387629542 Pull-Request: pytorch#19874

Summary: AOTI tests (llama3_2_vision and select extension/llm tests) hang indefinitely on macOS CI runners after the PyTorch 2.12 pin update. The hang is in native C/C++ code (inductor compilation / dlopen), which prevents faulthandler from producing a traceback. Diagnosis is ongoing in pytorch#19886. Skip the affected tests and bump the macOS job timeout from the default 90 to 120 minutes to add margin (observed completion at ~79 min with skips applied). Co-Authored-By: Claude <noreply@anthropic.com>

Differential Revision: D106710218 Pull Request resolved: pytorch#19860

Differential Revision: D105728156 Pull Request resolved: pytorch#19726

Add TurboQuant TQ4 KV cache to the MLX backend, exposed on gemma4_31b via --turboquant. Compresses full-attention KV cache from bf16 to a 4-bit codebook + per-vector norms, letting Gemma 4 31B-IT scale to very long contexts. Sliding-window layers are unchanged. What's in the PR New cache subclass: - backends/mlx/llm/turboquant_cache.py: MLXTurboQuantKVCache, a drop-in subclass of TurboQuantKVCache. Three custom ops + Metal kernels: - mlx::tq4_compress (model_ops/tq4_compress.py): bucketize + cast(uint8) + nibble-pack in one kernel. - mlx::tq_norm (model_ops/tq_norm.py): L2 norm with simd_sum cross-lane reduction in fp32 registers; bf16 in / bf16 out. - mlx::tq_dequant (model_ops/tq_dequant.py): unpack + centroid gather + multiply-by-norm in one kernel. Per-op tests: - test_tq4_compress.py, test_tq_norm.py, test_tq_dequant.py Wiring: - examples/models/gemma4_31b/mlx_source_transformations.py: - examples/models/gemma4_31b/export.py: --turboquant CLI flag - examples/models/gemma4_31b/README.md: TurboQuant subsection. Perf on M4 Max 64GB Ram: ``` 2K prompt: bf16 cache: prefill 189.7 tok/s, decode 17.4 tok/s TurboQuant cache: prefill 187.7 tok/s, decode 16.9 tok/s 8K prompt: bf16 cache: prefill 170.0 tok/s, decode 17.1 tok/s TurboQuant cache: prefill 166.0 tok/s, decode 11.9 tok/s ``` For TQ, max context length is set to 64K. On bf16 cache, max context length is 10K. TODO: why does decode slow more for TQ than bf16?

Summary: Add `fuse()` implementations to the remaining Cadence `QuantizationPattern` subclasses: - `MaxPool2dPattern`, `MaxPool2dWithoutIndicesPattern` — order-preserving pool on quantized values - `ReluBasePattern` (inherited by `ReluPattern0`/`1`) — relu with requantization - `ConvReluBasePattern` (inherited by `Conv1d`/`2dReluPattern0`/`1`) — conv+relu fusion with `anchor_ops()` override to match only the conv op - `SoftmaxPattern` — softmax with dummy mask/pos tensors and fake_mode metadata - `MixedW8A32LinearPattern` — weight-only quantized linear (no input/output quant) - `MixedW8A32ConvPattern` — weight-only quantized conv1d with NCL→NLC permutation - `MixedW8A32GruPattern` — weight-only quantized GRU with 4 dequantized params Reviewed By: DrJessop Differential Revision: D105728177

…19728) Summary: Both and Cadence now use the shared `QuantFusionPass` from `compiler_funcs.py`. - `QuantFusionPass` in `compiler_funcs.py` iterates patterns, matches `anchor_ops()`, calls `fuse()` on each match, with debug logging and dead code elimination - Cadence: `compiler.py` now uses `QuantFusionPass` instead of the old `QuantFusion` isinstance switch - Removed Cadence `compiler` target's dep on `:fusion_pass` (no longer imported) Reviewed By: DrJessop Differential Revision: D105728219

Differential Revision: D106957459 Pull Request resolved: pytorch#19903

Add the possibility to convert torch.nn.Linear modules to MXFP format. The feature works by replacing all torch.nn.Linear submodules inside a graph by a custom implemented MXFP counterpart: `MXFPLinearOp`. A new user API called `to_mxfp` has been added to enable this feature (located in backends/arm/ao_ext/mxfp.py). The API is tagged as experimental for now. An eager CPU and fake implementation is added to the new custom op, but lowering it TOSA is handled in a later patch. To summarize, this patch enables the following flow: ```python m = MyModule() to_mxfp(m, MXFPOpConfig()) m.forward(x) ``` Signed-off-by: Martin Lindström <Martin.Lindstroem@arm.com> Co-authored-by: Sebastian Larsson <sebastian.larsson@arm.com>

@robert-kalmar

### Summary Enables to test Neutron delegate with int data created by quantization of generated float data and removed input and output quantization nodes. Turns model to int variant. ### Test plan Tests provided. cc @robert-kalmar

@robert-kalmar

…h#19803) ### Summary Added support for `aten.slice` using new Neutron flow. ### Test plan tests can be manually run using `pytest -c /dev/null backends/nxp/tests/` cc @robert-kalmar @JakeStevens @digantdesai @rascani @MartinPavella @roman-janik-nxp @jirioc @irtrukhina @StrycekSimon

…19890) ### Summary cppcheck's unusedFunction is a whole-program check, but lintrunner analyzes files individually. Functions defined in headers are used by the .cpp files that include them, but cppcheck only sees the header in isolation and falsely reports them as never used. Suppress the check for .h/.hpp files while keeping it active for .cpp. Authored with assistance from Claude.

### Summary Add a docker build image based on Ubuntu 26.04 with gcc 15. It's necessary for the the baremetal on RISC-V use case since `libstdc++-riscv64-unknown-elf-picolibc` is only available starting Ubuntu 26.04. It also makes sure that `gcc-riscv64-unknown-elf` is at least gcc 14+ which has support for RVV ### Test plan It will be used by the baremetal testing on RISC-V. Relates to pytorch#18991 pytorch#19666

Cross-compiles with riscv64-unknown-elf + picolibc, embeds the .bpte into the ELF, and runs under qemu-system-riscv{32,64} -machine virt with semihosting carrying stdout and exit status. Same bundled-IO PASS criterion as the existing linux runs.

sentencepiece fails to compile on GCC 15 due to missing #include <cstdint>

metascroy and others added 7 commits May 22, 2026 19:20

Fix 2 broken tests caused by D105910457

a83e7c4

Differential Revision: D105973185 Pull Request resolved: pytorch#19736

Convert Android LLM extension from Java to Kotlin (pytorch#19211)

158c5d8

Differential Revision: D102880053 Pull Request resolved: pytorch#19211

Globally serialize XNNPACK execution, add logging (pytorch#19742)

6bda6c4

Differential Revision: D106123930 Pull Request resolved: pytorch#19742

[ET Device Support] Module: allocate device memory for planned buffers (

12f62f2

pytorch#19746) pytorch#18476 clone version due to bot crash

[ET Device Support] CudaAllocator: device memory allocator for CUDA b…

c27cc5d

…ackend (pytorch#19747) clone pytorch#18477 due to bot crash

[ET Device Support] Define AOT device copy ops registry (pytorch#19748)

7d8063f

clone pytorch#18728 due to bot crash

This was referenced May 23, 2026

[discussion] Upstreaming an HPMicro bare-metal RISC-V MCU backend pytorch/executorch#19666

Open

Export YOLO to executorch for RISC-V Baremetal environment pytorch/executorch#18833

Open

kirklandsign and others added 11 commits May 23, 2026 18:50

Add extension_llm_runner to CMake deps (pytorch#19749)

d757776

Differential Revision: D106162684 Pull Request resolved: pytorch#19749

NXP backend: Enable Add Tensor with new Neutron flow (pytorch#19550)

b69cbcd

### Summary Add tests verifying correct support for add.tensor by the Neutron backend using the new Neutron MLIR flow. ### Test plan Unit tests provided. cc @robert-kalmar

Back out "Globally serialize XNNPACK execution, add logging" (pytorch…

ba6074c

…#19752) Differential Revision: D106254596 Pull Request resolved: pytorch#19752

Arm backend: Exclude build metadata from license checks

ee4c90a

Treat BUCK and TARGETS files as build metadata in the Arm pre-push license check so they do not need copyright headers. Signed-off-by: Per Held <per.held@arm.com> Change-Id: I4b3bbd1e03ba4b9c38fd06225156344985f0cc70

NXP backend: Enable Sub Tensor with new Neutron flow (pytorch#19588)

b73df0b

### Summary Add tests verifying correct support for sub.tensor by the Neutron backend using the new Neutron MLIR flow. ### Test plan Unit tests provided. cc @robert-kalmar @JakeStevens @digantdesai @rascani

add cuda allocator to cmake target (pytorch#19764) (pytorch#19764)

75fb249

Summary: Pull Request resolved: pytorch#19764 Reviewed By: kirklandsign Differential Revision: D106332819

luhenry mentioned this pull request May 26, 2026

Add Yolo26 to matrix of tested models on RISC-V pytorch/executorch#19741

Merged

luhenry and others added 9 commits May 26, 2026 09:14

Convert minibench Java files to Kotlin (pytorch#19760)

6128a45

Convert BenchmarkActivity, BenchmarkMetric, LlmBenchmark, LlmModelRunner, and ModelRunner from Java to Kotlin. Differential Revision: D106195816

Harden against concurrency violations (pytorch#19734) (pytorch#19734)

fb3f6eb

Differential Revision: D106026285 Pull Request resolved: pytorch#19734

Convert Experimental, DType, MethodMetadata from Java to Kotlin

50ee05e

Differential Revision: D106394605 Pull Request resolved: pytorch#19775

NXP backend: Improve docs for NXP eIQ Neutron Kernel Selective Kernel… (

5d36c7c

pytorch#19772) … Registration ### Summary Docs improvement. ### Test plan Docs only. cc @robert-kalmar @JakeStevens @digantdesai @rascani

Fix cortex_m test failures from D106339880

29c3a23

Differential Revision: D106408368 Pull Request resolved: pytorch#19783

YufengShi-dudu and others added 20 commits May 29, 2026 10:05

Change python to python3 in shell script

b0441b5

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Use uint64_t for FlatTensor segment end

29c18de

Differential Revision: D106710218 Pull Request resolved: pytorch#19860

Add fuse() to QuantizationPatterns (pytorch#19726)

0e6b67e

Differential Revision: D105728156 Pull Request resolved: pytorch#19726

Remove over-strict softmax mask divisibility assert

2af5a13

Differential Revision: D106957459 Pull Request resolved: pytorch#19903

luhenry force-pushed the riscv-testing-baremetal branch from 6661a84 to 7cc42fe Compare June 1, 2026 16:42

luhenry force-pushed the riscv-testing-baremetal branch from 7cc42fe to 00d0173 Compare June 1, 2026 16:59

luhenry added 7 commits June 1, 2026 21:39

Fix based on Claude's review

0df077d

Fix qemu-riscv64-static live check

cfd9b52

Use GCC 14 for host compiler as well

66edf4e

sentencepiece fails to compile on GCC 15 due to missing #include <cstdint>

Fix unecessary change

ba2281e

Add testing on RVV on Portable Backend

89fdf66

Add rvv128, rvv256, and rvv512 testing in test-matrix.sh

7dc53a1

Run all models with quantization (except excluded)

4b616c0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add baremetal RISC-V smoke tests (rv32, rv64)#4

Add baremetal RISC-V smoke tests (rv32, rv64)#4
luhenry wants to merge 103 commits into
riscv-testing-modelsfrom
riscv-testing-baremetal

luhenry commented May 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

luhenry commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

luhenry commented May 23, 2026 •

edited

Loading