Add alpha support for 9.7 data processing intrinsics by rockdreamer · Pull Request #428 · ARM-software/acle

rockdreamer · 2026-01-30T13:24:06Z

name: Add alpha support for 9.7 data processing intrinsics
about: feature proposal.

This change adds intrinsics for the following architectural features:

FEAT_F16F32DOT
FEAT_F16F32MM
FEAT_F16MM
FEAT_SVE_B16MM
FEAT_SVE2p3
FEAT_SME2p3

Checklist: (mark with X those which apply)

If an issue reporting the bug exists, I have mentioned it in the
PR (do not bother creating the issue if all you want to do is
fixing the bug yourself).
I have added/updated the SPDX-FileCopyrightText lines on top
of any file I have edited. Format is SPDX-FileCopyrightText: Copyright {year} {entity or name} <{contact informations}>
(Please update existing copyright lines if applicable. You can
specify year ranges with hyphen , as in 2017-2019, and use
commas to separate gaps, as in 2018-2020, 2022).
I have updated the Copyright section of the sources of the
specification I have edited (this will show up in the text
rendered in the PDF and other output format supported). The
format is the same described in the previous item.
I have run the CI scripts (if applicable, as they might be
tricky to set up on non-*nix machines). The sequence can be
found in the contribution
guidelines. Don't
worry if you cannot run these scripts on your machine, your
patch will be automatically checked in the Actions of the pull
request.
I have added an item that describes the changes I have
introduced in this PR in the section Changes for next
release of the section Change Control/Document history
of the document. Create Changes for next release if it does
not exist. Notice that changes that are not modifying the
content and rendering of the specifications (both HTML and PDF)
do not need to be listed.
When modifying content and/or its rendering, I have checked the
correctness of the result in the PDF output (please refer to the
instructions on how to build the PDFs
locally).
The variable draftversion is set to true in the YAML header
of the sources of the specifications I have modified.
Please DO NOT add my GitHub profile to the list of contributors
in the README page of the project.

Add Clang/LLVM intrinsics for svcvt, scvtflt, ucvtf, ucvtflt and fcvtzsn, fcvtzun. The Clang intrinsics are guarded by the sve2.3 and sme2.3 feature flags. ACLE Patch: ARM-software/acle#428

This change adds - AdvSIMD dot intrinsics available under FEAT_F16F32DOT - SVE dot intrinsics available under either FEAT_SVE2p3 || FEAT_SME2p3

This change adds - AdvSIMD intrinsics based on FMMLA available under FEAT_F16F32MM and FEAT_F16MM - SVE intrinsics available under either FEAT_F16MM and FEAT_SVE_B16MM

This change adds support for the svluti6 operations under FEAT_SVE2p3 and FEAT_SME2p3.

rockdreamer · 2026-03-18T14:56:05Z

In the latest force push:

Neon case is now consistent
Typo fixes
svcvt_* is now svcvtzn_*
svaba_s*[_] is now just svabal[_s] and has gained a _n variant
updated vfdot_lane_f32_f16, vfdotq_laneq_f32_f16, vfdot_laneq_f32_f16, vfdotq_lane_f32_f16 to be similar to their bf16 counterparts.

This change adds support for the svaddqp, svaddsubp and svsubp operations under FEAT_SVE2p3 and FEAT_SME2p3.

This change adds support for the svcvt* operations under FEAT_SVE2p3 and FEAT_SME2p3.

This change adds support for the svaba operations under FEAT_SVE2p3 and FEAT_SME2p3.

This change adds support for new svqshrn, svqshrun operations under FEAT_SVE2p3 and FEAT_SME2p3.

…-subtract/subtract pairwise operations (#187527) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) - ADDQP (Add pairwise within quadword vector segments) - svint8_t svaddqp_s8(svint8_t, svint8_t) / svint8_t svaddqp(svint8_t, svint8_t) - svuint8_t svaddqp_u8(svuint8_t, svuint8_t) / svuint8_t svaddqp(svuint8_t, svuint8_t) - svint16_t svaddqp_s16(svint16_t, svint16_t) / svint16_t svaddqp(svint16_t, svint16_t) - svuint16_t svaddqp_u16(svuint16_t, svuint16_t) / svuint16_t svaddqp(svuint16_t, svuint16_t) - svint32_t svaddqp_s32(svint32_t, svint32_t) / svint32_t svaddqp(svint32_t, svint32_t) - svuint32_t svaddqp_u32(svuint32_t, svuint32_t) / svuint32_t svaddqp(svuint32_t, svuint32_t) - svint64_t svaddqp_s64(svint64_t, svint64_t) / svint64_t svaddqp(svint64_t, svint64_t) - svuint64_t svaddqp_u64(svuint64_t, svuint64_t) / svuint64_t svaddqp(svuint64_t, svuint64_t) - ADDSUBP (Add and subtract pairwise) - svint8_t svaddsubp_s8(svint8_t, svint8_t) / svint8_t svaddsubp(svint8_t, svint8_t) - svuint8_t svaddsubp_u8(svuint8_t, svuint8_t) / svuint8_t svaddsubp(svuint8_t, svuint8_t) - svint16_t svaddsubp_s16(svint16_t, svint16_t) / svint16_t svaddsubp(svint16_t, svint16_t) - svuint16_t svaddsubp_u16(svuint16_t, svuint16_t) / svuint16_t svaddsubp(svuint16_t, svuint16_t) - svint32_t svaddsubp_s32(svint32_t, svint32_t) / svint32_t svaddsubp(svint32_t, svint32_t) - svuint32_t svaddsubp_u32(svuint32_t, svuint32_t) / svuint32_t svaddsubp(svuint32_t, svuint32_t) - svint64_t svaddsubp_s64(svint64_t, svint64_t) / svint64_t svaddsubp(svint64_t, svint64_t) - svuint64_t svaddsubp_u64(svuint64_t, svuint64_t) / svuint64_t svaddsubp(svuint64_t, svuint64_t) - SUBP (Subtract pairwise) - svint8_t svsubp_s8(svbool_t, svint8_t, svint8_t) / svint8_t svsubp(svbool_t, svint8_t, svint8_t) - svuint8_t svsubp_u8(svbool_t, svuint8_t, svuint8_t) / svuint8_t svsubp(svbool_t, svuint8_t, svuint8_t) - svint16_t svsubp_s16(svbool_t, svint16_t, svint16_t) / svint16_t svsubp(svbool_t, svint16_t, svint16_t) - svuint16_t svsubp_u16(svbool_t, svuint16_t, svuint16_t) / svuint16_t svsubp(svbool_t, svuint16_t, svuint16_t) - svint32_t svsubp_s32(svbool_t, svint32_t, svint32_t) / svint32_t svsubp(svbool_t, svint32_t, svint32_t) - svuint32_t svsubp_u32(svbool_t, svuint32_t, svuint32_t) / svuint32_t svsubp(svbool_t, svuint32_t, svuint32_t) - svint64_t svsubp_s64(svbool_t, svint64_t, svint64_t) / svint64_t svsubp(svbool_t, svint64_t, svint64_t) - svuint64_t svsubp_u64(svbool_t, svuint64_t, svuint64_t) / svuint64_t svsubp(svbool_t, svuint64_t, svuint64_t)

…add/add-and-subtract/subtract pairwise operations (#187527) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) - ADDQP (Add pairwise within quadword vector segments) - svint8_t svaddqp_s8(svint8_t, svint8_t) / svint8_t svaddqp(svint8_t, svint8_t) - svuint8_t svaddqp_u8(svuint8_t, svuint8_t) / svuint8_t svaddqp(svuint8_t, svuint8_t) - svint16_t svaddqp_s16(svint16_t, svint16_t) / svint16_t svaddqp(svint16_t, svint16_t) - svuint16_t svaddqp_u16(svuint16_t, svuint16_t) / svuint16_t svaddqp(svuint16_t, svuint16_t) - svint32_t svaddqp_s32(svint32_t, svint32_t) / svint32_t svaddqp(svint32_t, svint32_t) - svuint32_t svaddqp_u32(svuint32_t, svuint32_t) / svuint32_t svaddqp(svuint32_t, svuint32_t) - svint64_t svaddqp_s64(svint64_t, svint64_t) / svint64_t svaddqp(svint64_t, svint64_t) - svuint64_t svaddqp_u64(svuint64_t, svuint64_t) / svuint64_t svaddqp(svuint64_t, svuint64_t) - ADDSUBP (Add and subtract pairwise) - svint8_t svaddsubp_s8(svint8_t, svint8_t) / svint8_t svaddsubp(svint8_t, svint8_t) - svuint8_t svaddsubp_u8(svuint8_t, svuint8_t) / svuint8_t svaddsubp(svuint8_t, svuint8_t) - svint16_t svaddsubp_s16(svint16_t, svint16_t) / svint16_t svaddsubp(svint16_t, svint16_t) - svuint16_t svaddsubp_u16(svuint16_t, svuint16_t) / svuint16_t svaddsubp(svuint16_t, svuint16_t) - svint32_t svaddsubp_s32(svint32_t, svint32_t) / svint32_t svaddsubp(svint32_t, svint32_t) - svuint32_t svaddsubp_u32(svuint32_t, svuint32_t) / svuint32_t svaddsubp(svuint32_t, svuint32_t) - svint64_t svaddsubp_s64(svint64_t, svint64_t) / svint64_t svaddsubp(svint64_t, svint64_t) - svuint64_t svaddsubp_u64(svuint64_t, svuint64_t) / svuint64_t svaddsubp(svuint64_t, svuint64_t) - SUBP (Subtract pairwise) - svint8_t svsubp_s8(svbool_t, svint8_t, svint8_t) / svint8_t svsubp(svbool_t, svint8_t, svint8_t) - svuint8_t svsubp_u8(svbool_t, svuint8_t, svuint8_t) / svuint8_t svsubp(svbool_t, svuint8_t, svuint8_t) - svint16_t svsubp_s16(svbool_t, svint16_t, svint16_t) / svint16_t svsubp(svbool_t, svint16_t, svint16_t) - svuint16_t svsubp_u16(svbool_t, svuint16_t, svuint16_t) / svuint16_t svsubp(svbool_t, svuint16_t, svuint16_t) - svint32_t svsubp_s32(svbool_t, svint32_t, svint32_t) / svint32_t svsubp(svbool_t, svint32_t, svint32_t) - svuint32_t svsubp_u32(svbool_t, svuint32_t, svuint32_t) / svuint32_t svsubp(svbool_t, svuint32_t, svuint32_t) - svint64_t svsubp_s64(svbool_t, svint64_t, svint64_t) / svint64_t svsubp(svbool_t, svint64_t, svint64_t) - svuint64_t svsubp_u64(svbool_t, svuint64_t, svuint64_t) / svuint64_t svsubp(svbool_t, svuint64_t, svuint64_t)

…ned/unsigned absolute difference sum and accumulate long ops Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) SABAL (Two-way signed absolute difference sum and accumulate long) - svint16_t svabal[_s16](svint16_t, svint8_t, svint8_t) / svint16_t svabal[_n_s16](svint16_t, svint8_t, int8_t) - svint32_t svabal[_s32](svint32_t, svint16_t, svint16_t) / svint32_t svabal[_n_s32](svint32_t, svint16_t, int16_t) - svint64_t svabal[_s64](svint64_t, svint32_t, svint32_t) / svint64_t svabal[_n_s64](svint64_t, svint32_t, int32_t) UABAL (Two-way unsigned absolute difference sum and accumulate long ) - svuint16_t svabal[_u16](svuint16_t, svuint8_t, svuint8_t) / svuint16_t svabal[_n_u16](svuint16_t, svuint8_t, uint8_t) - svuint32_t svabal[_u32](svuint32_t, svuint16_t, svuint16_t) / svuint32_t svabal[_n_u32](svuint32_t, svuint16_t, uint16_t) - svuint64_t svabal[_u64](svuint64_t, svuint32_t, svuint32_t) / svuint64_t svabal[_n_u64](svuint64_t, svuint32_t, uint32_t)

…ned/unsigned absolute difference sum and accumulate long ops (#188972) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) SABAL (Two-way signed absolute difference sum and accumulate long) - svint16_t svabal[_s16](svint16_t, svint8_t, svint8_t) / svint16_t svabal[_n_s16](svint16_t, svint8_t, int8_t) - svint32_t svabal[_s32](svint32_t, svint16_t, svint16_t) / svint32_t svabal[_n_s32](svint32_t, svint16_t, int16_t) - svint64_t svabal[_s64](svint64_t, svint32_t, svint32_t) / svint64_t svabal[_n_s64](svint64_t, svint32_t, int32_t) UABAL (Two-way unsigned absolute difference sum and accumulate long ) - svuint16_t svabal[_u16](svuint16_t, svuint8_t, svuint8_t) / svuint16_t svabal[_n_u16](svuint16_t, svuint8_t, uint8_t) - svuint32_t svabal[_u32](svuint32_t, svuint16_t, svuint16_t) / svuint32_t svabal[_n_u32](svuint32_t, svuint16_t, uint16_t) - svuint64_t svabal[_u64](svuint64_t, svuint32_t, svuint32_t) / svuint64_t svabal[_n_u64](svuint64_t, svuint32_t, uint32_t)

…Two-way signed/unsigned absolute difference sum and accumulate long ops (#188972) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) SABAL (Two-way signed absolute difference sum and accumulate long) - svint16_t svabal[_s16](svint16_t, svint8_t, svint8_t) / svint16_t svabal[_n_s16](svint16_t, svint8_t, int8_t) - svint32_t svabal[_s32](svint32_t, svint16_t, svint16_t) / svint32_t svabal[_n_s32](svint32_t, svint16_t, int16_t) - svint64_t svabal[_s64](svint64_t, svint32_t, svint32_t) / svint64_t svabal[_n_s64](svint64_t, svint32_t, int32_t) UABAL (Two-way unsigned absolute difference sum and accumulate long ) - svuint16_t svabal[_u16](svuint16_t, svuint8_t, svuint8_t) / svuint16_t svabal[_n_u16](svuint16_t, svuint8_t, uint8_t) - svuint32_t svabal[_u32](svuint32_t, svuint16_t, svuint16_t) / svuint32_t svabal[_n_u32](svuint32_t, svuint16_t, uint16_t) - svuint64_t svabal[_u64](svuint64_t, svuint32_t, svuint32_t) / svuint64_t svabal[_n_u64](svuint64_t, svuint32_t, uint32_t)

…tions Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) Multi-vector saturating rounding shift right narrow and interleave instructions - SQRSHRN - svint8_t svqrshrn_s8(svint16x2_t, uint64_t) / svint8_t svqrshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - UQRSHRN - svuint8_t svqrshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqrshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - SQRSHRUN - svuint8_t svqrshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqrshrun_n_u8_s16_x2(svint16x2_t, uint64_t) Multi-vector saturating shift right narrow and interleave - SQSHRN - svint8_t svqshrn_s8(svint16x2_t, uint64_t) / svint8_t svqshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - svint16_t svqshrn_s16(svint32x2_t, uint64_t) / svint16_t svqshrn_n_s16_s32_x2(svint32x2_t, uint64_t) - UQSHRN - svuint8_t svqshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - svuint16_t svqshrn_u16(svuint32x2_t, uint64_t) / svuint16_t svqshrn_n_u16_u32_x2(svuint32x2_t, uint64_t) - SQSHRUN - svuint8_t svqshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqshrun_n_u8_s16_x2(svint16x2_t, uint64_t) - svuint16_t svqshrun_u16(svint32x2_t, uint64_t) / svuint16_t svqshrun_n_u16_s32_x2(svint32x2_t, uint64_t)

…tions (#186087) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) Multi-vector saturating rounding shift right narrow and interleave instructions - SQRSHRN - svint8_t svqrshrn_s8(svint16x2_t, uint64_t) / svint8_t svqrshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - UQRSHRN - svuint8_t svqrshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqrshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - SQRSHRUN - svuint8_t svqrshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqrshrun_n_u8_s16_x2(svint16x2_t, uint64_t) Multi-vector saturating shift right narrow and interleave - SQSHRN - svint8_t svqshrn_s8(svint16x2_t, uint64_t) / svint8_t svqshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - svint16_t svqshrn_s16(svint32x2_t, uint64_t) / svint16_t svqshrn_n_s16_s32_x2(svint32x2_t, uint64_t) - UQSHRN - svuint8_t svqshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - svuint16_t svqshrn_u16(svuint32x2_t, uint64_t) / svuint16_t svqshrn_n_u16_u32_x2(svuint32x2_t, uint64_t) - SQSHRUN - svuint8_t svqshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqshrun_n_u8_s16_x2(svint16x2_t, uint64_t) - svuint16_t svqshrun_u16(svint32x2_t, uint64_t) / svuint16_t svqshrun_n_u16_s32_x2(svint32x2_t, uint64_t)

…shift operations (#186087) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) Multi-vector saturating rounding shift right narrow and interleave instructions - SQRSHRN - svint8_t svqrshrn_s8(svint16x2_t, uint64_t) / svint8_t svqrshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - UQRSHRN - svuint8_t svqrshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqrshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - SQRSHRUN - svuint8_t svqrshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqrshrun_n_u8_s16_x2(svint16x2_t, uint64_t) Multi-vector saturating shift right narrow and interleave - SQSHRN - svint8_t svqshrn_s8(svint16x2_t, uint64_t) / svint8_t svqshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - svint16_t svqshrn_s16(svint32x2_t, uint64_t) / svint16_t svqshrn_n_s16_s32_x2(svint32x2_t, uint64_t) - UQSHRN - svuint8_t svqshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - svuint16_t svqshrn_u16(svuint32x2_t, uint64_t) / svuint16_t svqshrn_n_u16_u32_x2(svuint32x2_t, uint64_t) - SQSHRUN - svuint8_t svqshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqshrun_n_u8_s16_x2(svint16x2_t, uint64_t) - svuint16_t svqshrun_u16(svint32x2_t, uint64_t) / svuint16_t svqshrun_n_u16_s32_x2(svint32x2_t, uint64_t)

…-subtract/subtract pairwise operations (llvm#187527) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) - ADDQP (Add pairwise within quadword vector segments) - svint8_t svaddqp_s8(svint8_t, svint8_t) / svint8_t svaddqp(svint8_t, svint8_t) - svuint8_t svaddqp_u8(svuint8_t, svuint8_t) / svuint8_t svaddqp(svuint8_t, svuint8_t) - svint16_t svaddqp_s16(svint16_t, svint16_t) / svint16_t svaddqp(svint16_t, svint16_t) - svuint16_t svaddqp_u16(svuint16_t, svuint16_t) / svuint16_t svaddqp(svuint16_t, svuint16_t) - svint32_t svaddqp_s32(svint32_t, svint32_t) / svint32_t svaddqp(svint32_t, svint32_t) - svuint32_t svaddqp_u32(svuint32_t, svuint32_t) / svuint32_t svaddqp(svuint32_t, svuint32_t) - svint64_t svaddqp_s64(svint64_t, svint64_t) / svint64_t svaddqp(svint64_t, svint64_t) - svuint64_t svaddqp_u64(svuint64_t, svuint64_t) / svuint64_t svaddqp(svuint64_t, svuint64_t) - ADDSUBP (Add and subtract pairwise) - svint8_t svaddsubp_s8(svint8_t, svint8_t) / svint8_t svaddsubp(svint8_t, svint8_t) - svuint8_t svaddsubp_u8(svuint8_t, svuint8_t) / svuint8_t svaddsubp(svuint8_t, svuint8_t) - svint16_t svaddsubp_s16(svint16_t, svint16_t) / svint16_t svaddsubp(svint16_t, svint16_t) - svuint16_t svaddsubp_u16(svuint16_t, svuint16_t) / svuint16_t svaddsubp(svuint16_t, svuint16_t) - svint32_t svaddsubp_s32(svint32_t, svint32_t) / svint32_t svaddsubp(svint32_t, svint32_t) - svuint32_t svaddsubp_u32(svuint32_t, svuint32_t) / svuint32_t svaddsubp(svuint32_t, svuint32_t) - svint64_t svaddsubp_s64(svint64_t, svint64_t) / svint64_t svaddsubp(svint64_t, svint64_t) - svuint64_t svaddsubp_u64(svuint64_t, svuint64_t) / svuint64_t svaddsubp(svuint64_t, svuint64_t) - SUBP (Subtract pairwise) - svint8_t svsubp_s8(svbool_t, svint8_t, svint8_t) / svint8_t svsubp(svbool_t, svint8_t, svint8_t) - svuint8_t svsubp_u8(svbool_t, svuint8_t, svuint8_t) / svuint8_t svsubp(svbool_t, svuint8_t, svuint8_t) - svint16_t svsubp_s16(svbool_t, svint16_t, svint16_t) / svint16_t svsubp(svbool_t, svint16_t, svint16_t) - svuint16_t svsubp_u16(svbool_t, svuint16_t, svuint16_t) / svuint16_t svsubp(svbool_t, svuint16_t, svuint16_t) - svint32_t svsubp_s32(svbool_t, svint32_t, svint32_t) / svint32_t svsubp(svbool_t, svint32_t, svint32_t) - svuint32_t svsubp_u32(svbool_t, svuint32_t, svuint32_t) / svuint32_t svsubp(svbool_t, svuint32_t, svuint32_t) - svint64_t svsubp_s64(svbool_t, svint64_t, svint64_t) / svint64_t svsubp(svbool_t, svint64_t, svint64_t) - svuint64_t svsubp_u64(svbool_t, svuint64_t, svuint64_t) / svuint64_t svsubp(svbool_t, svuint64_t, svuint64_t)

…ned/unsigned absolute difference sum and accumulate long ops (llvm#188972) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) SABAL (Two-way signed absolute difference sum and accumulate long) - svint16_t svabal[_s16](svint16_t, svint8_t, svint8_t) / svint16_t svabal[_n_s16](svint16_t, svint8_t, int8_t) - svint32_t svabal[_s32](svint32_t, svint16_t, svint16_t) / svint32_t svabal[_n_s32](svint32_t, svint16_t, int16_t) - svint64_t svabal[_s64](svint64_t, svint32_t, svint32_t) / svint64_t svabal[_n_s64](svint64_t, svint32_t, int32_t) UABAL (Two-way unsigned absolute difference sum and accumulate long ) - svuint16_t svabal[_u16](svuint16_t, svuint8_t, svuint8_t) / svuint16_t svabal[_n_u16](svuint16_t, svuint8_t, uint8_t) - svuint32_t svabal[_u32](svuint32_t, svuint16_t, svuint16_t) / svuint32_t svabal[_n_u32](svuint32_t, svuint16_t, uint16_t) - svuint64_t svabal[_u64](svuint64_t, svuint32_t, svuint32_t) / svuint64_t svabal[_n_u64](svuint64_t, svuint32_t, uint32_t)

…-subtract/subtract pairwise operations (llvm#187527) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) - ADDQP (Add pairwise within quadword vector segments) - svint8_t svaddqp_s8(svint8_t, svint8_t) / svint8_t svaddqp(svint8_t, svint8_t) - svuint8_t svaddqp_u8(svuint8_t, svuint8_t) / svuint8_t svaddqp(svuint8_t, svuint8_t) - svint16_t svaddqp_s16(svint16_t, svint16_t) / svint16_t svaddqp(svint16_t, svint16_t) - svuint16_t svaddqp_u16(svuint16_t, svuint16_t) / svuint16_t svaddqp(svuint16_t, svuint16_t) - svint32_t svaddqp_s32(svint32_t, svint32_t) / svint32_t svaddqp(svint32_t, svint32_t) - svuint32_t svaddqp_u32(svuint32_t, svuint32_t) / svuint32_t svaddqp(svuint32_t, svuint32_t) - svint64_t svaddqp_s64(svint64_t, svint64_t) / svint64_t svaddqp(svint64_t, svint64_t) - svuint64_t svaddqp_u64(svuint64_t, svuint64_t) / svuint64_t svaddqp(svuint64_t, svuint64_t) - ADDSUBP (Add and subtract pairwise) - svint8_t svaddsubp_s8(svint8_t, svint8_t) / svint8_t svaddsubp(svint8_t, svint8_t) - svuint8_t svaddsubp_u8(svuint8_t, svuint8_t) / svuint8_t svaddsubp(svuint8_t, svuint8_t) - svint16_t svaddsubp_s16(svint16_t, svint16_t) / svint16_t svaddsubp(svint16_t, svint16_t) - svuint16_t svaddsubp_u16(svuint16_t, svuint16_t) / svuint16_t svaddsubp(svuint16_t, svuint16_t) - svint32_t svaddsubp_s32(svint32_t, svint32_t) / svint32_t svaddsubp(svint32_t, svint32_t) - svuint32_t svaddsubp_u32(svuint32_t, svuint32_t) / svuint32_t svaddsubp(svuint32_t, svuint32_t) - svint64_t svaddsubp_s64(svint64_t, svint64_t) / svint64_t svaddsubp(svint64_t, svint64_t) - svuint64_t svaddsubp_u64(svuint64_t, svuint64_t) / svuint64_t svaddsubp(svuint64_t, svuint64_t) - SUBP (Subtract pairwise) - svint8_t svsubp_s8(svbool_t, svint8_t, svint8_t) / svint8_t svsubp(svbool_t, svint8_t, svint8_t) - svuint8_t svsubp_u8(svbool_t, svuint8_t, svuint8_t) / svuint8_t svsubp(svbool_t, svuint8_t, svuint8_t) - svint16_t svsubp_s16(svbool_t, svint16_t, svint16_t) / svint16_t svsubp(svbool_t, svint16_t, svint16_t) - svuint16_t svsubp_u16(svbool_t, svuint16_t, svuint16_t) / svuint16_t svsubp(svbool_t, svuint16_t, svuint16_t) - svint32_t svsubp_s32(svbool_t, svint32_t, svint32_t) / svint32_t svsubp(svbool_t, svint32_t, svint32_t) - svuint32_t svsubp_u32(svbool_t, svuint32_t, svuint32_t) / svuint32_t svsubp(svbool_t, svuint32_t, svuint32_t) - svint64_t svsubp_s64(svbool_t, svint64_t, svint64_t) / svint64_t svsubp(svbool_t, svint64_t, svint64_t) - svuint64_t svsubp_u64(svbool_t, svuint64_t, svuint64_t) / svuint64_t svsubp(svbool_t, svuint64_t, svuint64_t)

…ned/unsigned absolute difference sum and accumulate long ops (llvm#188972) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) SABAL (Two-way signed absolute difference sum and accumulate long) - svint16_t svabal[_s16](svint16_t, svint8_t, svint8_t) / svint16_t svabal[_n_s16](svint16_t, svint8_t, int8_t) - svint32_t svabal[_s32](svint32_t, svint16_t, svint16_t) / svint32_t svabal[_n_s32](svint32_t, svint16_t, int16_t) - svint64_t svabal[_s64](svint64_t, svint32_t, svint32_t) / svint64_t svabal[_n_s64](svint64_t, svint32_t, int32_t) UABAL (Two-way unsigned absolute difference sum and accumulate long ) - svuint16_t svabal[_u16](svuint16_t, svuint8_t, svuint8_t) / svuint16_t svabal[_n_u16](svuint16_t, svuint8_t, uint8_t) - svuint32_t svabal[_u32](svuint32_t, svuint16_t, svuint16_t) / svuint32_t svabal[_n_u32](svuint32_t, svuint16_t, uint16_t) - svuint64_t svabal[_u64](svuint64_t, svuint32_t, svuint32_t) / svuint64_t svabal[_n_u64](svuint64_t, svuint32_t, uint32_t)

The intrinsics are based on the following [proposal](ARM-software/acle#428). These are: ``` svint16_t svdot_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm); svint16_t svdot_n_s16_s8(svint16_t zda, svint8_t zn, int8_t zm); svint16_t svdot_lane_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm, uint64_t imm_idx); svuint16_t svdot_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm); svuint16_t svdot_n_u16_u8(svuint16_t zda, svuint8_t zn, uint8_t zm); svuint16_t svdot_lane_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm, uint64_t imm_idx); ```

…(#189424) The intrinsics are based on the following [proposal](ARM-software/acle#428). These are: ``` svint16_t svdot_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm); svint16_t svdot_n_s16_s8(svint16_t zda, svint8_t zn, int8_t zm); svint16_t svdot_lane_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm, uint64_t imm_idx); svuint16_t svdot_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm); svuint16_t svdot_n_u16_u8(svuint16_t zda, svuint8_t zn, uint8_t zm); svuint16_t svdot_lane_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm, uint64_t imm_idx); ```

The intrinsics are based on the following [proposal](ARM-software/acle#428). These are: ``` svint16_t svdot_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm); svint16_t svdot_n_s16_s8(svint16_t zda, svint8_t zn, int8_t zm); svint16_t svdot_lane_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm, uint64_t imm_idx); svuint16_t svdot_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm); svuint16_t svdot_n_u16_u8(svuint16_t zda, svuint8_t zn, uint8_t zm); svuint16_t svdot_lane_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm, uint64_t imm_idx); ```

Add Clang/LLVM intrinsics for svcvt, scvtflt, ucvtf, ucvtflt and fcvtzsn, fcvtzun. The Clang intrinsics are guarded by the sve2.3 and sme2.3 feature flags. ACLE Patch: ARM-software/acle#428 Fix overload and address comments Fix intrinsic name and simplify CHECK lines Reintroduce overloaded short forms for intrinsics Adapt the test cases accordingly. Rename ACLE clang intrinsic A clang intrinsic was renamed in the ACLE patch. Change the name accordingly. Use existing pattern template Apply suggestions Apply suggestions

IRBuilder, IRFolder, etc small modifications on instructions.cpp and instructions.h Constants + Verifier (DOES NOT COMPILE - JUST TO SAVE WORK) Error fixing and selectionDAG and GlobalIsel structure Langref Update fix bitinsert and bitextract isValidOperands allow return value of bitextract to be of any first-class type allow inserted value of bitinsert to be of any first-class type fix ConstantsContext, Verifier and echo add onlyIfReduced logic to const expr add parsing for bitinsert and bitextract add bitcode and tests, fix syntax add bitcode reader and writer logic add printing of bitextract to AsmWriter printInstruction fix syntax in Core, Instructions, ConstantsContext and echo add some tests simplify Parser make BitcodeWriter consistent remove constexpr, start machine IR (not compiling; save work) Co-authored-by: Pedro Vicente <pedro.costa.vicente@tecnico.ulisboa.pt> SDag and global isel better formating and fixing assert [mlir][Vector] Add load, store, etc. to dropleadunitdim (#195686) Discussions on improvements to fold-memref-alias-ops changes revealed that the patterns meant to drop leading unit dimensions from vector operations weren't handling load, store, and other "terminal" vector dialect operations. This PR adds the patterns to fix that. Assisted-by: Claude 4.7 [IPO] Fix infinite recursive inlining in ModuleInliner (#195471) The ModuleInliner currently lacks inline history tracking. Without it, the inliner can get stuck in an infinite loop when mutually recursive functions are involved. This patch enables inline history tracking in the ModuleInliner to address this issue. The minsize attribute in the test case lowers the threshold for the mutually recursive functions, ensuring the bug reproduces in pass isolation. [InlineOrder] Fix assertion failure in CostBenefitPriority (#195564) InlineCost::getStaticBonusApplied() triggers an assertion failure if the CostBenefitPriority constructor calls it when IC.isVariable() is false. This is because getStaticBonusApplied() expects isVariable() to be true. Unconditionally populating CostBenefit also incorrectly prioritizes a NeverInline candidate with a cost-benefit pair over other valid variable-cost sites. This patch fixes the crash and the sorting issue by calling getStaticBonusApplied() and populating CostBenefit only when IC.isVariable() is true. For AlwaysInline and NeverInline costs, CostBenefit is explicitly set to std::nullopt. [ModuleInliner] Skip function declarations during candidate scan (#195567) This patch skips function declarations during the candidate scan in ModuleInlinerPass::run as declarations do not have bodies. [asan] Add tests for __sanitizer_annotate_double_ended_contiguous_container (#195673) [VPlan] Get GEP wrap flags from VPInstructions (NFCI). (#195730) Add helper to retrieve GEP no-wrap flags from VPInstructions, looking through zero-index GEPs and pointer casts, like Value::stripPointerCasts. Removes an access to underlying IR. Add named posix semaphore lifetime operations on linux (#192278) This implements the second part of #190847 Specifically, this pr adds `sem_open`, `sem_close`, and `sem_unlink` for posix semaphore on linux. https://pubs.opengroup.org/onlinepubs/9799919799/functions/sem_open.html https://pubs.opengroup.org/onlinepubs/9799919799/functions/sem_close.html https://pubs.opengroup.org/onlinepubs/9799919799/functions/sem_unlink.html Since it targets on linux implementation, two extra things are added: 1. add system call wrappers for `mmap`, `munmap`, `link`, `unlink`, and `ftruncate`. Those are necessary for the implementation of semaphore on linux. Wrappers is added based on the refactor proposal: https://libc.llvm.org/dev/syscall_wrapper_refactor.html. 2. refactor the previous semaphore implementation, put it under `linux/` since its based on linux. [sanitizer] Capitalize HINT in reports (#195734) Consistency with other sanitizer reports which use uppercase "HINT:". [DWARFLinker] Emit DW_IDX_parent in the accelerator table (#195403) .debug_names entries produced by the parallel linker were always emitted with std::nullopt for ParentDIEOffset, resulting in a missing DW_IDX_parent. The classic linker emits it via DWARF5AccelTableData::getDefiningParentDieOffset on the output DIE tree. The parallel linker can't use the same approach because the records are saved during cloneDIE, before the output DIE has been linked into its parent, so DIE::getParent() is nullptr at that time time. Fix that by computing the parent offset from the input-side DIE tree instead. We look up InputDieEntry's parent via getParentIdx, skip parents marked DW_AT_declaration, and translate them to the output offset through CompileUnit::getDieOutOffset. Since no real DIE can live at offset 0, we can use that to unambiguously mark input DIEs that were not cloned into this CU's plain DWARF (e.g. routed only into the artificial type unit) and is treated as "no parent". Only compile-unit accelerator entries are covered. Type-unit entries (artificial type unit) still emit no DW_IDX_parent, tracked by a TODO. [asan] Record container poisoning in poison history (#195674) [flang][OpenMP] Detect DSA conflicts in nested loop constructs (#195323) Follow-up to https://github.com/llvm/llvm-project/pull/194961 The fix from PR194961 did not detect explicit/predefined DSA conflicts on an iteration variable in a nested loop construct. For example, in a testcase inspired by Fujitsu 0165_0035.f90: ``` !$omp parallel do private(i) shared(j) do i=1,1 do j=1,1 !$omp parallel do default(none) shared(k) do k=1,1 end do !$omp end parallel do end do end do ``` the "shared(k)" was not flagged as incorrect. Adjust the fix to traverse the host-association chain until the symbol from the DSA clause is found. [cmake][compiler-rt][darwin] builtin libraries don't build for armv6m in Darwin (#195372) darwin_add_builtin_libraries tests for _Float16 and __bf16 for the host architecture rather than the one being built, add -arch to fix that so that armv6m correctly reports that it does not support __bf16. cfcmp/cdcmp get "error: unsupported relocation type" on their "Branch to target address" to c{f,d}cmple. Switch those to "Call a subroutine" instructions on Thumb-1 (e.g. armv6m). Assisted-by: Claude Code rdar://167828904 [SandboxIR][Tracker] Support nested checkpoints (#191097) This patch implements nested checkpointing, i.e., you can now save the IR state more than once and revert more than once. For example, after two saves: save(1) and save(2), a revert() will bring you back to the IR state of save(2), one more revert will bring you back to the IR state of save(1). APFloat: fix wrong result status for large floats (#189925) For large float literals such as `10384593717069655257060992658440193.0`, [`FloatingLiteral::isExact`](https://github.com/llvm/llvm-project/blob/6b2b0da40de1495ace2b100799a35711f7ad7b21/clang/include/clang/AST/Expr.h#L1702) was incorrectly returning `true`. The issue has been tracked down to `IEEEFloat::roundSignificandWithExponent` incorrectly reporting `opOK`. [compiler-rt][test] Disable create_thread_loop2 for lsan on Darwin (#195753) create_thread_loop2 occasionally hangs on macOS till hitting timeout. Disable the tests for LSAN on macOS. [AMDGPU][GlobalISel] RegBankLegalize rules for permlane_(bcast|up|down|xor|idx_gen) intrinsics (#195583) [clang][Analysis] Handle const-qualified pointer refs in `ExprMutationAnalyzer` (#190421) Teach `ExprMutationAnalyzer` to recognize references to const-qualified pointer objects, such as `T *const &`, as non-const pointee sinks when the pointee type itself is non-const. Fixes #190218 Fixes #157730 [libc] Add vhscampos as the maintainer for ARM and AArch64 (#195051) [asan] Remove XFAIL from tests that are passing with debug checks These tests have been passing with COMPILER_RT_DEBUG=ON since 2022 (at least for x86). Fixes https://github.com/llvm/llvm-project/issues/46206 Pull Request: https://github.com/llvm/llvm-project/pull/84614 [libc][semaphore] Zero-initialize rand_bytes to fix GCC warning (#195757) GCC 15 warns about `rand_bytes` being maybe uninitialized when passed to `getrandom`. Since `getrandom` writes to it, it doesn't strictly need initialization, but zero-initializing it satisfies the compiler and avoids the `-Werror=maybe-uninitialized` error. Fix for https://github.com/llvm/llvm-project/pull/192278 Assisted by Gemini [NFC][asan] Re-use ErrorGeneric::shadow_val (#195684) It's done to reduce code size, as additional load on reporting is negligible. This requires change in ErrorGeneric constructor, to now we skip partial shadow only if next one has a better value. Before shadow_val was either `0x[a-f].` or 0. Now it may be partial granule as well. However it's NFC as `0 < shadow_val < ASAN_SHADOW_GRANULARITY` does not affect reporting outside of `CheckPoisonRecords`. [asan] Only suggest increasing poison_history_size if the buffer is full (#195732) I unlikely but possible to setup shadow state, e.g unpoison heap red zone just after partial granule. If buffer is not full, increasing it will not help. [asan] Add HINT prefix to poison_history_size suggestions (#195733) Consistency update to prefix poison tracking suggestions with "HINT:" to match other ASan diagnostic hints. [lldb] Add missing closing brace in IRForTarget (#195761) Follow-up to 51d52c2a6cdc. [RFC][IR] Support vector splats in `ConstantPointerNull` (#195486) This PR allows `ConstantPointerNull` to represent both scalar pointer nulls and fixed or scalable vector splats of pointer nulls. This change first aligns with the native splat behavior of `ConstantInt` and `ConstantFP`, and second, makes it easier to eventually change the semantics of `ConstantPointerNull` to represent a semantic null pointer instead of a zero value, which is what it represents today. Reapply "[SandboxIR][SandboxVec] Remove score tracking from Region, add RegionWithScore" (#195708) (#195752) This reverts commit 75f884b96dfa082227fb730435c68353cd3376c5. [clang][ExprConst] Add another missing `NoteLValueLocation()` call (#195098) This is a little more work since it requires a new parameter for all the `found()` handlers. [NVPTX] Fix for LTO dwarf emission when multiple CUs with a mix of DebugDirectivesOnly and NoDebug (#190371) Only emit the initial .loc dwarf directive if the CU requires debug information. When a module contains multiple CUs where one is DebugDirectiveOnly and the rest are NoDebug, we attempt to emit dwarf for the NoDebug compile units leading to an assertion when calling emitDwarfFile0Directive when in getOrCreateDwarfCompileUnit. I added a lit test for this case which used to assert. [libcxx] Remove ASan container overflow checks for SSO strings (#194208) Essentially a revert of #79536, although manual given how much the code has changed. Doing container overflow checks on SSO strings is fundamentally unsound due to how they interact with compiler optimizations within LLVM, which assume that the entire object is dereferencable when it is live. LLVM can sometimes insert speculative loads that will create false positives when they try and load a partially filled SSO buffer. This was previously prevented using inline asm barriers within std::string itself. However, this can also manifest in other ways, with the motivating example for the revert being GlobalOpt inserting a speculative load of a std::variant containing a SSO std::string, causing an ASan failure. [libunwind] Add SME detection for Apple platforms (#193630) Copy SME detection sysctl code from ZA test. Also fix a logic issue in the original code while being reviewed. [InlineOrder] Remove unused InlineHistoryMap (#195769) This patch removes InlineHistoryMap, an unused variable. The last use was removed by: commit 3af427539bc6b85a9fe3334b9a0b43d347ea29ac Author: Arthur Eubanks <aeubanks@google.com> Date: Wed Apr 8 09:34:52 2026 -0700 [compiler-rt] Add MSVC CRT flags to ASan coverage test (#195719) The ASan coverage test also links clang_rt.profile because it uses -coverage. On Windows MSVC, this can fail if the test is linked with the static CRT but clang_rt.profile was built with the dynamic CRT. For example, a profile runtime that uses /MD may reference DLL CRT symbols such as __imp_* symbols. Add the same dynamic CRT flags directly to this one test on Windows MSVC so it keeps linking when clang_rt.profile changes. Reapply "[JTS][Passes] Enable JTS By Default" (#193409)" (#195776) This reverts commit c74951c6c3074be23fd7b12843e1187ca98e98af. The ASan issues on the buildbot should be fixed by b5f4f16fd98cb57a13a96b7d7faa89326636c5bc. [IR] Add elementwise modifier to atomicrmw (#189517) This PR implements the IR side modifications of [[RFC] Add elementwise modifier to atomicrmw](https://discourse.llvm.org/t/rfc-add-elementwise-modifier-to-atomicrmw/90134). Design Decisions: - In the IR, the current atomicrmw record layout looks like: [ptrty, ptr, valty, val, operation, vol, ordering, syncscope, align]. To encode elementwise, I decided to pack it into the operation field, which also contains the math op (i.e. fadd, fmin, add etc...). I could have changed the record structure, but that would be slightly more complicated. - elementwise vector atomics can be vectors of integers because we can always scalarize legally - elementwise vector atomics need to have power of 2 size. We can potentially remove this restriction later. Assisted by AI. [gn build] Port be8e0842a103 Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/195781 [gn build] Port f7329e61ad4b Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/195782 [MLIR][Python] Add `ConditionallySpeculatable` interface and `Pure` specifier (#195505) This PR brings two features: the `ConditionallySpeculatable` op interface and the `Pure` specifier for Python-defined ops. The result is that you can mark an op as pure like: ```python class PureOp( TestPure.Operation, name="pure", traits=[Pure] # just like in the ODS! ): a: Operand[IntegerType[32]] b: Operand[IntegerType[32]] res: Result[IntegerType[32]] = infer_result() ``` Then this op is both `NoMemoryEffect` and `AlwaysSpeculatable`. Assisted-by: Copilot/GPT5.4 [X86][APX] Add VirtRegMap to non stack foldMemoryOperand too (#193423) We need to query mapped physical register through VirtRegMap. Fixes: https://godbolt.org/z/1KGj3aYeP [gn build] Port cb2a64e7e5af Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/195783 [orc-rt] Change SPS controller-interface naming conventions. (#195614) This commit makes two changes to the naming conventions for SPS CI symbols: 1. The orc_rt_sps_ci_ prefix is replaced with orc_rt_ci_sps_ (for SPS wrapper functions) and orc_rt_ci_ (without the "sps_" suffix) for data symbols. 2. The _sps_wrapper suffix is dropped from wrapper functions, since the prefix now distinguishes between SPS-wrappers and data symbols. fix libMLIRCAPIInterfaces after #195505 (#195789) https://github.com/llvm/llvm-project/pull/195505 missed `MLIRSideEffectInterfaces` in the CMakeLists.txt. [MLIR] Add HasAncestor op trait (#195447) Add HasAncestor/AncestorOneOf traits that verify an operation has a specific ancestor anywhere in the parent chain, unlike HasParent which only checks the immediate parent. [CIR][NFC] Rename SignBitOp to CIR_SignBitOp (#195477) Align with the CIR_ prefix naming convention used by other op definitions in CIROps.td. [PGO][ICP] Prevent indirect call promotion to functions with incompatible target features (#192142) Profile-driven indirect call promotion was promoting indirect calls to functions requiring advanced CPU features (e.g., AVX512) even when the caller function did not support those features. When these promoted calls were subsequently inlined, it could lead to invalid IR and backend crashes during instruction selection because the target CPU could not handle the advanced instructions. This patch addresses the issue by adding a target feature compatibility check to `llvm::isLegalToPromote` in `CallPromotionUtils.cpp`. If the callee requires target features (prefixed with `+`) that are not present in the caller's target features, the promotion is skipped. By centralizing this check in `isLegalToPromote`, we protect all passes relying on this utility (such as `SampleProfileLoader` and `IndirectCallPromotion`) from promoting to incompatible targets. This also prevents incorrect inlining of `always_inline` functions that would otherwise be promoted via indirect calls and then inlined. Fixes #192126 Pull Request: https://github.com/llvm/llvm-project/pull/192142 [mlir][SPIR-V] Add Weak linkage type and SPV_AMD_weak_linkage extension (#195660) - add 'Weak' linkage type (SPV_AMD_weak_linkage) - deduce the Linkage capability and linkage-type extension from linkage_attributes in UpdateVCE pass --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> [GlobalISel] Fast-path single vreg values in getOrCreateVRegs (#195244) IRTranslator::translateLoad is hot in compile-time profiles of sqlite on aarch64-O0-g. Aggregates are flattened into multiple vregs with offsets. Avoiding this path for single vregs is a small -0.11% compile-time win. https://llvm-compile-time-tracker.com/compare.php?from=a7a2dc59616a8cb1198d933bcdf55ebdbd78894c&to=96fcfe733947db64effd5e1d1413697981cbe357&stat=instructions%3Au [GlobalISel] Fast-path single reg loads and stores in IRTranslator (#195245) IRTranslator::translateLoad is hot in compile-time profiles of sqlite on aarch64-O0-g. Aggregates are flattened into multiple vregs with offsets. Most loads and stores lower to a single register so this offset materialization can be avoided. Small -0.07% geomean improvement on aarch64-O0-g with -0.24% for sqlite. https://llvm-compile-time-tracker.com/compare.php?from=a7a2dc59616a8cb1198d933bcdf55ebdbd78894c&to=e535e96fd0952cfa646d48ef84cf5948c0a8a2ab&stat=instructions%3Au [mlir][NFC] Fix typo (flattend => flattened) (#195780) [AArch64][GlobalISel] Add gisel test coverage for addp-shuffles.ll. NFC (#195803) [CIR][CUDA] Fix typed StringAttr on globals and Add CIR-to-LLVM lowering checks for existing registration support (#195002) [AArch64][GlobalISel] Match G_DUP with undef elements (#195237) This helps us match more vector splats that contain undef elements, matching build vectors that contain undef so long as they contain at least 2 duplicate entries. [LV] Introduce isLegalMaskedLoadOrStore (NFC) (#195242) This simplifies legality checks, and eventually will become the single point querying TTI hooks for masked ld/st. Currently, legality checks for interleaved accesses still query TTI directly. [clang][AST][NFC] const-correctness improvements for member functions returing `ArrayRef` (#195784) - Add const qualifiers to member functions. - Drop non-const-qualified member functions whose const-qualified versions return same thing. [libc++][mdspan] Fix `mdspan::operator[]` bounds checking and implement LWG4020 (#192269) This example fails to compile with libc++ in hardened mode without this fix. ```c++ struct RValueInt { constexpr operator int() && noexcept { return 0; } }; int main() { int data[1] = {42}; std::mdspan m(data, std::extents<int, 1>{1}); m[RValueInt{}]; } ``` Fixes: #171311 [AssumeBuilder] Fix incorrect nonnull optimization (#195676) We can only optimize nonnull assumes to refer to the base of a GEP if the GEP is inbounds. From #195650. [NFC][clang] InterpBuiltin.cpp - use consistent ia32 naming convention for all x86 builtin expansions (#195699) Also, add IsAdd argument to interp__builtin_ia32_addcarry_subborrow instead of repeated builtinid comparisons [mlir][SPIR-V][NFC] Add pushCaps/pushExts helpers to type visitors (#195796) Discussed in https://github.com/llvm/llvm-project/pull/195060#discussion_r3183984853 [MLIR] Fix offload map metadata order (#195346) This fixes one part of https://github.com/llvm/llvm-project/issues/195333. There are two different semicolon-field orders in offload/include/Shared/SourceInfo.h: 1. Per-map strings (`map_var_info_t`): first field is the mapped name, second is the file path: ``` /// Type alias for source location information for variable mappings with /// data layout ";name;filename;row;col;;\0" from clang. using map_var_info_t = void *; ``` 2. `ident_t` region strings: first field is the file path, second is the function name: ``` /// The ident structure that describes a source location from kmp.h. with /// source location string data as ";filename;function;line;column;;\0". struct ident_t { ``` `SourceInfo` reflects that with two constructors: `SourceInfo(const ident_t *)` parses `getSubstring(0)` as the filename field and `getSubstring(1)` as the name (function); `SourceInfo(const map_var_info_t)` uses `getSubstring(0)` for the map name and `getSubstring(1)` for the filename path. `OpenMPIRBuilder::getOrCreateSrcLocStr(FunctionName, FileName, line, col)` always builds the same wire shape: the second parameter is emitted first, then the first i.e. ;FileName;FunctionName;line;col;. Clang passes arguments accordingly in two places: 1. per-map metadata (`emitMappingInformation`): Calls `getOrCreateSrcLocStr(FileName, ExprName, ...)` so the buffer is ";expr;path;…", matching `map_var_info_t` and `getNameFromMapping`. 2. ident / update location (`emitUpdateLocation`): Calls `getOrCreateSrcLocStr(FunctionName, FileName, ...)` so the buffer is ";file;function;…", matching ident_t. MLIR previously called the same helper in only the ident-like way for both uses which is why we see file paths in the map name slot. I would have preferred if we can keep the same order in both cases. But I am not aware of the history of why it was done like this. I have instead fixed it by adding `ForOffloadMap` to `createSourceLocStrFromLocation` so mapping uses the same argument order as Clang's `emitMappingInformation`, while ident-style callers keep the existing order. Adjusted a test accordingly. [mlir][SPIR-V] Add AtomicLoad and AtomicStore ops (#195797) [OpenMP] Add target to build OpenMP runtime (#189557) This adds a top-level target to build the OpenMP runtime, similar to what was done in https://github.com/llvm/llvm-project/pull/186099 for the Offload runtime. Having this top-level target enables us to execute the build in the pre-commit CI as shown in https://github.com/llvm/llvm-project/pull/174955 (I actually just cherry-picked the commit from that branch) [AArch64] Enable SVE MaxBW for Neoverse-N2, prefer fixed at same cost (#195108) This should be using the same settings as other 128b SVE cores. Very slight performance improvement from MaxBW, regressions alleviated by preferring fixed VFs over scalable when they're the same cost. [MLIR][Shard] Fold all_gather/all_slice inverse pairs (#193906) Add a simplify pattern that replaces all_gather(all_slice(x))/all_slice(all_gather(x)) with x when grid, grid axes, and gather/slice axis match. Co-authored-by: Cursor <cursoragent@cursor.com> [lldb-dap][vscode] add instructions for debugging the VSCode extension (#195280) [LifetimeSafety] Update doc for `handleInvalidatingCall` (#195281) [lldb] fix unconsumed llvm::Expected's errors (#193257) [LangRef] Patchable function attributes (#195764) Langref should document the expected layout for patchable-function-entry / patchable-function-prefix attrs (prefix nops go before the function label, entry nops go after the label, both are before the prologue). Fixes #195693. [AArch64] Add C1-Premium scheduling model (#185398) This patch adds the C1-Premium scheduling model. Values are derived from Arm's SWOG for this core which can be found at https://developer.arm.com/documentation/111080/3-0 Assisted-by Codex [offload] Update test after ed07c92c9629. (#195822) We now get `a(:n) at dump_map_tables.f90:20:11` so test has been updated accordingly. This should fix the regression seen after https://github.com/llvm/llvm-project/pull/195346. [mlir][SPIR-V] Refine OpTypeImage capability inference (#195060) Capability requirements for OpTypeImage are determined by Dim, Sampled, MS, and Arrayed related to LLVM SPIR-V backend PR https://github.com/llvm/llvm-project/pull/192626 [clang][bytecode] Don't evaluate bound member function expressions in new constant interpreter (#194851) **Problem:** A crash is triggered by clangd's hover feature when using C++23 and the new bytecode interpreter, which calls `Expr::EvaluateAsRValue()` to attempt constant folding on an expression under the cursor, even when it is not a valid constant expression. Tested versions: 22.1.3, Trunk (x86_64-pc-linux-gnu) **How to reproduce:** ```cpp struct S { void f(); }; void g() { S s; s.f(); } ``` Running `clangd --check=repro.cpp` (with `compile_flags.txt` containing `-std=c++23 -fexperimental-new-constant-interpreter`) will crash. `Assertion ItemTypes.back() == toPrimType<T>() failed.` You can observe the same crash by hovering over STL iterators like `vec.begin()`. **Relevant Stack Trace:** ```text ``` *Basically: `textDocument/hover` → `getHover` → `EvaluateAsRValue` → new constant interpreter → `MemberPointer` type mismatch on stack pop.* When `Compiler<Emitter>::VisitMemberExpr()` encounters a non-static `CXXMethodDecl` member (a bound member function expression such as `s.f` in `s.f()`), it falls through to `visitDeclRef()`. This pushes a `FnPtr` onto the interpreter stack. However, the caller expects a `MemberPointer`, causing an assertion failure in `InterpStack::pop()`: **Fix:** * In `VisitMemberExpr()`, bail out early (`return false`) when the member is a non-static `CXXMethodDecl`, before reaching `visitDeclRef()`. This causes `EvaluateAsRValue()` to report failure gracefully rather than crashing. Bound member function expressions (`s.f`) are not valid constant expressions, so returning `false` should be semantically correct. **Testing:** * Added AST unit test (`EvaluateAsRValue.FailsGracefullyOnBoundMemberExpr`) that directly isolates a bound `MemberExpr` and passes it to `EvaluateAsRValue()`, asserting it returns `false` without crashing. * Added clangd hover test (Hover.NoCrashOnBoundMemberFunctionWithNewInterpreter) that reproduces the original crash scenario. * *Note:* I could not add a Lit test because I believe this is unreachable via normal `clang` invocations. `Sema` strictly catches isolated bound member functions before constant evaluation. `clangd` has a unique path to triggering this. **Root cause:** This is exposed by C++23 specifically due to (I think P2280R4 / P2448R2): - Relaxing the rules around "unknown" objects in constant evaluation, allowing `s` in `s.f()` to proceed past the base object check even though `s` is not constexpr and deferring failures to bytecode execution rather than rejecting them structurally. Assisted-by: gemini-cli @tbaederr --------- Co-authored-by: Timm Baeder <tbaeder@redhat.com> [mlir][spirv] Support OpenCL.std clz in the SPIR-V dialect (#195317) Add support for the OpenCL.std clz extended instruction in the MLIR SPIR-V dialect. From Spriv Specs - Op Name: "clz", Op Code: "151" [clang][CIR] Add lowering for fp16 intrinsics (#194865) This PR adds lowering for the following intrinsic groups: * https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#absolute-difference-1 * https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#reciprocal-estimate-1 * https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#reciprocal-estimate-1 It also moves the corresponding tests from: * clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics.c to: * clang/test/CodeGen/AArch64/neon/fullfp16.c The lowering follows the existing implementation in CodeGen/TargetBuiltins/ARM.cpp. [analyzer][NFC] Rename class `StackFrameContext` to `StackFrame` (#195802) This patch continues the refactoring roadmap described in issue #190973 by renaming the `StackFrameContext` class to just `StackFrame`. Many variables of type `const StackFrame *` also had their names updated to reflect the new class name `StackFrame`. [NFC][analyzer] Introduce specialized variants of makeNode (#194459) This commit introduces new methods `makePostStmtNode` and `makeNodeWithBinding` of `CoreEngine`, which will be used instead of the 5-parameter overloads of `NodeBuilder::generateNode` and `NodeBuilder::generateSink` (which were originally methods of the class `StmtNodeBuilder` that was deleted in commit fb46677a858697afa116c4252e84050a07bc6a70). This commit applies the newly introduced methods in a few places (as examples), but there are 80+ call sites that use the 5-parameter `NodeBuilder::generateNode` or `generateSink`, so this transition will be completed in multiple follow-up commits. I decided to introduce these methods because after the transition there will be 20+ calls to `makePostStmtNode` and 30+ calls to `makeNodeWithBinding` and it would be cumbersome to use plain `makeNode` instead of these specialized variants. (On the other hand these new methods don't support specifying a `tag` because only a few call sites would use that feature of the 5-parameter `generateNode`.) [ADT] Bitset: add shift operators, word accessors, and etc (#193400) This PR is split out from #191757 per reviewer request. It has the following changes to `llvm::Bitset<N>`: * Added `operator<<`/`<<=`/`>>`/`>>=`, `getNumWords()`, `getWord()`, and `findLastSet()`. * Moved the `std::array<>` constructor from protected to **public** and **explicit**. A follow-up PR will use these to re-implement `LaneBitmask` as a `llvm::Bitset` wrapper. --- The unit test in the PR is largely generated by LLMs. I have reviewed it and manually applied changes to cover more edge cases. [SLP][NFC]Add a test with broacast of loads estimation Reviewers: Pull Request: https://github.com/llvm/llvm-project/pull/195833 Revert "[clang-tidy] An option for conditional skipping overloaded functions …" (#195701) Reverts llvm/llvm-project#194133 [DAG] Replace llvm::isNeutralConstant with SelectionDAG::isIdentityElement (#195827) Initial step towards generalising this - move to SelectionDAG like other valuetracker helpers, add DemandedElts/Depth controls, etc. We can add target node handling when it becomes necessary. Renamed to "IdentityElement" to match llvm naming conventions. [mlir][spirv][nfc] Fix the summary of SPIRVLowerABIAttributesPass (#195043) [mlir][emitc] Refactor the e2e test for TOSA -> EmitC lowering (#194686) Update the TOSA -> EmitC test, mlir/test/Dialect/EmitC/tosa/ops.mlir, to also compile the generated source with a C++ compiler. This adds an `mlir-translate` step, making the overall flow: mlir-opt tosa.mlir | mlir-translate | c++ -Wall -Werror -c Although the test still does not execute the generated code, compiling it with a host compiler increases coverage and makes the test closer to an integration test. For this reason, move it to: mlir/test/Integration/Dialect/EmitC/ The source generated by the translator uses `size_t`, which requires the `stddef.h` header. This is currently provided explicitly in the compile step with `-include stddef.h`; a follow-up should consider emitting the required include from the translator instead. [libcxx] Include __utility/exchange.h in thread.h (#195786) [DAG] expandVecReduce - pull out repeated getFlags() calls. NFC. (#195838) [lldb] Flush delayed breakpoints before eagerly changing one (#195815) See the discussion in https://github.com/llvm/llvm-project/pull/192971 When LLDB makes the decision to eagerly send a breakpoint packet, it should first ensure the delayed breakpoints are flushed, as they may interfere with the eager breakpoint that is about to be changed. Implementation note: we could have included the eager breakpoint in the batch that is about to be flushed. However, it's important to get information about the error status of this eager breakpoint, and the current APIs dont make it easy to distinguish which breakpoint caused an error. [Bazel] Fixes 9db9f6d (#195843) This fixes 9db9f6d9c985f38075f2e0a3f55646d13fe0953c. Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com> [llubi] Initial support for floating-point numbers (#188453) Implements basic floating-point instructions and fast-math flags handling in `phi` and `select`. [clang-tidy] apply fix-it for modernize-return-braced-init-list with qualifiers (#195846) Fixes #195844 With this PR I've added robust test cases to ensure things work for many different cases. Revert "[ADT] Bitset: add shift operators, word accessors, and etc" (#195848) Reverts llvm/llvm-project#193400 Breaks buildbots clang: Report full triple arch string in --offload-arch diagnostic (#195849) This was previously only printing the top level arch string. Print the full arch + subarch from the triple. [flang][OpenMP] Change NONTEMPORAL clause to contain OmpObjectList (#195331) NONTEMPORAL in source code takes a variable list, which is represented in the AST as OmpObjectList. [libc] Add socket test support library (#193207) A couple of tests (and I plan to add a couple more) need to create/compare unix domain socket addresses. This isn't completely trivial due to the the interaction between the `addrlen` argument and the terminating '\0', and the existing code doing that was subtly wrong in some cases. This patch centralizes the functions for these operations, reducing code duplication and ensuring consistent (and correct) behavior everywhere. [OpenMP][OMPT] Update tools header for OpenMP v6.0 (#195829) Add new enum values, structs and variables to `omp-tools.h` header, according to the OpenMP v6.0 specification. Update `openmp` and `offload` tools headers to recognize the new values to ease testing once those features are implemented. Since no feature is implemented here, update the docs to mention `partial` support, pointing to this PR for the enums. --------- Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de> [AARCH64] Add intrinsic support for new s/udot intrinsics (#189424) The intrinsics are based on the following [proposal](https://github.com/ARM-software/acle/pull/428). These are: ``` svint16_t svdot_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm); svint16_t svdot_n_s16_s8(svint16_t zda, svint8_t zn, int8_t zm); svint16_t svdot_lane_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm, uint64_t imm_idx); svuint16_t svdot_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm); svuint16_t svdot_n_u16_u8(svuint16_t zda, svuint8_t zn, uint8_t zm); svuint16_t svdot_lane_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm, uint64_t imm_idx); ``` [clang][test] Add missing `FileCheck` in Import/destructor/test.cpp (#195778) The test had `CHECK` directives that were never executed because the `RUN` line did not pipe output to `FileCheck`. This also replaces `CHECK-NEXT` with `CHECK-SAME` so the subsequent `CHECK-SAME` directives match the remaining destructor properties in order. [MLIR][XeGPU] Support Layout propagation for interleave and deintereleave op (#194966) Enable propagation of interleave and deinterleave with their own propagation rules. --------- Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> [SystemZ] Replace PatLeaf with ImmLeaf (#195146) The code snippets for the predicates are a bit shorter, because the APInt is directly available instead of an SDNode. Main advantage is that it enables the constraint to be ported to GlobalISel. [Flang][Driver] Enable -print-file-name (#195790) Enable -print-file-name for the Flang driver as already enabled by Clang. Extracted out of #171515 as requested by https://github.com/llvm/llvm-project/pull/171515#pullrequestreview-4180690622 [SCEV] Clarify nuw/nsw semantics for multi-operand nodes (#195826) nuw/nsw must hold for all subsets/orders of operands. [Support] Add missing header guard (#195870) [EarlyCSE] Check if getValueType() returns nullptr (#195868) 15383dcdb4012387d94fab67fed4519613bf9a19 recently added a call to getValueType() into EarlyCSE without checking whether or not it returned nullptr. It sometimes might end up processing target specific intrinsics for which the underlying getAccessType will typically return nullptr. We observed this crash when building https://github.com/google/highway for AArch64. Regression test added to ensure this does not occur again in the future. [mlir][vector] Account for subview offset in gather lowering. (#195359) Strided vector.gather on a column subview was reading the wrong column because the rewrite to a collapsed gather dropped the subview's static offset. --------- Signed-off-by: hanhanW <hanhan0912@gmail.com> PeepholeOpt: Clear kill flags in foldImmediate (#195680) When foldImmediate replaces a COPY destination with its source, this extends the live range of the source, but it does not update the kill flags. Clear kill flags on the source register after replacement. This was found while working on REG_SEQUENCE optimizations motivated by AMDGPU demands. Both an AMDGPU and an X86 test case are added to show that the issue is not AMDGPU specific. [InstCombine][NFC] Use CreateAssumption instead of CreateCall (#195862) [AMDGPU][NFC] Autogenerate check lines for llvm.amdgcn.rsq.clamp.ll and llvm.amdgcn.rsq.ll (#195867) [lldb][windows] fix cross DLL file descriptor lookup crash (#195855) On Windows, file descriptors are only valid in the same DLL: they are really just handles mapped to an index in a table in the CRT. Calling a liblldb method with a file descriptor from lldb-dap will cause the program to crash. See https://github.com/llvm/llvm-project/issues/193971. This patch fixes the issue by refactoring the `NativeFile` constructors so that they no longer try to convert `FILE` types to handles through the CRT lookup table. [InstCombine] Remove redundant assume fold (#195852) The fold is fully redundant with the fold using `computeKnownBits`, so we can let that do the work instead. [NFCI] clarify that asan-*linux.cpp files affect *nix OS'es (#195565) **Prior Work:** Aims to supersede (#132263), which seems inactive, specifically by applying my own comment: https://github.com/llvm/llvm-project/pull/132263#issuecomment-3051238734 **Context:** It aims to minimally document that the `asan_(malloc_)?linux.cpp` files may impact non-linux OS'es (despite the name) such as Solaris, BSD, and other *nix OS'es. This is worth documenting as otherwise we risk breakage due to confusion, as occurred [here](https://github.com/llvm/llvm-project/pull/131975#issuecomment-2741097471). This is done simply by minimally augmenting the file header comment saying precisely this. Unlike the prior PR, this does not rename any files, which should reduce the 'git noise' impact of this change. _Thanks!_ [AArch64][GlobalISel] Lower unmerge to extract_subvector (#195046) This follows and reuses the existing lowering for unmerge -> extract vector element, extending it to also lower unmerge -> subvector extract for half-sized vector extracts. This allows certain tablegen patterns to match. An extra extract_subvector(dup) combine is needed to optimize away unnecessary instructions. The ext vs mov/dup brings us in-line with SDAG, but we may change both to use mov/dup. [Flang][OpenMP] Fix assert trigger in MapInfoFinalization pass for implicit record member maps (#193851) The current iteration of the implicit record member mapping segment of the MapInfoFinalization pass makes the assumption that child maps of parents are already bound to the targets block arguments, but that is not the case apon initial lowering from PFT to MLIR. This actually happens as the end of the MapInfoFinalization pass currently where we "canonicalize" that all maps are inserted as Block arguments to their respective targets. This assumption unfortunately leads to a few cases where we trigger the assertion, to address this we can impose this canonicalization of map <-> block arguments as soon as we enter the pass and then once again at the end of the pass for any new members generated by the MapInfoFinalization pass. This allows the implicit record member mapping process to continue unhindered whilst changing very little elsewhere other than the ordering of block arguments (hence some lit tests tweaks). The main downside is the extra processing required for running the "canonialization" twice. I adopted some tests created by @chichunchen in his version of the fix to help test for regressions. Co-authored-by: @chichunchen [DWARFLinker] Add assembly-label range handling to parallel linker (#195366) Assembly CUs typically have DW_TAG_label entries instead of subprograms, so the parallel linker's line-table filter saw no function ranges and dropped every row. Mirror the classic linker: for labels in Mips_Assembler or Assembly CUs, look up an assembly range via getAssemblyRangeForAddress and call addFunctionRange before falling back to addLabelLowPc. [VPlan] Scalarize to first-lane-only directly on VPlan (#184267) This is needed to enable subsequent https://github.com/llvm/llvm-project/pull/182595. I don't think we can fully port all scalarization logic from the legacy path to VPlan-based right now because by that point in the pipeline interleave groups aren't lowered into any VPlan-based representation and as such this pass operates on incomplete information. Currently, the pass can make transformations if "all uses are scalar" (that won't change later) but not "are uses a mix of vector and scalar uses" (that might change after lowering interleave groups). As such, I decided just to implement something much simpler that would be enough for #182595. However, we perform this transformation before delegating to the old CM-based decision, so it **is** effective immediately and taking precedence even for consecutive loads/stores right away. --------- Co-authored-by: Ramkumar Ramachandra <artagnon@tenstorrent.com> [HLSL] For builtins aliases, apply implicit conversions before running custom type checking (#195365) Fixes https://github.com/llvm/llvm-project/issues/195329 by making HLSL builtin aliases apply implicit conversions before running custom type checking. After this PR: - There are no more size 1 vectors being passed and returned to/from aliased Clang builtins because they get truncated to scalars due to the HLSL alias builtin not having explicit size 1 vector overloads. - HLSL alias builtins no longer accept matrices unless they have explicit matrix overloads. Matrices get implicitly truncated to scalars and resolve to the scalar Clang builtin being aliased. - Many calls with mismatched vector sizes no longer error with `arguments are of different types` and instead follow Clang's overload resolution rules with respect to HLSL's implicit conversion sequences. (e.g., `dot(float3, float2)` -> `dot(float2, float2)` with warning) - Calls with implicitly-convertible types no longer error. They are now implicitly converted, and with a warning in some cases. (e.g., `f16tof32(bool)` -> `f16tof32(uint)` without warning, but `f16tof32(short)` -> `f16tof32(uint)` with warning). Assisted-by: Claude Opus 4.6 Reland [Inliner] Use store-to-load forwarding to resolve call arguments (#195526) Adds store to load forwarding when inliner has successfully done some inlining. This allows simplification of further inlining attempts and can give them more precise cost analysis. It allows to optimize away empty `std::set` and `std::map` in both `libc++` and `libstdc++` and many other real world cases. Reland of #190607. It was reverted since it was causing crashes in address space pointers and should be fixed by #195256 [DWARFLinker] Patch DW_AT_LLVM_stmt_sequence in the parallel linker (#195388) Mirror dsymutil's stmt-sequence rewriting in the parallel linker so each attribute ends up pointing at the DW_LNE_set_address that opens its containing output sequence, with the correct offset in the combined .debug_line. At DIE cloning time we resolve each attribute's input offset to the address of its first row and record the pair (DIEValue, address) on the CompileUnit, alongside a DebugOffsetPatch on the .debug_info section so combination adds the CU's .debug_line start offset. The line-table emitter then fills a map from row address to the byte offset of the sequence-opening DW_LNE_set_address. After emission, each recorded attribute is rewritten by relocating its input address through the CU's function ranges and looking the result up in the map. When resolution fails the DWARF max-offset sentinel is written instead, and the patch applier preserves it unchanged. First-row lookups share a lazy per-CU cache to keep resolution O(1) per attribute. [clang][ssaf] Add accessor for `LUNamespace` (#195756) [mlir][SPIR-V] Add OpExpectKHR op (#195842) Complete SPV_KHR_expect_assume extension support [ELF] Support non-section Defined symbols at MergeInputSection end (#195801) Commit bb443359a8ad ("[ELF] Validate merge section offsets in getSymVA and match GNU ld") accepts offset == section_size for section-symbol references in getSymVA, and skips the out-of-bounds case in MarkLive. This patch extends the support for non-section Defined symbols, modifying the per-symbol pre-resolution added by commit 42cc45477727 ("[ELF] Optimize binary search in getSectionPiece"). Fix #118148 [libc][unistd] Fix generated at-function prototypes (#195341) Fixes generated <unistd.h> prototypes for readlinkat/symlinkat and adds compile-time public-header coverage [DWARFLinker] Support parseable Swift interfaces in the parallel linker (#195475) CompileUnit::Language was filtered through isODRLanguage at construction, so non-ODR languages landed as std::nullopt and analyzeImportedModule's Swift branch was never reached under --linker parallel. Store the raw DW_AT_language and apply isODRLanguage only where ODR is actually decided. Reaching that branch then exposes a shared-map write from parallelForEach workers. A mutex would make it safe but not deterministic: conflict warnings and last-writer-wins contents would depend on thread scheduling and diverge from the classic linker. Stage entries per-CU and merge them serially in input order after analysis, so results match classic regardless of schedule. Because the diagnostics are emitted after unloading the input, we do lose the originating DW_TAG_module. The warning still names the module and both conflicting paths, which carries the load-bearing information. [mlir][SYCL] Fail init errors cleanly instead of `abort`ing (#192979) Disclaimer: this is my first PR to LLVM. I'm trying to follow the contribution guide and the conventions i see in other PRs, but if i missed something -- please let me know. Fixes #182807. When the SYCL runtime wrapper is loaded on a host without a Level-Zero backend, `getDefaultDevice()` throws an `std::runtime_error`, `catchAll` catches it, and calls `abort()`, which results in a "PLEASE submit a bug report" stack dump, which is not correct for this kind of crash. With this change `catchAll` now writes to stderr and terminates via `std::exit(EXIT_FAILURE)`, yielding a clean exit code 1 with no crash dump. The "getDefaultDevice failed" message is also replaced with a (hopefully) better one. The runners in `mlir/lib/ExecutionEngine/` don't have much consistency with regards as to how to output diagnostics. 2 files use `llvm::outs/errs`, 9 files use `std::cout/cerr`, 8 files use `fprintf`. I left `fprintf` in place, since it was used in that module before and it's also used by CUDA and ROCm runners (the most dominant ones in the folder). I'm open to changing it to something else if needed. 1. Why this fix strategy doesn't match the CUDA/ROCm wrappers' print-warning-and-continue style: The `print-and-continue` approach was tried and it produces worse behavior with SYCL wrapper. The entry points would return `nullptr` on init failure, but the MLIR JIT runner does not null-check those returns, so the `nullptr` is dereferenced inside a later SYCL call and produces another SIGSEGV and the same crash dump this patch is trying to avoid. Making that style work would require null-checks across the downstream wrappers. We can go that way if desired, but i opted to start with a simpler fix. 2. Why `L0_SAFE_CALL` (`SyclRuntimeWrappers.cpp`, line 40) is left untouched: it's a deliberate decision, as `L0_SAFE_CALL` is supposed to abort on Level-Zero errors (kernel launch, OOM, etc). I can reconsider if there are some arguments for updating error handling strategy in `L0_SAFE_CALL` too, I just don't see any yet. 3. There's no unit test with this change, as I wasn't able to find a test that would test this logic (i guess this kind of logic isn't usually tested with unit tests anyway). I did test it manually, though, for both positive and negative cases. clang: Print full triple arch name in error (#195889) More instances of pattern from 58f94f0a4648a5d8d624c05a51dea8c37e5fe7bc [NFC][SPIR-V] Rename vID register class to viID (#195711) Rename the v2i64 register and its class from vID/vID0 to viID/viID0 so it follows the same `v<element-kind>ID` convention as vfID (v2f64) and vpID (v2p64) for better consistency [SandboxVec] SandboxVectorizerIR Boilerplate (#189515) This patch introduces a new specialization of SandboxIR named SandboxVectorizerIR that contains the new Pack instruction (though it is just a placeholder for now). It also implements the necessary boilerplate in SandboxIR to support adding such specializations. Each specialization requires its own *Values.def file defining its new instructions. The new .def file also needs to be added to ValuesDefFilesList.def which allows SandboxIR to include entries from all specializations and populate structures like the Opcode enum, ClassID and others. Given that we need to include both .def files in several places, I had to (i) change the .def file to undefine its locally defined macros so that they won't get redefined in the next #include .def, and (ii) add the DEF_DISABLE_AUTO_UNDEF option to disable undefining the macros at the end because otherwise the user-specified DEF_ macros would be undefined by the time we reach the second .def file. [Bazel] Fixes bb51440 (#195851) This fixes bb51440ffe693007fee40df9c9227432ebc5e0f0. Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com> [MLIR][Transform] Don't error when a structurally inlinable call exists (#195770) Fixes bug introduced in https://github.com/llvm/llvm-project/pull/192956 Specifically transform-interpreter would crash if any op in the region it is applied to is marked as no-inline via the inliner interface. This is because the check added does a post processing to verify that all operations can be inlined [and there isn't an issue due to symbol merging]. However, it fails to account for the case where an operation was already not inlinable (and not an error introduced by the transform symbol merging). [clang-sycl-linker] Add per-kernel device code splitting (#195362) Introduce a --sycl-module-split-mode option for clang-sycl-linker that selects how the fully linked device module is split into device images: - none (default): one device image containing all kernels - kernel: one device image per kernel function, built by taking the transitive closure of each kernel's dependencies The split is implemented by delegating to llvm::splitModuleTransitiveFromEntryPoints, providing a categorizer that assigns each kernel definition a unique integer id. Symbol collection now happens during splitting while the Module is still in memory, avoiding a round-trip through disk to re-read the bitcode for symbol extraction. As part of this change, the splitModuleTransitiveFromEntryPoints API is updated to accept an Error-returning callback and propagate errors from it, replacing the previous void callback signature. The existing caller in llvm-split is updated accordingly. Additionally, linkDeviceCode now returns the linked Module alongside the bitcode file path, so the splitter operates on the in-memory module directly rather than re-reading it from disk. Additional split modes (per-translation-unit, optional-feature grouping) and specialization-constant handling will be introduced in follow-up changes. Co-Authored-By: Claude [flang][cuda][openacc] Add UseDevice attribute to model host_data use_device symbols (#195182) Symbols appearing in `!$acc host_data use_device(...)` were previously marked with `CUDADataAttr::Device`, which caused generic resolution to fail. Introduce a new CUDADataAttr::UseDevice enumerator and use it in `CopySymbolWithDevice` / `CloneDerivedTypeForUseDeviceImpl` instead of Device. The new attribute is: 1. Compatible with any dummy in AreCompatibleCUDADataAttrs. 2. Ranked in the matching distance table (GetMatchingDistance): prefers Device dummies (0), accepts Managed/Unified (2), and allows host dummies (3). 3. Not user-spellable — only set internally during OpenACC name resolution, never parsed from source or written to module files. 4. Treated like Device for non-host-array checks, I/O restrictions, and MLIR lowering (mapped to `cuf::DataAttribute::Device`). [VPlan] Check for VInstruction in findCanonicalIVIncrement. (#195845) There are cases where findCanonicalIVIncrement may discover a non-VPInstruction add that matches the expected shape, but we expect a VPInstruction. Add explicit check to fix crash. [bazel] fix #193207 (#195898) fix #193207 [llvm] Fix the DWARFLinkerParallel unit tests (#195902) [UnitTests] Fix shared build. NFC (#195906) [clang] Consistently store format specifiers in execution encoding (#195876) `DecomposePrintfHandler::HandlePrintfSpecifier()` parses the format string and collects specifiers into `Specs`. In most cases the collected specifiers are in the execution encoding, but there were to places that used string literals in "host" encoding. Change them to use a part of the parsed `StringLiteral` instead so that `Specs` always contain text in the execution encoding. This is achieved by adding `getCharacters()` method to `OptionalAmount` class, following `ConversionSpecifier::getCharacters()`. This is to make #169803 smaller and is an NFC before that PR lands. [PowerPC] Add AES Builtins (#186895) Add the following AES builtins: __vector_pair __builtin_aes_encrypt_paired (__vector_pair, __vector_pair, uint2); __vector_pair __builtin_aes128_encrypt_paired (__vector_pair, __vector_pair); __vector_pair __builtin_aes192_encrypt_paired (__vector_pair, __vector_pair); __vector_pair __builtin_aes256_encrypt_paired (__vector_pair, __vector_pair); __vector_pair __builtin_aes_decrypt_paired (__vector_pair, __vector_pair, uint2); __vector_pair__builtin_aes128_decrypt_paired (__vector_pair, __vector_pair); __vector_pair __builtin_aes192_decrypt_paired (__vector_pair, __vector_pair); __vector_pair__builtin_aes256_decrypt_paired (__vector_pair, __vector_pair); __vector_pair __builtin_aes_genlastkey_paired (__vector_pair, uint2); __vector_pair __builtin_aes128_genlastkey_paired (__vector_pair); __vector_pair __builtin_aes192_genlastkey_paired (__vector_pair); __vector_pair __builtin_aes256_genlastkey_paired (__vector_pair); vec_t __builtin_galois_field_mult (vec_t, vec_t, uint1); vec_t __builtin_galois_field_mult_gcm (vec_t, vec_t); vec_t __builtin_galois_field_mult_xts (vec_t, vec_t); [lldb-dap] Fix crash in source request handler (#195847) Check optional argument source has a value before getting the source reference. [RISCV][GISel][NFC] Fix file header comment in RISCVInlineAsmLowering.cpp (#195805) [LoopInfo] Remove redundant LLVM_ABI from Loop member functions (#195863) Remove redundant `LLVM_ABI` from `addStringLoopAttribute` and `addIntLoopAttribute`. Addresses post-merge feedback on #194676. This patch was generated with the help of Claude and reviewed by a human. Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com> [LifetimeSafety] Add support for iterator arithmetic (#195442) This PR adds origin propagation through iterator arithmetic. It also adds a few tests to check the new behavior. Closes #190140 clang: Print effective triple in -ccc-print-bindings (#195914) Previously this would print the original triple, rather than the adjusted triple for the tool invocation which I found confusing. [SSAF][PointerFlow] Add PointerFlow summary and extractor (#188654) Implement a PointerFlow summary and extractor, which uses EntityPointerLevel. An assignment is extracted as a pair of EntityPointerLevels, representing a directed edge. Edges extracted from assignments form a directed graph encoding abstract pointer flow information. rdar://172429193 --------- Co-authored-by: Balázs Benics <benicsbalazs@gmail.com> Co-authored-by: Jan Korous <jkorous@apple.com> [LV] Extend test coverage with predicated IVs and FORs. (#195908) Add coverage to test where the induction phi itself is a live-out, as well as various tests where we could either generate a predicated IV or first-order recurrence. [flang] Implement SIMPLE procedure support (parsing, semantics, diagnostics) (#161285) Implement support for the Fortran 2023 `SIMPLE` procedure specifier. This change includes: - Parsing and attribute registration for `SIMPLE` procedures - Adding `SIMPLE` to procedure characteristics - Semantic support for `SIMPLE`, including treating `SIMPLE` procedures as `PURE` (`SIMPLE` implies `PURE`) - Rejecting `SIMPLE` + `IMPURE` combinations - Emitting and reading `SIMPLE` in mod files - Adding parser, semantic, and modfile tests to validate `SIMPLE` behavior `SIMPLE` procedures satisfy `PURE` requirements, while `PURE` procedures do not imply `SIMPLE`. [PowerPC] Further refactor atomic loads (#194445) Depending on the availability of the word-part feature, different code is generated for 1 and 2 byte atomic loads. This change moves the decision to use the word-part feature from C++ into TableGen patterns. This is done via: - move code from `EmitPartwordAtomicBinary()` into new function 'signExtendOperandIfUnknown()' - decouple functions `EmitPartwordAtomicBinary()` and `EmitAtomicBinary()` - remove the size from the name of the pseudo instructions; instead, introduce a pseudo instruction which is used in case the word-part feature is missing - update the handling of the pseudo instruction insertion accordingly A side effect of this change is the implementation requires 11 pseudo instructions less. [NFC][CIR] Pass StringRef to `getIntrinsicForClangBuiltin` directly (#195458) `prefix` is already a StringRef, so pass it to `getIntrinsicForClangBuiltin` and `getIntrinsicForMSBuiltin` directly. Additionally, drop some redundant `llvm::` and `Intrinsic::` namespace qualifiers. [bazel][SandboxVec] Port 0beb9f45f273bdc2d620f6dd07517761d294d259 (#195928) [IVDescriptors] Check for null SE before dereference in getMinMaxRecurrence. (#195798) Some callers don't pass SCEV. [SSAF] Fix -Wunused-variable in b73ce3e53fcb9f72c759139e93db7e16a813c3b2 (#195944) There was a variable only used in an assertion in a call to addSummary. Mark it [[maybe_unused]] given the call has side effects. [flang] Fix FIR AliasAnalysis for zero-offset view chains (#192710) When subroutine `f` below is inlined, the `ACCImplicitData` pass fails to recognize that `b` is already covered by the enclosing `!$acc data deviceptr`. The deviceptr clause operates on a box (`fir.embox` result) while the inner `acc.serial` uses the underlying ref. `fir::AliasAnalysis` traces both back through the full def-chain, where an upstream sliced `fir.embox` (from `a(:,5)`) sets `approximateSource=true`, causing `MayAlias` instead of `MustAlias`. The pass falls back to implicit `copyin`/`copyout`, causing a segfault. ```fortran module test_mod real, allocatable :: a(:,:) contains subroutine f(b) real, dimension(*) :: b !$acc data deviceptr(b(1:100)) !$acc serial print *, b(10) !$acc end serial !$acc end data end subroutine f end module test_mod program test use test_mod call alloc_a !$acc data present(a) !$acc host_data use_device(a) call f( a(:,5) ) !$acc end host_data !$acc end data end program ``` Fix: add `getZeroOffsetViewRoot()` as a short-circuit in `alias(Source, Source, Value, Value)`. Before consulting `approximateSource`, it walks each value backward through `FortranObjectViewOpInterface` ops with zero offset (embox without slice, declare, box_addr) to find its root. If both values share the same root, `MustAlias` is returned immediately, avoiding the conservative `MayAlias` caused by distant `approximateSource` flags. [VPlan] Add isFreeScalarIntrinsic helper (NFC). (#195929) Split off from https://github.com/llvm/llvm-project/pull/195518. [VPlan] Add VPlan printing test for call widening decisions. (NFC) (#195930) Add VPlan-printing test for https://github.com/llvm/llvm-project/pull/195518. [flang][cuda] Fix unregistered allocator (#195924) which then miss to add the constructor to `llvm.mlir.global_ctors` which leads to runtime failure because the allocators for CUDA Fortran are not registered. [clang-tidy] Fix `implicit-bool-conversion` C ternary condition false positive (#195913) Fixes #195604 [SSAF][WPA] Add "no-op" PointerFlow and UnsafeBufferUsage analysis (#193089) Added 'no-op' PointerFlow and UnsafeBufferUsage analyses to convert summary data into AnalysisResult, which DerivedAnalysis can then consume. Also, refactored the PointerFlow and UnsafeBufferUsage serialization for code sharing. rdar://174874942 --------- Co-authored-by: Balázs Benics <benicsbalazs@gmail.com> Co-authored-by: Jan Korous <jkorous@apple.com> [gn] use action() instead of copy() for libcxx headers (#195948) copy() doesn't handle file deletions. Use an action() that syncs the output directory with the input list via a response file, removing files that are no longer in the list. This works because if files are added or removed, ninja's command line tracking re-runs the script, and if contents of existing files change, ninja's input mtime checking reruns it. This also makes the remove_float_h workaround unnecessary. Motivated by all the recent header removals in libc++. [mlir][spirv][nfc] Use function argument type in TypeCapabilityVisitor switch (#195918) Addresses post commit review from #195796. [LLDB] Fix UBSan issue with ValueType enums. (#195540) ValueTypeSyntheticMask, when bitwise OR'd with ValueType enums, produces a value that is outside the official enum range for ValueTypes. This causes UBSan errors, when UBSan is set to check enum values. E.g. If you build LLDB with the Cmake flags -DCMAKE_CXX_FLAGS="-fsanitize=enum -fsanitize-trap=enum" -DCMAKE_C_FLAGS="-fsanitize=enum -fsanitize-trap=enum" Then try to run the LLDB test TestScripedFrameProvider, it crashes with a SIGILL from UBSan. This change fixes that by pulling ValueTypeSyntheticMask into the ValueType enums, expanding the valid enum range and making the bitwise OR'd values valid. [mlir][acc] Improve implicit deviceptr detection for alias (#195934) The ACCImplicitData automatically is able to use deviceptr clause when variable is detected as being device data. However, it was missing check for own `acc declare deviceptr` attribute. clang: Avoid dummy LAST entry in OffloadArch (#195952) Use this as an alias of the final entry, rather than its own enum value. This will allow writing covered switches that don't need to handle this case. This matches how other places with an end enum entry handle this. [MLIR][XeGPU] Support pointer/dynamic-memref sources in array-length optimization (#195872) Extend `OptimizeCreateNdDescOp` to handle the two remaining `create_nd_tdesc` source forms — `i64` pointer and dynamic-shape memref — by forwarding the existing shape/strides operands through the general builder. The memory region is unchanged by the rewrite; only the `tensor_desc` view is narrowed along the FCD and tagged with `array_length`. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> [CIR] Add pass_object_size hidden parameter support (#191482) Emit the hidden `i64` parameter that `__attribute__((pass_object_size(N)))` requires. At call sites the size is constant-folded when possible (e.g. `&a` → 4) and falls back to `cir.objsize` / `@llvm.objectsize` otherwise (e.g. VLAs). On the callee side, `buildFunctionArgList` now creates an `ImplicitParamDecl` for each annotated parameter so that `emitBuiltinObjectSize` can load the passed size instead of re-computing it. This also fixes the `llvm_unreachable("NYI")` in `RequiredArgs::getFromProtoWithExtraSlots` and the `errorNYI` in `appendParameterTypes` / `arrangeFreeFunctionLikeCall` that fired whenever `hasExtParameterInfos()` was true. New test: `clang/test/CIR/CodeGen/pass-object-size.c` (CIR / LLVM / OGCG). Made with [Cursor](https://cursor.com) --------- Co-authored-by: Cursor <cursoragent@cursor.com> [CIR] Use SymbolUserMap in applyReplacements to fix quadratic behavior (#195883) applyReplacements() previously called replaceAllSymbolUses() for each replacement, which walks the entire module every time — O(R × M) for R replacements and M operations. For C++ programs with heavy template instantiation (e.g., Eigen), this quadratic behav…

The intrinsics are based on the following [proposal](ARM-software/acle#428). These are: ``` svint16_t svdot_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm); svint16_t svdot_n_s16_s8(svint16_t zda, svint8_t zn, int8_t zm); svint16_t svdot_lane_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm, uint64_t imm_idx); svuint16_t svdot_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm); svuint16_t svdot_n_u16_u8(svuint16_t zda, svuint8_t zn, uint8_t zm); svuint16_t svdot_lane_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm, uint64_t imm_idx); ```

georges-arm reviewed Feb 2, 2026

View reviewed changes

Comment thread main/acle.md Outdated

Comment thread main/acle.md Outdated

Comment thread main/acle.md Outdated

Comment thread main/acle.md Outdated

Comment thread main/acle.md Outdated

Comment thread main/acle.md Outdated

Update copyright dates

2739bfa

rockdreamer force-pushed the dp97 branch from 5b93d9b to f0458c9 Compare February 5, 2026 11:45

ricbal02 reviewed Feb 27, 2026

View reviewed changes

Comment thread neon_intrinsics/advsimd.md Outdated

georges-arm reviewed Mar 3, 2026

View reviewed changes

Comment thread main/acle.md Outdated

Comment thread main/acle.md Outdated

Comment thread main/acle.md Outdated

Comment thread main/acle.md Outdated

Comment thread main/acle.md Outdated

Comment thread main/acle.md Outdated

Comment thread main/acle.md Outdated

amilendra reviewed Mar 4, 2026

View reviewed changes

Comment thread main/acle.md Outdated

amilendra mentioned this pull request Mar 12, 2026

[clang][AArch64][SVE2p3][SME2p3] Add intrinsics for v9.7a shift operations llvm/llvm-project#186087

Merged

MartinWehking mentioned this pull request Mar 16, 2026

[AArch64] Add 9.7 CVT data processing intrinsics llvm/llvm-project#186807

Open

jthackray reviewed Mar 17, 2026

View reviewed changes

Comment thread main/acle.md Outdated

rockdreamer added 3 commits March 18, 2026 10:53

Add alpha support for 9.7 dot product intrinsics

f0cfdd2

This change adds - AdvSIMD dot intrinsics available under FEAT_F16F32DOT - SVE dot intrinsics available under either FEAT_SVE2p3 || FEAT_SME2p3

Add alpha support for 9.7 mmla intrinsics

6509a41

This change adds - AdvSIMD intrinsics based on FMMLA available under FEAT_F16F32MM and FEAT_F16MM - SVE intrinsics available under either FEAT_F16MM and FEAT_SVE_B16MM

Add alpha support for 9.7 lookup table intrinsics

bd40d9b

This change adds support for the svluti6 operations under FEAT_SVE2p3 and FEAT_SME2p3.

rockdreamer force-pushed the dp97 branch from f0458c9 to 9afa130 Compare March 18, 2026 14:52

amilendra mentioned this pull request Mar 19, 2026

[Clang][AArch64][SVE2p3][SME2p3] Add intrinsics for v9.7a add/add-and-subtract/subtract pairwise operations llvm/llvm-project#187527

Merged

amilendra reviewed Mar 23, 2026

View reviewed changes

Comment thread main/acle.md Outdated

rockdreamer force-pushed the dp97 branch from 9afa130 to def5604 Compare March 24, 2026 11:02

CarolineConcatto reviewed Mar 25, 2026

View reviewed changes

Comment thread main/acle.md Outdated

amilendra reviewed Mar 25, 2026

View reviewed changes

Comment thread main/acle.md

rockdreamer added 5 commits March 26, 2026 09:31

Add alpha support for 9.7 pairwise operation intrinsics

ab1da62

This change adds support for the svaddqp, svaddsubp and svsubp operations under FEAT_SVE2p3 and FEAT_SME2p3.

Add alpha support for 9.7 conversion intrinsics

14486df

This change adds support for the svcvt* operations under FEAT_SVE2p3 and FEAT_SME2p3.

Add alpha support for 9.7 absolute difference accumulation intrinsics

788af6d

This change adds support for the svaba operations under FEAT_SVE2p3 and FEAT_SME2p3.

Add alpha support for 9.7 shift right narrow intrinsics

330b978

This change adds support for new svqshrn, svqshrun operations under FEAT_SVE2p3 and FEAT_SME2p3.

Use Neon name consistently.

4a060f9

Update svluti6_lane_s16_x4 to also accept index registers of size 3.

b92f4c6

rockdreamer added 2 commits May 12, 2026 20:36

Merge branch 'main' into dp97

f699021

Make changelog consistent between template and md file

5f5d80f

rockdreamer force-pushed the dp97 branch from 188fe1f to 5f5d80f Compare May 12, 2026 20:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add alpha support for 9.7 data processing intrinsics#428

Add alpha support for 9.7 data processing intrinsics#428
rockdreamer wants to merge 15 commits into
ARM-software:mainfrom
rockdreamer:dp97

rockdreamer commented Jan 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rockdreamer commented Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Conversation

rockdreamer commented Jan 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rockdreamer commented Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants