Add alpha support for 9.7 data processing intrinsics#428
Open
rockdreamer wants to merge 15 commits into
Open
Conversation
georges-arm
reviewed
Feb 2, 2026
ricbal02
reviewed
Feb 27, 2026
georges-arm
reviewed
Mar 3, 2026
amilendra
reviewed
Mar 4, 2026
MartinWehking
added a commit
to MartinWehking/llvm-project
that referenced
this pull request
Mar 16, 2026
Add Clang/LLVM intrinsics for svcvt, scvtflt, ucvtf, ucvtflt and fcvtzsn, fcvtzun. The Clang intrinsics are guarded by the sve2.3 and sme2.3 feature flags. ACLE Patch: ARM-software/acle#428
jthackray
reviewed
Mar 17, 2026
This change adds - AdvSIMD dot intrinsics available under FEAT_F16F32DOT - SVE dot intrinsics available under either FEAT_SVE2p3 || FEAT_SME2p3
This change adds - AdvSIMD intrinsics based on FMMLA available under FEAT_F16F32MM and FEAT_F16MM - SVE intrinsics available under either FEAT_F16MM and FEAT_SVE_B16MM
This change adds support for the svluti6 operations under FEAT_SVE2p3 and FEAT_SME2p3.
Contributor
Author
|
In the latest force push:
|
amilendra
reviewed
Mar 23, 2026
amilendra
reviewed
Mar 25, 2026
This change adds support for the svaddqp, svaddsubp and svsubp operations under FEAT_SVE2p3 and FEAT_SME2p3.
This change adds support for the svcvt* operations under FEAT_SVE2p3 and FEAT_SME2p3.
This change adds support for the svaba operations under FEAT_SVE2p3 and FEAT_SME2p3.
This change adds support for new svqshrn, svqshrun operations under FEAT_SVE2p3 and FEAT_SME2p3.
CarolineConcatto
pushed a commit
to llvm/llvm-project
that referenced
this pull request
Apr 27, 2026
…-subtract/subtract pairwise operations (#187527) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) - ADDQP (Add pairwise within quadword vector segments) - svint8_t svaddqp_s8(svint8_t, svint8_t) / svint8_t svaddqp(svint8_t, svint8_t) - svuint8_t svaddqp_u8(svuint8_t, svuint8_t) / svuint8_t svaddqp(svuint8_t, svuint8_t) - svint16_t svaddqp_s16(svint16_t, svint16_t) / svint16_t svaddqp(svint16_t, svint16_t) - svuint16_t svaddqp_u16(svuint16_t, svuint16_t) / svuint16_t svaddqp(svuint16_t, svuint16_t) - svint32_t svaddqp_s32(svint32_t, svint32_t) / svint32_t svaddqp(svint32_t, svint32_t) - svuint32_t svaddqp_u32(svuint32_t, svuint32_t) / svuint32_t svaddqp(svuint32_t, svuint32_t) - svint64_t svaddqp_s64(svint64_t, svint64_t) / svint64_t svaddqp(svint64_t, svint64_t) - svuint64_t svaddqp_u64(svuint64_t, svuint64_t) / svuint64_t svaddqp(svuint64_t, svuint64_t) - ADDSUBP (Add and subtract pairwise) - svint8_t svaddsubp_s8(svint8_t, svint8_t) / svint8_t svaddsubp(svint8_t, svint8_t) - svuint8_t svaddsubp_u8(svuint8_t, svuint8_t) / svuint8_t svaddsubp(svuint8_t, svuint8_t) - svint16_t svaddsubp_s16(svint16_t, svint16_t) / svint16_t svaddsubp(svint16_t, svint16_t) - svuint16_t svaddsubp_u16(svuint16_t, svuint16_t) / svuint16_t svaddsubp(svuint16_t, svuint16_t) - svint32_t svaddsubp_s32(svint32_t, svint32_t) / svint32_t svaddsubp(svint32_t, svint32_t) - svuint32_t svaddsubp_u32(svuint32_t, svuint32_t) / svuint32_t svaddsubp(svuint32_t, svuint32_t) - svint64_t svaddsubp_s64(svint64_t, svint64_t) / svint64_t svaddsubp(svint64_t, svint64_t) - svuint64_t svaddsubp_u64(svuint64_t, svuint64_t) / svuint64_t svaddsubp(svuint64_t, svuint64_t) - SUBP (Subtract pairwise) - svint8_t svsubp_s8(svbool_t, svint8_t, svint8_t) / svint8_t svsubp(svbool_t, svint8_t, svint8_t) - svuint8_t svsubp_u8(svbool_t, svuint8_t, svuint8_t) / svuint8_t svsubp(svbool_t, svuint8_t, svuint8_t) - svint16_t svsubp_s16(svbool_t, svint16_t, svint16_t) / svint16_t svsubp(svbool_t, svint16_t, svint16_t) - svuint16_t svsubp_u16(svbool_t, svuint16_t, svuint16_t) / svuint16_t svsubp(svbool_t, svuint16_t, svuint16_t) - svint32_t svsubp_s32(svbool_t, svint32_t, svint32_t) / svint32_t svsubp(svbool_t, svint32_t, svint32_t) - svuint32_t svsubp_u32(svbool_t, svuint32_t, svuint32_t) / svuint32_t svsubp(svbool_t, svuint32_t, svuint32_t) - svint64_t svsubp_s64(svbool_t, svint64_t, svint64_t) / svint64_t svsubp(svbool_t, svint64_t, svint64_t) - svuint64_t svsubp_u64(svbool_t, svuint64_t, svuint64_t) / svuint64_t svsubp(svbool_t, svuint64_t, svuint64_t)
cpullvm-upstream-sync Bot
pushed a commit
to navaneethshan/cpullvm-toolchain-1
that referenced
this pull request
Apr 27, 2026
…add/add-and-subtract/subtract pairwise operations (#187527) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) - ADDQP (Add pairwise within quadword vector segments) - svint8_t svaddqp_s8(svint8_t, svint8_t) / svint8_t svaddqp(svint8_t, svint8_t) - svuint8_t svaddqp_u8(svuint8_t, svuint8_t) / svuint8_t svaddqp(svuint8_t, svuint8_t) - svint16_t svaddqp_s16(svint16_t, svint16_t) / svint16_t svaddqp(svint16_t, svint16_t) - svuint16_t svaddqp_u16(svuint16_t, svuint16_t) / svuint16_t svaddqp(svuint16_t, svuint16_t) - svint32_t svaddqp_s32(svint32_t, svint32_t) / svint32_t svaddqp(svint32_t, svint32_t) - svuint32_t svaddqp_u32(svuint32_t, svuint32_t) / svuint32_t svaddqp(svuint32_t, svuint32_t) - svint64_t svaddqp_s64(svint64_t, svint64_t) / svint64_t svaddqp(svint64_t, svint64_t) - svuint64_t svaddqp_u64(svuint64_t, svuint64_t) / svuint64_t svaddqp(svuint64_t, svuint64_t) - ADDSUBP (Add and subtract pairwise) - svint8_t svaddsubp_s8(svint8_t, svint8_t) / svint8_t svaddsubp(svint8_t, svint8_t) - svuint8_t svaddsubp_u8(svuint8_t, svuint8_t) / svuint8_t svaddsubp(svuint8_t, svuint8_t) - svint16_t svaddsubp_s16(svint16_t, svint16_t) / svint16_t svaddsubp(svint16_t, svint16_t) - svuint16_t svaddsubp_u16(svuint16_t, svuint16_t) / svuint16_t svaddsubp(svuint16_t, svuint16_t) - svint32_t svaddsubp_s32(svint32_t, svint32_t) / svint32_t svaddsubp(svint32_t, svint32_t) - svuint32_t svaddsubp_u32(svuint32_t, svuint32_t) / svuint32_t svaddsubp(svuint32_t, svuint32_t) - svint64_t svaddsubp_s64(svint64_t, svint64_t) / svint64_t svaddsubp(svint64_t, svint64_t) - svuint64_t svaddsubp_u64(svuint64_t, svuint64_t) / svuint64_t svaddsubp(svuint64_t, svuint64_t) - SUBP (Subtract pairwise) - svint8_t svsubp_s8(svbool_t, svint8_t, svint8_t) / svint8_t svsubp(svbool_t, svint8_t, svint8_t) - svuint8_t svsubp_u8(svbool_t, svuint8_t, svuint8_t) / svuint8_t svsubp(svbool_t, svuint8_t, svuint8_t) - svint16_t svsubp_s16(svbool_t, svint16_t, svint16_t) / svint16_t svsubp(svbool_t, svint16_t, svint16_t) - svuint16_t svsubp_u16(svbool_t, svuint16_t, svuint16_t) / svuint16_t svsubp(svbool_t, svuint16_t, svuint16_t) - svint32_t svsubp_s32(svbool_t, svint32_t, svint32_t) / svint32_t svsubp(svbool_t, svint32_t, svint32_t) - svuint32_t svsubp_u32(svbool_t, svuint32_t, svuint32_t) / svuint32_t svsubp(svbool_t, svuint32_t, svuint32_t) - svint64_t svsubp_s64(svbool_t, svint64_t, svint64_t) / svint64_t svsubp(svbool_t, svint64_t, svint64_t) - svuint64_t svsubp_u64(svbool_t, svuint64_t, svuint64_t) / svuint64_t svsubp(svbool_t, svuint64_t, svuint64_t)
llvm-sync Bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
Apr 27, 2026
…add/add-and-subtract/subtract pairwise operations (#187527) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) - ADDQP (Add pairwise within quadword vector segments) - svint8_t svaddqp_s8(svint8_t, svint8_t) / svint8_t svaddqp(svint8_t, svint8_t) - svuint8_t svaddqp_u8(svuint8_t, svuint8_t) / svuint8_t svaddqp(svuint8_t, svuint8_t) - svint16_t svaddqp_s16(svint16_t, svint16_t) / svint16_t svaddqp(svint16_t, svint16_t) - svuint16_t svaddqp_u16(svuint16_t, svuint16_t) / svuint16_t svaddqp(svuint16_t, svuint16_t) - svint32_t svaddqp_s32(svint32_t, svint32_t) / svint32_t svaddqp(svint32_t, svint32_t) - svuint32_t svaddqp_u32(svuint32_t, svuint32_t) / svuint32_t svaddqp(svuint32_t, svuint32_t) - svint64_t svaddqp_s64(svint64_t, svint64_t) / svint64_t svaddqp(svint64_t, svint64_t) - svuint64_t svaddqp_u64(svuint64_t, svuint64_t) / svuint64_t svaddqp(svuint64_t, svuint64_t) - ADDSUBP (Add and subtract pairwise) - svint8_t svaddsubp_s8(svint8_t, svint8_t) / svint8_t svaddsubp(svint8_t, svint8_t) - svuint8_t svaddsubp_u8(svuint8_t, svuint8_t) / svuint8_t svaddsubp(svuint8_t, svuint8_t) - svint16_t svaddsubp_s16(svint16_t, svint16_t) / svint16_t svaddsubp(svint16_t, svint16_t) - svuint16_t svaddsubp_u16(svuint16_t, svuint16_t) / svuint16_t svaddsubp(svuint16_t, svuint16_t) - svint32_t svaddsubp_s32(svint32_t, svint32_t) / svint32_t svaddsubp(svint32_t, svint32_t) - svuint32_t svaddsubp_u32(svuint32_t, svuint32_t) / svuint32_t svaddsubp(svuint32_t, svuint32_t) - svint64_t svaddsubp_s64(svint64_t, svint64_t) / svint64_t svaddsubp(svint64_t, svint64_t) - svuint64_t svaddsubp_u64(svuint64_t, svuint64_t) / svuint64_t svaddsubp(svuint64_t, svuint64_t) - SUBP (Subtract pairwise) - svint8_t svsubp_s8(svbool_t, svint8_t, svint8_t) / svint8_t svsubp(svbool_t, svint8_t, svint8_t) - svuint8_t svsubp_u8(svbool_t, svuint8_t, svuint8_t) / svuint8_t svsubp(svbool_t, svuint8_t, svuint8_t) - svint16_t svsubp_s16(svbool_t, svint16_t, svint16_t) / svint16_t svsubp(svbool_t, svint16_t, svint16_t) - svuint16_t svsubp_u16(svbool_t, svuint16_t, svuint16_t) / svuint16_t svsubp(svbool_t, svuint16_t, svuint16_t) - svint32_t svsubp_s32(svbool_t, svint32_t, svint32_t) / svint32_t svsubp(svbool_t, svint32_t, svint32_t) - svuint32_t svsubp_u32(svbool_t, svuint32_t, svuint32_t) / svuint32_t svsubp(svbool_t, svuint32_t, svuint32_t) - svint64_t svsubp_s64(svbool_t, svint64_t, svint64_t) / svint64_t svsubp(svbool_t, svint64_t, svint64_t) - svuint64_t svsubp_u64(svbool_t, svuint64_t, svuint64_t) / svuint64_t svsubp(svbool_t, svuint64_t, svuint64_t)
llvm-upstreamsync Bot
pushed a commit
to qualcomm/cpullvm-toolchain
that referenced
this pull request
Apr 27, 2026
…add/add-and-subtract/subtract pairwise operations (#187527) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) - ADDQP (Add pairwise within quadword vector segments) - svint8_t svaddqp_s8(svint8_t, svint8_t) / svint8_t svaddqp(svint8_t, svint8_t) - svuint8_t svaddqp_u8(svuint8_t, svuint8_t) / svuint8_t svaddqp(svuint8_t, svuint8_t) - svint16_t svaddqp_s16(svint16_t, svint16_t) / svint16_t svaddqp(svint16_t, svint16_t) - svuint16_t svaddqp_u16(svuint16_t, svuint16_t) / svuint16_t svaddqp(svuint16_t, svuint16_t) - svint32_t svaddqp_s32(svint32_t, svint32_t) / svint32_t svaddqp(svint32_t, svint32_t) - svuint32_t svaddqp_u32(svuint32_t, svuint32_t) / svuint32_t svaddqp(svuint32_t, svuint32_t) - svint64_t svaddqp_s64(svint64_t, svint64_t) / svint64_t svaddqp(svint64_t, svint64_t) - svuint64_t svaddqp_u64(svuint64_t, svuint64_t) / svuint64_t svaddqp(svuint64_t, svuint64_t) - ADDSUBP (Add and subtract pairwise) - svint8_t svaddsubp_s8(svint8_t, svint8_t) / svint8_t svaddsubp(svint8_t, svint8_t) - svuint8_t svaddsubp_u8(svuint8_t, svuint8_t) / svuint8_t svaddsubp(svuint8_t, svuint8_t) - svint16_t svaddsubp_s16(svint16_t, svint16_t) / svint16_t svaddsubp(svint16_t, svint16_t) - svuint16_t svaddsubp_u16(svuint16_t, svuint16_t) / svuint16_t svaddsubp(svuint16_t, svuint16_t) - svint32_t svaddsubp_s32(svint32_t, svint32_t) / svint32_t svaddsubp(svint32_t, svint32_t) - svuint32_t svaddsubp_u32(svuint32_t, svuint32_t) / svuint32_t svaddsubp(svuint32_t, svuint32_t) - svint64_t svaddsubp_s64(svint64_t, svint64_t) / svint64_t svaddsubp(svint64_t, svint64_t) - svuint64_t svaddsubp_u64(svuint64_t, svuint64_t) / svuint64_t svaddsubp(svuint64_t, svuint64_t) - SUBP (Subtract pairwise) - svint8_t svsubp_s8(svbool_t, svint8_t, svint8_t) / svint8_t svsubp(svbool_t, svint8_t, svint8_t) - svuint8_t svsubp_u8(svbool_t, svuint8_t, svuint8_t) / svuint8_t svsubp(svbool_t, svuint8_t, svuint8_t) - svint16_t svsubp_s16(svbool_t, svint16_t, svint16_t) / svint16_t svsubp(svbool_t, svint16_t, svint16_t) - svuint16_t svsubp_u16(svbool_t, svuint16_t, svuint16_t) / svuint16_t svsubp(svbool_t, svuint16_t, svuint16_t) - svint32_t svsubp_s32(svbool_t, svint32_t, svint32_t) / svint32_t svsubp(svbool_t, svint32_t, svint32_t) - svuint32_t svsubp_u32(svbool_t, svuint32_t, svuint32_t) / svuint32_t svsubp(svbool_t, svuint32_t, svuint32_t) - svint64_t svsubp_s64(svbool_t, svint64_t, svint64_t) / svint64_t svsubp(svbool_t, svint64_t, svint64_t) - svuint64_t svsubp_u64(svbool_t, svuint64_t, svuint64_t) / svuint64_t svsubp(svbool_t, svuint64_t, svuint64_t)
amilendra
added a commit
to amilendra/llvm-project
that referenced
this pull request
Apr 27, 2026
…ned/unsigned absolute difference sum and accumulate long ops Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) SABAL (Two-way signed absolute difference sum and accumulate long) - svint16_t svabal[_s16](svint16_t, svint8_t, svint8_t) / svint16_t svabal[_n_s16](svint16_t, svint8_t, int8_t) - svint32_t svabal[_s32](svint32_t, svint16_t, svint16_t) / svint32_t svabal[_n_s32](svint32_t, svint16_t, int16_t) - svint64_t svabal[_s64](svint64_t, svint32_t, svint32_t) / svint64_t svabal[_n_s64](svint64_t, svint32_t, int32_t) UABAL (Two-way unsigned absolute difference sum and accumulate long ) - svuint16_t svabal[_u16](svuint16_t, svuint8_t, svuint8_t) / svuint16_t svabal[_n_u16](svuint16_t, svuint8_t, uint8_t) - svuint32_t svabal[_u32](svuint32_t, svuint16_t, svuint16_t) / svuint32_t svabal[_n_u32](svuint32_t, svuint16_t, uint16_t) - svuint64_t svabal[_u64](svuint64_t, svuint32_t, svuint32_t) / svuint64_t svabal[_n_u64](svuint64_t, svuint32_t, uint32_t)
CarolineConcatto
pushed a commit
to llvm/llvm-project
that referenced
this pull request
Apr 27, 2026
…ned/unsigned absolute difference sum and accumulate long ops (#188972) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) SABAL (Two-way signed absolute difference sum and accumulate long) - svint16_t svabal[_s16](svint16_t, svint8_t, svint8_t) / svint16_t svabal[_n_s16](svint16_t, svint8_t, int8_t) - svint32_t svabal[_s32](svint32_t, svint16_t, svint16_t) / svint32_t svabal[_n_s32](svint32_t, svint16_t, int16_t) - svint64_t svabal[_s64](svint64_t, svint32_t, svint32_t) / svint64_t svabal[_n_s64](svint64_t, svint32_t, int32_t) UABAL (Two-way unsigned absolute difference sum and accumulate long ) - svuint16_t svabal[_u16](svuint16_t, svuint8_t, svuint8_t) / svuint16_t svabal[_n_u16](svuint16_t, svuint8_t, uint8_t) - svuint32_t svabal[_u32](svuint32_t, svuint16_t, svuint16_t) / svuint32_t svabal[_n_u32](svuint32_t, svuint16_t, uint16_t) - svuint64_t svabal[_u64](svuint64_t, svuint32_t, svuint32_t) / svuint64_t svabal[_n_u64](svuint64_t, svuint32_t, uint32_t)
llvm-upstreamsync Bot
pushed a commit
to qualcomm/cpullvm-toolchain
that referenced
this pull request
Apr 27, 2026
…Two-way signed/unsigned absolute difference sum and accumulate long ops (#188972) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) SABAL (Two-way signed absolute difference sum and accumulate long) - svint16_t svabal[_s16](svint16_t, svint8_t, svint8_t) / svint16_t svabal[_n_s16](svint16_t, svint8_t, int8_t) - svint32_t svabal[_s32](svint32_t, svint16_t, svint16_t) / svint32_t svabal[_n_s32](svint32_t, svint16_t, int16_t) - svint64_t svabal[_s64](svint64_t, svint32_t, svint32_t) / svint64_t svabal[_n_s64](svint64_t, svint32_t, int32_t) UABAL (Two-way unsigned absolute difference sum and accumulate long ) - svuint16_t svabal[_u16](svuint16_t, svuint8_t, svuint8_t) / svuint16_t svabal[_n_u16](svuint16_t, svuint8_t, uint8_t) - svuint32_t svabal[_u32](svuint32_t, svuint16_t, svuint16_t) / svuint32_t svabal[_n_u32](svuint32_t, svuint16_t, uint16_t) - svuint64_t svabal[_u64](svuint64_t, svuint32_t, svuint32_t) / svuint64_t svabal[_n_u64](svuint64_t, svuint32_t, uint32_t)
llvm-sync Bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
Apr 27, 2026
…Two-way signed/unsigned absolute difference sum and accumulate long ops (#188972) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) SABAL (Two-way signed absolute difference sum and accumulate long) - svint16_t svabal[_s16](svint16_t, svint8_t, svint8_t) / svint16_t svabal[_n_s16](svint16_t, svint8_t, int8_t) - svint32_t svabal[_s32](svint32_t, svint16_t, svint16_t) / svint32_t svabal[_n_s32](svint32_t, svint16_t, int16_t) - svint64_t svabal[_s64](svint64_t, svint32_t, svint32_t) / svint64_t svabal[_n_s64](svint64_t, svint32_t, int32_t) UABAL (Two-way unsigned absolute difference sum and accumulate long ) - svuint16_t svabal[_u16](svuint16_t, svuint8_t, svuint8_t) / svuint16_t svabal[_n_u16](svuint16_t, svuint8_t, uint8_t) - svuint32_t svabal[_u32](svuint32_t, svuint16_t, svuint16_t) / svuint32_t svabal[_n_u32](svuint32_t, svuint16_t, uint16_t) - svuint64_t svabal[_u64](svuint64_t, svuint32_t, svuint32_t) / svuint64_t svabal[_n_u64](svuint64_t, svuint32_t, uint32_t)
cpullvm-upstream-sync Bot
pushed a commit
to navaneethshan/cpullvm-toolchain-1
that referenced
this pull request
Apr 27, 2026
…Two-way signed/unsigned absolute difference sum and accumulate long ops (#188972) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) SABAL (Two-way signed absolute difference sum and accumulate long) - svint16_t svabal[_s16](svint16_t, svint8_t, svint8_t) / svint16_t svabal[_n_s16](svint16_t, svint8_t, int8_t) - svint32_t svabal[_s32](svint32_t, svint16_t, svint16_t) / svint32_t svabal[_n_s32](svint32_t, svint16_t, int16_t) - svint64_t svabal[_s64](svint64_t, svint32_t, svint32_t) / svint64_t svabal[_n_s64](svint64_t, svint32_t, int32_t) UABAL (Two-way unsigned absolute difference sum and accumulate long ) - svuint16_t svabal[_u16](svuint16_t, svuint8_t, svuint8_t) / svuint16_t svabal[_n_u16](svuint16_t, svuint8_t, uint8_t) - svuint32_t svabal[_u32](svuint32_t, svuint16_t, svuint16_t) / svuint32_t svabal[_n_u32](svuint32_t, svuint16_t, uint16_t) - svuint64_t svabal[_u64](svuint64_t, svuint32_t, svuint32_t) / svuint64_t svabal[_n_u64](svuint64_t, svuint32_t, uint32_t)
amilendra
added a commit
to amilendra/llvm-project
that referenced
this pull request
Apr 28, 2026
…tions Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) Multi-vector saturating rounding shift right narrow and interleave instructions - SQRSHRN - svint8_t svqrshrn_s8(svint16x2_t, uint64_t) / svint8_t svqrshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - UQRSHRN - svuint8_t svqrshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqrshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - SQRSHRUN - svuint8_t svqrshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqrshrun_n_u8_s16_x2(svint16x2_t, uint64_t) Multi-vector saturating shift right narrow and interleave - SQSHRN - svint8_t svqshrn_s8(svint16x2_t, uint64_t) / svint8_t svqshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - svint16_t svqshrn_s16(svint32x2_t, uint64_t) / svint16_t svqshrn_n_s16_s32_x2(svint32x2_t, uint64_t) - UQSHRN - svuint8_t svqshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - svuint16_t svqshrn_u16(svuint32x2_t, uint64_t) / svuint16_t svqshrn_n_u16_u32_x2(svuint32x2_t, uint64_t) - SQSHRUN - svuint8_t svqshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqshrun_n_u8_s16_x2(svint16x2_t, uint64_t) - svuint16_t svqshrun_u16(svint32x2_t, uint64_t) / svuint16_t svqshrun_n_u16_s32_x2(svint32x2_t, uint64_t)
CarolineConcatto
pushed a commit
to llvm/llvm-project
that referenced
this pull request
Apr 28, 2026
…tions (#186087) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) Multi-vector saturating rounding shift right narrow and interleave instructions - SQRSHRN - svint8_t svqrshrn_s8(svint16x2_t, uint64_t) / svint8_t svqrshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - UQRSHRN - svuint8_t svqrshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqrshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - SQRSHRUN - svuint8_t svqrshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqrshrun_n_u8_s16_x2(svint16x2_t, uint64_t) Multi-vector saturating shift right narrow and interleave - SQSHRN - svint8_t svqshrn_s8(svint16x2_t, uint64_t) / svint8_t svqshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - svint16_t svqshrn_s16(svint32x2_t, uint64_t) / svint16_t svqshrn_n_s16_s32_x2(svint32x2_t, uint64_t) - UQSHRN - svuint8_t svqshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - svuint16_t svqshrn_u16(svuint32x2_t, uint64_t) / svuint16_t svqshrn_n_u16_u32_x2(svuint32x2_t, uint64_t) - SQSHRUN - svuint8_t svqshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqshrun_n_u8_s16_x2(svint16x2_t, uint64_t) - svuint16_t svqshrun_u16(svint32x2_t, uint64_t) / svuint16_t svqshrun_n_u16_s32_x2(svint32x2_t, uint64_t)
llvm-sync Bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
Apr 28, 2026
…shift operations (#186087) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) Multi-vector saturating rounding shift right narrow and interleave instructions - SQRSHRN - svint8_t svqrshrn_s8(svint16x2_t, uint64_t) / svint8_t svqrshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - UQRSHRN - svuint8_t svqrshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqrshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - SQRSHRUN - svuint8_t svqrshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqrshrun_n_u8_s16_x2(svint16x2_t, uint64_t) Multi-vector saturating shift right narrow and interleave - SQSHRN - svint8_t svqshrn_s8(svint16x2_t, uint64_t) / svint8_t svqshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - svint16_t svqshrn_s16(svint32x2_t, uint64_t) / svint16_t svqshrn_n_s16_s32_x2(svint32x2_t, uint64_t) - UQSHRN - svuint8_t svqshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - svuint16_t svqshrn_u16(svuint32x2_t, uint64_t) / svuint16_t svqshrn_n_u16_u32_x2(svuint32x2_t, uint64_t) - SQSHRUN - svuint8_t svqshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqshrun_n_u8_s16_x2(svint16x2_t, uint64_t) - svuint16_t svqshrun_u16(svint32x2_t, uint64_t) / svuint16_t svqshrun_n_u16_s32_x2(svint32x2_t, uint64_t)
llvm-upstreamsync Bot
pushed a commit
to qualcomm/cpullvm-toolchain
that referenced
this pull request
Apr 28, 2026
…shift operations (#186087) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) Multi-vector saturating rounding shift right narrow and interleave instructions - SQRSHRN - svint8_t svqrshrn_s8(svint16x2_t, uint64_t) / svint8_t svqrshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - UQRSHRN - svuint8_t svqrshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqrshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - SQRSHRUN - svuint8_t svqrshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqrshrun_n_u8_s16_x2(svint16x2_t, uint64_t) Multi-vector saturating shift right narrow and interleave - SQSHRN - svint8_t svqshrn_s8(svint16x2_t, uint64_t) / svint8_t svqshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - svint16_t svqshrn_s16(svint32x2_t, uint64_t) / svint16_t svqshrn_n_s16_s32_x2(svint32x2_t, uint64_t) - UQSHRN - svuint8_t svqshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - svuint16_t svqshrn_u16(svuint32x2_t, uint64_t) / svuint16_t svqshrn_n_u16_u32_x2(svuint32x2_t, uint64_t) - SQSHRUN - svuint8_t svqshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqshrun_n_u8_s16_x2(svint16x2_t, uint64_t) - svuint16_t svqshrun_u16(svint32x2_t, uint64_t) / svuint16_t svqshrun_n_u16_s32_x2(svint32x2_t, uint64_t)
cpullvm-upstream-sync Bot
pushed a commit
to navaneethshan/cpullvm-toolchain-1
that referenced
this pull request
Apr 28, 2026
…shift operations (#186087) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) Multi-vector saturating rounding shift right narrow and interleave instructions - SQRSHRN - svint8_t svqrshrn_s8(svint16x2_t, uint64_t) / svint8_t svqrshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - UQRSHRN - svuint8_t svqrshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqrshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - SQRSHRUN - svuint8_t svqrshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqrshrun_n_u8_s16_x2(svint16x2_t, uint64_t) Multi-vector saturating shift right narrow and interleave - SQSHRN - svint8_t svqshrn_s8(svint16x2_t, uint64_t) / svint8_t svqshrn_n_s8_s16_x2(svint16x2_t, uint64_t) - svint16_t svqshrn_s16(svint32x2_t, uint64_t) / svint16_t svqshrn_n_s16_s32_x2(svint32x2_t, uint64_t) - UQSHRN - svuint8_t svqshrn_u8(svuint16x2_t, uint64_t) / svuint8_t svqshrn_n_u8_u16_x2(svuint16x2_t, uint64_t) - svuint16_t svqshrn_u16(svuint32x2_t, uint64_t) / svuint16_t svqshrn_n_u16_u32_x2(svuint32x2_t, uint64_t) - SQSHRUN - svuint8_t svqshrun_u8(svint16x2_t, uint64_t) / svuint8_t svqshrun_n_u8_s16_x2(svint16x2_t, uint64_t) - svuint16_t svqshrun_u16(svint32x2_t, uint64_t) / svuint16_t svqshrun_n_u16_s32_x2(svint32x2_t, uint64_t)
yingopq
pushed a commit
to yingopq/llvm-project
that referenced
this pull request
Apr 29, 2026
…-subtract/subtract pairwise operations (llvm#187527) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) - ADDQP (Add pairwise within quadword vector segments) - svint8_t svaddqp_s8(svint8_t, svint8_t) / svint8_t svaddqp(svint8_t, svint8_t) - svuint8_t svaddqp_u8(svuint8_t, svuint8_t) / svuint8_t svaddqp(svuint8_t, svuint8_t) - svint16_t svaddqp_s16(svint16_t, svint16_t) / svint16_t svaddqp(svint16_t, svint16_t) - svuint16_t svaddqp_u16(svuint16_t, svuint16_t) / svuint16_t svaddqp(svuint16_t, svuint16_t) - svint32_t svaddqp_s32(svint32_t, svint32_t) / svint32_t svaddqp(svint32_t, svint32_t) - svuint32_t svaddqp_u32(svuint32_t, svuint32_t) / svuint32_t svaddqp(svuint32_t, svuint32_t) - svint64_t svaddqp_s64(svint64_t, svint64_t) / svint64_t svaddqp(svint64_t, svint64_t) - svuint64_t svaddqp_u64(svuint64_t, svuint64_t) / svuint64_t svaddqp(svuint64_t, svuint64_t) - ADDSUBP (Add and subtract pairwise) - svint8_t svaddsubp_s8(svint8_t, svint8_t) / svint8_t svaddsubp(svint8_t, svint8_t) - svuint8_t svaddsubp_u8(svuint8_t, svuint8_t) / svuint8_t svaddsubp(svuint8_t, svuint8_t) - svint16_t svaddsubp_s16(svint16_t, svint16_t) / svint16_t svaddsubp(svint16_t, svint16_t) - svuint16_t svaddsubp_u16(svuint16_t, svuint16_t) / svuint16_t svaddsubp(svuint16_t, svuint16_t) - svint32_t svaddsubp_s32(svint32_t, svint32_t) / svint32_t svaddsubp(svint32_t, svint32_t) - svuint32_t svaddsubp_u32(svuint32_t, svuint32_t) / svuint32_t svaddsubp(svuint32_t, svuint32_t) - svint64_t svaddsubp_s64(svint64_t, svint64_t) / svint64_t svaddsubp(svint64_t, svint64_t) - svuint64_t svaddsubp_u64(svuint64_t, svuint64_t) / svuint64_t svaddsubp(svuint64_t, svuint64_t) - SUBP (Subtract pairwise) - svint8_t svsubp_s8(svbool_t, svint8_t, svint8_t) / svint8_t svsubp(svbool_t, svint8_t, svint8_t) - svuint8_t svsubp_u8(svbool_t, svuint8_t, svuint8_t) / svuint8_t svsubp(svbool_t, svuint8_t, svuint8_t) - svint16_t svsubp_s16(svbool_t, svint16_t, svint16_t) / svint16_t svsubp(svbool_t, svint16_t, svint16_t) - svuint16_t svsubp_u16(svbool_t, svuint16_t, svuint16_t) / svuint16_t svsubp(svbool_t, svuint16_t, svuint16_t) - svint32_t svsubp_s32(svbool_t, svint32_t, svint32_t) / svint32_t svsubp(svbool_t, svint32_t, svint32_t) - svuint32_t svsubp_u32(svbool_t, svuint32_t, svuint32_t) / svuint32_t svsubp(svbool_t, svuint32_t, svuint32_t) - svint64_t svsubp_s64(svbool_t, svint64_t, svint64_t) / svint64_t svsubp(svbool_t, svint64_t, svint64_t) - svuint64_t svsubp_u64(svbool_t, svuint64_t, svuint64_t) / svuint64_t svsubp(svbool_t, svuint64_t, svuint64_t)
yingopq
pushed a commit
to yingopq/llvm-project
that referenced
this pull request
Apr 29, 2026
…ned/unsigned absolute difference sum and accumulate long ops (llvm#188972) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) SABAL (Two-way signed absolute difference sum and accumulate long) - svint16_t svabal[_s16](svint16_t, svint8_t, svint8_t) / svint16_t svabal[_n_s16](svint16_t, svint8_t, int8_t) - svint32_t svabal[_s32](svint32_t, svint16_t, svint16_t) / svint32_t svabal[_n_s32](svint32_t, svint16_t, int16_t) - svint64_t svabal[_s64](svint64_t, svint32_t, svint32_t) / svint64_t svabal[_n_s64](svint64_t, svint32_t, int32_t) UABAL (Two-way unsigned absolute difference sum and accumulate long ) - svuint16_t svabal[_u16](svuint16_t, svuint8_t, svuint8_t) / svuint16_t svabal[_n_u16](svuint16_t, svuint8_t, uint8_t) - svuint32_t svabal[_u32](svuint32_t, svuint16_t, svuint16_t) / svuint32_t svabal[_n_u32](svuint32_t, svuint16_t, uint16_t) - svuint64_t svabal[_u64](svuint64_t, svuint32_t, svuint32_t) / svuint64_t svabal[_n_u64](svuint64_t, svuint32_t, uint32_t)
KHicketts
pushed a commit
to KHicketts/llvm-project
that referenced
this pull request
Apr 30, 2026
…-subtract/subtract pairwise operations (llvm#187527) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) - ADDQP (Add pairwise within quadword vector segments) - svint8_t svaddqp_s8(svint8_t, svint8_t) / svint8_t svaddqp(svint8_t, svint8_t) - svuint8_t svaddqp_u8(svuint8_t, svuint8_t) / svuint8_t svaddqp(svuint8_t, svuint8_t) - svint16_t svaddqp_s16(svint16_t, svint16_t) / svint16_t svaddqp(svint16_t, svint16_t) - svuint16_t svaddqp_u16(svuint16_t, svuint16_t) / svuint16_t svaddqp(svuint16_t, svuint16_t) - svint32_t svaddqp_s32(svint32_t, svint32_t) / svint32_t svaddqp(svint32_t, svint32_t) - svuint32_t svaddqp_u32(svuint32_t, svuint32_t) / svuint32_t svaddqp(svuint32_t, svuint32_t) - svint64_t svaddqp_s64(svint64_t, svint64_t) / svint64_t svaddqp(svint64_t, svint64_t) - svuint64_t svaddqp_u64(svuint64_t, svuint64_t) / svuint64_t svaddqp(svuint64_t, svuint64_t) - ADDSUBP (Add and subtract pairwise) - svint8_t svaddsubp_s8(svint8_t, svint8_t) / svint8_t svaddsubp(svint8_t, svint8_t) - svuint8_t svaddsubp_u8(svuint8_t, svuint8_t) / svuint8_t svaddsubp(svuint8_t, svuint8_t) - svint16_t svaddsubp_s16(svint16_t, svint16_t) / svint16_t svaddsubp(svint16_t, svint16_t) - svuint16_t svaddsubp_u16(svuint16_t, svuint16_t) / svuint16_t svaddsubp(svuint16_t, svuint16_t) - svint32_t svaddsubp_s32(svint32_t, svint32_t) / svint32_t svaddsubp(svint32_t, svint32_t) - svuint32_t svaddsubp_u32(svuint32_t, svuint32_t) / svuint32_t svaddsubp(svuint32_t, svuint32_t) - svint64_t svaddsubp_s64(svint64_t, svint64_t) / svint64_t svaddsubp(svint64_t, svint64_t) - svuint64_t svaddsubp_u64(svuint64_t, svuint64_t) / svuint64_t svaddsubp(svuint64_t, svuint64_t) - SUBP (Subtract pairwise) - svint8_t svsubp_s8(svbool_t, svint8_t, svint8_t) / svint8_t svsubp(svbool_t, svint8_t, svint8_t) - svuint8_t svsubp_u8(svbool_t, svuint8_t, svuint8_t) / svuint8_t svsubp(svbool_t, svuint8_t, svuint8_t) - svint16_t svsubp_s16(svbool_t, svint16_t, svint16_t) / svint16_t svsubp(svbool_t, svint16_t, svint16_t) - svuint16_t svsubp_u16(svbool_t, svuint16_t, svuint16_t) / svuint16_t svsubp(svbool_t, svuint16_t, svuint16_t) - svint32_t svsubp_s32(svbool_t, svint32_t, svint32_t) / svint32_t svsubp(svbool_t, svint32_t, svint32_t) - svuint32_t svsubp_u32(svbool_t, svuint32_t, svuint32_t) / svuint32_t svsubp(svbool_t, svuint32_t, svuint32_t) - svint64_t svsubp_s64(svbool_t, svint64_t, svint64_t) / svint64_t svsubp(svbool_t, svint64_t, svint64_t) - svuint64_t svsubp_u64(svbool_t, svuint64_t, svuint64_t) / svuint64_t svsubp(svbool_t, svuint64_t, svuint64_t)
KHicketts
pushed a commit
to KHicketts/llvm-project
that referenced
this pull request
Apr 30, 2026
…ned/unsigned absolute difference sum and accumulate long ops (llvm#188972) Add the following new clang intrinsics based on the ACLE specification ARM-software/acle#428 (Add alpha support for 9.7 data processing intrinsics) SABAL (Two-way signed absolute difference sum and accumulate long) - svint16_t svabal[_s16](svint16_t, svint8_t, svint8_t) / svint16_t svabal[_n_s16](svint16_t, svint8_t, int8_t) - svint32_t svabal[_s32](svint32_t, svint16_t, svint16_t) / svint32_t svabal[_n_s32](svint32_t, svint16_t, int16_t) - svint64_t svabal[_s64](svint64_t, svint32_t, svint32_t) / svint64_t svabal[_n_s64](svint64_t, svint32_t, int32_t) UABAL (Two-way unsigned absolute difference sum and accumulate long ) - svuint16_t svabal[_u16](svuint16_t, svuint8_t, svuint8_t) / svuint16_t svabal[_n_u16](svuint16_t, svuint8_t, uint8_t) - svuint32_t svabal[_u32](svuint32_t, svuint16_t, svuint16_t) / svuint32_t svabal[_n_u32](svuint32_t, svuint16_t, uint16_t) - svuint64_t svabal[_u64](svuint64_t, svuint32_t, svuint32_t) / svuint64_t svabal[_n_u64](svuint64_t, svuint32_t, uint32_t)
Lukacma
added a commit
to llvm/llvm-project
that referenced
this pull request
May 5, 2026
The intrinsics are based on the following [proposal](ARM-software/acle#428). These are: ``` svint16_t svdot_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm); svint16_t svdot_n_s16_s8(svint16_t zda, svint8_t zn, int8_t zm); svint16_t svdot_lane_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm, uint64_t imm_idx); svuint16_t svdot_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm); svuint16_t svdot_n_u16_u8(svuint16_t zda, svuint8_t zn, uint8_t zm); svuint16_t svdot_lane_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm, uint64_t imm_idx); ```
llvm-sync Bot
pushed a commit
to arm/arm-toolchain
that referenced
this pull request
May 5, 2026
…(#189424) The intrinsics are based on the following [proposal](ARM-software/acle#428). These are: ``` svint16_t svdot_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm); svint16_t svdot_n_s16_s8(svint16_t zda, svint8_t zn, int8_t zm); svint16_t svdot_lane_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm, uint64_t imm_idx); svuint16_t svdot_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm); svuint16_t svdot_n_u16_u8(svuint16_t zda, svuint8_t zn, uint8_t zm); svuint16_t svdot_lane_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm, uint64_t imm_idx); ```
llvm-upstreamsync Bot
pushed a commit
to qualcomm/cpullvm-toolchain
that referenced
this pull request
May 5, 2026
…(#189424) The intrinsics are based on the following [proposal](ARM-software/acle#428). These are: ``` svint16_t svdot_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm); svint16_t svdot_n_s16_s8(svint16_t zda, svint8_t zn, int8_t zm); svint16_t svdot_lane_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm, uint64_t imm_idx); svuint16_t svdot_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm); svuint16_t svdot_n_u16_u8(svuint16_t zda, svuint8_t zn, uint8_t zm); svuint16_t svdot_lane_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm, uint64_t imm_idx); ```
cpullvm-upstream-sync Bot
pushed a commit
to navaneethshan/cpullvm-toolchain-1
that referenced
this pull request
May 5, 2026
…(#189424) The intrinsics are based on the following [proposal](ARM-software/acle#428). These are: ``` svint16_t svdot_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm); svint16_t svdot_n_s16_s8(svint16_t zda, svint8_t zn, int8_t zm); svint16_t svdot_lane_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm, uint64_t imm_idx); svuint16_t svdot_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm); svuint16_t svdot_n_u16_u8(svuint16_t zda, svuint8_t zn, uint8_t zm); svuint16_t svdot_lane_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm, uint64_t imm_idx); ```
moar55
pushed a commit
to moar55/llvm-project
that referenced
this pull request
May 12, 2026
The intrinsics are based on the following [proposal](ARM-software/acle#428). These are: ``` svint16_t svdot_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm); svint16_t svdot_n_s16_s8(svint16_t zda, svint8_t zn, int8_t zm); svint16_t svdot_lane_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm, uint64_t imm_idx); svuint16_t svdot_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm); svuint16_t svdot_n_u16_u8(svuint16_t zda, svuint8_t zn, uint8_t zm); svuint16_t svdot_lane_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm, uint64_t imm_idx); ```
MartinWehking
added a commit
to MartinWehking/llvm-project
that referenced
this pull request
May 19, 2026
Add Clang/LLVM intrinsics for svcvt, scvtflt, ucvtf, ucvtflt and fcvtzsn, fcvtzun. The Clang intrinsics are guarded by the sve2.3 and sme2.3 feature flags. ACLE Patch: ARM-software/acle#428 Fix overload and address comments Fix intrinsic name and simplify CHECK lines Reintroduce overloaded short forms for intrinsics Adapt the test cases accordingly. Rename ACLE clang intrinsic A clang intrinsic was renamed in the ACLE patch. Change the name accordingly. Use existing pattern template Apply suggestions Apply suggestions
pedroMVicente
added a commit
to pedroMVicente/llvm-project
that referenced
this pull request
May 19, 2026
IRBuilder, IRFolder, etc
small modifications on instructions.cpp and instructions.h
Constants + Verifier (DOES NOT COMPILE - JUST TO SAVE WORK)
Error fixing and selectionDAG and GlobalIsel structure
Langref Update
fix bitinsert and bitextract isValidOperands
allow return value of bitextract to be of any first-class type
allow inserted value of bitinsert to be of any first-class type
fix ConstantsContext, Verifier and echo
add onlyIfReduced logic to const expr
add parsing for bitinsert and bitextract
add bitcode and tests, fix syntax
add bitcode reader and writer logic
add printing of bitextract to AsmWriter printInstruction
fix syntax in Core, Instructions, ConstantsContext and echo
add some tests
simplify Parser
make BitcodeWriter consistent
remove constexpr, start machine IR
(not compiling; save work)
Co-authored-by: Pedro Vicente <pedro.costa.vicente@tecnico.ulisboa.pt>
SDag and global isel
better formating and fixing assert
[mlir][Vector] Add load, store, etc. to dropleadunitdim (#195686)
Discussions on improvements to fold-memref-alias-ops changes revealed
that the patterns meant to drop leading unit dimensions from vector
operations weren't handling load, store, and other "terminal" vector
dialect operations. This PR adds the patterns to fix that.
Assisted-by: Claude 4.7
[IPO] Fix infinite recursive inlining in ModuleInliner (#195471)
The ModuleInliner currently lacks inline history tracking. Without
it, the inliner can get stuck in an infinite loop when mutually
recursive functions are involved.
This patch enables inline history tracking in the ModuleInliner to
address this issue.
The minsize attribute in the test case lowers the threshold for the
mutually recursive functions, ensuring the bug reproduces in pass
isolation.
[InlineOrder] Fix assertion failure in CostBenefitPriority (#195564)
InlineCost::getStaticBonusApplied() triggers an assertion failure
if the CostBenefitPriority constructor calls it when
IC.isVariable() is false. This is because
getStaticBonusApplied() expects isVariable() to be true.
Unconditionally populating CostBenefit also incorrectly prioritizes
a NeverInline candidate with a cost-benefit pair over other
valid variable-cost sites.
This patch fixes the crash and the sorting issue by calling
getStaticBonusApplied() and populating CostBenefit only when
IC.isVariable() is true. For AlwaysInline and NeverInline costs,
CostBenefit is explicitly set to std::nullopt.
[ModuleInliner] Skip function declarations during candidate scan (#195567)
This patch skips function declarations during the candidate scan in
ModuleInlinerPass::run as declarations do not have bodies.
[asan] Add tests for __sanitizer_annotate_double_ended_contiguous_container (#195673)
[VPlan] Get GEP wrap flags from VPInstructions (NFCI). (#195730)
Add helper to retrieve GEP no-wrap flags from VPInstructions, looking
through zero-index GEPs and pointer casts, like
Value::stripPointerCasts. Removes an access to underlying IR.
Add named posix semaphore lifetime operations on linux (#192278)
This implements the second part of #190847
Specifically, this pr adds `sem_open`, `sem_close`, and `sem_unlink` for
posix semaphore on linux.
https://pubs.opengroup.org/onlinepubs/9799919799/functions/sem_open.html
https://pubs.opengroup.org/onlinepubs/9799919799/functions/sem_close.html
https://pubs.opengroup.org/onlinepubs/9799919799/functions/sem_unlink.html
Since it targets on linux implementation, two extra things are added:
1. add system call wrappers for `mmap`, `munmap`, `link`, `unlink`, and
`ftruncate`. Those are necessary for the implementation of semaphore on
linux. Wrappers is added based on the refactor proposal:
https://libc.llvm.org/dev/syscall_wrapper_refactor.html.
2. refactor the previous semaphore implementation, put it under `linux/`
since its based on linux.
[sanitizer] Capitalize HINT in reports (#195734)
Consistency with other sanitizer reports which use uppercase "HINT:".
[DWARFLinker] Emit DW_IDX_parent in the accelerator table (#195403)
.debug_names entries produced by the parallel linker were always emitted
with std::nullopt for ParentDIEOffset, resulting in a missing
DW_IDX_parent. The classic linker emits it via
DWARF5AccelTableData::getDefiningParentDieOffset on the output DIE tree.
The parallel linker can't use the same approach because the records are
saved during cloneDIE, before the output DIE has been linked into its
parent, so DIE::getParent() is nullptr at that time time. Fix that by
computing the parent offset from the input-side DIE tree instead. We
look up InputDieEntry's parent via getParentIdx, skip parents marked
DW_AT_declaration, and translate them to the output offset through
CompileUnit::getDieOutOffset. Since no real DIE can live at offset 0, we
can use that to unambiguously mark input DIEs that were not cloned into
this CU's plain DWARF (e.g. routed only into the artificial type unit)
and is treated as "no parent".
Only compile-unit accelerator entries are covered. Type-unit entries
(artificial type unit) still emit no DW_IDX_parent, tracked by a TODO.
[asan] Record container poisoning in poison history (#195674)
[flang][OpenMP] Detect DSA conflicts in nested loop constructs (#195323)
Follow-up to https://github.com/llvm/llvm-project/pull/194961
The fix from PR194961 did not detect explicit/predefined DSA conflicts
on an iteration variable in a nested loop construct. For example, in a
testcase inspired by Fujitsu 0165_0035.f90:
```
!$omp parallel do private(i) shared(j)
do i=1,1
do j=1,1
!$omp parallel do default(none) shared(k)
do k=1,1
end do
!$omp end parallel do
end do
end do
```
the "shared(k)" was not flagged as incorrect.
Adjust the fix to traverse the host-association chain until the symbol
from the DSA clause is found.
[cmake][compiler-rt][darwin] builtin libraries don't build for armv6m in Darwin (#195372)
darwin_add_builtin_libraries tests for _Float16 and __bf16 for the host
architecture rather than the one being built, add -arch to fix that so
that armv6m correctly reports that it does not support __bf16.
cfcmp/cdcmp get "error: unsupported relocation type" on their "Branch to
target address" to c{f,d}cmple. Switch those to "Call a subroutine"
instructions on Thumb-1 (e.g. armv6m).
Assisted-by: Claude Code
rdar://167828904
[SandboxIR][Tracker] Support nested checkpoints (#191097)
This patch implements nested checkpointing, i.e., you can now save the
IR state more than once and revert more than once.
For example, after two saves: save(1) and save(2), a revert() will bring
you back to the IR state of save(2), one more revert will bring you back
to the IR state of save(1).
APFloat: fix wrong result status for large floats (#189925)
For large float literals such as
`10384593717069655257060992658440193.0`,
[`FloatingLiteral::isExact`](https://github.com/llvm/llvm-project/blob/6b2b0da40de1495ace2b100799a35711f7ad7b21/clang/include/clang/AST/Expr.h#L1702)
was incorrectly returning `true`.
The issue has been tracked down to
`IEEEFloat::roundSignificandWithExponent` incorrectly reporting `opOK`.
[compiler-rt][test] Disable create_thread_loop2 for lsan on Darwin (#195753)
create_thread_loop2 occasionally hangs on macOS till hitting timeout.
Disable the tests for LSAN on macOS.
[AMDGPU][GlobalISel] RegBankLegalize rules for permlane_(bcast|up|down|xor|idx_gen) intrinsics (#195583)
[clang][Analysis] Handle const-qualified pointer refs in `ExprMutationAnalyzer` (#190421)
Teach `ExprMutationAnalyzer` to recognize references to const-qualified
pointer objects, such as `T *const &`, as non-const pointee sinks when
the pointee type itself is non-const.
Fixes #190218
Fixes #157730
[libc] Add vhscampos as the maintainer for ARM and AArch64 (#195051)
[asan] Remove XFAIL from tests that are passing with debug checks
These tests have been passing with COMPILER_RT_DEBUG=ON since 2022 (at
least for x86).
Fixes https://github.com/llvm/llvm-project/issues/46206
Pull Request: https://github.com/llvm/llvm-project/pull/84614
[libc][semaphore] Zero-initialize rand_bytes to fix GCC warning (#195757)
GCC 15 warns about `rand_bytes` being maybe uninitialized when passed to
`getrandom`. Since `getrandom` writes to it, it doesn't strictly need
initialization, but zero-initializing it satisfies the compiler and
avoids the `-Werror=maybe-uninitialized` error.
Fix for https://github.com/llvm/llvm-project/pull/192278
Assisted by Gemini
[NFC][asan] Re-use ErrorGeneric::shadow_val (#195684)
It's done to reduce code size, as additional load on reporting is
negligible.
This requires change in ErrorGeneric constructor, to now we skip partial
shadow only if next one has a better value.
Before shadow_val was either `0x[a-f].` or 0.
Now it may be partial granule as well.
However it's NFC as `0 < shadow_val < ASAN_SHADOW_GRANULARITY` does not
affect reporting outside of `CheckPoisonRecords`.
[asan] Only suggest increasing poison_history_size if the buffer is full (#195732)
I unlikely but possible to setup shadow state, e.g unpoison heap red
zone just after partial granule.
If buffer is not full, increasing it will not help.
[asan] Add HINT prefix to poison_history_size suggestions (#195733)
Consistency update to prefix poison tracking suggestions with "HINT:"
to match other ASan diagnostic hints.
[lldb] Add missing closing brace in IRForTarget (#195761)
Follow-up to 51d52c2a6cdc.
[RFC][IR] Support vector splats in `ConstantPointerNull` (#195486)
This PR allows `ConstantPointerNull` to represent both scalar pointer
nulls and fixed or scalable vector splats of pointer nulls. This change
first aligns with the native splat behavior of `ConstantInt` and
`ConstantFP`, and second, makes it easier to eventually change the
semantics of `ConstantPointerNull` to represent a semantic null pointer
instead of a zero value, which is what it represents today.
Reapply "[SandboxIR][SandboxVec] Remove score tracking from Region, add RegionWithScore" (#195708) (#195752)
This reverts commit 75f884b96dfa082227fb730435c68353cd3376c5.
[clang][ExprConst] Add another missing `NoteLValueLocation()` call (#195098)
This is a little more work since it requires a new parameter for all the
`found()` handlers.
[NVPTX] Fix for LTO dwarf emission when multiple CUs with a mix of DebugDirectivesOnly and NoDebug (#190371)
Only emit the initial .loc dwarf directive if the CU requires debug
information. When a module contains multiple CUs where one is
DebugDirectiveOnly and the rest are NoDebug, we attempt to emit dwarf
for the NoDebug compile units leading to an assertion when calling
emitDwarfFile0Directive when in getOrCreateDwarfCompileUnit.
I added a lit test for this case which used to assert.
[libcxx] Remove ASan container overflow checks for SSO strings (#194208)
Essentially a revert of #79536, although manual given how much the code
has changed.
Doing container overflow checks on SSO strings is fundamentally unsound
due to how they interact with compiler optimizations within LLVM, which
assume that the entire object is dereferencable when it is live. LLVM
can sometimes insert speculative loads that will create false positives
when they try and load a partially filled SSO buffer. This was
previously prevented using inline asm barriers within std::string
itself. However, this can also manifest in other ways, with the
motivating example for the revert being GlobalOpt inserting a
speculative load of a std::variant containing a SSO std::string, causing
an ASan failure.
[libunwind] Add SME detection for Apple platforms (#193630)
Copy SME detection sysctl code from ZA test. Also fix a
logic issue in the original code while being reviewed.
[InlineOrder] Remove unused InlineHistoryMap (#195769)
This patch removes InlineHistoryMap, an unused variable. The last
use was removed by:
commit 3af427539bc6b85a9fe3334b9a0b43d347ea29ac
Author: Arthur Eubanks <aeubanks@google.com>
Date: Wed Apr 8 09:34:52 2026 -0700
[compiler-rt] Add MSVC CRT flags to ASan coverage test (#195719)
The ASan coverage test also links clang_rt.profile because it uses
-coverage. On Windows MSVC, this can fail if the test is linked with the
static CRT but clang_rt.profile was built with the dynamic CRT.
For example, a profile runtime that uses /MD may reference DLL CRT
symbols
such as __imp_* symbols. Add the same dynamic CRT flags directly to this
one test on Windows MSVC so it keeps linking when clang_rt.profile
changes.
Reapply "[JTS][Passes] Enable JTS By Default" (#193409)" (#195776)
This reverts commit c74951c6c3074be23fd7b12843e1187ca98e98af.
The ASan issues on the buildbot should be fixed by
b5f4f16fd98cb57a13a96b7d7faa89326636c5bc.
[IR] Add elementwise modifier to atomicrmw (#189517)
This PR implements the IR side modifications of [[RFC] Add elementwise
modifier to atomicrmw](https://discourse.llvm.org/t/rfc-add-elementwise-modifier-to-atomicrmw/90134).
Design Decisions:
- In the IR, the current atomicrmw record layout looks like: [ptrty,
ptr, valty, val, operation, vol, ordering, syncscope, align]. To encode
elementwise, I decided to pack it into the operation field, which also
contains the math op (i.e. fadd, fmin, add etc...). I could have changed
the record structure, but that would be slightly more complicated.
- elementwise vector atomics can be vectors of integers because we can always scalarize legally
- elementwise vector atomics need to have power of 2 size. We can potentially remove this restriction later.
Assisted by AI.
[gn build] Port be8e0842a103
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/195781
[gn build] Port f7329e61ad4b
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/195782
[MLIR][Python] Add `ConditionallySpeculatable` interface and `Pure` specifier (#195505)
This PR brings two features: the `ConditionallySpeculatable` op
interface and the `Pure` specifier for Python-defined ops.
The result is that you can mark an op as pure like:
```python
class PureOp(
TestPure.Operation,
name="pure",
traits=[Pure] # just like in the ODS!
):
a: Operand[IntegerType[32]]
b: Operand[IntegerType[32]]
res: Result[IntegerType[32]] = infer_result()
```
Then this op is both `NoMemoryEffect` and `AlwaysSpeculatable`.
Assisted-by: Copilot/GPT5.4
[X86][APX] Add VirtRegMap to non stack foldMemoryOperand too (#193423)
We need to query mapped physical register through VirtRegMap.
Fixes: https://godbolt.org/z/1KGj3aYeP
[gn build] Port cb2a64e7e5af
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/195783
[orc-rt] Change SPS controller-interface naming conventions. (#195614)
This commit makes two changes to the naming conventions for SPS CI
symbols:
1. The orc_rt_sps_ci_ prefix is replaced with orc_rt_ci_sps_ (for SPS
wrapper functions) and orc_rt_ci_ (without the "sps_" suffix) for data
symbols.
2. The _sps_wrapper suffix is dropped from wrapper functions, since the
prefix now distinguishes between SPS-wrappers and data symbols.
fix libMLIRCAPIInterfaces after #195505 (#195789)
https://github.com/llvm/llvm-project/pull/195505 missed
`MLIRSideEffectInterfaces` in the CMakeLists.txt.
[MLIR] Add HasAncestor op trait (#195447)
Add HasAncestor/AncestorOneOf traits that verify an operation has a
specific ancestor anywhere in the parent chain, unlike HasParent which
only checks the immediate parent.
[CIR][NFC] Rename SignBitOp to CIR_SignBitOp (#195477)
Align with the CIR_ prefix naming convention used by other op
definitions in CIROps.td.
[PGO][ICP] Prevent indirect call promotion to functions with incompatible target features (#192142)
Profile-driven indirect call promotion was promoting indirect calls to
functions requiring advanced CPU features (e.g., AVX512) even when the
caller function did not support those features. When these promoted
calls were subsequently inlined, it could lead to invalid IR and
backend crashes during instruction selection because the target CPU
could not handle the advanced instructions.
This patch addresses the issue by adding a target feature
compatibility check to `llvm::isLegalToPromote` in
`CallPromotionUtils.cpp`. If the callee requires target features
(prefixed with `+`) that are not present in the caller's target
features, the promotion is skipped. By centralizing this check in
`isLegalToPromote`, we protect all passes relying on this utility
(such as `SampleProfileLoader` and `IndirectCallPromotion`) from
promoting to incompatible targets. This also prevents incorrect
inlining of `always_inline` functions that would otherwise be promoted
via indirect calls and then inlined.
Fixes #192126
Pull Request: https://github.com/llvm/llvm-project/pull/192142
[mlir][SPIR-V] Add Weak linkage type and SPV_AMD_weak_linkage extension (#195660)
- add 'Weak' linkage type (SPV_AMD_weak_linkage)
- deduce the Linkage capability and linkage-type extension from
linkage_attributes in UpdateVCE pass
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
[GlobalISel] Fast-path single vreg values in getOrCreateVRegs (#195244)
IRTranslator::translateLoad is hot in compile-time profiles of sqlite on
aarch64-O0-g. Aggregates are flattened into multiple vregs with offsets.
Avoiding this path for single vregs is a small -0.11% compile-time win.
https://llvm-compile-time-tracker.com/compare.php?from=a7a2dc59616a8cb1198d933bcdf55ebdbd78894c&to=96fcfe733947db64effd5e1d1413697981cbe357&stat=instructions%3Au
[GlobalISel] Fast-path single reg loads and stores in IRTranslator (#195245)
IRTranslator::translateLoad is hot in compile-time profiles of sqlite on
aarch64-O0-g. Aggregates are flattened into multiple vregs with offsets.
Most loads and stores lower to a single register so this offset
materialization can be avoided.
Small -0.07% geomean improvement on aarch64-O0-g with -0.24% for sqlite.
https://llvm-compile-time-tracker.com/compare.php?from=a7a2dc59616a8cb1198d933bcdf55ebdbd78894c&to=e535e96fd0952cfa646d48ef84cf5948c0a8a2ab&stat=instructions%3Au
[mlir][NFC] Fix typo (flattend => flattened) (#195780)
[AArch64][GlobalISel] Add gisel test coverage for addp-shuffles.ll. NFC (#195803)
[CIR][CUDA] Fix typed StringAttr on globals and Add CIR-to-LLVM lowering checks for existing registration support (#195002)
[AArch64][GlobalISel] Match G_DUP with undef elements (#195237)
This helps us match more vector splats that contain undef elements,
matching build vectors that contain undef so long as they contain at
least 2 duplicate entries.
[LV] Introduce isLegalMaskedLoadOrStore (NFC) (#195242)
This simplifies legality checks, and eventually will become the single
point querying TTI hooks for masked ld/st. Currently, legality checks
for interleaved accesses still query TTI directly.
[clang][AST][NFC] const-correctness improvements for member functions returing `ArrayRef` (#195784)
- Add const qualifiers to member functions.
- Drop non-const-qualified member functions whose const-qualified
versions return same thing.
[libc++][mdspan] Fix `mdspan::operator[]` bounds checking and implement LWG4020 (#192269)
This example fails to compile with libc++ in hardened mode without this
fix.
```c++
struct RValueInt {
constexpr operator int() && noexcept { return 0; }
};
int main() {
int data[1] = {42};
std::mdspan m(data, std::extents<int, 1>{1});
m[RValueInt{}];
}
```
Fixes: #171311
[AssumeBuilder] Fix incorrect nonnull optimization (#195676)
We can only optimize nonnull assumes to refer to the base of a GEP if
the GEP is inbounds.
From #195650.
[NFC][clang] InterpBuiltin.cpp - use consistent ia32 naming convention for all x86 builtin expansions (#195699)
Also, add IsAdd argument to interp__builtin_ia32_addcarry_subborrow instead of repeated builtinid comparisons
[mlir][SPIR-V][NFC] Add pushCaps/pushExts helpers to type visitors (#195796)
Discussed in
https://github.com/llvm/llvm-project/pull/195060#discussion_r3183984853
[MLIR] Fix offload map metadata order (#195346)
This fixes one part of
https://github.com/llvm/llvm-project/issues/195333.
There are two different semicolon-field orders in
offload/include/Shared/SourceInfo.h:
1. Per-map strings (`map_var_info_t`): first field is the mapped name,
second
is the file path:
```
/// Type alias for source location information for variable mappings with
/// data layout ";name;filename;row;col;;\0" from clang.
using map_var_info_t = void *;
```
2. `ident_t` region strings: first field is the file path, second is the
function name:
```
/// The ident structure that describes a source location from kmp.h. with
/// source location string data as ";filename;function;line;column;;\0".
struct ident_t {
```
`SourceInfo` reflects that with two constructors: `SourceInfo(const
ident_t *)`
parses `getSubstring(0)` as the filename field and `getSubstring(1)` as
the
name (function); `SourceInfo(const map_var_info_t)` uses
`getSubstring(0)` for
the map name and `getSubstring(1)` for the filename path.
`OpenMPIRBuilder::getOrCreateSrcLocStr(FunctionName, FileName, line,
col)`
always builds the same wire shape: the second parameter is emitted
first,
then the first i.e. ;FileName;FunctionName;line;col;.
Clang passes arguments accordingly in two places:
1. per-map metadata (`emitMappingInformation`):
Calls `getOrCreateSrcLocStr(FileName, ExprName, ...)`
so the buffer is ";expr;path;…", matching `map_var_info_t` and
`getNameFromMapping`.
2. ident / update location (`emitUpdateLocation`):
Calls `getOrCreateSrcLocStr(FunctionName, FileName, ...)` so the buffer
is
";file;function;…", matching ident_t.
MLIR previously called the same helper in only the ident-like way for
both
uses which is why we see file paths in the map name slot.
I would have preferred if we can keep the same order in both cases. But
I am not
aware of the history of why it was done like this. I have instead fixed
it by adding
`ForOffloadMap` to `createSourceLocStrFromLocation` so mapping uses the
same
argument order as Clang's `emitMappingInformation`, while ident-style
callers
keep the existing order.
Adjusted a test accordingly.
[mlir][SPIR-V] Add AtomicLoad and AtomicStore ops (#195797)
[OpenMP] Add target to build OpenMP runtime (#189557)
This adds a top-level target to build the OpenMP runtime, similar to
what was done in https://github.com/llvm/llvm-project/pull/186099 for
the Offload runtime.
Having this top-level target enables us to execute the build in the
pre-commit CI as shown in
https://github.com/llvm/llvm-project/pull/174955 (I actually just
cherry-picked the commit from that branch)
[AArch64] Enable SVE MaxBW for Neoverse-N2, prefer fixed at same cost (#195108)
This should be using the same settings as other 128b SVE cores. Very
slight performance improvement from MaxBW, regressions alleviated
by preferring fixed VFs over scalable when they're the same cost.
[MLIR][Shard] Fold all_gather/all_slice inverse pairs (#193906)
Add a simplify pattern that replaces all_gather(all_slice(x))/all_slice(all_gather(x))
with x when grid, grid axes, and gather/slice axis match.
Co-authored-by: Cursor <cursoragent@cursor.com>
[lldb-dap][vscode] add instructions for debugging the VSCode extension (#195280)
[LifetimeSafety] Update doc for `handleInvalidatingCall` (#195281)
[lldb] fix unconsumed llvm::Expected's errors (#193257)
[LangRef] Patchable function attributes (#195764)
Langref should document the expected layout for patchable-function-entry
/ patchable-function-prefix attrs (prefix nops go before the function
label, entry nops go after the label, both are before the prologue).
Fixes #195693.
[AArch64] Add C1-Premium scheduling model (#185398)
This patch adds the C1-Premium scheduling model. Values are derived from
Arm's SWOG for this core which can be found at
https://developer.arm.com/documentation/111080/3-0
Assisted-by Codex
[offload] Update test after ed07c92c9629. (#195822)
We now get `a(:n) at dump_map_tables.f90:20:11` so test has been updated
accordingly. This should fix the regression seen after
https://github.com/llvm/llvm-project/pull/195346.
[mlir][SPIR-V] Refine OpTypeImage capability inference (#195060)
Capability requirements for OpTypeImage are determined by Dim, Sampled,
MS, and Arrayed
related to LLVM SPIR-V backend PR
https://github.com/llvm/llvm-project/pull/192626
[clang][bytecode] Don't evaluate bound member function expressions in new constant interpreter (#194851)
**Problem:**
A crash is triggered by clangd's hover feature when using C++23 and the
new bytecode interpreter, which calls `Expr::EvaluateAsRValue()` to
attempt constant folding on an expression under the cursor, even when it
is not a valid constant expression.
Tested versions: 22.1.3, Trunk (x86_64-pc-linux-gnu)
**How to reproduce:**
```cpp
struct S { void f(); };
void g() { S s; s.f(); }
```
Running `clangd --check=repro.cpp`
(with `compile_flags.txt` containing `-std=c++23
-fexperimental-new-constant-interpreter`)
will crash.
`Assertion ItemTypes.back() == toPrimType<T>() failed.`
You can observe the same crash by hovering over STL iterators like
`vec.begin()`.
**Relevant Stack Trace:**
```text
```
*Basically: `textDocument/hover` → `getHover` → `EvaluateAsRValue` → new
constant interpreter → `MemberPointer` type mismatch on stack pop.*
When `Compiler<Emitter>::VisitMemberExpr()` encounters a non-static
`CXXMethodDecl` member (a bound member function expression such as `s.f`
in `s.f()`), it falls through to `visitDeclRef()`. This pushes a `FnPtr`
onto the interpreter stack. However, the caller expects a
`MemberPointer`, causing an assertion failure in `InterpStack::pop()`:
**Fix:**
* In `VisitMemberExpr()`, bail out early (`return false`) when the
member is a non-static `CXXMethodDecl`, before reaching
`visitDeclRef()`. This causes `EvaluateAsRValue()` to report failure
gracefully rather than crashing. Bound member function expressions
(`s.f`) are not valid constant expressions, so returning `false` should
be semantically correct.
**Testing:**
* Added AST unit test
(`EvaluateAsRValue.FailsGracefullyOnBoundMemberExpr`) that directly
isolates a bound `MemberExpr` and passes it to `EvaluateAsRValue()`,
asserting it returns `false` without crashing.
* Added clangd hover test
(Hover.NoCrashOnBoundMemberFunctionWithNewInterpreter)
that reproduces the original crash scenario.
* *Note:* I could not add a Lit test because I believe this is
unreachable via normal `clang` invocations. `Sema` strictly catches
isolated bound member functions before constant evaluation. `clangd` has
a unique path to triggering this.
**Root cause:**
This is exposed by C++23 specifically due to (I think P2280R4 /
P2448R2):
- Relaxing the rules around "unknown" objects in constant evaluation,
allowing `s` in `s.f()` to proceed past the base object check even
though `s` is not constexpr and deferring failures to bytecode execution
rather than rejecting them structurally.
Assisted-by: gemini-cli
@tbaederr
---------
Co-authored-by: Timm Baeder <tbaeder@redhat.com>
[mlir][spirv] Support OpenCL.std clz in the SPIR-V dialect (#195317)
Add support for the OpenCL.std clz extended instruction in the MLIR
SPIR-V dialect. From Spriv Specs - Op Name: "clz", Op Code: "151"
[clang][CIR] Add lowering for fp16 intrinsics (#194865)
This PR adds lowering for the following intrinsic groups:
* https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#absolute-difference-1
* https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#reciprocal-estimate-1
* https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#reciprocal-estimate-1
It also moves the corresponding tests from:
* clang/test/CodeGen/AArch64/v8.2a-fp16-intrinsics.c
to:
* clang/test/CodeGen/AArch64/neon/fullfp16.c
The lowering follows the existing implementation in
CodeGen/TargetBuiltins/ARM.cpp.
[analyzer][NFC] Rename class `StackFrameContext` to `StackFrame` (#195802)
This patch continues the refactoring roadmap described in issue #190973
by renaming the `StackFrameContext` class to just `StackFrame`. Many
variables of type `const StackFrame *` also had their names updated to
reflect the new class name `StackFrame`.
[NFC][analyzer] Introduce specialized variants of makeNode (#194459)
This commit introduces new methods `makePostStmtNode` and
`makeNodeWithBinding` of `CoreEngine`, which will be used instead of the
5-parameter overloads of `NodeBuilder::generateNode` and
`NodeBuilder::generateSink` (which were originally methods of the class
`StmtNodeBuilder` that was deleted in commit
fb46677a858697afa116c4252e84050a07bc6a70).
This commit applies the newly introduced methods in a few places (as
examples), but there are 80+ call sites that use the 5-parameter
`NodeBuilder::generateNode` or `generateSink`, so this transition will
be completed in multiple follow-up commits.
I decided to introduce these methods because after the transition there
will be 20+ calls to `makePostStmtNode` and 30+ calls to
`makeNodeWithBinding` and it would be cumbersome to use plain `makeNode`
instead of these specialized variants.
(On the other hand these new methods don't support specifying a `tag`
because only a few call sites would use that feature of the 5-parameter
`generateNode`.)
[ADT] Bitset: add shift operators, word accessors, and etc (#193400)
This PR is split out from #191757 per reviewer request. It has the
following changes to `llvm::Bitset<N>`:
* Added `operator<<`/`<<=`/`>>`/`>>=`, `getNumWords()`, `getWord()`, and
`findLastSet()`.
* Moved the `std::array<>` constructor from protected to **public** and
**explicit**.
A follow-up PR will use these to re-implement `LaneBitmask` as a
`llvm::Bitset` wrapper.
---
The unit test in the PR is largely generated by LLMs. I have reviewed it
and manually applied changes to cover more edge cases.
[SLP][NFC]Add a test with broacast of loads estimation
Reviewers:
Pull Request: https://github.com/llvm/llvm-project/pull/195833
Revert "[clang-tidy] An option for conditional skipping overloaded functions …" (#195701)
Reverts llvm/llvm-project#194133
[DAG] Replace llvm::isNeutralConstant with SelectionDAG::isIdentityElement (#195827)
Initial step towards generalising this - move to SelectionDAG like other
valuetracker helpers, add DemandedElts/Depth controls, etc.
We can add target node handling when it becomes necessary.
Renamed to "IdentityElement" to match llvm naming conventions.
[mlir][spirv][nfc] Fix the summary of SPIRVLowerABIAttributesPass (#195043)
[mlir][emitc] Refactor the e2e test for TOSA -> EmitC lowering (#194686)
Update the TOSA -> EmitC test, mlir/test/Dialect/EmitC/tosa/ops.mlir, to also
compile the generated source with a C++ compiler. This adds an `mlir-translate`
step, making the overall flow:
mlir-opt tosa.mlir | mlir-translate | c++ -Wall -Werror -c
Although the test still does not execute the generated code, compiling it with
a host compiler increases coverage and makes the test closer to an integration
test. For this reason, move it to:
mlir/test/Integration/Dialect/EmitC/
The source generated by the translator uses `size_t`, which requires the
`stddef.h` header. This is currently provided explicitly in the compile step
with `-include stddef.h`; a follow-up should consider emitting the required
include from the translator instead.
[libcxx] Include __utility/exchange.h in thread.h (#195786)
[DAG] expandVecReduce - pull out repeated getFlags() calls. NFC. (#195838)
[lldb] Flush delayed breakpoints before eagerly changing one (#195815)
See the discussion in https://github.com/llvm/llvm-project/pull/192971
When LLDB makes the decision to eagerly send a breakpoint packet, it
should first ensure the delayed breakpoints are flushed, as they may
interfere with the eager breakpoint that is about to be changed.
Implementation note: we could have included the eager breakpoint in the
batch that is about to be flushed. However, it's important to get
information about the error status of this eager breakpoint, and the
current APIs dont make it easy to distinguish which breakpoint caused an
error.
[Bazel] Fixes 9db9f6d (#195843)
This fixes 9db9f6d9c985f38075f2e0a3f55646d13fe0953c.
Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>
[llubi] Initial support for floating-point numbers (#188453)
Implements basic floating-point instructions and fast-math flags
handling in `phi` and `select`.
[clang-tidy] apply fix-it for modernize-return-braced-init-list with qualifiers (#195846)
Fixes #195844
With this PR I've added robust test cases to ensure things work for many
different cases.
Revert "[ADT] Bitset: add shift operators, word accessors, and etc" (#195848)
Reverts llvm/llvm-project#193400
Breaks buildbots
clang: Report full triple arch string in --offload-arch diagnostic (#195849)
This was previously only printing the top level arch string. Print
the full arch + subarch from the triple.
[flang][OpenMP] Change NONTEMPORAL clause to contain OmpObjectList (#195331)
NONTEMPORAL in source code takes a variable list, which is represented
in the AST as OmpObjectList.
[libc] Add socket test support library (#193207)
A couple of tests (and I plan to add a couple more) need to
create/compare unix domain socket addresses. This isn't completely
trivial due to the the interaction between the `addrlen` argument and
the terminating '\0', and the existing code doing that was subtly wrong
in some cases.
This patch centralizes the functions for these operations, reducing code
duplication and ensuring consistent (and correct) behavior everywhere.
[OpenMP][OMPT] Update tools header for OpenMP v6.0 (#195829)
Add new enum values, structs and variables to `omp-tools.h` header,
according to the OpenMP v6.0 specification.
Update `openmp` and `offload` tools headers to recognize the new values
to ease testing once those features are implemented.
Since no feature is implemented here, update the docs to mention
`partial` support, pointing to this PR for the enums.
---------
Signed-off-by: Jan André Reuter <j.reuter@fz-juelich.de>
[AARCH64] Add intrinsic support for new s/udot intrinsics (#189424)
The intrinsics are based on the following
[proposal](https://github.com/ARM-software/acle/pull/428). These are:
```
svint16_t svdot_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm);
svint16_t svdot_n_s16_s8(svint16_t zda, svint8_t zn, int8_t zm);
svint16_t svdot_lane_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm, uint64_t imm_idx);
svuint16_t svdot_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm);
svuint16_t svdot_n_u16_u8(svuint16_t zda, svuint8_t zn, uint8_t zm);
svuint16_t svdot_lane_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm, uint64_t imm_idx);
```
[clang][test] Add missing `FileCheck` in Import/destructor/test.cpp (#195778)
The test had `CHECK` directives that were never executed because the
`RUN` line did not pipe output to `FileCheck`. This also replaces
`CHECK-NEXT` with `CHECK-SAME` so the subsequent `CHECK-SAME` directives
match the remaining destructor properties in order.
[MLIR][XeGPU] Support Layout propagation for interleave and deintereleave op (#194966)
Enable propagation of interleave and deinterleave with their own
propagation rules.
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
[SystemZ] Replace PatLeaf with ImmLeaf (#195146)
The code snippets for the predicates are a bit shorter, because the
APInt is directly available instead of an SDNode. Main advantage is that
it enables the constraint to be ported to GlobalISel.
[Flang][Driver] Enable -print-file-name (#195790)
Enable -print-file-name for the Flang driver as already enabled by
Clang.
Extracted out of #171515 as requested by
https://github.com/llvm/llvm-project/pull/171515#pullrequestreview-4180690622
[SCEV] Clarify nuw/nsw semantics for multi-operand nodes (#195826)
nuw/nsw must hold for all subsets/orders of operands.
[Support] Add missing header guard (#195870)
[EarlyCSE] Check if getValueType() returns nullptr (#195868)
15383dcdb4012387d94fab67fed4519613bf9a19 recently added a call to
getValueType() into EarlyCSE without checking whether or not it returned
nullptr. It sometimes might end up processing target specific intrinsics
for which the underlying getAccessType will typically return nullptr.
We observed this crash when building https://github.com/google/highway
for AArch64.
Regression test added to ensure this does not occur again in the future.
[mlir][vector] Account for subview offset in gather lowering. (#195359)
Strided vector.gather on a column subview was reading the wrong column
because the rewrite to a collapsed gather dropped the subview's static
offset.
---------
Signed-off-by: hanhanW <hanhan0912@gmail.com>
PeepholeOpt: Clear kill flags in foldImmediate (#195680)
When foldImmediate replaces a COPY destination with its source,
this extends the live range of the source, but it does not update the
kill flags.
Clear kill flags on the source register after replacement.
This was found while working on REG_SEQUENCE optimizations motivated by
AMDGPU demands. Both an AMDGPU and an X86 test case are added to show that
the issue is not AMDGPU specific.
[InstCombine][NFC] Use CreateAssumption instead of CreateCall (#195862)
[AMDGPU][NFC] Autogenerate check lines for llvm.amdgcn.rsq.clamp.ll and llvm.amdgcn.rsq.ll (#195867)
[lldb][windows] fix cross DLL file descriptor lookup crash (#195855)
On Windows, file descriptors are only valid in the same DLL: they are
really just handles mapped to an index in a table in the CRT. Calling a
liblldb method with a file descriptor from lldb-dap will cause the
program to crash. See
https://github.com/llvm/llvm-project/issues/193971.
This patch fixes the issue by refactoring the `NativeFile` constructors
so that they no longer try to convert `FILE` types to handles through
the CRT lookup table.
[InstCombine] Remove redundant assume fold (#195852)
The fold is fully redundant with the fold using `computeKnownBits`, so
we can let that do the work instead.
[NFCI] clarify that asan-*linux.cpp files affect *nix OS'es (#195565)
**Prior Work:** Aims to supersede (#132263), which seems inactive,
specifically by applying my own comment:
https://github.com/llvm/llvm-project/pull/132263#issuecomment-3051238734
**Context:** It aims to minimally document that the
`asan_(malloc_)?linux.cpp` files may impact non-linux OS'es (despite the
name) such as Solaris, BSD, and other *nix OS'es. This is worth
documenting as otherwise we risk breakage due to confusion, as occurred
[here](https://github.com/llvm/llvm-project/pull/131975#issuecomment-2741097471).
This is done simply by minimally augmenting the file header comment
saying precisely this.
Unlike the prior PR, this does not rename any files, which should reduce
the 'git noise' impact of this change.
_Thanks!_
[AArch64][GlobalISel] Lower unmerge to extract_subvector (#195046)
This follows and reuses the existing lowering for unmerge -> extract
vector element, extending it to also lower unmerge -> subvector extract
for half-sized vector extracts. This allows certain tablegen patterns to
match.
An extra extract_subvector(dup) combine is needed to optimize away
unnecessary instructions. The ext vs mov/dup brings us in-line with
SDAG, but we may change both to use mov/dup.
[Flang][OpenMP] Fix assert trigger in MapInfoFinalization pass for implicit record member maps (#193851)
The current iteration of the implicit record member mapping segment of
the MapInfoFinalization pass makes the assumption that child maps of
parents are already bound to the targets block arguments, but that is
not the case apon initial lowering from PFT to MLIR. This actually
happens as the end of the MapInfoFinalization pass currently where we
"canonicalize" that all maps are inserted as Block arguments to their
respective targets.
This assumption unfortunately leads to a few cases where we trigger the
assertion, to address this we can impose this canonicalization of map
<-> block arguments as soon as we enter the pass and then once again at
the end of the pass for any new members generated by the
MapInfoFinalization pass. This allows the implicit record member mapping
process to continue unhindered whilst changing very little elsewhere
other than the ordering of block arguments (hence some lit tests
tweaks). The main downside is the extra processing required for running
the "canonialization" twice.
I adopted some tests created by @chichunchen in his version of the fix
to help test for regressions.
Co-authored-by: @chichunchen
[DWARFLinker] Add assembly-label range handling to parallel linker (#195366)
Assembly CUs typically have DW_TAG_label entries instead of subprograms,
so the parallel linker's line-table filter saw no function ranges and
dropped every row. Mirror the classic linker: for labels in
Mips_Assembler or Assembly CUs, look up an assembly range via
getAssemblyRangeForAddress and call addFunctionRange before falling back
to addLabelLowPc.
[VPlan] Scalarize to first-lane-only directly on VPlan (#184267)
This is needed to enable subsequent
https://github.com/llvm/llvm-project/pull/182595.
I don't think we can fully port all scalarization logic from the legacy
path to VPlan-based right now because by that point in the pipeline
interleave groups aren't lowered into any VPlan-based representation and
as such this pass operates on incomplete information. Currently, the
pass can make transformations if "all uses are scalar" (that won't
change later) but not "are uses a mix of vector and scalar uses" (that
might change after lowering interleave groups).
As such, I decided just to implement something much simpler that would
be enough for #182595. However, we perform this transformation before
delegating to the old CM-based decision, so it **is** effective
immediately and taking precedence even for consecutive loads/stores
right away.
---------
Co-authored-by: Ramkumar Ramachandra <artagnon@tenstorrent.com>
[HLSL] For builtins aliases, apply implicit conversions before running custom type checking (#195365)
Fixes https://github.com/llvm/llvm-project/issues/195329 by making HLSL
builtin aliases apply implicit conversions before running custom type
checking.
After this PR:
- There are no more size 1 vectors being passed and returned to/from
aliased Clang builtins because they get truncated to scalars due to the
HLSL alias builtin not having explicit size 1 vector overloads.
- HLSL alias builtins no longer accept matrices unless they have
explicit matrix overloads. Matrices get implicitly truncated to scalars
and resolve to the scalar Clang builtin being aliased.
- Many calls with mismatched vector sizes no longer error with
`arguments are of different types` and instead follow Clang's overload
resolution rules with respect to HLSL's implicit conversion sequences.
(e.g., `dot(float3, float2)` -> `dot(float2, float2)` with warning)
- Calls with implicitly-convertible types no longer error. They are now
implicitly converted, and with a warning in some cases. (e.g.,
`f16tof32(bool)` -> `f16tof32(uint)` without warning, but
`f16tof32(short)` -> `f16tof32(uint)` with warning).
Assisted-by: Claude Opus 4.6
Reland [Inliner] Use store-to-load forwarding to resolve call arguments (#195526)
Adds store to load forwarding when inliner has successfully done some
inlining. This allows simplification of further inlining attempts and
can give them more precise cost analysis.
It allows to optimize away empty `std::set` and `std::map` in both
`libc++` and `libstdc++` and many other real world cases.
Reland of #190607. It was reverted since it was causing crashes in
address space pointers and should be fixed by #195256
[DWARFLinker] Patch DW_AT_LLVM_stmt_sequence in the parallel linker (#195388)
Mirror dsymutil's stmt-sequence rewriting in the parallel linker so each
attribute ends up pointing at the DW_LNE_set_address that opens its
containing output sequence, with the correct offset in the combined
.debug_line.
At DIE cloning time we resolve each attribute's input offset to the
address of its first row and record the pair (DIEValue, address) on the
CompileUnit, alongside a DebugOffsetPatch on the .debug_info section so
combination adds the CU's .debug_line start offset. The line-table
emitter then fills a map from row address to the byte offset of the
sequence-opening DW_LNE_set_address.
After emission, each recorded attribute is rewritten by relocating its
input address through the CU's function ranges and looking the result up
in the map. When resolution fails the DWARF max-offset sentinel is
written instead, and the patch applier preserves it unchanged.
First-row lookups share a lazy per-CU cache to keep resolution O(1) per
attribute.
[clang][ssaf] Add accessor for `LUNamespace` (#195756)
[mlir][SPIR-V] Add OpExpectKHR op (#195842)
Complete SPV_KHR_expect_assume extension support
[ELF] Support non-section Defined symbols at MergeInputSection end (#195801)
Commit bb443359a8ad ("[ELF] Validate merge section offsets in getSymVA
and
match GNU ld") accepts offset == section_size for section-symbol
references
in getSymVA, and skips the out-of-bounds case in MarkLive.
This patch extends the support for non-section Defined symbols,
modifying the per-symbol pre-resolution added by commit 42cc45477727
("[ELF] Optimize binary search in getSectionPiece").
Fix #118148
[libc][unistd] Fix generated at-function prototypes (#195341)
Fixes generated <unistd.h> prototypes for readlinkat/symlinkat and adds
compile-time public-header coverage
[DWARFLinker] Support parseable Swift interfaces in the parallel linker (#195475)
CompileUnit::Language was filtered through isODRLanguage at construction,
so non-ODR languages landed as std::nullopt and analyzeImportedModule's
Swift branch was never reached under --linker parallel. Store the raw
DW_AT_language and apply isODRLanguage only where ODR is actually
decided.
Reaching that branch then exposes a shared-map write from
parallelForEach workers. A mutex would make it safe but not
deterministic: conflict warnings and last-writer-wins contents would
depend on thread scheduling and diverge from the classic linker. Stage
entries per-CU and merge them serially in input order after analysis, so
results match classic regardless of schedule.
Because the diagnostics are emitted after unloading the input, we do
lose the originating DW_TAG_module. The warning still names the module
and both conflicting paths, which carries the load-bearing information.
[mlir][SYCL] Fail init errors cleanly instead of `abort`ing (#192979)
Disclaimer: this is my first PR to LLVM. I'm trying to follow the
contribution guide and the conventions i see in other PRs, but if i
missed something -- please let me know.
Fixes #182807.
When the SYCL runtime wrapper is loaded on a host without a Level-Zero
backend, `getDefaultDevice()` throws an `std::runtime_error`, `catchAll`
catches it, and calls `abort()`, which results in a "PLEASE submit a bug
report" stack dump, which is not correct for this kind of crash.
With this change `catchAll` now writes to stderr and terminates via
`std::exit(EXIT_FAILURE)`, yielding a clean exit code 1 with no crash
dump. The "getDefaultDevice failed" message is also replaced with a
(hopefully) better one.
The runners in `mlir/lib/ExecutionEngine/` don't have much consistency
with regards as to how to output diagnostics. 2 files use
`llvm::outs/errs`, 9 files use `std::cout/cerr`, 8 files use `fprintf`.
I left `fprintf` in place, since it was used in that module before and
it's also used by CUDA and ROCm runners (the most dominant ones in the
folder). I'm open to changing it to something else if needed.
1. Why this fix strategy doesn't match the CUDA/ROCm wrappers'
print-warning-and-continue style: The `print-and-continue` approach was
tried and it produces worse behavior with SYCL wrapper. The entry points
would return `nullptr` on init failure, but the MLIR JIT runner does not
null-check those returns, so the `nullptr` is dereferenced inside a
later SYCL call and produces another SIGSEGV and the same crash dump
this patch is trying to avoid. Making that style work would require
null-checks across the downstream wrappers. We can go that way if
desired, but i opted to start with a simpler fix.
2. Why `L0_SAFE_CALL` (`SyclRuntimeWrappers.cpp`, line 40) is left
untouched: it's a deliberate decision, as `L0_SAFE_CALL` is supposed to
abort on Level-Zero errors (kernel launch, OOM, etc). I can reconsider
if there are some arguments for updating error handling strategy in
`L0_SAFE_CALL` too, I just don't see any yet.
3. There's no unit test with this change, as I wasn't able to find a
test that would test this logic (i guess this kind of logic isn't
usually tested with unit tests anyway). I did test it manually, though,
for both positive and negative cases.
clang: Print full triple arch name in error (#195889)
More instances of pattern from 58f94f0a4648a5d8d624c05a51dea8c37e5fe7bc
[NFC][SPIR-V] Rename vID register class to viID (#195711)
Rename the v2i64 register and its class from vID/vID0 to viID/viID0 so
it follows the same `v<element-kind>ID` convention as vfID (v2f64) and
vpID (v2p64) for better consistency
[SandboxVec] SandboxVectorizerIR Boilerplate (#189515)
This patch introduces a new specialization of SandboxIR named
SandboxVectorizerIR that contains the new Pack instruction (though it is
just a placeholder for now). It also implements the necessary
boilerplate in SandboxIR to support adding such specializations.
Each specialization requires its own *Values.def file defining its new
instructions. The new .def file also needs to be added to
ValuesDefFilesList.def which allows SandboxIR to include entries from
all specializations and populate structures like the Opcode enum,
ClassID and others.
Given that we need to include both .def files in several places, I had
to (i) change the .def file to undefine its locally defined macros so
that they won't get redefined in the next #include .def, and (ii) add
the DEF_DISABLE_AUTO_UNDEF option to disable undefining the macros at
the end because otherwise the user-specified DEF_ macros would be
undefined by the time we reach the second .def file.
[Bazel] Fixes bb51440 (#195851)
This fixes bb51440ffe693007fee40df9c9227432ebc5e0f0.
Co-authored-by: Google Bazel Bot <google-bazel-bot@google.com>
[MLIR][Transform] Don't error when a structurally inlinable call exists (#195770)
Fixes bug introduced in https://github.com/llvm/llvm-project/pull/192956
Specifically transform-interpreter would crash if any op in the region
it is applied to is marked as no-inline via the inliner interface. This
is because the check added does a post processing to verify that all
operations can be inlined [and there isn't an issue due to symbol
merging]. However, it fails to account for the case where an operation
was already not inlinable (and not an error introduced by the transform
symbol merging).
[clang-sycl-linker] Add per-kernel device code splitting (#195362)
Introduce a --sycl-module-split-mode option for clang-sycl-linker that
selects how the fully linked device module is split into device images:
- none (default): one device image containing all kernels
- kernel: one device image per kernel function, built by taking the
transitive closure of each kernel's dependencies
The split is implemented by delegating to
llvm::splitModuleTransitiveFromEntryPoints, providing a categorizer that
assigns each kernel definition a unique integer id.
Symbol collection now happens during splitting while the Module is still
in memory, avoiding a round-trip through disk to re-read the bitcode for
symbol extraction.
As part of this change, the splitModuleTransitiveFromEntryPoints API is
updated to accept an Error-returning callback and propagate errors from
it, replacing the previous void callback signature. The existing caller
in llvm-split is updated accordingly.
Additionally, linkDeviceCode now returns the linked Module alongside the
bitcode file path, so the splitter operates on the in-memory module
directly rather than re-reading it from disk.
Additional split modes (per-translation-unit, optional-feature grouping)
and specialization-constant handling will be introduced in
follow-up changes.
Co-Authored-By: Claude
[flang][cuda][openacc] Add UseDevice attribute to model host_data use_device symbols (#195182)
Symbols appearing in `!$acc host_data use_device(...)` were previously
marked with `CUDADataAttr::Device`, which caused generic resolution to
fail.
Introduce a new CUDADataAttr::UseDevice enumerator and use it in
`CopySymbolWithDevice` / `CloneDerivedTypeForUseDeviceImpl` instead of
Device. The new attribute is:
1. Compatible with any dummy in AreCompatibleCUDADataAttrs.
2. Ranked in the matching distance table (GetMatchingDistance): prefers
Device dummies (0), accepts Managed/Unified (2), and allows host dummies
(3).
3. Not user-spellable — only set internally during OpenACC name
resolution, never parsed from source or written to module files.
4. Treated like Device for non-host-array checks, I/O restrictions, and
MLIR lowering (mapped to `cuf::DataAttribute::Device`).
[VPlan] Check for VInstruction in findCanonicalIVIncrement. (#195845)
There are cases where findCanonicalIVIncrement may discover a
non-VPInstruction add that matches the expected shape, but we expect a
VPInstruction. Add explicit check to fix crash.
[bazel] fix #193207 (#195898)
fix #193207
[llvm] Fix the DWARFLinkerParallel unit tests (#195902)
[UnitTests] Fix shared build. NFC (#195906)
[clang] Consistently store format specifiers in execution encoding (#195876)
`DecomposePrintfHandler::HandlePrintfSpecifier()` parses the format
string and collects specifiers into `Specs`. In most cases the collected
specifiers are in the execution encoding, but there were to places that
used string literals in "host" encoding.
Change them to use a part of the parsed `StringLiteral` instead so that
`Specs` always contain text in the execution encoding. This is achieved
by adding `getCharacters()` method to `OptionalAmount` class, following
`ConversionSpecifier::getCharacters()`.
This is to make #169803 smaller and is an NFC before that PR lands.
[PowerPC] Add AES Builtins (#186895)
Add the following AES builtins:
__vector_pair __builtin_aes_encrypt_paired (__vector_pair,
__vector_pair, uint2);
__vector_pair __builtin_aes128_encrypt_paired (__vector_pair,
__vector_pair);
__vector_pair __builtin_aes192_encrypt_paired (__vector_pair,
__vector_pair);
__vector_pair __builtin_aes256_encrypt_paired (__vector_pair,
__vector_pair);
__vector_pair __builtin_aes_decrypt_paired (__vector_pair,
__vector_pair, uint2);
__vector_pair__builtin_aes128_decrypt_paired (__vector_pair,
__vector_pair);
__vector_pair __builtin_aes192_decrypt_paired (__vector_pair,
__vector_pair);
__vector_pair__builtin_aes256_decrypt_paired (__vector_pair,
__vector_pair);
__vector_pair __builtin_aes_genlastkey_paired (__vector_pair, uint2);
__vector_pair __builtin_aes128_genlastkey_paired (__vector_pair);
__vector_pair __builtin_aes192_genlastkey_paired (__vector_pair);
__vector_pair __builtin_aes256_genlastkey_paired (__vector_pair);
vec_t __builtin_galois_field_mult (vec_t, vec_t, uint1);
vec_t __builtin_galois_field_mult_gcm (vec_t, vec_t);
vec_t __builtin_galois_field_mult_xts (vec_t, vec_t);
[lldb-dap] Fix crash in source request handler (#195847)
Check optional argument source has a value before getting the source
reference.
[RISCV][GISel][NFC] Fix file header comment in RISCVInlineAsmLowering.cpp (#195805)
[LoopInfo] Remove redundant LLVM_ABI from Loop member functions (#195863)
Remove redundant `LLVM_ABI` from `addStringLoopAttribute` and
`addIntLoopAttribute`.
Addresses post-merge feedback on #194676.
This patch was generated with the help of Claude and reviewed by a
human.
Co-authored-by: Claude Opus 4 (1M context) <noreply@anthropic.com>
[LifetimeSafety] Add support for iterator arithmetic (#195442)
This PR adds origin propagation through iterator arithmetic.
It also adds a few tests to check the new behavior.
Closes #190140
clang: Print effective triple in -ccc-print-bindings (#195914)
Previously this would print the original triple, rather
than the adjusted triple for the tool invocation which I found
confusing.
[SSAF][PointerFlow] Add PointerFlow summary and extractor (#188654)
Implement a PointerFlow summary and extractor, which uses
EntityPointerLevel. An assignment is extracted as a pair of
EntityPointerLevels, representing a directed edge.
Edges extracted from assignments form a directed graph
encoding abstract pointer flow information.
rdar://172429193
---------
Co-authored-by: Balázs Benics <benicsbalazs@gmail.com>
Co-authored-by: Jan Korous <jkorous@apple.com>
[LV] Extend test coverage with predicated IVs and FORs. (#195908)
Add coverage to test where the induction phi itself is a live-out, as
well as various tests where we could either generate a predicated IV or
first-order recurrence.
[flang] Implement SIMPLE procedure support (parsing, semantics, diagnostics) (#161285)
Implement support for the Fortran 2023 `SIMPLE` procedure specifier.
This change includes:
- Parsing and attribute registration for `SIMPLE` procedures
- Adding `SIMPLE` to procedure characteristics
- Semantic support for `SIMPLE`, including treating `SIMPLE` procedures
as `PURE` (`SIMPLE` implies `PURE`)
- Rejecting `SIMPLE` + `IMPURE` combinations
- Emitting and reading `SIMPLE` in mod files
- Adding parser, semantic, and modfile tests to validate `SIMPLE`
behavior
`SIMPLE` procedures satisfy `PURE` requirements, while `PURE` procedures
do not imply `SIMPLE`.
[PowerPC] Further refactor atomic loads (#194445)
Depending on the availability of the word-part feature, different code
is generated for 1 and 2 byte atomic loads. This change moves the
decision to use the word-part feature from C++ into TableGen patterns.
This is done via:
- move code from `EmitPartwordAtomicBinary()` into new function
'signExtendOperandIfUnknown()'
- decouple functions `EmitPartwordAtomicBinary()` and
`EmitAtomicBinary()`
- remove the size from the name of the pseudo instructions; instead,
introduce a pseudo instruction which is used in case the word-part
feature is missing
- update the handling of the pseudo instruction insertion accordingly
A side effect of this change is the implementation requires 11 pseudo
instructions less.
[NFC][CIR] Pass StringRef to `getIntrinsicForClangBuiltin` directly (#195458)
`prefix` is already a StringRef, so pass it to
`getIntrinsicForClangBuiltin` and `getIntrinsicForMSBuiltin` directly.
Additionally, drop some redundant `llvm::` and `Intrinsic::` namespace
qualifiers.
[bazel][SandboxVec] Port 0beb9f45f273bdc2d620f6dd07517761d294d259 (#195928)
[IVDescriptors] Check for null SE before dereference in getMinMaxRecurrence. (#195798)
Some callers don't pass SCEV.
[SSAF] Fix -Wunused-variable in b73ce3e53fcb9f72c759139e93db7e16a813c3b2 (#195944)
There was a variable only used in an assertion in a call to addSummary.
Mark it [[maybe_unused]] given the call has side effects.
[flang] Fix FIR AliasAnalysis for zero-offset view chains (#192710)
When subroutine `f` below is inlined, the `ACCImplicitData` pass fails
to recognize that `b` is already covered by the enclosing `!$acc data
deviceptr`. The deviceptr clause operates on a box (`fir.embox` result)
while the inner `acc.serial` uses the underlying ref.
`fir::AliasAnalysis` traces both back through the full def-chain, where
an upstream sliced `fir.embox` (from `a(:,5)`) sets
`approximateSource=true`, causing `MayAlias` instead of `MustAlias`. The
pass falls back to implicit `copyin`/`copyout`, causing a segfault.
```fortran
module test_mod
real, allocatable :: a(:,:)
contains
subroutine f(b)
real, dimension(*) :: b
!$acc data deviceptr(b(1:100))
!$acc serial
print *, b(10)
!$acc end serial
!$acc end data
end subroutine f
end module test_mod
program test
use test_mod
call alloc_a
!$acc data present(a)
!$acc host_data use_device(a)
call f( a(:,5) )
!$acc end host_data
!$acc end data
end program
```
Fix: add `getZeroOffsetViewRoot()` as a short-circuit in `alias(Source,
Source, Value, Value)`. Before consulting `approximateSource`, it walks
each value backward through `FortranObjectViewOpInterface` ops with zero
offset (embox without slice, declare, box_addr) to find its root. If
both values share the same root, `MustAlias` is returned immediately,
avoiding the conservative `MayAlias` caused by distant
`approximateSource` flags.
[VPlan] Add isFreeScalarIntrinsic helper (NFC). (#195929)
Split off from https://github.com/llvm/llvm-project/pull/195518.
[VPlan] Add VPlan printing test for call widening decisions. (NFC) (#195930)
Add VPlan-printing test for
https://github.com/llvm/llvm-project/pull/195518.
[flang][cuda] Fix unregistered allocator (#195924)
which then miss to add the constructor to `llvm.mlir.global_ctors` which
leads to runtime failure because the allocators for CUDA Fortran are not
registered.
[clang-tidy] Fix `implicit-bool-conversion` C ternary condition false positive (#195913)
Fixes #195604
[SSAF][WPA] Add "no-op" PointerFlow and UnsafeBufferUsage analysis (#193089)
Added 'no-op' PointerFlow and UnsafeBufferUsage analyses to convert
summary data into AnalysisResult, which DerivedAnalysis can then consume.
Also, refactored the PointerFlow and UnsafeBufferUsage serialization
for code sharing.
rdar://174874942
---------
Co-authored-by: Balázs Benics <benicsbalazs@gmail.com>
Co-authored-by: Jan Korous <jkorous@apple.com>
[gn] use action() instead of copy() for libcxx headers (#195948)
copy() doesn't handle file deletions. Use an action() that syncs the
output directory with the input list via a response file, removing files
that are no longer in the list.
This works because if files are added or removed, ninja's command line
tracking re-runs the script, and if contents of existing files change,
ninja's input mtime checking reruns it.
This also makes the remove_float_h workaround unnecessary.
Motivated by all the recent header removals in libc++.
[mlir][spirv][nfc] Use function argument type in TypeCapabilityVisitor switch (#195918)
Addresses post commit review from #195796.
[LLDB] Fix UBSan issue with ValueType enums. (#195540)
ValueTypeSyntheticMask, when bitwise OR'd with ValueType enums, produces
a value that is outside the official enum range for ValueTypes. This
causes UBSan errors, when UBSan is set to check enum values. E.g. If you
build LLDB with the Cmake flags
-DCMAKE_CXX_FLAGS="-fsanitize=enum -fsanitize-trap=enum"
-DCMAKE_C_FLAGS="-fsanitize=enum -fsanitize-trap=enum"
Then try to run the LLDB test TestScripedFrameProvider, it crashes with
a SIGILL from UBSan.
This change fixes that by pulling ValueTypeSyntheticMask into the
ValueType enums, expanding the valid enum range and making the bitwise
OR'd values valid.
[mlir][acc] Improve implicit deviceptr detection for alias (#195934)
The ACCImplicitData automatically is able to use deviceptr clause when
variable is detected as being device data. However, it was missing check
for own `acc declare deviceptr` attribute.
clang: Avoid dummy LAST entry in OffloadArch (#195952)
Use this as an alias of the final entry, rather than its
own enum value. This will allow writing covered switches
that don't need to handle this case. This matches how
other places with an end enum entry handle this.
[MLIR][XeGPU] Support pointer/dynamic-memref sources in array-length optimization (#195872)
Extend `OptimizeCreateNdDescOp` to handle the two remaining
`create_nd_tdesc` source forms — `i64` pointer and dynamic-shape memref
— by forwarding the existing shape/strides operands through the general
builder. The memory region is unchanged by the rewrite; only the
`tensor_desc` view is narrowed along the FCD and tagged with
`array_length`.
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
[CIR] Add pass_object_size hidden parameter support (#191482)
Emit the hidden `i64` parameter that
`__attribute__((pass_object_size(N)))` requires. At call sites the size
is constant-folded when possible (e.g. `&a` → 4) and falls back to
`cir.objsize` / `@llvm.objectsize` otherwise (e.g. VLAs).
On the callee side, `buildFunctionArgList` now creates an
`ImplicitParamDecl` for each annotated parameter so that
`emitBuiltinObjectSize` can load the passed size instead of re-computing
it.
This also fixes the `llvm_unreachable("NYI")` in
`RequiredArgs::getFromProtoWithExtraSlots` and the `errorNYI` in
`appendParameterTypes` / `arrangeFreeFunctionLikeCall` that fired
whenever `hasExtParameterInfos()` was true.
New test: `clang/test/CIR/CodeGen/pass-object-size.c` (CIR / LLVM /
OGCG).
Made with [Cursor](https://cursor.com)
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
[CIR] Use SymbolUserMap in applyReplacements to fix quadratic behavior (#195883)
applyReplacements() previously called replaceAllSymbolUses() for each
replacement, which walks the entire module every time — O(R × M) for R
replacements and M operations. For C++ programs with heavy template
instantiation (e.g., Eigen), this quadratic behav…
pedroMVicente
pushed a commit
to pedroMVicente/llvm-project
that referenced
this pull request
May 19, 2026
The intrinsics are based on the following [proposal](ARM-software/acle#428). These are: ``` svint16_t svdot_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm); svint16_t svdot_n_s16_s8(svint16_t zda, svint8_t zn, int8_t zm); svint16_t svdot_lane_s16_s8(svint16_t zda, svint8_t zn, svint8_t zm, uint64_t imm_idx); svuint16_t svdot_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm); svuint16_t svdot_n_u16_u8(svuint16_t zda, svuint8_t zn, uint8_t zm); svuint16_t svdot_lane_u16_u8(svuint16_t zda, svuint8_t zn, svuint8_t zm, uint64_t imm_idx); ```
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
name: Add alpha support for 9.7 data processing intrinsics
about: feature proposal.
This change adds intrinsics for the following architectural features:
Checklist: (mark with
Xthose which apply)PR (do not bother creating the issue if all you want to do is
fixing the bug yourself).
SPDX-FileCopyrightTextlines on topof any file I have edited. Format is
SPDX-FileCopyrightText: Copyright {year} {entity or name} <{contact informations}>(Please update existing copyright lines if applicable. You can
specify year ranges with hyphen , as in
2017-2019, and usecommas to separate gaps, as in
2018-2020, 2022).Copyrightsection of the sources of thespecification I have edited (this will show up in the text
rendered in the PDF and other output format supported). The
format is the same described in the previous item.
tricky to set up on non-*nix machines). The sequence can be
found in the contribution
guidelines. Don't
worry if you cannot run these scripts on your machine, your
patch will be automatically checked in the Actions of the pull
request.
introduced in this PR in the section Changes for next
release of the section Change Control/Document history
of the document. Create Changes for next release if it does
not exist. Notice that changes that are not modifying the
content and rendering of the specifications (both HTML and PDF)
do not need to be listed.
correctness of the result in the PDF output (please refer to the
instructions on how to build the PDFs
locally).
draftversionis set totruein the YAML headerof the sources of the specifications I have modified.
in the README page of the project.