Change codegen of LLVM intrinsics to be name-based, and add llvm linkage support for bf16(xN) and i1xN#140763
Conversation
|
Some changes occurred in compiler/rustc_codegen_ssa |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Some changes occurred in compiler/rustc_codegen_gcc |
This comment has been minimized.
This comment has been minimized.
x86amx for i32x256 for AMX intrinsics
x86amx for i32x256 for AMX intrinsicsx86amx and i32x256 for AMX intrinsics
This comment has been minimized.
This comment has been minimized.
|
I think you can use |
|
That can be used to improve performance, I am not really focusing on performance in this PR. I want to currently emphasize the correctness of the codegen. |
|
Oh wait, I probably misunderstood your comment, you meant using the llvm declaration by itself. Yeah, that would be better, thanks for the info. I will update the impl when I get the chance |
I think you can just focus on non-overloaded functions for this PR. Overloaded functions and type checking that checking Rust function signatures using LLVM defined can be subsequent PRs. @rustbot author |
|
Reminder, once the PR becomes ready for a review, use |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@sayantn Taking the address of an intrinsic is invalid LLVM IR. |
I'm still a little busy, but I should be able to take this in this week whatever. |
|
The proposal makes sense to me. Great! I have a concern that rustc is supporting multiple LLVM versions. What about if we use an intrinsic that is deprecated in the new LLVM? I also wonder whether it might make sense to implement these checks directly in LLVM rather than rustc. This can be beneficial to other languages. And we don't need to consider things like AutoUpgraded. If we'd like, we can provide a lint function that helps rustc show better diagnostics rather than |
|
@dianqk thanks for the review ❤️
(In stdarch, we use the latest LLVM intrinsics, but that doesn't cause problems in the Rust CI, which uses the lowest supported LLVM version. This is because all the functions in stdarch are marked IMO it doesn't make much sense to add these checks to LLVM. For once, LLVM already has the opt-in I didn't get your comment about AutoUpgrade, I am using AutoUpgrade to determine if it deprecated or unknown (because |
This comment has been minimized.
This comment has been minimized.
| } | ||
| } | ||
|
|
||
| fn autocast<'ll>( |
There was a problem hiding this comment.
Can this function be moved in check_call?
There was a problem hiding this comment.
I don't think that is possible, because these autocasts etc should only be done for intrinsics. Also, I want to keep these changes as contained in intrinsics.rs as possible
There was a problem hiding this comment.
Should equate_ty be renamed to can_autocast or else? I don't think <2 x i1> equates to i8.
Do you think we could get rid of equate_ty(can_autocast)? Can we emit IntrinsicSignatureMismatch in autocast? This is not crucial. I have no clue which is better.
|
@dianqk thanks for the review, and sorry for the late reply.
No problems, idc about the names
Sorry, I don't get what you're trying to say here.
Thanks a lot, it should work. I'll let you know
I just copied whatever LLVM does when it |
Never mind. The current version is fine with me. I am just curious whether the |
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
|
Oh, I see. The |
Some updates on this, it worked perfectly, thanks a lot again! Moreover, I observed that the old zext shuffles produced code like (for kshiftlw k2, k0, 14
kshiftrw k2, k2, 14
kmovw eax, k2which is what clang does too. But the new shuffles produce asm like kmovw eax, k0
and al, 3I have no idea which is faster, but they seem comparable |
Hmm, I have no idea either, but this can be changed back if this is an issue. |
|
@bors r+ |
|
@bors try jobs=i686-msvc-2 |
View all comments
This PR changes how LLVM intrinsics are codegen
Explanation of the changes
Current procedure
This is the same for all functions, LLVM intrinsics are not treated specially
f32 (f32)due to the Rust signaturePros
Cons
-Zverify-llvm-irto it will fail compilation). I would expect this code to not compile at all instead of generating invalid IR.x86amxtype, and (almost) all intrinsics that have vectors ofi1types) can't be linked to at all. This is a (major?) roadblock in the AMX and AVX512 support in stdarch.-Zverify-llvm-irwon't complain. Eventually it will error out due to the non-existing function (courtesy of the linker). I don't think this is a behavior we want.What this PR does
LLVMIntrinsicGetTypeto directly get the function type of the intrinsic from LLVM.Note
This PR only focuses on non-overloaded intrinsics, overloaded can be done in a future PR
Regardless, the undermentioned functionalities work for all intrinsics
AutoUpgraded by LLVM. If not, that means it is an invalid intrinsic, and we error out.Pros
x86amxand injectingllvm.x86.cast.vector.to.tileandllvm.x86.cast.tile.to.vectors in callsite)Note
I don't intend for these bypasses to be permanent. A better approach will be introducing a
bf16type in Rust, and allowingrepr(simd)withbools to get Rust-nativei1xNs. These are meant to be short-time, as I mentioned, "bypass"es. They shouldn't cause any major breakage even if removed, aslink_llvm_intrinsicsis perma-unstable.This PR adds bypasses for
bf16(viai16),bf16xN(viai16xN) andi1xN(viaiM, whereMis the smallest power of 2 s.t.M >= N, unlessN <= 4, where we useM = 8). This will unblock AVX512-VP2INTERSECT and a lot of bf16 intrinsics in stdarch. This PR also automatically destructures structs if the types don't exactly match (this is required for us to start emitting hard errors on mismmatches).Cons
Possible ways to extend this to overloaded intrinsics (future)
Parse the mangled intrinsic name to get the type parameters
LLVM has a stable mangling of intrinsic names with type parameters (in
LLVMIntrinsicCopyOverloadedName2), so we can parse the name to get the type parameters, and then just do the same thing.Pros
Cons
TargetExttypes or identified structs, their name is a part of the mangling, making it impossible to reverse. Even more complexities arise when there are unnamed identified structs, as LLVM adds more mangling to the names.Use the
IITDescriptortable and the Rust function signatureWe can use the base name to get the
IITDescriptors of the corresponding intrinsic, and then manually implement the matching logic based on the Rust signature.Pros
TargetExttypes. Also, fun fact, Rust exports all struct types as literal structs (unless it is emitting LLVM IR, then it always uses named identified structs, with mangled names)Cons
llvm.sqrt.bf16until we havebf16types in Rust. Because if we are usingu16s (or any other type) asbf16s, then the matcher will deduce that the signature isu16 (u16)notbf16 (bf16)(which would lead to an error becauseu16is not a valid type parameter forllvm.sqrt), even though the intended type parameter is specified in the name.IITDescriptorKindsThese 2 approaches might give different results for same function. Let's take
The name-based approach will decide that the type parameter is
bf16, and the LLVM signature isi1 (bf16)and will inject some bitcasts at callsite.The
IITDescriptor-based approach will decide that the LLVM signature isi1 (u16), and will see that the name given doesn't match the expected name (llvm.is.constant.u16), and will error out.Reviews are welcome, as this is my first time actually contributing to
rustc@rustbot label T-compiler A-codegen A-LLVM
r? codegen