-
Notifications
You must be signed in to change notification settings - Fork 13.5k
_BitInt implementation does not conform to x86-64 psABI regarding padding bits #62032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-clang-frontend |
@llvm/issue-subscribers-backend-x86 |
This looks to me like a backend bug. The IR we output is (https://godbolt.org/z/3bTnhaqbf): define dso_local zeroext i8 @f(ptr nocapture noundef readonly %0) local_unnamed_addr #0 {
%2 = load i4, ptr %0, align 1, !tbaa !6
%3 = zext i4 %2 to i8
ret i8 %3
} and apparently the |
cc @AaronBallman - If I understand correctly the discourse thread above, the psABI _BigInt might be currently unimplementable in LLVM? |
Errrrr, I hope that's not the case, but it sure looks like LLVM is doing unexpected things by removing that |
CC @FreddyLeaf @phoebewang as they know far, far more about the x86 backend than I do. |
Yes, it looks like psABI is incompatible with LangRef:
Non-byte-sized types in LLVM do not have a specified ABI, just that loads and stores have consistent semantics. In practice, those semantics are to zero-extend values to the byte-sized type (but with caveats due to store elision). We would need to either change LangRef and backend lowerings, or make the frontend lower non-byte-sized BitInt stores to the next byte-sized width, the same as is done for _Bool. |
Isn't it loads that are the problem here rather that stores? IIUC, the problem described here is if some outside caller (not compiled by LLVM, or for example hand-written asm) isn't zero-extending before storing the bitint type to memory. Then a function compiled with LLVM currently assumes that the padding bits are zero when loading the bitint from memory, while there is no such guarantee according to the psABI. |
I guess there aren't really any news about this? Trying to figure out if our downstream ABI should diverge from the standard and say that padding must be filled with zeroes when storing a _BitInt to memory (rather than saying that the bits are unspecified). That feels a bit weird, but as long as LLVM is assuming that the padding bits are zero when loading a _BitInt. then it is scary to say something else. I'm not sure how many places in LLVM that actually make such assumptions. The one that I'm aware of is in SelectionDAGLegalize::LegalizeLoadOps, as there is this comment about it
So maybe SelectionDAGLegalize::LegalizeLoadOps shouldn't make such assumption and instead make sure to mask away relevant bits. And then we might wanna try to find places when we unneccessarily zero extend before stores, and maybe have some special mode when we instead write garbage to the padding to help detecting if there are more places that assume that the padding bits are zero when loading. |
I've reached out to someone from the psABI team here at Intel for comments but they're on sabbatical for another few days, so it might be a bit before I hear anything concrete. |
I heard back from @hjl-tools and he said: This is a very old issue, not just related to When psABI says the unused bits are undefined, compiler is free to put any values in the unused field, If zeroing the unused bits results in better code, we may change the psABI to specify that |
If we want to make a psABI change, please don't mention LLM/clang. Just make the proposal as a performance improvement issue. |
IIUC the padding bits become zero in memory (with LLVM) today, but that behavior is kind of implicit (at least when looking at the IR). The IR would just say "store i15" and then it will be legalized at instruction selection into a byte-sized store. And LLVM currently write zeroes in the padding bits in general (not really knowing if the destination is a BitInt or not). Similarly a load would just be modelled as "load i5", and then there is some general assumption that padding bits are zero when legalizing the load into loading a number if full bytes. If the standard would say explicitly that the padding bits should be zero, then I think (maybe) the LLVM IR should be explicit about that as well. So the IR for a 5-bit BitInt store should probably be emitted like this by the frontend
Considering that such a zext already is added implicitly by the backend it kind of looks like it wouldn't cost much. Although I'm not fully sure what the optimization pipeline would think about it (here I'm thinking about load->store forwarding optimizations etc., but also cost calculations for inlining/vectorization etc.) Then maybe we should model the loads differently as well (making sure that we read full bytes). The LLVM IR doesn't really say how a non-byte-sized load works (i.e. how it is aligned within the occupied bytes in memory), so that is kind of a weakness as well. But if we for example would do
then it could be hard to eliminate that trunc unless we could express that the loaded value has zeroes in the most significant bits. I think that we would need to add some way of annotating the load with such information. So even if LLVM currently behaves such as the padding bits are zero for a BitInt, the modelling in IR to ensure such a behavior is a bit weak. And making the modelling stronger, without impacting codegen, isn't trivial. |
My understanding is that the front end is generally responsible for generating IR that will conform to the ABI of the target (with some basic expectations about how the backend will handle various constructs). In this case, I don't think the zext before the store is necessary for the ABI, so the front end shouldn't need to generate it (but it's OK if the backend puts it in there). For the load, I agree with @bjope's suggestion of generating an i8 load and then truncating to i4. My initial thought was to insert an explicit mask, but it looks like that is unnecessary as the optimizer inserts it where it makes sense based on the truncs: https://godbolt.org/z/n5nbY3onq |
@andykaylor LLVM will generally refuse to optimize IR that mixes memory operations of different non-byte sizes. The load and store sides need to be symmetric, either both being i4 with implicit ext/trunc, or both being i8 with explicit ext/trunc, but not a mix of both. |
@nikic That makes sense. I was thinking of loads and stores in different places, but you're obviously right that the front end will need to extend the size so that we can handle cases where we can see the load and store together. |
Note, after #91364 it is https://godbolt.org/z/dnK37dsYj . |
clang version: 16.0.0
compiler flags: -std=c2x -O2
Observed behavior
Given the following code:
The generated code for
f
andg
does not mask out "unused" bits from*ptr
:https://godbolt.org/z/Ev8sbhb84
Expected behavior
The x86-64 psABI specifies that the unused bits within an object of
_BitInt(N)
type can take unspecified values:https://gitlab.com/x86-psABIs/x86-64-ABI/-/blob/3177443c4f5862d48f371d91ab36209f73cfe69c/x86-64-ABI/low-level-sys-info.tex#L297-299
This means that
*ptr
in both cases can have the object representation0b00010000
, representing the signed/unsigned_BitInt(N)
value0
. But according to the generated code this object representation is passed as-is to the returned unsigned char in both cases, which translates to the return value16
.As
0b00010000
is a valid object representation for representing the value0
here, I would expect both functions to return0
in this case.In general this requires masking out the unused bits.
Notes
It looks like the current implementation assumes that valid object representations have all zeroes for the unused bits, and other representations are trap representations.
The text was updated successfully, but these errors were encountered: