Skip to content

_BitInt implementation does not conform to x86-64 psABI regarding padding bits #62032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
leni536 opened this issue Apr 9, 2023 · 19 comments
Closed
Labels
ABI Application Binary Interface backend:X86 miscompilation

Comments

@leni536
Copy link

leni536 commented Apr 9, 2023

clang version: 16.0.0
compiler flags: -std=c2x -O2

Observed behavior

Given the following code:

unsigned char f(unsigned _BitInt(4)* ptr) {
    return *ptr;
}

unsigned char g(_BitInt(4)* ptr) {
    unsigned _BitInt(4) uint = *ptr;
    return uint;
}

The generated code for f and g does not mask out "unused" bits from *ptr:

f:                                      # @f
        movzx   eax, byte ptr [rdi]
        ret
g:                                      # @g
        movzx   eax, byte ptr [rdi]
        ret

https://godbolt.org/z/Ev8sbhb84

Expected behavior

The x86-64 psABI specifies that the unused bits within an object of _BitInt(N) type can take unspecified values:

The value of the unused bits beyond the width of the _BitInt(N) value but within the size of the _BitInt(N) are unspecified when stored in memory or register.

https://gitlab.com/x86-psABIs/x86-64-ABI/-/blob/3177443c4f5862d48f371d91ab36209f73cfe69c/x86-64-ABI/low-level-sys-info.tex#L297-299

This means that *ptr in both cases can have the object representation 0b00010000, representing the signed/unsigned _BitInt(N) value 0. But according to the generated code this object representation is passed as-is to the returned unsigned char in both cases, which translates to the return value 16.

As 0b00010000 is a valid object representation for representing the value 0 here, I would expect both functions to return 0 in this case.

In general this requires masking out the unused bits.

Notes

It looks like the current implementation assumes that valid object representations have all zeroes for the unused bits, and other representations are trap representations.

@royjacobson royjacobson added clang:frontend Language frontend issues, e.g. anything involving "Sema" miscompilation ABI Application Binary Interface and removed new issue labels Apr 10, 2023
@llvmbot
Copy link
Member

llvmbot commented Apr 10, 2023

@llvm/issue-subscribers-clang-frontend

@royjacobson royjacobson added backend:X86 and removed clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Apr 10, 2023
@llvmbot
Copy link
Member

llvmbot commented Apr 10, 2023

@llvm/issue-subscribers-backend-x86

@royjacobson
Copy link
Contributor

This looks to me like a backend bug. The IR we output is (https://godbolt.org/z/3bTnhaqbf):

define dso_local zeroext i8 @f(ptr nocapture noundef readonly %0) local_unnamed_addr #0 {
  %2 = load i4, ptr %0, align 1, !tbaa !6
  %3 = zext i4 %2 to i8
  ret i8 %3
}

and apparently the zext i4 %2 to i8 is optimized out?

@royjacobson
Copy link
Contributor

@royjacobson
Copy link
Contributor

cc @AaronBallman - If I understand correctly the discourse thread above, the psABI _BigInt might be currently unimplementable in LLVM?

@AaronBallman
Copy link
Collaborator

Errrrr, I hope that's not the case, but it sure looks like LLVM is doing unexpected things by removing that zext (I think that might be an LLVM bug -- we're expecting non-multiples-of-8-bit integer types to Just Work from the frontend perspective.)

@AaronBallman
Copy link
Collaborator

CC @FreddyLeaf @phoebewang as they know far, far more about the x86 backend than I do.

@nikic
Copy link
Contributor

nikic commented Apr 11, 2023

Yes, it looks like psABI is incompatible with LangRef:

When loading a value of a type like i20 with a size that is not an integral number of bytes, the result is undefined if the value was not originally written using a store of the same type.

When writing a value of a type like i20 with a size that is not an integral number of bytes, it is unspecified what happens to the extra bits that do not belong to the type, but they will typically be overwritten.

Non-byte-sized types in LLVM do not have a specified ABI, just that loads and stores have consistent semantics. In practice, those semantics are to zero-extend values to the byte-sized type (but with caveats due to store elision).

We would need to either change LangRef and backend lowerings, or make the frontend lower non-byte-sized BitInt stores to the next byte-sized width, the same as is done for _Bool.

@bjope
Copy link
Collaborator

bjope commented Apr 19, 2023

We would need to either change LangRef and backend lowerings, or make the frontend lower non-byte-sized BitInt stores to the next byte-sized width, the same as is done for _Bool.

Isn't it loads that are the problem here rather that stores?
Stores normally zero extend when padding to a larger type (at least in the target generic part of SelectionDAG). And for the x86-64 psABI that isn't even a requirement. I.e. stores fulfil the psABI.

IIUC, the problem described here is if some outside caller (not compiled by LLVM, or for example hand-written asm) isn't zero-extending before storing the bitint type to memory. Then a function compiled with LLVM currently assumes that the padding bits are zero when loading the bitint from memory, while there is no such guarantee according to the psABI.

@bjope
Copy link
Collaborator

bjope commented Nov 23, 2023

I guess there aren't really any news about this?

Trying to figure out if our downstream ABI should diverge from the standard and say that padding must be filled with zeroes when storing a _BitInt to memory (rather than saying that the bits are unspecified). That feels a bit weird, but as long as LLVM is assuming that the padding bits are zero when loading a _BitInt. then it is scary to say something else.

I'm not sure how many places in LLVM that actually make such assumptions. The one that I'm aware of is in SelectionDAGLegalize::LegalizeLoadOps, as there is this comment about it

    // The extra bits are guaranteed to be zero, since we stored them that
    // way.  A zext load from NVT thus automatically gives zext from SrcVT.

So maybe SelectionDAGLegalize::LegalizeLoadOps shouldn't make such assumption and instead make sure to mask away relevant bits.

And then we might wanna try to find places when we unneccessarily zero extend before stores, and maybe have some special mode when we instead write garbage to the padding to help detecting if there are more places that assume that the padding bits are zero when loading.

@AaronBallman
Copy link
Collaborator

I've reached out to someone from the psABI team here at Intel for comments but they're on sabbatical for another few days, so it might be a bit before I hear anything concrete.

@AaronBallman
Copy link
Collaborator

I heard back from @hjl-tools and he said:

This is a very old issue, not just related to _BitInt:
https://groups.google.com/g/ia32-abi/c/9H4BBrIdkmk/m/sjWw06ZPnS4J

When psABI says the unused bits are undefined, compiler is free to put any values in the unused field,
including 0. However, compiler shouldn’t assume the value of the unused bits are always 0. LLVM
producer behavior is psABI compliant. Just make sure that LLVM consumer also follows the psABI.

If zeroing the unused bits results in better code, we may change the psABI to specify that
the unused bits should be 0.

@hjl-tools
Copy link
Contributor

If we want to make a psABI change, please don't mention LLM/clang. Just make the proposal as a performance improvement issue.

@hjl-tools
Copy link
Contributor

I opened: https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues/16

@bjope
Copy link
Collaborator

bjope commented Feb 8, 2024

IIUC the padding bits become zero in memory (with LLVM) today, but that behavior is kind of implicit (at least when looking at the IR).

The IR would just say "store i15" and then it will be legalized at instruction selection into a byte-sized store. And LLVM currently write zeroes in the padding bits in general (not really knowing if the destination is a BitInt or not). Similarly a load would just be modelled as "load i5", and then there is some general assumption that padding bits are zero when legalizing the load into loading a number if full bytes.

If the standard would say explicitly that the padding bits should be zero, then I think (maybe) the LLVM IR should be explicit about that as well. So the IR for a 5-bit BitInt store should probably be emitted like this by the frontend

   %y = zext i5 %x to i8
   store i8 %y, ptr

Considering that such a zext already is added implicitly by the backend it kind of looks like it wouldn't cost much. Although I'm not fully sure what the optimization pipeline would think about it (here I'm thinking about load->store forwarding optimizations etc., but also cost calculations for inlining/vectorization etc.)

Then maybe we should model the loads differently as well (making sure that we read full bytes). The LLVM IR doesn't really say how a non-byte-sized load works (i.e. how it is aligned within the occupied bytes in memory), so that is kind of a weakness as well. But if we for example would do

   %x = load i8, ptr
   %y = trunc i8 to i5

then it could be hard to eliminate that trunc unless we could express that the loaded value has zeroes in the most significant bits. I think that we would need to add some way of annotating the load with such information.

So even if LLVM currently behaves such as the padding bits are zero for a BitInt, the modelling in IR to ensure such a behavior is a bit weak. And making the modelling stronger, without impacting codegen, isn't trivial.

@andykaylor
Copy link
Contributor

My understanding is that the front end is generally responsible for generating IR that will conform to the ABI of the target (with some basic expectations about how the backend will handle various constructs). In this case, I don't think the zext before the store is necessary for the ABI, so the front end shouldn't need to generate it (but it's OK if the backend puts it in there). For the load, I agree with @bjope's suggestion of generating an i8 load and then truncating to i4. My initial thought was to insert an explicit mask, but it looks like that is unnecessary as the optimizer inserts it where it makes sense based on the truncs: https://godbolt.org/z/n5nbY3onq

@nikic
Copy link
Contributor

nikic commented Feb 15, 2024

@andykaylor LLVM will generally refuse to optimize IR that mixes memory operations of different non-byte sizes. The load and store sides need to be symmetric, either both being i4 with implicit ext/trunc, or both being i8 with explicit ext/trunc, but not a mix of both.

@andykaylor
Copy link
Contributor

@nikic That makes sense. I was thinking of loads and stores in different places, but you're obviously right that the front end will need to extend the size so that we can handle cases where we can see the load and store together.

@Fznamznon
Copy link
Contributor

Note, after #91364 it is https://godbolt.org/z/dnK37dsYj .

@cor3ntin cor3ntin closed this as completed Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ABI Application Binary Interface backend:X86 miscompilation
Projects
None yet
Development

No branches or pull requests

10 participants