[clang] Use different memory layout type for _BitInt(N) in LLVM IR #91364

Fznamznon · 2024-05-07T17:43:15Z

There are two problems with _BitInt prior to this patch:

For at least some values of N, we cannot use LLVM's iN for the type of struct elements, array elements, allocas, global variables, and so on, because the LLVM layout for that type does not match the high-level layout of _BitInt(N).
Example: Currently for i128:128 targets correct implementation is possible either for __int128 or for _BitInt(129+) with lowering to iN, but not both, since we have now correct implementation of __int128 in place after a21abc7.
When this happens, opaque [M x i8] types used, where M = sizeof(_BitInt(N)).
LLVM doesn't guarantee any particular extension behavior for integer types that aren't a multiple of 8. For this reason, all _BitInt types are now have in-memory representation that is a whole number of bytes. I.e. for example _BitInt(17) now will have memory layout type i32.

This patch also introduces concept of load/store type and adds an API to CodeGenTypes that returns the IR type that should be used for load and store operations. This is particularly useful for the case when a _BitInt ends up having array of bytes as memory layout type. For _BitInt(N), let M = sizeof(_BitInt(N)), and let BITS = M * 8. Loads and stores of iM would both (1) produce far better code from the backends and (2) be far more optimizable by IR passes than loads and stores of [M x i8].

Fixes #85139
Fixes #83419

Currently for i128:128 targets either __int128 or a correct _BitInt(129+) implementation possible with lowering to iN, but not both. Since we have now correct implementation of __int128, this patch attempts to fix codegen issues by lowering _BitInt(129+) types to an array of i8 for "memory", similarly how it is happening for bools now. Fixes llvm#85139 Fixes llvm#83419

llvmbot · 2024-05-07T17:43:47Z

@llvm/pr-subscribers-hlsl
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Mariya Podchishchaeva (Fznamznon)

Changes

Currently for i128:128 targets correct implementation is possible either for __int128 or for _BitInt(129+) with lowering to iN, but not both. Since we have now correct implementation of __int128 in place after a21abc7, this patch attempts to fix codegen issues by lowering _BitInt(129+) types to an array of i8 for "memory", similarly how it is happening for bools now.

Fixes #85139
Fixes #83419

Full diff: https://github.com/llvm/llvm-project/pull/91364.diff

6 Files Affected:

(modified) clang/lib/CodeGen/CGExpr.cpp (+8)
(modified) clang/lib/CodeGen/CGExprConstant.cpp (+12)
(modified) clang/lib/CodeGen/CGExprScalar.cpp (+7)
(modified) clang/lib/CodeGen/CodeGenTypes.cpp (+6)
(modified) clang/test/CodeGen/ext-int-cc.c (+1-1)
(modified) clang/test/CodeGen/ext-int.c (+93-4)

diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index d96c7bb1e568..7e631e469a88 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1989,6 +1989,14 @@ llvm::Value *CodeGenFunction::EmitLoadOfScalar(Address Addr, bool Volatile,
     return EmitAtomicLoad(AtomicLValue, Loc).getScalarVal();
   }
 
+  if (const auto *BIT = Ty->getAs<BitIntType>()) {
+    if (BIT->getNumBits() > 128) {
+      // Long _BitInt has array of bytes as in-memory type.
+      llvm::Type *NewTy = ConvertType(Ty);
+      Addr = Addr.withElementType(NewTy);
+    }
+  }
+
   llvm::LoadInst *Load = Builder.CreateLoad(Addr, Volatile);
   if (isNontemporal) {
     llvm::MDNode *Node = llvm::MDNode::get(
diff --git a/clang/lib/CodeGen/CGExprConstant.cpp b/clang/lib/CodeGen/CGExprConstant.cpp
index 94962091116a..98ab1e23d128 100644
--- a/clang/lib/CodeGen/CGExprConstant.cpp
+++ b/clang/lib/CodeGen/CGExprConstant.cpp
@@ -1774,6 +1774,18 @@ llvm::Constant *ConstantEmitter::emitForMemory(CodeGenModule &CGM,
     return Res;
   }
 
+  if (const auto *BIT = destType->getAs<BitIntType>()) {
+    if (BIT->getNumBits() > 128) {
+      // Long _BitInt has array of bytes as in-memory type.
+      ConstantAggregateBuilder Builder(CGM);
+      llvm::Type *DesiredTy = CGM.getTypes().ConvertTypeForMem(destType);
+      auto *CI = cast<llvm::ConstantInt>(C);
+      llvm::APInt Value = CI->getValue();
+      Builder.addBits(Value, /*OffsetInBits=*/0, /*AllowOverwrite=*/false);
+      return Builder.build(DesiredTy, /*AllowOversized*/ false);
+    }
+  }
+
   return C;
 }
 
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index d84531959b50..717d47d20dea 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -5348,6 +5348,13 @@ Value *ScalarExprEmitter::VisitVAArgExpr(VAArgExpr *VE) {
     return llvm::UndefValue::get(ArgTy);
   }
 
+  if (const auto *BIT = Ty->getAs<BitIntType>()) {
+    if (BIT->getNumBits() > 128) {
+      // Long _BitInt has array of bytes as in-memory type.
+      ArgPtr = ArgPtr.withElementType(ArgTy);
+    }
+  }
+
   // FIXME Volatility.
   llvm::Value *Val = Builder.CreateLoad(ArgPtr);
 
diff --git a/clang/lib/CodeGen/CodeGenTypes.cpp b/clang/lib/CodeGen/CodeGenTypes.cpp
index e8d75eda029e..55c618677ddb 100644
--- a/clang/lib/CodeGen/CodeGenTypes.cpp
+++ b/clang/lib/CodeGen/CodeGenTypes.cpp
@@ -114,6 +114,12 @@ llvm::Type *CodeGenTypes::ConvertTypeForMem(QualType T, bool ForBitField) {
     return llvm::IntegerType::get(getLLVMContext(),
                                   (unsigned)Context.getTypeSize(T));
 
+  if (const auto *BIT = T->getAs<BitIntType>()) {
+    if (BIT->getNumBits() > 128)
+      R = llvm::ArrayType::get(CGM.Int8Ty,
+                               (unsigned)Context.getTypeSize(T) / 8);
+  }
+
   // Else, don't map it.
   return R;
 }
diff --git a/clang/test/CodeGen/ext-int-cc.c b/clang/test/CodeGen/ext-int-cc.c
index 001e866d34b4..83f20dcb0667 100644
--- a/clang/test/CodeGen/ext-int-cc.c
+++ b/clang/test/CodeGen/ext-int-cc.c
@@ -131,7 +131,7 @@ void ParamPassing3(_BitInt(15) a, _BitInt(31) b) {}
 // are negated. This will give an error when a target does support larger
 // _BitInt widths to alert us to enable the test.
 void ParamPassing4(_BitInt(129) a) {}
-// LIN64: define{{.*}} void @ParamPassing4(ptr byval(i129) align 8 %{{.+}})
+// LIN64: define{{.*}} void @ParamPassing4(ptr byval([24 x i8]) align 8 %{{.+}})
 // WIN64: define dso_local void @ParamPassing4(ptr %{{.+}})
 // LIN32: define{{.*}} void @ParamPassing4(ptr %{{.+}})
 // WIN32: define dso_local void @ParamPassing4(ptr %{{.+}})
diff --git a/clang/test/CodeGen/ext-int.c b/clang/test/CodeGen/ext-int.c
index 4cb399d108f2..a6a632bd985d 100644
--- a/clang/test/CodeGen/ext-int.c
+++ b/clang/test/CodeGen/ext-int.c
@@ -1,12 +1,19 @@
-// RUN: %clang_cc1 -triple x86_64-gnu-linux -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,CHECK64
-// RUN: %clang_cc1 -triple x86_64-windows-pc -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,CHECK64
-// RUN: %clang_cc1 -triple i386-gnu-linux -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,LIN32
-// RUN: %clang_cc1 -triple i386-windows-pc -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,WIN32
+// RUN: %clang_cc1 -std=c23 -triple x86_64-gnu-linux -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,CHECK64
+// RUN: %clang_cc1 -std=c23 -triple x86_64-windows-pc -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,CHECK64
+// RUN: %clang_cc1 -std=c23 -triple i386-gnu-linux -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,LIN32
+// RUN: %clang_cc1 -std=c23 -triple i386-windows-pc -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,WIN32
+
+// CHECK64: %struct.S1 = type { i17, [4 x i8], [24 x i8] }
+// CHECK64: %struct.S2 = type { [40 x i8], i32, [4 x i8] }
 
 //GH62207
 unsigned _BitInt(1) GlobSize1 = 0;
 // CHECK: @GlobSize1 = {{.*}}global i1 false
 
+// CHECK64: @__const.foo.A = private unnamed_addr constant { i17, [4 x i8], <{ i8, [23 x i8] }> } { i17 1, [4 x i8] undef, <{ i8, [23 x i8] }> <{ i8 -86, [23 x i8] zeroinitializer }> }, align 8
+// CHECK64: @BigGlob = {{.*}}global <{ i8, i8, [38 x i8] }> <{ i8 -68, i8 2, [38 x i8] zeroinitializer }>, align 8
+// CHECK64: @f.p = internal global <{ i8, i8, [22 x i8] }> <{ i8 16, i8 39, [22 x i8] zeroinitializer }>, align 8
+
 void GenericTest(_BitInt(3) a, unsigned _BitInt(3) b, _BitInt(4) c) {
   // CHECK: define {{.*}}void @GenericTest
   int which = _Generic(a, _BitInt(3): 1, unsigned _BitInt(3) : 2, _BitInt(4) : 3);
@@ -62,3 +69,85 @@ void Size1ExtIntParam(unsigned _BitInt(1) A) {
   // CHECK: store i1 %[[PARAM_LOAD]], ptr %[[IDX]]
   B[2] = A;
 }
+
+#if __BITINT_MAXWIDTH__ > 128
+struct S1 {
+  _BitInt(17) A;
+  _BitInt(129) B;
+};
+
+int foo(int a) {
+  // CHECK64: %A1 = getelementptr inbounds %struct.S1, ptr %B, i32 0, i32 0
+  // CHECK64: store i17 1, ptr %A1, align 8
+  // CHECK64: %B2 = getelementptr inbounds %struct.S1, ptr %B, i32 0, i32 2
+  // CHECK64: %0 = load i32, ptr %a.addr, align 4
+  // CHECK64: %conv = sext i32 %0 to i129
+  // CHECK64: store i129 %conv, ptr %B2, align 8
+  // CHECK64: %B3 = getelementptr inbounds %struct.S1, ptr %A, i32 0, i32 2
+  // CHECK64: %1 = load i129, ptr %B3, align 8
+  // CHECK64: %conv4 = trunc i129 %1 to i32
+  // CHECK64: %B5 = getelementptr inbounds %struct.S1, ptr %B, i32 0, i32 2
+  // CHECK64: %2 = load i129, ptr %B5, align 8
+  struct S1 A = {1, 170};
+  struct S1 B = {1, a};
+  return (int)A.B + (int)B.B;
+}
+
+struct S2 {
+  _BitInt(257) A;
+  int B;
+};
+
+_BitInt(257) bar() {
+  // CHECK64: define {{.*}}void @bar(ptr {{.*}} sret([40 x i8]) align 8 %[[RET:.+]])
+  // CHECK64: %A = alloca %struct.S2, align 8
+  // CHECK64: %0 = getelementptr inbounds { <{ i8, [39 x i8] }>, i32, [4 x i8] }, ptr %A, i32 0, i32 0
+  // CHECK64: %1 = getelementptr inbounds <{ i8, [39 x i8] }>, ptr %0, i32 0, i32 0
+  // CHECK64: store i8 1, ptr %1, align 8
+  // CHECK64: %2 = getelementptr inbounds { <{ i8, [39 x i8] }>, i32, [4 x i8] }, ptr %A, i32 0, i32 1
+  // CHECK64: store i32 10000, ptr %2, align 8
+  // CHECK64: %A1 = getelementptr inbounds %struct.S2, ptr %A, i32 0, i32 0
+  // CHECK64: %3 = load i257, ptr %A1, align 8
+  // CHECK64: store i257 %3, ptr %[[RET]], align 8
+  struct S2 A = {1, 10000};
+  return A.A;
+}
+
+void TakesVarargs(int i, ...) {
+  // CHECK64: define{{.*}} void @TakesVarargs(i32
+__builtin_va_list args;
+__builtin_va_start(args, i);
+
+_BitInt(160) A = __builtin_va_arg(args, _BitInt(160));
+  // CHECK64: %[[ARG:.+]] = load i160
+  // CHECK64: store i160 %[[ARG]], ptr %A, align 8
+}
+
+_BitInt(129) *f1(_BitInt(129) *p) {
+  // CHECK64: getelementptr inbounds [24 x i8], {{.*}} i64 1
+  return p + 1;
+}
+
+char *f2(char *p) {
+  // CHECK64: getelementptr inbounds i8, {{.*}} i64 24
+  return p + sizeof(_BitInt(129));
+}
+
+auto BigGlob = (_BitInt(257))700;
+// CHECK64: define {{.*}}void @foobar(ptr {{.*}} sret([40 x i8]) align 8 %[[RET1:.+]])
+_BitInt(257) foobar() {
+  // CHECK64: %A = alloca [40 x i8], align 8
+  // CHECK64: %0 = load i257, ptr @BigGlob, align 8
+  // CHECK64: %add = add nsw i257 %0, 1
+  // CHECK64: store i257 %add, ptr %A, align 8
+  // CHECK64: %1 = load i257, ptr %A, align 8
+  // CHECK64: store i257 %1, ptr %[[RET1]], align 8
+  _BitInt(257) A = BigGlob + 1;
+  return A;
+}
+
+void f() {
+  static _BitInt(130) p = {10000};
+}
+
+#endif

erichkeane

This is unfortunate, and will likely result in the FPGAs needing to generate extra bits here, so this is somewhat harmful in that regard.

It seems to me this is a case where we're trying to work -around an llvm bug? Should we just be fixing that instead?

efriedma-quic

Maybe add a helper somewhere to check "is this type a bitint wider than 128 bits"?

efriedma-quic · 2024-05-07T17:49:03Z

clang/lib/CodeGen/CGExprConstant.cpp

+      // Long _BitInt has array of bytes as in-memory type.
+      ConstantAggregateBuilder Builder(CGM);
+      llvm::Type *DesiredTy = CGM.getTypes().ConvertTypeForMem(destType);
+      auto *CI = cast<llvm::ConstantInt>(C);


I'm not sure this cast is guaranteed to succeed? At least in some cases, we emit constant expressions involving a ptrtoint. Maybe at the widths in question, that can't happen, but this deserves a comment explaining what's going on.

I've added a comment. I'm not able to get a ptrtoint in a constant expression involving a big _BitInt.

How about a "small" _BitInt ? The comment starts

// LLVM type doesn't match AST type only for big enough _BitInts,

and for AArch32 and AArch64 we are going to have a non-matching LLVM types even for "small" _BitInts - for AArch32 because the ABI wants the padding bing in-memory representation to contain zero or the sign-bit, and for both we'd like to emit loads/stores in bigger chunks, e.g. i17 is a single i32 load store, as opposed to two separate accesses to i16 and i8.

The test case here is just going to be something like _SomeSplitBitIntType x = (unsigned long) &someVariable;. What code do we actually produce for this? Sometimes we'll be able to fall back on dynamic initialization, but that's not always an option.

Ideally, it's just invalid to do something like that. It certainly needs to be diagnosed if the integer type is narrower than the pointer, and wider is also problematic, although less so and in a different way.

Well, it seems it doesn't depend on the size of _BitInt. Using something like _SomeSplitBitIntType x = (unsigned long) &someVariable; either fails if I for example apply constexpr or falls back on dynamic initialization. So I changed the comment to make it more generic.

clang/lib/CodeGen/CGExprScalar.cpp

efriedma-quic · 2024-05-07T18:08:34Z

It seems to me this is a case where we're trying to work -around an llvm bug? Should we just be fixing that instead?

You mean, revert https://reviews.llvm.org/D86310 ? Making any changes in LLVM here is painful; I'd rather not revisit that. CC @hvdijk @rnk

erichkeane · 2024-05-07T18:11:38Z

It seems to me this is a case where we're trying to work -around an llvm bug? Should we just be fixing that instead?

You mean, revert https://reviews.llvm.org/D86310 ? Making any changes in LLVM here is painful; I'd rather not revisit that. CC @hvdijk @rnk

I didn't, no, but I hadn't seen all that conversation.

Aaron has explained a bit more of the context here, and I'm finding myself pretty confused/out of the loop. As this is effectively all codegen, I suspect you, plus your CCs are the best ones to review this. I don't see a problem except for the FPGA folks to this, though between:

1- FPGA folks rarely/ever use large types like this if they can help it.
2- The FPGA group being spun off from Intel, meaning the original stakeholders are all gone
and 3- Me no longer being at Intel

I don't think I have strong feelings here.

efriedma-quic · 2024-05-07T18:16:51Z

I don't think FPGA folks will run into any practical issue with this; this only impacts the in-memory types, and backends shouldn't really be using in-memory types for anything anyways.

hvdijk · 2024-05-07T18:20:44Z

clang/lib/CodeGen/CGExpr.cpp

@@ -1989,6 +1989,14 @@ llvm::Value *CodeGenFunction::EmitLoadOfScalar(Address Addr, bool Volatile,
    return EmitAtomicLoad(AtomicLValue, Loc).getScalarVal();
  }

+  if (const auto *BIT = Ty->getAs<BitIntType>()) {
+    if (BIT->getNumBits() > 128) {


For a number of bits >64, <=128, LLVM's iN type will have identical representation to Clang _BitInt(N) but different alignment. I think this is fine, I think nothing needs their alignment to match Clang's, but could you double-check to make sure you agree?

These types remain unchanged.

hvdijk · 2024-05-07T18:28:53Z

Thanks for doing this, it's unfortunate that Clang is in a rather broken state with these types right now and it will be good to see improvement. I think the approach you're taking here is the only approach that will work.

rnk · 2024-05-07T19:58:40Z

I played with the idea of using LLVM packed structs (<{ i129 }>) to represent something like this, but they don't work the way I expected them to do: https://godbolt.org/z/M6hMYYhax

LLVM DataLayout's idea of sizeof(i129) is still rounded up from 17 bytes to 32 bytes.

Using byte arrays for the in-memory type should work, so it's probably the best path forward.

rjmccall · 2024-05-07T21:36:44Z

Hmm. I think this is actually pretty different from the bool pattern. Suppose we're talking about _BitInt(N). Let BYTES := sizeof(_BitInt(N)), and let BITS := BYTES * 8.

The problem being presented here is this:

For at least some values of N, we cannot use LLVM's iN for the type of struct elements, array elements, allocas, global variables, and so on, because the LLVM layout for that type does not match the high-level layout of _BitInt(N). The only available type that does match the memory layout appears to be [BYTES x i8].

However, it doesn't follow from the need to use [BYTES x i8] for memory layout that we have to use [BYTES x i8] for loads and stores. IIUC, loads and stores of both iN and iBITS are in fact required to only touch BYTES bytes and so should be valid. It is near-certain that loads and stores of either of those types would both (1) produce far better code from the backends and (2) be far more optimizable by IR passes than loads and stores of [BYTES x i8].

bool does run into (1) because of targets like PPC where sizeof(bool) == 4. However, we still use i8 as the in-memory type for bool on other targets. Partly, this is to discourage portability bugs where people write IR-gen code that doesn't handle the PPC pattern. But IIRC the main reason is actually to solve this other problem:

LLVM doesn't guarantee any particular extension behavior for integer types that aren't a multiple of 8, but ABIs do generally require objects of type bool to have all bits valid.

I expect that problem (2) also applies to _BitInt.

The upshot is that code like _BitInt(129) x = v; needs to be emitted something like this:

  %x = alloca [12 x i8]      # assuming for the sake of argument that sizeof(_BitInt(129)) == 12
  %storedv = sext i129 %v to i192  # or zext depending on signedness
  store i192 %storedv, ptr %x

Edit: I originally defined BYTES as ceil(N/8), but it clearly has to be sizeof(_BitInt(N)), and I expect the ABI expects extension out to that size as well.

rjmccall · 2024-05-07T21:59:27Z

If you want to do things that way, you will need to (1) generalize CodeGenTypes with a new API that will return this load/store type when applicable and (2) look at all the places we call ConvertTypeForMem, EmitToMemory, and EmitFromMemory to make sure they do the right things.

You definitely should not be hard-coding 128 in a bunch of places. The load/store type should always be iBITS, and the memory type should either be iBITS or [BYTES x i8] depending on whether the former has the right layout characteristics in the LLVM data layout.

efriedma-quic · 2024-05-07T22:39:26Z

You're suggesting we should fork ConvertTypeForMem into two functions? So there's actually three types: the "register" type, the "load/store" type, and the "in-memory" type. I guess that makes sense from a theoretical perspective, but... as a practical matter, I'm not sure how many places need to call the proposed "ConvertTypeForLoadStore".

In EmitLoadOfScalar(), instead of checking for BitInt, you just unconditionally do Addr = Addr.withElementType(ConvertTypeForLoadStore(Ty));. Logical cleanup, I guess. In EmitStoreOfScalar, you don't really need the interface because you can assume the result of EmitToMemory() has the load/store type. And then... what else calls it?

rjmccall · 2024-05-07T23:19:57Z

My experience is that compiler writers are really good at hacking in special cases to make their test cases work and really bad at recognizing that their case isn't as special as they think. There are three types already called out for special treatment in ConvertTypeForMem, of which two are handled in EmitFromMemory and only one is handled in EmitToMemory. I want to set up a pattern that the next person with this sort of problem can follow. It doesn't have to be exactly what I suggested above, but it should be a real pattern.

Fznamznon · 2024-05-08T17:54:55Z

Thank you everyone for the feedback. I'm working on applying.

momchil-velikov · 2024-05-29T13:09:53Z

clang/lib/CodeGen/CGExpr.cpp

+  if (const auto *BIT = Ty->getAs<BitIntType>()) {
+    if (BIT->getNumBits() > 128) {
+      // Long _BitInt has array of bytes as in-memory type.
+      llvm::Type *NewTy = ConvertType(Ty);


Shouldn't we call calling ConvertTypeForMem here?

The idea was to load not array, but iN, so ConvertType here was intentional. However I'm updating this patch soon, it will be using special load/store type whose idea is described by #91364 (comment) .

Oh, I see. It looks close to what we are trying to do with #93495, which is:

create in-memory representations according to the target ABI

improve efficiency of loads/stores, e.g. load/store of i18 in LLVM must touch just 3 bytes, so a compiler would emit one 16-bit load and one 8-bit load, but if i18 comes from _BitInt(18) then a single 32-bit load would work better.

This patch was mostly intended to fix codegen issues when it comes to big _BitInt types (>128 for 64bit targets), however I'm adding new idea of load/store type, so that seems close.

github-actions · 2024-05-29T13:29:23Z

✅ With the latest revision this PR passed the C/C++ code formatter.

clang/test/CodeGen/arm-abi-vector.c

clang/lib/CodeGen/CGExprScalar.cpp

rjmccall

This is generally looking great, and I think it's ready to go as soon as you can finish the tests. (You said you weren't able to update all the tests — did you have questions about the remaining tests?)

I did have a thought, though. Are we confident that the in-memory layout that LLVM is using for these large integer types matches the layout specified by the ABI? I know this patch makes the overall sizes match, but there's also an endianness question. When LLVM stores an i96, I assume it always stores them using the overall endianness of the target; for example, on i386, it might do three 32-bit stores with the low 32 bits at offset 0, the middle 32 bits at offset 4, and the high 32 bits at offset 8. I just want to make sure that the ABI specification for _BitInt always matches that. In particular, I'm worried that it might do some middle-endian thing where it breaks the integer into chunks and then stores those chunks in little-endian order even on a big-endian machine. (That is generally the right thing to do for BigInt types because most arithmetic operations access the chunks in little-endian order, and doing adjacent memory accesses in increasing order is generally more architecture-friendly.)

clang/lib/CodeGen/CodeGenTypes.cpp

rjmccall · 2024-07-10T19:46:10Z

clang/lib/CodeGen/CodeGenTypes.cpp

@@ -107,17 +107,52 @@ llvm::Type *CodeGenTypes::ConvertTypeForMem(QualType T, bool ForBitField) {
    return llvm::IntegerType::get(FixedVT->getContext(), BytePadded);
  }

-  // If this is a bool type, or a bit-precise integer type in a bitfield
-  // representation, map this integer to the target-specified size.


Let's keep this comment; we just need to update it a little:

// If T is _Bool or a _BitInt type, ConvertType will produce an IR type // with the exact semantic bit-width of the AST type; for example, // _BitInt(17) will turn into i17. In memory, however, we need to store // such values extended to their full storage size as decided by AST // layout; this is an ABI requirement. Ideally, we would always use an // integer type that's just the bit-size of the AST type; for example, if // sizeof(_BitInt(17)) == 4, _BitInt(17) would turn into i32. That is what's // returned by convertTypeForLoadStore. However, that type does not // always satisfy the size requirement on memory representation types // describe above. For example, a 32-bit platform might reasonably set // sizeof(_BitInt(65)) == 12, but i96 is likely to have to have an alloc size // of 16 bytes in the LLVM data layout. In these cases, we simply return // a byte array of the appropriate size.

Added, thanks.

AaronBallman · 2024-07-11T12:50:40Z

This is generally looking great, and I think it's ready to go as soon as you can finish the tests. (You said you weren't able to update all the tests — did you have questions about the remaining tests?)

I did have a thought, though. Are we confident that the in-memory layout that LLVM is using for these large integer types matches the layout specified by the ABI? I know this patch makes the overall sizes match, but there's also an endianness question. When LLVM stores an i96, I assume it always stores them using the overall endianness of the target; for example, on i386, it might do three 32-bit stores with the low 32 bits at offset 0, the middle 32 bits at offset 4, and the high 32 bits at offset 8. I just want to make sure that the ABI specification for _BitInt always matches that. In particular, I'm worried that it might do some middle-endian thing where it breaks the integer into chunks and then stores those chunks in little-endian order even on a big-endian machine. (That is generally the right thing to do for BigInt types because most arithmetic operations access the chunks in little-endian order, and doing adjacent memory accesses in increasing order is generally more architecture-friendly.)

FWIW, I was chasing down ABI documents yesterday, and found:

x86-64 (https://gitlab.com/x86-psABIs/x86-64-ABI):

_BitInt(N) types are signed by default, and unsigned _BitInt(N) types
are unsigned.
• _BitInt(N) types are stored in little-endian order in memory. Bits in each byte
are allocated from right to left.
• For N <= 64, they have the same size and alignment as the smallest of (signed and
unsigned) char, short, int, long and long long types that can contain them.
• For N > 64, they are treated as struct of 64-bit integer chunks. The number of
chunks is the smallest number that can contain the type. _BitInt(N) types are
byte-aligned to 64 bits. The size of these types is the smallest multiple of the 64-bit
chunks greater than or equal to N.
• The value of the unused bits beyond the width of the _BitInt(N) value but within
the size of the _BitInt(N) are unspecified when stored in memory or register.

ARM 32-bit (https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst):

_BitInt(N <= 64) 	Smallest of the signed Fundamental Integral Data Types where byte-size*8 >= N. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. Non-significant bits within the Machine Type are sign-extended.
unsigned _BitInt(N <= 64) 	Smallest of the unsigned Fundamental Integral Data Types where byte-size*8 >= N. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. Non-significant bits within the Machine Type are zero-extended.
_BitInt(N > 64) 	Allocated as if unsigned int64_t[M] array where M*64 >= N. Last element contains sign bit. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. The lower addressed double-word contains the least significant bits of the type on a little-endian view and the most significant bits of the type in a big-endian view. Non-significant bits within the last double-word are sign-extended.
unsigned _Bitint(N > 64) 	Allocated as if unsigned int64_t[M] where M*64 >= N. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. The lower addressed double-word contains the least significant bits of the type on a little-endian view and the most significant bits of the type in a big-endian view. Non-significant bits within the last double-word are zero-extended.

ARM 64-bit (https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst):

_BitInt(N <= 128) 	Smallest of the signed Fundamental Integral Data Types where byte-size*8 >= N. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. Non-significant bits within the Machine Type are unspecified.
unsigned _BitInt(N <= 128) 	Smallest of the unsigned Fundamental Integral Data Types where byte-size*8 >= N. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. Non-significant bits within the Machine Type are unspecified.
_BitInt(N > 128) 	Mapped as if unsigned __int128[M] array where M*128 >= N. Last element contains sign bit. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. The lower addressed quad-word contains the least significant bits of the type on a little-endian view and the most significant bits of the type in a big-endian view. Non-significant bits within the last quad-word are unspecified.
unsigned _Bitint(N > 128) 	Mapped as if unsigned __int128[M] where M*128 >= N. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. The lower addressed quad-word contains the least significant bits of the type on a little-endian view and the most significant bits of the type in a big-endian view. Non-significant bits within the last quad-word are unspecified.

The latest RISC-V, LoongArch, and CSKY ABI documents I could find did not mention _BitInt. I could not find any modern ABI document for PowerPC (power.org seems to no longer be about powerpc), but the one on Internet Archive also does not mention _BitInt.

Co-authored-by: John McCall <[email protected]>

…-project into long-bitint-align

rjmccall · 2024-07-11T18:08:30Z

Okay, so x86_64 describes it in byte terms and says they're little-endian, which is consistent with the overall target. Interestingly, it does not guarantee the content of the excess bits. The code-generation in this patch is consistent with that: the extension we do is unnecessary but allowed, and then we truncate it away after load. If we ever add some way to tell the backend that a truncation is known to be reversing a sign/zero-extension, we'll need to not set it on this target.

32-bit and 64-bit ARM describe it in terms of smaller units, but the units are expressly laid out according to the overall endianness of the target, which composes to mean that the bytes overall are also laid out according to that endianness.

rjmccall · 2024-07-11T19:03:10Z

Given all that, I feel pretty comfortable relying on using LLVM's i96 stores and so on. I do worry some that we're eventually going to run into a target where the _BitInt ABI does not match what LLVM wants to generate for i96 load/store, but we should be able to generalize this so that targets can override the _BitInt operations pretty easily.

clang/lib/CodeGen/CGExprConstant.cpp

rjmccall

LGTM

Co-authored-by: John McCall <[email protected]>

nikic · 2024-07-12T07:25:56Z

Okay, so x86_64 describes it in byte terms and says they're little-endian, which is consistent with the overall target. Interestingly, it does not guarantee the content of the excess bits. The code-generation in this patch is consistent with that: the extension we do is unnecessary but allowed, and then we truncate it away after load. If we ever add some way to tell the backend that a truncation is known to be reversing a sign/zero-extension, we'll need to not set it on this target.

FYI this already exists in the form of trunc nuw / trunc nsw. (Though it's not fully optimized yet.)

momchil-velikov · 2024-07-12T15:06:51Z

This solves 5-6 issues we had downstream, many thanks!

rjmccall · 2024-07-12T17:51:24Z

Okay, so x86_64 describes it in byte terms and says they're little-endian, which is consistent with the overall target. Interestingly, it does not guarantee the content of the excess bits. The code-generation in this patch is consistent with that: the extension we do is unnecessary but allowed, and then we truncate it away after load. If we ever add some way to tell the backend that a truncation is known to be reversing a sign/zero-extension, we'll need to not set it on this target.

FYI this already exists in the form of trunc nuw / trunc nsw. (Though it's not fully optimized yet.)

Ah, neat. Mariya, would you mind looking into setting this properly on the truncates we're doing here? It'd be fine to do that as a follow-up; no need to hold up this PR for it. You'll need some kind of target hook to tell us whether to set it or not. Probably that ought to go in the Basic TargetInfo just so all of the target-specific ABI configuration is done in one place.

Fznamznon · 2024-07-15T06:32:15Z

Mariya, would you mind looking into setting this properly on the truncates we're doing here? It'd be fine to do that as a follow-up; no need to hold up this PR for it. You'll need some kind of target hook to tell us whether to set it or not. Probably that ought to go in the Basic TargetInfo just so all of the target-specific ABI configuration is done in one place.

Sure.

Fznamznon requested review from rjmccall, AaronBallman, erichkeane and efriedma-quic May 7, 2024 17:43

llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels May 7, 2024

Fznamznon requested a review from hvdijk May 7, 2024 17:46

erichkeane reviewed May 7, 2024

View reviewed changes

efriedma-quic reviewed May 7, 2024

View reviewed changes

hvdijk reviewed May 7, 2024

View reviewed changes

efriedma-quic mentioned this pull request May 28, 2024

[Clang][AArch64][ARM]: Fix Inefficient loads/stores of _BitInt(N) #93495

Closed

momchil-velikov reviewed May 29, 2024

View reviewed changes

Fznamznon added 2 commits May 29, 2024 06:10

Add load and store type, update more places

fc6f90c

Merge branch 'main' into long-bitint-align

e7fa9ad

llvmbot added HLSL HLSL Language Support clang:openmp OpenMP related changes to Clang labels May 29, 2024

Fznamznon commented May 29, 2024

View reviewed changes

clang/test/CodeGen/arm-abi-vector.c Outdated Show resolved Hide resolved

clang/lib/CodeGen/CGExprScalar.cpp Outdated Show resolved Hide resolved

Use whole number of bytes to represent _BitInt

d00b81a

Fznamznon requested review from rjmccall and momchil-velikov July 10, 2024 17:26

rjmccall reviewed Jul 10, 2024

View reviewed changes

Fznamznon and others added 4 commits July 11, 2024 08:38

Fix remaining tests

fcd8be0

Update clang/lib/CodeGen/CodeGenTypes.cpp

f43ac1c

Co-authored-by: John McCall <[email protected]>

Add the comment.

0ec1be5

Merge branch 'long-bitint-align' of https://github.com/Fznamznon/llvm…

58846f4

…-project into long-bitint-align

Fznamznon requested a review from rjmccall July 11, 2024 16:07

Fznamznon added 2 commits July 11, 2024 10:23

Merge branch 'main' into long-bitint-align

97e1ef0

Attempt to fix HIP test

c00be8b

rjmccall reviewed Jul 11, 2024

View reviewed changes

clang/lib/CodeGen/CGExprConstant.cpp Outdated Show resolved Hide resolved

rjmccall approved these changes Jul 11, 2024

View reviewed changes

Update clang/lib/CodeGen/CGExprConstant.cpp

7956e64

Co-authored-by: John McCall <[email protected]>

Fix format

5c4b58b

Fznamznon changed the title ~~[clang] Lower _BitInt(129+) to a different type in LLVM IR~~ [clang] Use different memory layout type for _BitInt(N) in LLVM IR Jul 12, 2024

rjmccall approved these changes Jul 12, 2024

View reviewed changes

Merge branch 'main' into long-bitint-align

cd83d6b

Fznamznon merged commit 9ad72df into llvm:main Jul 15, 2024
7 checks passed

This was referenced Jul 22, 2024

_BitInt implementation does not conform to x86-64 psABI regarding padding bits #62032

Closed

[AArch64] Miscompilation of struct containing complex double and BitInt #89230

Closed

bjope mentioned this pull request Aug 29, 2024

[ubsan] Display correct runtime messages for negative _BitInt #96240

Merged

Fznamznon mentioned this pull request Dec 10, 2024

_Complex _BitInt is inconsistently/incorrectly lowered #119352

Closed

[clang] Use different memory layout type for _BitInt(N) in LLVM IR #91364

[clang] Use different memory layout type for _BitInt(N) in LLVM IR #91364

Conversation

Fznamznon commented May 7, 2024 • edited Loading

llvmbot commented May 7, 2024 • edited Loading

erichkeane left a comment

Choose a reason for hiding this comment

efriedma-quic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

momchil-velikov Jul 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

efriedma-quic commented May 7, 2024

erichkeane commented May 7, 2024

efriedma-quic commented May 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hvdijk commented May 7, 2024

rnk commented May 7, 2024

rjmccall commented May 7, 2024 • edited Loading

rjmccall commented May 7, 2024 • edited Loading

efriedma-quic commented May 7, 2024

rjmccall commented May 7, 2024 • edited Loading

Fznamznon commented May 8, 2024

Choose a reason for hiding this comment

Fznamznon May 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented May 29, 2024 • edited Loading

rjmccall left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AaronBallman commented Jul 11, 2024

rjmccall commented Jul 11, 2024

rjmccall commented Jul 11, 2024

rjmccall left a comment

Choose a reason for hiding this comment

nikic commented Jul 12, 2024

momchil-velikov commented Jul 12, 2024

rjmccall commented Jul 12, 2024

Fznamznon commented Jul 15, 2024

Fznamznon commented May 7, 2024 •

edited

Loading

llvmbot commented May 7, 2024 •

edited

Loading

momchil-velikov Jul 9, 2024 •

edited

Loading

rjmccall commented May 7, 2024 •

edited

Loading

rjmccall commented May 7, 2024 •

edited

Loading

rjmccall commented May 7, 2024 •

edited

Loading

Fznamznon May 29, 2024 •

edited

Loading

github-actions bot commented May 29, 2024 •

edited

Loading

rjmccall left a comment •

edited

Loading