Skip to content

[clang] Use different memory layout type for _BitInt(N) in LLVM IR #91364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jul 15, 2024

Conversation

Fznamznon
Copy link
Contributor

@Fznamznon Fznamznon commented May 7, 2024

There are two problems with _BitInt prior to this patch:

  1. For at least some values of N, we cannot use LLVM's iN for the type of struct elements, array elements, allocas, global variables, and so on, because the LLVM layout for that type does not match the high-level layout of _BitInt(N).
    Example: Currently for i128:128 targets correct implementation is possible either for __int128 or for _BitInt(129+) with lowering to iN, but not both, since we have now correct implementation of __int128 in place after a21abc7.
    When this happens, opaque [M x i8] types used, where M = sizeof(_BitInt(N)).
  2. LLVM doesn't guarantee any particular extension behavior for integer types that aren't a multiple of 8. For this reason, all _BitInt types are now have in-memory representation that is a whole number of bytes. I.e. for example _BitInt(17) now will have memory layout type i32.

This patch also introduces concept of load/store type and adds an API to CodeGenTypes that returns the IR type that should be used for load and store operations. This is particularly useful for the case when a _BitInt ends up having array of bytes as memory layout type. For _BitInt(N), let M = sizeof(_BitInt(N)), and let BITS = M * 8. Loads and stores of iM would both (1) produce far better code from the backends and (2) be far more optimizable by IR passes than loads and stores of [M x i8].

Fixes #85139
Fixes #83419

Currently for i128:128 targets either __int128 or a correct _BitInt(129+)
implementation possible with lowering to iN, but not both. Since we have now
correct implementation of __int128, this patch attempts to fix codegen
issues by lowering _BitInt(129+) types to an array of i8 for "memory",
similarly how it is happening for bools now.

Fixes llvm#85139
Fixes llvm#83419
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels May 7, 2024
@llvmbot
Copy link
Member

llvmbot commented May 7, 2024

@llvm/pr-subscribers-hlsl
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Mariya Podchishchaeva (Fznamznon)

Changes

Currently for i128:128 targets correct implementation is possible either for __int128 or for _BitInt(129+) with lowering to iN, but not both. Since we have now correct implementation of __int128 in place after a21abc7, this patch attempts to fix codegen issues by lowering _BitInt(129+) types to an array of i8 for "memory", similarly how it is happening for bools now.

Fixes #85139
Fixes #83419


Full diff: https://github.com/llvm/llvm-project/pull/91364.diff

6 Files Affected:

  • (modified) clang/lib/CodeGen/CGExpr.cpp (+8)
  • (modified) clang/lib/CodeGen/CGExprConstant.cpp (+12)
  • (modified) clang/lib/CodeGen/CGExprScalar.cpp (+7)
  • (modified) clang/lib/CodeGen/CodeGenTypes.cpp (+6)
  • (modified) clang/test/CodeGen/ext-int-cc.c (+1-1)
  • (modified) clang/test/CodeGen/ext-int.c (+93-4)
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index d96c7bb1e568..7e631e469a88 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -1989,6 +1989,14 @@ llvm::Value *CodeGenFunction::EmitLoadOfScalar(Address Addr, bool Volatile,
     return EmitAtomicLoad(AtomicLValue, Loc).getScalarVal();
   }
 
+  if (const auto *BIT = Ty->getAs<BitIntType>()) {
+    if (BIT->getNumBits() > 128) {
+      // Long _BitInt has array of bytes as in-memory type.
+      llvm::Type *NewTy = ConvertType(Ty);
+      Addr = Addr.withElementType(NewTy);
+    }
+  }
+
   llvm::LoadInst *Load = Builder.CreateLoad(Addr, Volatile);
   if (isNontemporal) {
     llvm::MDNode *Node = llvm::MDNode::get(
diff --git a/clang/lib/CodeGen/CGExprConstant.cpp b/clang/lib/CodeGen/CGExprConstant.cpp
index 94962091116a..98ab1e23d128 100644
--- a/clang/lib/CodeGen/CGExprConstant.cpp
+++ b/clang/lib/CodeGen/CGExprConstant.cpp
@@ -1774,6 +1774,18 @@ llvm::Constant *ConstantEmitter::emitForMemory(CodeGenModule &CGM,
     return Res;
   }
 
+  if (const auto *BIT = destType->getAs<BitIntType>()) {
+    if (BIT->getNumBits() > 128) {
+      // Long _BitInt has array of bytes as in-memory type.
+      ConstantAggregateBuilder Builder(CGM);
+      llvm::Type *DesiredTy = CGM.getTypes().ConvertTypeForMem(destType);
+      auto *CI = cast<llvm::ConstantInt>(C);
+      llvm::APInt Value = CI->getValue();
+      Builder.addBits(Value, /*OffsetInBits=*/0, /*AllowOverwrite=*/false);
+      return Builder.build(DesiredTy, /*AllowOversized*/ false);
+    }
+  }
+
   return C;
 }
 
diff --git a/clang/lib/CodeGen/CGExprScalar.cpp b/clang/lib/CodeGen/CGExprScalar.cpp
index d84531959b50..717d47d20dea 100644
--- a/clang/lib/CodeGen/CGExprScalar.cpp
+++ b/clang/lib/CodeGen/CGExprScalar.cpp
@@ -5348,6 +5348,13 @@ Value *ScalarExprEmitter::VisitVAArgExpr(VAArgExpr *VE) {
     return llvm::UndefValue::get(ArgTy);
   }
 
+  if (const auto *BIT = Ty->getAs<BitIntType>()) {
+    if (BIT->getNumBits() > 128) {
+      // Long _BitInt has array of bytes as in-memory type.
+      ArgPtr = ArgPtr.withElementType(ArgTy);
+    }
+  }
+
   // FIXME Volatility.
   llvm::Value *Val = Builder.CreateLoad(ArgPtr);
 
diff --git a/clang/lib/CodeGen/CodeGenTypes.cpp b/clang/lib/CodeGen/CodeGenTypes.cpp
index e8d75eda029e..55c618677ddb 100644
--- a/clang/lib/CodeGen/CodeGenTypes.cpp
+++ b/clang/lib/CodeGen/CodeGenTypes.cpp
@@ -114,6 +114,12 @@ llvm::Type *CodeGenTypes::ConvertTypeForMem(QualType T, bool ForBitField) {
     return llvm::IntegerType::get(getLLVMContext(),
                                   (unsigned)Context.getTypeSize(T));
 
+  if (const auto *BIT = T->getAs<BitIntType>()) {
+    if (BIT->getNumBits() > 128)
+      R = llvm::ArrayType::get(CGM.Int8Ty,
+                               (unsigned)Context.getTypeSize(T) / 8);
+  }
+
   // Else, don't map it.
   return R;
 }
diff --git a/clang/test/CodeGen/ext-int-cc.c b/clang/test/CodeGen/ext-int-cc.c
index 001e866d34b4..83f20dcb0667 100644
--- a/clang/test/CodeGen/ext-int-cc.c
+++ b/clang/test/CodeGen/ext-int-cc.c
@@ -131,7 +131,7 @@ void ParamPassing3(_BitInt(15) a, _BitInt(31) b) {}
 // are negated. This will give an error when a target does support larger
 // _BitInt widths to alert us to enable the test.
 void ParamPassing4(_BitInt(129) a) {}
-// LIN64: define{{.*}} void @ParamPassing4(ptr byval(i129) align 8 %{{.+}})
+// LIN64: define{{.*}} void @ParamPassing4(ptr byval([24 x i8]) align 8 %{{.+}})
 // WIN64: define dso_local void @ParamPassing4(ptr %{{.+}})
 // LIN32: define{{.*}} void @ParamPassing4(ptr %{{.+}})
 // WIN32: define dso_local void @ParamPassing4(ptr %{{.+}})
diff --git a/clang/test/CodeGen/ext-int.c b/clang/test/CodeGen/ext-int.c
index 4cb399d108f2..a6a632bd985d 100644
--- a/clang/test/CodeGen/ext-int.c
+++ b/clang/test/CodeGen/ext-int.c
@@ -1,12 +1,19 @@
-// RUN: %clang_cc1 -triple x86_64-gnu-linux -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,CHECK64
-// RUN: %clang_cc1 -triple x86_64-windows-pc -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,CHECK64
-// RUN: %clang_cc1 -triple i386-gnu-linux -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,LIN32
-// RUN: %clang_cc1 -triple i386-windows-pc -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,WIN32
+// RUN: %clang_cc1 -std=c23 -triple x86_64-gnu-linux -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,CHECK64
+// RUN: %clang_cc1 -std=c23 -triple x86_64-windows-pc -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,CHECK64
+// RUN: %clang_cc1 -std=c23 -triple i386-gnu-linux -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,LIN32
+// RUN: %clang_cc1 -std=c23 -triple i386-windows-pc -O3 -disable-llvm-passes -emit-llvm -o - %s | FileCheck %s --check-prefixes=CHECK,WIN32
+
+// CHECK64: %struct.S1 = type { i17, [4 x i8], [24 x i8] }
+// CHECK64: %struct.S2 = type { [40 x i8], i32, [4 x i8] }
 
 //GH62207
 unsigned _BitInt(1) GlobSize1 = 0;
 // CHECK: @GlobSize1 = {{.*}}global i1 false
 
+// CHECK64: @__const.foo.A = private unnamed_addr constant { i17, [4 x i8], <{ i8, [23 x i8] }> } { i17 1, [4 x i8] undef, <{ i8, [23 x i8] }> <{ i8 -86, [23 x i8] zeroinitializer }> }, align 8
+// CHECK64: @BigGlob = {{.*}}global <{ i8, i8, [38 x i8] }> <{ i8 -68, i8 2, [38 x i8] zeroinitializer }>, align 8
+// CHECK64: @f.p = internal global <{ i8, i8, [22 x i8] }> <{ i8 16, i8 39, [22 x i8] zeroinitializer }>, align 8
+
 void GenericTest(_BitInt(3) a, unsigned _BitInt(3) b, _BitInt(4) c) {
   // CHECK: define {{.*}}void @GenericTest
   int which = _Generic(a, _BitInt(3): 1, unsigned _BitInt(3) : 2, _BitInt(4) : 3);
@@ -62,3 +69,85 @@ void Size1ExtIntParam(unsigned _BitInt(1) A) {
   // CHECK: store i1 %[[PARAM_LOAD]], ptr %[[IDX]]
   B[2] = A;
 }
+
+#if __BITINT_MAXWIDTH__ > 128
+struct S1 {
+  _BitInt(17) A;
+  _BitInt(129) B;
+};
+
+int foo(int a) {
+  // CHECK64: %A1 = getelementptr inbounds %struct.S1, ptr %B, i32 0, i32 0
+  // CHECK64: store i17 1, ptr %A1, align 8
+  // CHECK64: %B2 = getelementptr inbounds %struct.S1, ptr %B, i32 0, i32 2
+  // CHECK64: %0 = load i32, ptr %a.addr, align 4
+  // CHECK64: %conv = sext i32 %0 to i129
+  // CHECK64: store i129 %conv, ptr %B2, align 8
+  // CHECK64: %B3 = getelementptr inbounds %struct.S1, ptr %A, i32 0, i32 2
+  // CHECK64: %1 = load i129, ptr %B3, align 8
+  // CHECK64: %conv4 = trunc i129 %1 to i32
+  // CHECK64: %B5 = getelementptr inbounds %struct.S1, ptr %B, i32 0, i32 2
+  // CHECK64: %2 = load i129, ptr %B5, align 8
+  struct S1 A = {1, 170};
+  struct S1 B = {1, a};
+  return (int)A.B + (int)B.B;
+}
+
+struct S2 {
+  _BitInt(257) A;
+  int B;
+};
+
+_BitInt(257) bar() {
+  // CHECK64: define {{.*}}void @bar(ptr {{.*}} sret([40 x i8]) align 8 %[[RET:.+]])
+  // CHECK64: %A = alloca %struct.S2, align 8
+  // CHECK64: %0 = getelementptr inbounds { <{ i8, [39 x i8] }>, i32, [4 x i8] }, ptr %A, i32 0, i32 0
+  // CHECK64: %1 = getelementptr inbounds <{ i8, [39 x i8] }>, ptr %0, i32 0, i32 0
+  // CHECK64: store i8 1, ptr %1, align 8
+  // CHECK64: %2 = getelementptr inbounds { <{ i8, [39 x i8] }>, i32, [4 x i8] }, ptr %A, i32 0, i32 1
+  // CHECK64: store i32 10000, ptr %2, align 8
+  // CHECK64: %A1 = getelementptr inbounds %struct.S2, ptr %A, i32 0, i32 0
+  // CHECK64: %3 = load i257, ptr %A1, align 8
+  // CHECK64: store i257 %3, ptr %[[RET]], align 8
+  struct S2 A = {1, 10000};
+  return A.A;
+}
+
+void TakesVarargs(int i, ...) {
+  // CHECK64: define{{.*}} void @TakesVarargs(i32
+__builtin_va_list args;
+__builtin_va_start(args, i);
+
+_BitInt(160) A = __builtin_va_arg(args, _BitInt(160));
+  // CHECK64: %[[ARG:.+]] = load i160
+  // CHECK64: store i160 %[[ARG]], ptr %A, align 8
+}
+
+_BitInt(129) *f1(_BitInt(129) *p) {
+  // CHECK64: getelementptr inbounds [24 x i8], {{.*}} i64 1
+  return p + 1;
+}
+
+char *f2(char *p) {
+  // CHECK64: getelementptr inbounds i8, {{.*}} i64 24
+  return p + sizeof(_BitInt(129));
+}
+
+auto BigGlob = (_BitInt(257))700;
+// CHECK64: define {{.*}}void @foobar(ptr {{.*}} sret([40 x i8]) align 8 %[[RET1:.+]])
+_BitInt(257) foobar() {
+  // CHECK64: %A = alloca [40 x i8], align 8
+  // CHECK64: %0 = load i257, ptr @BigGlob, align 8
+  // CHECK64: %add = add nsw i257 %0, 1
+  // CHECK64: store i257 %add, ptr %A, align 8
+  // CHECK64: %1 = load i257, ptr %A, align 8
+  // CHECK64: store i257 %1, ptr %[[RET1]], align 8
+  _BitInt(257) A = BigGlob + 1;
+  return A;
+}
+
+void f() {
+  static _BitInt(130) p = {10000};
+}
+
+#endif

@Fznamznon Fznamznon requested a review from hvdijk May 7, 2024 17:46
Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate, and will likely result in the FPGAs needing to generate extra bits here, so this is somewhat harmful in that regard.

It seems to me this is a case where we're trying to work -around an llvm bug? Should we just be fixing that instead?

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a helper somewhere to check "is this type a bitint wider than 128 bits"?

// Long _BitInt has array of bytes as in-memory type.
ConstantAggregateBuilder Builder(CGM);
llvm::Type *DesiredTy = CGM.getTypes().ConvertTypeForMem(destType);
auto *CI = cast<llvm::ConstantInt>(C);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this cast is guaranteed to succeed? At least in some cases, we emit constant expressions involving a ptrtoint. Maybe at the widths in question, that can't happen, but this deserves a comment explaining what's going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment. I'm not able to get a ptrtoint in a constant expression involving a big _BitInt.

Copy link
Collaborator

@momchil-velikov momchil-velikov Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a "small" _BitInt ? The comment starts

// LLVM type doesn't match AST type only for big enough _BitInts,

and for AArch32 and AArch64 we are going to have a non-matching LLVM types even for "small" _BitInts - for AArch32 because the ABI wants the padding bing in-memory representation to contain zero or the sign-bit, and for both we'd like to emit loads/stores in bigger chunks, e.g. i17 is a single i32 load store, as opposed to two separate accesses to i16 and i8.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case here is just going to be something like _SomeSplitBitIntType x = (unsigned long) &someVariable;. What code do we actually produce for this? Sometimes we'll be able to fall back on dynamic initialization, but that's not always an option.

Ideally, it's just invalid to do something like that. It certainly needs to be diagnosed if the integer type is narrower than the pointer, and wider is also problematic, although less so and in a different way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it seems it doesn't depend on the size of _BitInt. Using something like _SomeSplitBitIntType x = (unsigned long) &someVariable; either fails if I for example apply constexpr or falls back on dynamic initialization. So I changed the comment to make it more generic.

@efriedma-quic
Copy link
Collaborator

It seems to me this is a case where we're trying to work -around an llvm bug? Should we just be fixing that instead?

You mean, revert https://reviews.llvm.org/D86310 ? Making any changes in LLVM here is painful; I'd rather not revisit that. CC @hvdijk @rnk

@erichkeane
Copy link
Collaborator

It seems to me this is a case where we're trying to work -around an llvm bug? Should we just be fixing that instead?

You mean, revert https://reviews.llvm.org/D86310 ? Making any changes in LLVM here is painful; I'd rather not revisit that. CC @hvdijk @rnk

I didn't, no, but I hadn't seen all that conversation.

Aaron has explained a bit more of the context here, and I'm finding myself pretty confused/out of the loop. As this is effectively all codegen, I suspect you, plus your CCs are the best ones to review this. I don't see a problem except for the FPGA folks to this, though between:

1- FPGA folks rarely/ever use large types like this if they can help it.
2- The FPGA group being spun off from Intel, meaning the original stakeholders are all gone
and 3- Me no longer being at Intel

I don't think I have strong feelings here.

@efriedma-quic
Copy link
Collaborator

I don't think FPGA folks will run into any practical issue with this; this only impacts the in-memory types, and backends shouldn't really be using in-memory types for anything anyways.

@@ -1989,6 +1989,14 @@ llvm::Value *CodeGenFunction::EmitLoadOfScalar(Address Addr, bool Volatile,
return EmitAtomicLoad(AtomicLValue, Loc).getScalarVal();
}

if (const auto *BIT = Ty->getAs<BitIntType>()) {
if (BIT->getNumBits() > 128) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a number of bits >64, <=128, LLVM's iN type will have identical representation to Clang _BitInt(N) but different alignment. I think this is fine, I think nothing needs their alignment to match Clang's, but could you double-check to make sure you agree?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types remain unchanged.

@hvdijk
Copy link
Contributor

hvdijk commented May 7, 2024

Thanks for doing this, it's unfortunate that Clang is in a rather broken state with these types right now and it will be good to see improvement. I think the approach you're taking here is the only approach that will work.

@rnk
Copy link
Collaborator

rnk commented May 7, 2024

I played with the idea of using LLVM packed structs (<{ i129 }>) to represent something like this, but they don't work the way I expected them to do: https://godbolt.org/z/M6hMYYhax

LLVM DataLayout's idea of sizeof(i129) is still rounded up from 17 bytes to 32 bytes.

Using byte arrays for the in-memory type should work, so it's probably the best path forward.

@rjmccall
Copy link
Contributor

rjmccall commented May 7, 2024

Hmm. I think this is actually pretty different from the bool pattern. Suppose we're talking about _BitInt(N). Let BYTES := sizeof(_BitInt(N)), and let BITS := BYTES * 8.

The problem being presented here is this:

  1. For at least some values of N, we cannot use LLVM's iN for the type of struct elements, array elements, allocas, global variables, and so on, because the LLVM layout for that type does not match the high-level layout of _BitInt(N). The only available type that does match the memory layout appears to be [BYTES x i8].

However, it doesn't follow from the need to use [BYTES x i8] for memory layout that we have to use [BYTES x i8] for loads and stores. IIUC, loads and stores of both iN and iBITS are in fact required to only touch BYTES bytes and so should be valid. It is near-certain that loads and stores of either of those types would both (1) produce far better code from the backends and (2) be far more optimizable by IR passes than loads and stores of [BYTES x i8].

bool does run into (1) because of targets like PPC where sizeof(bool) == 4. However, we still use i8 as the in-memory type for bool on other targets. Partly, this is to discourage portability bugs where people write IR-gen code that doesn't handle the PPC pattern. But IIRC the main reason is actually to solve this other problem:

  1. LLVM doesn't guarantee any particular extension behavior for integer types that aren't a multiple of 8, but ABIs do generally require objects of type bool to have all bits valid.

I expect that problem (2) also applies to _BitInt.

The upshot is that code like _BitInt(129) x = v; needs to be emitted something like this:

  %x = alloca [12 x i8]      # assuming for the sake of argument that sizeof(_BitInt(129)) == 12
  %storedv = sext i129 %v to i192  # or zext depending on signedness
  store i192 %storedv, ptr %x

Edit: I originally defined BYTES as ceil(N/8), but it clearly has to be sizeof(_BitInt(N)), and I expect the ABI expects extension out to that size as well.

@rjmccall
Copy link
Contributor

rjmccall commented May 7, 2024

If you want to do things that way, you will need to (1) generalize CodeGenTypes with a new API that will return this load/store type when applicable and (2) look at all the places we call ConvertTypeForMem, EmitToMemory, and EmitFromMemory to make sure they do the right things.

You definitely should not be hard-coding 128 in a bunch of places. The load/store type should always be iBITS, and the memory type should either be iBITS or [BYTES x i8] depending on whether the former has the right layout characteristics in the LLVM data layout.

@efriedma-quic
Copy link
Collaborator

You're suggesting we should fork ConvertTypeForMem into two functions? So there's actually three types: the "register" type, the "load/store" type, and the "in-memory" type. I guess that makes sense from a theoretical perspective, but... as a practical matter, I'm not sure how many places need to call the proposed "ConvertTypeForLoadStore".

In EmitLoadOfScalar(), instead of checking for BitInt, you just unconditionally do Addr = Addr.withElementType(ConvertTypeForLoadStore(Ty));. Logical cleanup, I guess. In EmitStoreOfScalar, you don't really need the interface because you can assume the result of EmitToMemory() has the load/store type. And then... what else calls it?

@rjmccall
Copy link
Contributor

rjmccall commented May 7, 2024

My experience is that compiler writers are really good at hacking in special cases to make their test cases work and really bad at recognizing that their case isn't as special as they think. There are three types already called out for special treatment in ConvertTypeForMem, of which two are handled in EmitFromMemory and only one is handled in EmitToMemory. I want to set up a pattern that the next person with this sort of problem can follow. It doesn't have to be exactly what I suggested above, but it should be a real pattern.

@Fznamznon
Copy link
Contributor Author

Thank you everyone for the feedback. I'm working on applying.

if (const auto *BIT = Ty->getAs<BitIntType>()) {
if (BIT->getNumBits() > 128) {
// Long _BitInt has array of bytes as in-memory type.
llvm::Type *NewTy = ConvertType(Ty);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we call calling ConvertTypeForMem here?

Copy link
Contributor Author

@Fznamznon Fznamznon May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to load not array, but iN, so ConvertType here was intentional. However I'm updating this patch soon, it will be using special load/store type whose idea is described by #91364 (comment) .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. It looks close to what we are trying to do with #93495, which is:

  • create in-memory representations according to the target ABI
  • improve efficiency of loads/stores, e.g. load/store of i18 in LLVM must touch just 3 bytes, so a compiler would emit one 16-bit load and one 8-bit load, but if i18 comes from _BitInt(18) then a single 32-bit load would work better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch was mostly intended to fix codegen issues when it comes to big _BitInt types (>128 for 64bit targets), however I'm adding new idea of load/store type, so that seems close.

@llvmbot llvmbot added HLSL HLSL Language Support clang:openmp OpenMP related changes to Clang labels May 29, 2024
Copy link

github-actions bot commented May 29, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally looking great, and I think it's ready to go as soon as you can finish the tests. (You said you weren't able to update all the tests — did you have questions about the remaining tests?)

I did have a thought, though. Are we confident that the in-memory layout that LLVM is using for these large integer types matches the layout specified by the ABI? I know this patch makes the overall sizes match, but there's also an endianness question. When LLVM stores an i96, I assume it always stores them using the overall endianness of the target; for example, on i386, it might do three 32-bit stores with the low 32 bits at offset 0, the middle 32 bits at offset 4, and the high 32 bits at offset 8. I just want to make sure that the ABI specification for _BitInt always matches that. In particular, I'm worried that it might do some middle-endian thing where it breaks the integer into chunks and then stores those chunks in little-endian order even on a big-endian machine. (That is generally the right thing to do for BigInt types because most arithmetic operations access the chunks in little-endian order, and doing adjacent memory accesses in increasing order is generally more architecture-friendly.)

@@ -107,17 +107,52 @@ llvm::Type *CodeGenTypes::ConvertTypeForMem(QualType T, bool ForBitField) {
return llvm::IntegerType::get(FixedVT->getContext(), BytePadded);
}

// If this is a bool type, or a bit-precise integer type in a bitfield
// representation, map this integer to the target-specified size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this comment; we just need to update it a little:

  // If T is _Bool or a _BitInt type, ConvertType will produce an IR type
  // with the exact semantic bit-width of the AST type; for example,
  // _BitInt(17) will turn into i17.  In memory, however, we need to store
  // such values extended to their full storage size as decided by AST
  // layout; this is an ABI requirement.  Ideally, we would always use an
  // integer type that's just the bit-size of the AST type; for example, if
  // sizeof(_BitInt(17)) == 4, _BitInt(17) would turn into i32.  That is what's
  // returned by convertTypeForLoadStore.  However, that type does not
  // always satisfy the size requirement on memory representation types
  // describe above.  For example, a 32-bit platform might reasonably set
  // sizeof(_BitInt(65)) == 12, but i96 is likely to have to have an alloc size
  // of 16 bytes in the LLVM data layout.  In these cases, we simply return
  // a byte array of the appropriate size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks.

@AaronBallman
Copy link
Collaborator

This is generally looking great, and I think it's ready to go as soon as you can finish the tests. (You said you weren't able to update all the tests — did you have questions about the remaining tests?)

I did have a thought, though. Are we confident that the in-memory layout that LLVM is using for these large integer types matches the layout specified by the ABI? I know this patch makes the overall sizes match, but there's also an endianness question. When LLVM stores an i96, I assume it always stores them using the overall endianness of the target; for example, on i386, it might do three 32-bit stores with the low 32 bits at offset 0, the middle 32 bits at offset 4, and the high 32 bits at offset 8. I just want to make sure that the ABI specification for _BitInt always matches that. In particular, I'm worried that it might do some middle-endian thing where it breaks the integer into chunks and then stores those chunks in little-endian order even on a big-endian machine. (That is generally the right thing to do for BigInt types because most arithmetic operations access the chunks in little-endian order, and doing adjacent memory accesses in increasing order is generally more architecture-friendly.)

FWIW, I was chasing down ABI documents yesterday, and found:

x86-64 (https://gitlab.com/x86-psABIs/x86-64-ABI):

_BitInt(N) types are signed by default, and unsigned _BitInt(N) types
are unsigned.
• _BitInt(N) types are stored in little-endian order in memory. Bits in each byte
are allocated from right to left.
• For N <= 64, they have the same size and alignment as the smallest of (signed and
unsigned) char, short, int, long and long long types that can contain them.
• For N > 64, they are treated as struct of 64-bit integer chunks. The number of
chunks is the smallest number that can contain the type. _BitInt(N) types are
byte-aligned to 64 bits. The size of these types is the smallest multiple of the 64-bit
chunks greater than or equal to N.
• The value of the unused bits beyond the width of the _BitInt(N) value but within
the size of the _BitInt(N) are unspecified when stored in memory or register.

ARM 32-bit (https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst):

_BitInt(N <= 64) 	Smallest of the signed Fundamental Integral Data Types where byte-size*8 >= N. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. Non-significant bits within the Machine Type are sign-extended.
unsigned _BitInt(N <= 64) 	Smallest of the unsigned Fundamental Integral Data Types where byte-size*8 >= N. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. Non-significant bits within the Machine Type are zero-extended.
_BitInt(N > 64) 	Allocated as if unsigned int64_t[M] array where M*64 >= N. Last element contains sign bit. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. The lower addressed double-word contains the least significant bits of the type on a little-endian view and the most significant bits of the type in a big-endian view. Non-significant bits within the last double-word are sign-extended.
unsigned _Bitint(N > 64) 	Allocated as if unsigned int64_t[M] where M*64 >= N. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. The lower addressed double-word contains the least significant bits of the type on a little-endian view and the most significant bits of the type in a big-endian view. Non-significant bits within the last double-word are zero-extended.

ARM 64-bit (https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst):

_BitInt(N <= 128) 	Smallest of the signed Fundamental Integral Data Types where byte-size*8 >= N. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. Non-significant bits within the Machine Type are unspecified.
unsigned _BitInt(N <= 128) 	Smallest of the unsigned Fundamental Integral Data Types where byte-size*8 >= N. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. Non-significant bits within the Machine Type are unspecified.
_BitInt(N > 128) 	Mapped as if unsigned __int128[M] array where M*128 >= N. Last element contains sign bit. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. The lower addressed quad-word contains the least significant bits of the type on a little-endian view and the most significant bits of the type in a big-endian view. Non-significant bits within the last quad-word are unspecified.
unsigned _Bitint(N > 128) 	Mapped as if unsigned __int128[M] where M*128 >= N. 	C2x Only. Significant bits are allocated from least significant end of the Machine Type. The lower addressed quad-word contains the least significant bits of the type on a little-endian view and the most significant bits of the type in a big-endian view. Non-significant bits within the last quad-word are unspecified.

The latest RISC-V, LoongArch, and CSKY ABI documents I could find did not mention _BitInt. I could not find any modern ABI document for PowerPC (power.org seems to no longer be about powerpc), but the one on Internet Archive also does not mention _BitInt.

@Fznamznon Fznamznon requested a review from rjmccall July 11, 2024 16:07
@rjmccall
Copy link
Contributor

Okay, so x86_64 describes it in byte terms and says they're little-endian, which is consistent with the overall target. Interestingly, it does not guarantee the content of the excess bits. The code-generation in this patch is consistent with that: the extension we do is unnecessary but allowed, and then we truncate it away after load. If we ever add some way to tell the backend that a truncation is known to be reversing a sign/zero-extension, we'll need to not set it on this target.

32-bit and 64-bit ARM describe it in terms of smaller units, but the units are expressly laid out according to the overall endianness of the target, which composes to mean that the bytes overall are also laid out according to that endianness.

@rjmccall
Copy link
Contributor

Given all that, I feel pretty comfortable relying on using LLVM's i96 stores and so on. I do worry some that we're eventually going to run into a target where the _BitInt ABI does not match what LLVM wants to generate for i96 load/store, but we should be able to generalize this so that targets can override the _BitInt operations pretty easily.

Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nikic
Copy link
Contributor

nikic commented Jul 12, 2024

Okay, so x86_64 describes it in byte terms and says they're little-endian, which is consistent with the overall target. Interestingly, it does not guarantee the content of the excess bits. The code-generation in this patch is consistent with that: the extension we do is unnecessary but allowed, and then we truncate it away after load. If we ever add some way to tell the backend that a truncation is known to be reversing a sign/zero-extension, we'll need to not set it on this target.

FYI this already exists in the form of trunc nuw / trunc nsw. (Though it's not fully optimized yet.)

@Fznamznon Fznamznon changed the title [clang] Lower _BitInt(129+) to a different type in LLVM IR [clang] Use different memory layout type for _BitInt(N) in LLVM IR Jul 12, 2024
@momchil-velikov
Copy link
Collaborator

This solves 5-6 issues we had downstream, many thanks!

@rjmccall
Copy link
Contributor

Okay, so x86_64 describes it in byte terms and says they're little-endian, which is consistent with the overall target. Interestingly, it does not guarantee the content of the excess bits. The code-generation in this patch is consistent with that: the extension we do is unnecessary but allowed, and then we truncate it away after load. If we ever add some way to tell the backend that a truncation is known to be reversing a sign/zero-extension, we'll need to not set it on this target.

FYI this already exists in the form of trunc nuw / trunc nsw. (Though it's not fully optimized yet.)

Ah, neat. Mariya, would you mind looking into setting this properly on the truncates we're doing here? It'd be fine to do that as a follow-up; no need to hold up this PR for it. You'll need some kind of target hook to tell us whether to set it or not. Probably that ought to go in the Basic TargetInfo just so all of the target-specific ABI configuration is done in one place.

@Fznamznon
Copy link
Contributor Author

Mariya, would you mind looking into setting this properly on the truncates we're doing here? It'd be fine to do that as a follow-up; no need to hold up this PR for it. You'll need some kind of target hook to tell us whether to set it or not. Probably that ought to go in the Basic TargetInfo just so all of the target-specific ABI configuration is done in one place.

Sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category HLSL HLSL Language Support
Projects
Archived in project