-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[ubsan] Display correct runtime messages for negative _BitInt #96240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,6 +41,7 @@ | |
#include "llvm/IR/MatrixBuilder.h" | ||
#include "llvm/Passes/OptimizationLevel.h" | ||
#include "llvm/Support/ConvertUTF.h" | ||
#include "llvm/Support/Endian.h" | ||
#include "llvm/Support/MathExtras.h" | ||
#include "llvm/Support/Path.h" | ||
#include "llvm/Support/SaveAndRestore.h" | ||
|
@@ -64,6 +65,22 @@ static llvm::cl::opt<bool> ClSanitizeGuardChecks( | |
"ubsan-guard-checks", llvm::cl::Optional, | ||
llvm::cl::desc("Guard UBSAN checks with `llvm.allow.ubsan.check()`.")); | ||
|
||
//===--------------------------------------------------------------------===// | ||
// Defines for metadata | ||
//===--------------------------------------------------------------------===// | ||
|
||
// Those values are crucial to be the SAME as in ubsan runtime library. | ||
enum VariableTypeDescriptorKind : uint16_t { | ||
/// An integer type. | ||
TK_Integer = 0x0000, | ||
/// A floating-point type. | ||
TK_Float = 0x0001, | ||
/// An _BitInt(N) type. | ||
TK_BitInt = 0x0002, | ||
/// Any other type. The value representation is unspecified. | ||
TK_Unknown = 0xffff | ||
}; | ||
|
||
//===--------------------------------------------------------------------===// | ||
// Miscellaneous Helper Methods | ||
//===--------------------------------------------------------------------===// | ||
|
@@ -3298,22 +3315,40 @@ LValue CodeGenFunction::EmitPredefinedLValue(const PredefinedExpr *E) { | |
/// { i16 TypeKind, i16 TypeInfo } | ||
/// \endcode | ||
/// | ||
/// followed by an array of i8 containing the type name. TypeKind is 0 for an | ||
/// integer, 1 for a floating point value, and -1 for anything else. | ||
/// followed by an array of i8 containing the type name with extra information | ||
/// for BitInt. TypeKind is TK_Integer(0) for an integer, TK_Float(1) for a | ||
/// floating point value, TK_BitInt(2) for BitInt and TK_Unknown(0xFFFF) for | ||
/// anything else. | ||
llvm::Constant *CodeGenFunction::EmitCheckTypeDescriptor(QualType T) { | ||
// Only emit each type's descriptor once. | ||
if (llvm::Constant *C = CGM.getTypeDescriptorFromMap(T)) | ||
return C; | ||
|
||
uint16_t TypeKind = -1; | ||
uint16_t TypeKind = TK_Unknown; | ||
uint16_t TypeInfo = 0; | ||
bool IsBitInt = false; | ||
|
||
if (T->isIntegerType()) { | ||
TypeKind = 0; | ||
TypeKind = TK_Integer; | ||
TypeInfo = (llvm::Log2_32(getContext().getTypeSize(T)) << 1) | | ||
(T->isSignedIntegerType() ? 1 : 0); | ||
// Follow suggestion from discussion of issue 64100. | ||
// So we can write the exact amount of bits in TypeName after '\0' | ||
// making it <diagnostic-like type name>.'\0'.<32-bit width>. | ||
if (T->isSignedIntegerType() && T->getAs<BitIntType>()) { | ||
// Do a sanity checks as we are using 32-bit type to store bit length. | ||
assert(getContext().getTypeSize(T) > 0 && | ||
" non positive amount of bits in __BitInt type"); | ||
assert(getContext().getTypeSize(T) <= 0xFFFFFFFF && | ||
" too many bits in __BitInt type"); | ||
|
||
// Redefine TypeKind with the actual __BitInt type if we have signed | ||
// BitInt. | ||
TypeKind = TK_BitInt; | ||
IsBitInt = true; | ||
} | ||
} else if (T->isFloatingType()) { | ||
TypeKind = 1; | ||
TypeKind = TK_Float; | ||
TypeInfo = getContext().getTypeSize(T); | ||
} | ||
|
||
|
@@ -3324,6 +3359,20 @@ llvm::Constant *CodeGenFunction::EmitCheckTypeDescriptor(QualType T) { | |
DiagnosticsEngine::ak_qualtype, (intptr_t)T.getAsOpaquePtr(), StringRef(), | ||
StringRef(), std::nullopt, Buffer, std::nullopt); | ||
|
||
if (IsBitInt) { | ||
// The Structure is: 0 to end the string, 32 bit unsigned integer in target | ||
// endianness, zero. | ||
char S[6] = {'\0', '\0', '\0', '\0', '\0', '\0'}; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How is the trailing zero byte used? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is not used explicitly and serves as a guard byte. The idea is to have 2 consecutive zero bytes at the end of the line. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably a bit too late to the party and I struggle with following discussion history on GitHub, but what was the reasoning behind storing the bit width encoded in the name of the type versus parsing out the bit width from the name of the type itself in the runtime (e.g. parsing out the value 37 from '_BitInt(37)' instead of extracting the literal 37 from '_BitInt(37)\x00\x25\x00\x00\x00' )? I would imagine that the former would be easier since there wouldn't be any platform dependence when traversing the string, and I don't think it would incur too much of an extra performance hit (the solution might look a bit kludgey though) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was initial idea proposed in #64100.
|
||
const auto *EIT = T->castAs<BitIntType>(); | ||
uint32_t Bits = EIT->getNumBits(); | ||
llvm::support::endian::write32(S + 1, Bits, | ||
getTarget().isBigEndian() | ||
? llvm::endianness::big | ||
: llvm::endianness::little); | ||
StringRef Str = StringRef(S, sizeof(S) / sizeof(decltype(S[0]))); | ||
Buffer.append(Str); | ||
} | ||
|
||
llvm::Constant *Components[] = { | ||
Builder.getInt16(TypeKind), Builder.getInt16(TypeInfo), | ||
llvm::ConstantDataArray::getString(getLLVMContext(), Buffer) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
// REQUIRES: x86-registered-target | ||
// RUN: %clang -Wno-constant-conversion -Wno-array-bounds -Wno-division-by-zero -Wno-shift-negative-value -Wno-shift-count-negative -Wno-int-to-pointer-cast -fsanitize=array-bounds,enum,float-cast-overflow,integer-divide-by-zero,implicit-unsigned-integer-truncation,implicit-signed-integer-truncation,implicit-integer-sign-change,unsigned-integer-overflow,signed-integer-overflow,shift-base,shift-exponent -O0 -S -emit-llvm -o - %s | FileCheck %s | ||
|
||
// The runtime test checking the _BitInt ubsan feature is located in compiler-rt/test/ubsan/TestCases/Integer/bit-int.c | ||
|
||
#include <stdint.h> | ||
#include <stdio.h> | ||
Comment on lines
+6
to
+7
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please replace these includes with the declarations needed from them -- this makes the tests more stable because they won't be relying on whatever system CRT headers are installed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. Already working on it. Those are not really needed for non compiler-rt tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Created #104462. Testing it. |
||
|
||
uint32_t float_divide_by_zero() { | ||
float f = 1.0f / 0.0f; | ||
// CHECK: constant { i16, i16, [8 x i8] } { i16 1, i16 32, [8 x i8] c"'float'\00" } | ||
_BitInt(37) r = (_BitInt(37))f; | ||
// CHECK: constant { i16, i16, [20 x i8] } { i16 2, i16 13, [20 x i8] c"'_BitInt(37)'\00%\00\00\00\00\00" } | ||
return r; | ||
} | ||
|
||
uint32_t integer_divide_by_zero() __attribute__((no_sanitize("memory"))) { | ||
_BitInt(37) x = 1 / 0; | ||
// CHECK: constant { i16, i16, [32 x i8] } { i16 0, i16 10, [32 x i8] c"'uint32_t' (aka 'unsigned int')\00" } | ||
return x; | ||
} | ||
|
||
uint32_t implicit_unsigned_integer_truncation() { | ||
unsigned _BitInt(37) x = 2U; | ||
x += float_divide_by_zero(); | ||
x += integer_divide_by_zero(); | ||
x = x + 0xFFFFFFFFFFFFFFFFULL; | ||
// CHECK: constant { i16, i16, [23 x i8] } { i16 0, i16 12, [23 x i8] c"'unsigned _BitInt(37)'\00" } | ||
uint32_t r = x & 0xFFFFFFFF; | ||
return r; | ||
} | ||
|
||
uint32_t array_bounds() { | ||
_BitInt(37) x[4]; | ||
_BitInt(37) y = x[10]; | ||
// CHECK: constant { i16, i16, [17 x i8] } { i16 -1, i16 0, [17 x i8] c"'_BitInt(37)[4]'\00" } | ||
return (uint32_t)y; | ||
} | ||
|
||
uint32_t float_cast_overflow() { | ||
float a = 100000000.0f; | ||
_BitInt(7) b = (_BitInt(7))a; | ||
// CHECK: constant { i16, i16, [19 x i8] } { i16 2, i16 7, [19 x i8] c"'_BitInt(7)'\00\07\00\00\00\00\00" } | ||
return b; | ||
} | ||
|
||
_BitInt(13) implicit_signed_integer_truncation() { | ||
_BitInt(73) x = (_BitInt(73)) ~((~0UL) >> 1); | ||
return x; | ||
// CHECK: constant { i16, i16, [20 x i8] } { i16 2, i16 {{([[:xdigit:]]{2})}}, [20 x i8] c"'_BitInt(73)'\00I\00\00\00\00\00" } | ||
// CHECK: constant { i16, i16, [20 x i8] } { i16 2, i16 9, [20 x i8] c"'_BitInt(13)'\00\0D\00\00\00\00\00" } | ||
} | ||
|
||
uint32_t negative_shift1(unsigned _BitInt(37) x) | ||
__attribute__((no_sanitize("memory"))) { | ||
_BitInt(9) c = -2; | ||
return x >> c; | ||
// CHECK: constant { i16, i16, [19 x i8] } { i16 2, i16 9, [19 x i8] c"'_BitInt(9)'\00\09\00\00\00\00\00" } | ||
} | ||
|
||
uint32_t negative_shift2(unsigned _BitInt(37) x) | ||
__attribute__((no_sanitize("memory"))) { | ||
_BitInt(17) c = -2; | ||
return x >> c; | ||
// CHECK: constant { i16, i16, [20 x i8] } { i16 2, i16 11, [20 x i8] c"'_BitInt(17)'\00\11\00\00\00\00\00" } | ||
} | ||
|
||
uint32_t negative_shift3(unsigned _BitInt(37) x) | ||
__attribute__((no_sanitize("memory"))) { | ||
_BitInt(34) c = -2; | ||
return x >> c; | ||
// CHECK: constant { i16, i16, [20 x i8] } { i16 2, i16 13, [20 x i8] c"'_BitInt(34)'\00\22\00\00\00\00\00" } | ||
} | ||
|
||
uint32_t negative_shift5(unsigned _BitInt(37) x) | ||
__attribute__((no_sanitize("memory"))) { | ||
_BitInt(68) c = -2; | ||
return x >> c; | ||
// CHECK: constant { i16, i16, [20 x i8] } { i16 2, i16 {{([[:xdigit:]]{2})}}, [20 x i8] c"'_BitInt(68)'\00D\00\00\00\00\00" } | ||
} | ||
|
||
int main(int argc, char **argv) { | ||
// clang-format off | ||
uint64_t result = | ||
1ULL + | ||
implicit_unsigned_integer_truncation() + | ||
(uint32_t)array_bounds() + | ||
float_cast_overflow() + | ||
(uint64_t)implicit_signed_integer_truncation() + | ||
negative_shift1(5) + | ||
negative_shift2(5) + | ||
negative_shift3(5) + | ||
negative_shift5(5); | ||
// clang-format on | ||
printf("%u\n", (uint32_t)(result & 0xFFFFFFFF)); | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -103,6 +103,13 @@ class TypeDescriptor { | |
/// representation is that of bitcasting the floating-point value to an | ||
/// integer type. | ||
TK_Float = 0x0001, | ||
/// An _BitInt(N) type. Lowest bit is 1 for a signed value, 0 for an | ||
/// unsigned value. Remaining bits are log_2(bit_width). The value | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. bit_width of what? The main problem is that the _BitInt ABI is only defined in the x86-64 and aarch64 psABIs, for i686 the word was that the ABI should be whatever GCC implemented, but nobody updated the ia32 psABI yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's an interesting problem. I have not changed the way _BitInts are encoded in the memory. My understanding is that might not be right approach to have too many meanings embedded into one field depending on the context. Please note we have only 15 bits here (lowest bit is a sign). Standard requires us to support N up to BITINT_MAXWIDTH which is quite high. It means it will not fit if we are to store the number of limbs as it is being suggested. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The new thing in this patch is that the size of the _BitInt type being described is stored after the type string. It can be read using Thus, to get the size of the type being described one should no longer use The new behavior of And unfortunatly there isn't much information about how those non-inlined values are store. The
Now we for example get things like |
||
/// representation is the integer itself if it fits into a ValueHandle, and | ||
/// a pointer to the integer otherwise. TypeName contains the true width | ||
/// of the type for the signed _BitInt(N) type stored after zero bit after | ||
/// TypeName as 32-bit unsigned integer. | ||
TK_BitInt = 0x0002, | ||
/// Any other type. The value representation is unspecified. | ||
TK_Unknown = 0xffff | ||
}; | ||
|
@@ -113,10 +120,15 @@ class TypeDescriptor { | |
return static_cast<Kind>(TypeKind); | ||
} | ||
|
||
bool isIntegerTy() const { return getKind() == TK_Integer; } | ||
bool isIntegerTy() const { | ||
return getKind() == TK_Integer || getKind() == TK_BitInt; | ||
} | ||
bool isBitIntTy() const { return getKind() == TK_BitInt; } | ||
|
||
bool isSignedIntegerTy() const { | ||
return isIntegerTy() && (TypeInfo & 1); | ||
} | ||
bool isSignedBitIntTy() const { return isBitIntTy() && (TypeInfo & 1); } | ||
bool isUnsignedIntegerTy() const { | ||
return isIntegerTy() && !(TypeInfo & 1); | ||
} | ||
|
@@ -125,6 +137,25 @@ class TypeDescriptor { | |
return 1 << (TypeInfo >> 1); | ||
} | ||
|
||
const char *getBitIntBitCountPointer() const { | ||
DCHECK(isBitIntTy()); | ||
DCHECK(isSignedBitIntTy()); | ||
// Scan Name for zero and return the next address | ||
const char *p = getTypeName(); | ||
while (*p != '\0') | ||
++p; | ||
// Return the next address | ||
return p + 1; | ||
} | ||
|
||
unsigned getIntegerBitCount() const { | ||
DCHECK(isIntegerTy()); | ||
if (isSignedBitIntTy()) | ||
return *reinterpret_cast<const u32 *>(getBitIntBitCountPointer()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will not work on strict alignment targets. I'd suggest instead of trying to reinterpret the bits memcpy them into u32 and return that, the compilers will optimize that properly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well. I do agree with this statement, however do we have a way to know whether target has a strict alignment or not? Preferably on the define level or you would suggest use always use memcpy? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For example Value::getFloatValue() in ubsan_value.cpp is using internal_memcpy from sanitizer_libc. Maybe that is an option here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it isn't performance critical, guess using internal_memcpy is an option as well. |
||
else | ||
return getIntegerBitWidth(); | ||
} | ||
|
||
bool isFloatTy() const { return getKind() == TK_Float; } | ||
unsigned getFloatBitWidth() const { | ||
CHECK(isFloatTy()); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
// RUN: %clang -Wno-constant-conversion -Wno-array-bounds -Wno-division-by-zero -Wno-shift-negative-value -Wno-shift-count-negative -Wno-int-to-pointer-cast -O0 -fsanitize=alignment,array-bounds,bool,float-cast-overflow,implicit-integer-sign-change,implicit-signed-integer-truncation,implicit-unsigned-integer-truncation,integer-divide-by-zero,nonnull-attribute,null,nullability-arg,nullability-assign,nullability-return,pointer-overflow,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,unreachable,unsigned-integer-overflow,unsigned-shift-base,vla-bound %s -o %t1 && %run %t1 2>&1 | FileCheck %s | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can merge these tests into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure. Previous reviewer asked me to move those cases into separate file and it makes sense to keep positive and negative cases separated. |
||
|
||
#include <stdint.h> | ||
#include <stdio.h> | ||
|
||
// In this test there is an expectation of assignment of _BitInt not producing any output. | ||
uint32_t nullability_arg(_BitInt(37) *_Nonnull x) | ||
__attribute__((no_sanitize("address"))) | ||
__attribute__((no_sanitize("memory"))) { | ||
_BitInt(37) y = *(_BitInt(37) *)&x; | ||
return (y > 0) ? y : 0; | ||
} | ||
|
||
// In this test there is an expectation of ubsan not triggeting on returning random address which is inside address space of the process. | ||
_BitInt(37) nonnull_attribute(__attribute__((nonnull)) _BitInt(37) * x) | ||
__attribute__((no_sanitize("address"))) | ||
__attribute__((no_sanitize("memory"))) { | ||
return *(_BitInt(37) *)&x; | ||
} | ||
|
||
// In this test there is an expectation of assignment of uint32_t from "invalid" _BitInt is not producing any output. | ||
uint32_t nullability_assign(_BitInt(7) * x) | ||
__attribute__((no_sanitize("address"))) | ||
__attribute__((no_sanitize("memory"))) { | ||
_BitInt(7) *_Nonnull y = x; | ||
int32_t r = *(_BitInt(7) *)&y; | ||
return (r > 0) ? r : 0; | ||
} | ||
|
||
// In those examples the file is expected to compile & run with no diagnostics | ||
// CHECK-NOT: runtime error: | ||
|
||
int main(int argc, char **argv) { | ||
// clang-format off | ||
uint64_t result = | ||
1ULL + | ||
nullability_arg((_BitInt(37) *)argc) + | ||
((uint64_t)nonnull_attribute((_BitInt(37) *)argc) & 0xFFFFFFFF) + | ||
nullability_assign((_BitInt(7) *)argc); | ||
// clang-format on | ||
printf("%u\n", (uint32_t)(result & 0xFFFFFFFF)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this whole patch limited to fixing the non-power-of-2 problem for signed BitInt? I couldn't see any test involving unsigned BitInt, and the check for signed here makes me think that we still got the same bug for type descriptors for unsigned BitInt.
If that is true, was there any specific reason for not fixing the type descriptors for unsigned BitInt as well directly?
If for example the reasoning is that the runtimes currently do not try to print the size anyway for any failures involving unsigned BitInt, maybe it had been worth mentioning that here. Because now it just looks broken to me that we still can't provide a correct type descriptor for unsigned BitInt. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole patch is limited to signed _BitInt only. Unsigned BitInts are not affected by the patch and did not have an issue to begin with. Please see the issue: #64100. The reason of the patch is to save and get an information about sign bit location. Unsigned _BitInt has no sign bit (To be very specific: it is implicit and always zero) and thus nothing needs to be located.
PS: Patch affects both power-of-2 and non-power-of-2 signed BitInts in spite the fact power-of-2 BitInts were not affected by the initial issue. This is done for the sake of uniformity and for the sake of not putting extra checks into runtime part.