-
Notifications
You must be signed in to change notification settings - Fork 79
Support bounds widening for conditionals within switch statements #805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added support for widenening bounds for nt_array_ptr dereferences in while loops. For example, "while (*p) {}" would widen the bounds of p upon entry to the loop. In order to handle loops we initialize the In and Out sets of the entry block as empty. This is done so that we can handle loop back edges and unconditional jumps.
Added support for widenening bounds for nt_array_ptr dereferences in for loops. For example, "for (; *p; ) {}" would widen the bounds of p upon entry to the loop.
Added support for widenening bounds for nt_array_ptr dereferences in switch statements. For example, "switch (*p) {case 'a': break;}" would widen the bounds of p inside the "case 'a'" block.
This follows from the PR on for loops: #804 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have suggestions for improving and generalizing the handling of case statements.
Could you open a GitHub issue to allow widening for the default case in some circumstances? That should be a separate change. If we establish that another label in a switch statement tests for 0, then the default case will handle non-zero case, and the bounds can be widened there. Programmers will use all the different forms of case statements and may depend on that behavior.
clang/lib/Sema/BoundsAnalysis.cpp
Outdated
// Check if this is a switch case and whether it tests for a null | ||
// terminator, like "case '\0'". We cannot widen the bounds for the null | ||
// terminator case. | ||
bool IsSwitchCaseNullTerm = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that instead of checking for null cases and excluding them, you should check for non-null cases and include only them. This makes the analysis more robust if you miss detecting a null case. In the test you do below, the prgrammer could have written a 0 literal, instead of a character literal. This would have result in a widened bounds, which is incorrect.
You also need to generalize this to look for constant expressions as the label in the switch statement You can use the clang code for determining whether an expression is an integer constant expression. According to the C11 spec (section 6.8.1) a case statement's label is a constant expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My updated PR generalizes the checking of non-null case labels. Also added more unit tests for switch-cases.
Generalized the checking of non-null switch cases by checking for IntegerConstantExpr and ConstantExpr. Also added more unit tests for widening inside switch cases.
Opened issue: #818 |
Handle GNU range case statements is of the form "case LHS ... RHS".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. I have a few suggestions. I am concerned that for switch statements where the controlling expression has an integer type that requires more bits than int
, the code may lead to a compiler assertion.
clang/lib/Sema/BoundsAnalysis.cpp
Outdated
|
||
llvm::APSInt IntVal (Ctx.getTypeSize(Ctx.IntTy), 0); | ||
|
||
if (const auto *CE = dyn_cast_or_null<ConstantExpr>(E)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you casting E to a ConstantExpr? Can't you just use the inner test?
clang/lib/Sema/BoundsAnalysis.cpp
Outdated
|
||
bool BoundsAnalysis::CheckIsSwitchCaseNull(ElevatedCFGBlock *EB) { | ||
if (const auto *CS = dyn_cast_or_null<CaseStmt>(EB->Block->getLabel())) { | ||
llvm::APSInt Zero (Ctx.getTypeSize(Ctx.IntTy), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this may not be the correct type for zero. The C11 specification says in 6.8.4.2 that the type of the constant expression in a case expression shall be an integer constant expression. There are a number of different sizes of integer types in C, which are distinct from the int
type. The C11 specification says that integer promotions are applied to the controlling expression in the switch statement, and that the label expressions of the case statements are promoted to match that type.
I believe that the comparison functions for APSInt may expect the integers to have the same size. It's been a while since I used the API, so I amy be wrong. In that case, the test for inclusion of zero, as well as the uses of APSInt in GetSwitchCaseVal could both lead to internal compiler assertion failures, if the integer requires more bits than IntTy.
My suggestion is that you test out switches that use an unsigned int
, as well as an integer type larger than the default size of int
. I believe the default promotions rules for char
will widen everything out to int
, which is why the code is working for your test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added new tests for unsigned, UINT_MAX, INT_MAX and INT_MIN case labels. The compiler and GetSwitchCaseVal seem to handle these correctly. For example:
case 999999999999999999999999999
is internally represented as 11515845246265065471ULL
and we return detect that this is non-zero.
In case of case INT_MIN + INT_MIN
we do not widen as the compiler evaluates this to 0 with the warning: "overflow in expression; result is 0 with type 'int'".
clang/lib/Sema/BoundsAnalysis.cpp
Outdated
|
||
// Check if EB is on a true edge of pred. The false edge (including the | ||
// default case for a switch) is always the last edge in the list of | ||
// edges. So we check whether EB is on the last edge for pred. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wording of the comment "We check whether EB is on the last edge for pred" is confusing" with respect to the code. Aren't you checking that EB is not the last edge for pred?
if (Expr *E = GetTerminatorCondition(pred)) | ||
if (Expr *E = GetTerminatorCondition(pred)) { | ||
|
||
// Check if the pred ends in a switch statement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a TODO about handling the default case better and cite the GitHub issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
clang/lib/Sema/BoundsAnalysis.cpp
Outdated
// This is not an IntegerConstantExpr. It is a ConstantExpr and we get its | ||
// value by calling the getResultAsAPSInt() method. | ||
if (const auto *CE = dyn_cast_or_null<ConstantExpr>(E)) | ||
return CE->getResultAsAPSInt(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getResultAsAPSInt can only be called on a ConstantExpr. So I have to cast E to ConstantExpr here.
clang/lib/Sema/BoundsAnalysis.cpp
Outdated
|
||
bool BoundsAnalysis::CheckIsSwitchCaseNull(ElevatedCFGBlock *EB) { | ||
if (const auto *CS = dyn_cast_or_null<CaseStmt>(EB->Block->getLabel())) { | ||
llvm::APSInt Zero (Ctx.getTypeSize(Ctx.IntTy), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added new tests for unsigned, UINT_MAX, INT_MAX and INT_MIN case labels. The compiler and GetSwitchCaseVal seem to handle these correctly. For example:
case 999999999999999999999999999
is internally represented as 11515845246265065471ULL
and we return detect that this is non-zero.
In case of case INT_MIN + INT_MIN
we do not widen as the compiler evaluates this to 0 with the warning: "overflow in expression; result is 0 with type 'int'".
if (Expr *E = GetTerminatorCondition(pred)) | ||
if (Expr *E = GetTerminatorCondition(pred)) { | ||
|
||
// Check if the pred ends in a switch statement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe GetSwitchCaseVal can be simplified. It shouldn't be removing implicit casts because these code narrow or widen the value in the case statement. Removing them changes how the matching in the switch statement occurs. I think tunction also has an unnecessary case for ConstantExpr.
Here is an example of code that demonstrates how a case statement could have a narrow cast applied to the constant value. As we discussed, the C specification requires that the type of the constant expression used in case statement be narrowed to the type of the controlling expression in the switch statement.
#include <stdio.h>
#include <stdlib.h>
void f(unsigned int a) {
switch (a) {
case 0x500000000LL:
printf("surprise\n");
break;
default:
break;
}
}
extern int main(int argc, char* argv[]) {
if (argc >= 2)
f(atoi(argv[1]));
}
On a 32-bit architecture, clang issues a warning on the case statement because the value overflows when converted to int type (the suffix LL
indicates a long long
constant).
c:\Samples\test.c:6:7: warning: overflow converting case value to switch condition type
(21474836480 to 0) [-Wswitch]
case 0x500000000LL:
^
1 warning generated.
If you run the resulting program with the argument 0
, you hit the surprise
case.
clang/lib/Sema/BoundsAnalysis.cpp
Outdated
llvm::APSInt BoundsAnalysis::GetSwitchCaseVal(const Expr *CaseExpr) { | ||
Expr *E = const_cast<Expr *>(CaseExpr); | ||
|
||
while (auto *ICE = dyn_cast<ImplicitCastExpr>(E)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should be removing implicit casts before computing the integer constant expression. The implicit cast could be truncating the constant declared by the case statement.
clang/lib/Sema/BoundsAnalysis.cpp
Outdated
// Note: According to C11 spec sections 6.6 and 6.8.1 a DeclRefExpr is not | ||
// considered a constant expr. But clang allows for this additional | ||
// extension. So we handle this here. | ||
if (const auto *CE = dyn_cast_or_null<ConstantExpr>(E)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this case is needed. isIntegerConstantExpr in ExprConstant.cpp calls CheckICE, which has a case that checks for ConstantExpr. It also calls expression evaluation functions that allow have cases for ConstantExpr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this case:
const int i = 1;
This is represented in AST as:
ConstantExpr 'const int' lvalue 1
-- DeclRefExpr 'const int' lvalue Var 'i' 'const int'
The case for ConstantExpr invokes CheckICE on the DeclRefExpr. Now DeclRefExprs only seem to be considered ICE for EnumConstantDecl or when the language is C++.
case Expr::DeclRefExprClass: {
if (isa<EnumConstantDecl>(cast<DeclRefExpr>(E)->getDecl()))
return NoDiag();
const ValueDecl *D = cast<DeclRefExpr>(E)->getDecl();
if (Ctx.getLangOpts().CPlusPlus &&
D && IsConstNonVolatile(D->getType())) {
...
}
return ICEDiag(IK_NotICE, E->getBeginLoc());
That's the reason I have a separate case for ConstantExpr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for clarifying this and giving this concrete example. Yes, it makes sense to keep this case. In test cases where this non-standard extension is used, could you call out that the usage is non-standard C, but allowed by clang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David.
And as a result of keeping this case we also need to strip off casts because getResultAsAPSInt
does not recursively test subexprs.
llvm::APSInt ConstantExpr::getResultAsAPSInt() const {
switch (ConstantExprBits.ResultKind) {
case ConstantExpr::RSK_APValue:
return APValueResult().getInt();
case ConstantExpr::RSK_Int64:
return llvm::APSInt(llvm::APInt(ConstantExprBits.BitWidth, Int64Result()),
ConstantExprBits.IsUnsigned);
default:
llvm_unreachable("invalid Accessor");
}
}
I can move the stripping off casts to just above the call to getResultAsAPSInt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to move stripping off casts earlier. I am still concerned that stripping off casts could cause our analysis to differ from what being tested at runtime. What does the code generation phase do for this case? How are they handling casts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant "move stripping off casts later".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overflown value is truncated to the valid integer range in clang ("lib/Sema/SemaStmt.cpp"):
llvm::APSInt ConvVal(Val);
AdjustAPSInt(ConvVal, UnpromotedWidth, UnpromotedSign);
AdjustAPSInt(ConvVal, Val.getBitWidth(), Val.isSigned());
if (ConvVal != Val)
S.Diag(Loc, diag::warn_case_value_overflow);
static void AdjustAPSInt(llvm::APSInt &Val, unsigned BitWidth, bool IsSigned) {
Val = Val.extOrTrunc(BitWidth);
Val.setIsSigned(IsSigned);
}
So in the AST there is no cast for the trunc/ext as the AST does not record the trunc/ext.
And the codegen simply receives the trunc/ext value. Here's the codegen for the following reduced test case:
void f(unsigned int a) {
switch (a) {
case 0x500000000LL:
break;
}
}
Codegen:
sub sp, sp, #4
str r0, [sp]
ldr r0, [sp]
cmp r0, #0
bne .LBB0_2
b .LBB0_1
.LBB0_1: @ %sw.bb
b .LBB0_2
.LBB0_2: @ %sw.epilog
add sp, sp, #4
As we can see the value of the label (cmp r0, #0) is 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thank you. I am glad to see the simplification for checking the values of switch cases.
Added support for widening bounds for nt_array_ptr dereferences in switch
statements. For example, "switch (*p) {case 'a': break;}" would widen the
bounds of p inside the "case 'a'" block.