Skip to content

Support bounds widening for conditionals within switch statements #805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 9, 2020

Conversation

mgrang
Copy link

@mgrang mgrang commented Mar 9, 2020

Added support for widening bounds for nt_array_ptr dereferences in switch
statements. For example, "switch (*p) {case 'a': break;}" would widen the
bounds of p inside the "case 'a'" block.

Mandeep Singh Grang added 5 commits March 5, 2020 18:49
Added support for widenening bounds for nt_array_ptr dereferences in while
loops. For example, "while (*p) {}" would widen the bounds of p upon
entry to the loop.

In order to handle loops we initialize the In and Out sets of the entry block
as empty. This is done so that we can handle loop back edges and unconditional
jumps.
Added support for widenening bounds for nt_array_ptr dereferences in for
loops. For example, "for (; *p; ) {}" would widen the bounds of p upon
entry to the loop.
Added support for widenening bounds for nt_array_ptr dereferences in switch
statements. For example, "switch (*p) {case 'a': break;}" would widen the
bounds of p inside the "case 'a'" block.
@mgrang mgrang requested review from dtarditi and kkjeer March 9, 2020 04:55
@mgrang
Copy link
Author

mgrang commented Mar 9, 2020

This follows from the PR on for loops: #804

Copy link
Member

@dtarditi dtarditi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have suggestions for improving and generalizing the handling of case statements.

Could you open a GitHub issue to allow widening for the default case in some circumstances? That should be a separate change. If we establish that another label in a switch statement tests for 0, then the default case will handle non-zero case, and the bounds can be widened there. Programmers will use all the different forms of case statements and may depend on that behavior.

// Check if this is a switch case and whether it tests for a null
// terminator, like "case '\0'". We cannot widen the bounds for the null
// terminator case.
bool IsSwitchCaseNullTerm = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that instead of checking for null cases and excluding them, you should check for non-null cases and include only them. This makes the analysis more robust if you miss detecting a null case. In the test you do below, the prgrammer could have written a 0 literal, instead of a character literal. This would have result in a widened bounds, which is incorrect.

You also need to generalize this to look for constant expressions as the label in the switch statement You can use the clang code for determining whether an expression is an integer constant expression. According to the C11 spec (section 6.8.1) a case statement's label is a constant expression.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My updated PR generalizes the checking of non-null case labels. Also added more unit tests for switch-cases.

Mandeep Singh Grang added 2 commits April 6, 2020 14:48
Generalized the checking of non-null switch cases by checking for
IntegerConstantExpr and ConstantExpr. Also added more unit tests for widening
inside switch cases.
@mgrang
Copy link
Author

mgrang commented Apr 9, 2020

I have suggestions for improving and generalizing the handling of case statements.

Could you open a GitHub issue to allow widening for the default case in some circumstances? That should be a separate change. If we establish that another label in a switch statement tests for 0, then the default case will handle non-zero case, and the bounds can be widened there. Programmers will use all the different forms of case statements and may depend on that behavior.

Opened issue: #818

Mandeep Singh Grang added 2 commits April 9, 2020 14:19
Handle GNU range case statements is of the form "case LHS ... RHS".
Copy link
Member

@dtarditi dtarditi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. I have a few suggestions. I am concerned that for switch statements where the controlling expression has an integer type that requires more bits than int, the code may lead to a compiler assertion.


llvm::APSInt IntVal (Ctx.getTypeSize(Ctx.IntTy), 0);

if (const auto *CE = dyn_cast_or_null<ConstantExpr>(E)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you casting E to a ConstantExpr? Can't you just use the inner test?


bool BoundsAnalysis::CheckIsSwitchCaseNull(ElevatedCFGBlock *EB) {
if (const auto *CS = dyn_cast_or_null<CaseStmt>(EB->Block->getLabel())) {
llvm::APSInt Zero (Ctx.getTypeSize(Ctx.IntTy), 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this may not be the correct type for zero. The C11 specification says in 6.8.4.2 that the type of the constant expression in a case expression shall be an integer constant expression. There are a number of different sizes of integer types in C, which are distinct from the int type. The C11 specification says that integer promotions are applied to the controlling expression in the switch statement, and that the label expressions of the case statements are promoted to match that type.

I believe that the comparison functions for APSInt may expect the integers to have the same size. It's been a while since I used the API, so I amy be wrong. In that case, the test for inclusion of zero, as well as the uses of APSInt in GetSwitchCaseVal could both lead to internal compiler assertion failures, if the integer requires more bits than IntTy.

My suggestion is that you test out switches that use an unsigned int, as well as an integer type larger than the default size of int. I believe the default promotions rules for char will widen everything out to int, which is why the code is working for your test cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added new tests for unsigned, UINT_MAX, INT_MAX and INT_MIN case labels. The compiler and GetSwitchCaseVal seem to handle these correctly. For example:

case 999999999999999999999999999 is internally represented as 11515845246265065471ULL and we return detect that this is non-zero.

In case of case INT_MIN + INT_MIN we do not widen as the compiler evaluates this to 0 with the warning: "overflow in expression; result is 0 with type 'int'".


// Check if EB is on a true edge of pred. The false edge (including the
// default case for a switch) is always the last edge in the list of
// edges. So we check whether EB is on the last edge for pred.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording of the comment "We check whether EB is on the last edge for pred" is confusing" with respect to the code. Aren't you checking that EB is not the last edge for pred?

if (Expr *E = GetTerminatorCondition(pred))
if (Expr *E = GetTerminatorCondition(pred)) {

// Check if the pred ends in a switch statement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a TODO about handling the default case better and cite the GitHub issue?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// This is not an IntegerConstantExpr. It is a ConstantExpr and we get its
// value by calling the getResultAsAPSInt() method.
if (const auto *CE = dyn_cast_or_null<ConstantExpr>(E))
return CE->getResultAsAPSInt();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getResultAsAPSInt can only be called on a ConstantExpr. So I have to cast E to ConstantExpr here.


bool BoundsAnalysis::CheckIsSwitchCaseNull(ElevatedCFGBlock *EB) {
if (const auto *CS = dyn_cast_or_null<CaseStmt>(EB->Block->getLabel())) {
llvm::APSInt Zero (Ctx.getTypeSize(Ctx.IntTy), 0);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added new tests for unsigned, UINT_MAX, INT_MAX and INT_MIN case labels. The compiler and GetSwitchCaseVal seem to handle these correctly. For example:

case 999999999999999999999999999 is internally represented as 11515845246265065471ULL and we return detect that this is non-zero.

In case of case INT_MIN + INT_MIN we do not widen as the compiler evaluates this to 0 with the warning: "overflow in expression; result is 0 with type 'int'".

if (Expr *E = GetTerminatorCondition(pred))
if (Expr *E = GetTerminatorCondition(pred)) {

// Check if the pred ends in a switch statement.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@dtarditi dtarditi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe GetSwitchCaseVal can be simplified. It shouldn't be removing implicit casts because these code narrow or widen the value in the case statement. Removing them changes how the matching in the switch statement occurs. I think tunction also has an unnecessary case for ConstantExpr.

Here is an example of code that demonstrates how a case statement could have a narrow cast applied to the constant value. As we discussed, the C specification requires that the type of the constant expression used in case statement be narrowed to the type of the controlling expression in the switch statement.

#include <stdio.h>
#include <stdlib.h>

void f(unsigned int a) {
	switch (a) {
	case 0x500000000LL: 
		printf("surprise\n");
		break;
	default:
		break;
	}
}

extern int main(int argc, char* argv[]) {
	if (argc >= 2)
		f(atoi(argv[1]));
}

On a 32-bit architecture, clang issues a warning on the case statement because the value overflows when converted to int type (the suffix LL indicates a long long constant).

c:\Samples\test.c:6:7: warning: overflow converting case value to switch condition type
      (21474836480 to 0) [-Wswitch]
        case 0x500000000LL:
             ^
1 warning generated.

If you run the resulting program with the argument 0, you hit the surprise case.

llvm::APSInt BoundsAnalysis::GetSwitchCaseVal(const Expr *CaseExpr) {
Expr *E = const_cast<Expr *>(CaseExpr);

while (auto *ICE = dyn_cast<ImplicitCastExpr>(E))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should be removing implicit casts before computing the integer constant expression. The implicit cast could be truncating the constant declared by the case statement.

// Note: According to C11 spec sections 6.6 and 6.8.1 a DeclRefExpr is not
// considered a constant expr. But clang allows for this additional
// extension. So we handle this here.
if (const auto *CE = dyn_cast_or_null<ConstantExpr>(E))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this case is needed. isIntegerConstantExpr in ExprConstant.cpp calls CheckICE, which has a case that checks for ConstantExpr. It also calls expression evaluation functions that allow have cases for ConstantExpr.

Copy link
Author

@mgrang mgrang Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this case:
const int i = 1;

This is represented in AST as:
ConstantExpr 'const int' lvalue 1
-- DeclRefExpr 'const int' lvalue Var 'i' 'const int'

The case for ConstantExpr invokes CheckICE on the DeclRefExpr. Now DeclRefExprs only seem to be considered ICE for EnumConstantDecl or when the language is C++.

case Expr::DeclRefExprClass: {
     if (isa<EnumConstantDecl>(cast<DeclRefExpr>(E)->getDecl()))
       return NoDiag();
     const ValueDecl *D = cast<DeclRefExpr>(E)->getDecl();
     if (Ctx.getLangOpts().CPlusPlus &&
         D && IsConstNonVolatile(D->getType())) {
     ...
     }
     return ICEDiag(IK_NotICE, E->getBeginLoc());

That's the reason I have a separate case for ConstantExpr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying this and giving this concrete example. Yes, it makes sense to keep this case. In test cases where this non-standard extension is used, could you call out that the usage is non-standard C, but allowed by clang.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David.
And as a result of keeping this case we also need to strip off casts because getResultAsAPSInt does not recursively test subexprs.

llvm::APSInt ConstantExpr::getResultAsAPSInt() const {
  switch (ConstantExprBits.ResultKind) {
  case ConstantExpr::RSK_APValue:
    return APValueResult().getInt();
  case ConstantExpr::RSK_Int64:
    return llvm::APSInt(llvm::APInt(ConstantExprBits.BitWidth, Int64Result()),
                        ConstantExprBits.IsUnsigned);
  default:
    llvm_unreachable("invalid Accessor");
  }
}

I can move the stripping off casts to just above the call to getResultAsAPSInt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to move stripping off casts earlier. I am still concerned that stripping off casts could cause our analysis to differ from what being tested at runtime. What does the code generation phase do for this case? How are they handling casts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant "move stripping off casts later".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overflown value is truncated to the valid integer range in clang ("lib/Sema/SemaStmt.cpp"):

    llvm::APSInt ConvVal(Val);
    AdjustAPSInt(ConvVal, UnpromotedWidth, UnpromotedSign);
    AdjustAPSInt(ConvVal, Val.getBitWidth(), Val.isSigned());
    if (ConvVal != Val)
      S.Diag(Loc, diag::warn_case_value_overflow);

static void AdjustAPSInt(llvm::APSInt &Val, unsigned BitWidth, bool IsSigned) {
  Val = Val.extOrTrunc(BitWidth);
  Val.setIsSigned(IsSigned);
}

So in the AST there is no cast for the trunc/ext as the AST does not record the trunc/ext.
And the codegen simply receives the trunc/ext value. Here's the codegen for the following reduced test case:

void f(unsigned int a) {
  switch (a) {
  case 0x500000000LL:
    break;
  }
}

Codegen:

        sub     sp, sp, #4
        str     r0, [sp]
        ldr     r0, [sp]
        cmp     r0, #0
        bne     .LBB0_2
        b       .LBB0_1
.LBB0_1:                                @ %sw.bb
        b       .LBB0_2
.LBB0_2:                                @ %sw.epilog
        add     sp, sp, #4

As we can see the value of the label (cmp r0, #0) is 0.

Copy link
Member

@dtarditi dtarditi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you. I am glad to see the simplification for checking the values of switch cases.

@mgrang mgrang merged commit d2f589f into master May 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants