Support bounds widening for conditionals within switch statements #805

mgrang · 2020-03-09T04:54:44Z

Added support for widening bounds for nt_array_ptr dereferences in switch
statements. For example, "switch (*p) {case 'a': break;}" would widen the
bounds of p inside the "case 'a'" block.

Added support for widenening bounds for nt_array_ptr dereferences in while loops. For example, "while (*p) {}" would widen the bounds of p upon entry to the loop. In order to handle loops we initialize the In and Out sets of the entry block as empty. This is done so that we can handle loop back edges and unconditional jumps.

Added support for widenening bounds for nt_array_ptr dereferences in for loops. For example, "for (; *p; ) {}" would widen the bounds of p upon entry to the loop.

Added support for widenening bounds for nt_array_ptr dereferences in switch statements. For example, "switch (*p) {case 'a': break;}" would widen the bounds of p inside the "case 'a'" block.

mgrang · 2020-03-09T04:56:00Z

This follows from the PR on for loops: #804

dtarditi

I have suggestions for improving and generalizing the handling of case statements.

Could you open a GitHub issue to allow widening for the default case in some circumstances? That should be a separate change. If we establish that another label in a switch statement tests for 0, then the default case will handle non-zero case, and the bounds can be widened there. Programmers will use all the different forms of case statements and may depend on that behavior.

dtarditi · 2020-04-04T14:58:02Z

clang/lib/Sema/BoundsAnalysis.cpp

+    // Check if this is a switch case and whether it tests for a null
+    // terminator, like "case '\0'". We cannot widen the bounds for the null
+    // terminator case.
+    bool IsSwitchCaseNullTerm = false;


I think that instead of checking for null cases and excluding them, you should check for non-null cases and include only them. This makes the analysis more robust if you miss detecting a null case. In the test you do below, the prgrammer could have written a 0 literal, instead of a character literal. This would have result in a widened bounds, which is incorrect.

You also need to generalize this to look for constant expressions as the label in the switch statement You can use the clang code for determining whether an expression is an integer constant expression. According to the C11 spec (section 6.8.1) a case statement's label is a constant expression.

My updated PR generalizes the checking of non-null case labels. Also added more unit tests for switch-cases.

Generalized the checking of non-null switch cases by checking for IntegerConstantExpr and ConstantExpr. Also added more unit tests for widening inside switch cases.

mgrang · 2020-04-09T01:04:08Z

I have suggestions for improving and generalizing the handling of case statements.

Could you open a GitHub issue to allow widening for the default case in some circumstances? That should be a separate change. If we establish that another label in a switch statement tests for 0, then the default case will handle non-zero case, and the bounds can be widened there. Programmers will use all the different forms of case statements and may depend on that behavior.

Opened issue: #818

Handle GNU range case statements is of the form "case LHS ... RHS".

dtarditi

Overall looks good. I have a few suggestions. I am concerned that for switch statements where the controlling expression has an integer type that requires more bits than int, the code may lead to a compiler assertion.

dtarditi · 2020-04-14T20:28:06Z

clang/lib/Sema/BoundsAnalysis.cpp

+
+  llvm::APSInt IntVal (Ctx.getTypeSize(Ctx.IntTy), 0);
+
+  if (const auto *CE = dyn_cast_or_null<ConstantExpr>(E)) {


Why are you casting E to a ConstantExpr? Can't you just use the inner test?

dtarditi · 2020-04-14T20:47:56Z

clang/lib/Sema/BoundsAnalysis.cpp

+
+bool BoundsAnalysis::CheckIsSwitchCaseNull(ElevatedCFGBlock *EB) {
+  if (const auto *CS = dyn_cast_or_null<CaseStmt>(EB->Block->getLabel())) {
+    llvm::APSInt Zero (Ctx.getTypeSize(Ctx.IntTy), 0);


I believe this may not be the correct type for zero. The C11 specification says in 6.8.4.2 that the type of the constant expression in a case expression shall be an integer constant expression. There are a number of different sizes of integer types in C, which are distinct from the int type. The C11 specification says that integer promotions are applied to the controlling expression in the switch statement, and that the label expressions of the case statements are promoted to match that type.

I believe that the comparison functions for APSInt may expect the integers to have the same size. It's been a while since I used the API, so I amy be wrong. In that case, the test for inclusion of zero, as well as the uses of APSInt in GetSwitchCaseVal could both lead to internal compiler assertion failures, if the integer requires more bits than IntTy.

My suggestion is that you test out switches that use an unsigned int, as well as an integer type larger than the default size of int. I believe the default promotions rules for char will widen everything out to int, which is why the code is working for your test cases.

I have added new tests for unsigned, UINT_MAX, INT_MAX and INT_MIN case labels. The compiler and GetSwitchCaseVal seem to handle these correctly. For example:

case 999999999999999999999999999 is internally represented as 11515845246265065471ULL and we return detect that this is non-zero.

In case of case INT_MIN + INT_MIN we do not widen as the compiler evaluates this to 0 with the warning: "overflow in expression; result is 0 with type 'int'".

dtarditi · 2020-04-14T20:55:29Z

clang/lib/Sema/BoundsAnalysis.cpp

+
+      // Check if EB is on a true edge of pred. The false edge (including the
+      // default case for a switch) is always the last edge in the list of
+      // edges. So we check whether EB is on the last edge for pred.


The wording of the comment "We check whether EB is on the last edge for pred" is confusing" with respect to the code. Aren't you checking that EB is not the last edge for pred?

dtarditi · 2020-04-14T20:55:59Z

clang/lib/Sema/BoundsAnalysis.cpp

-        if (Expr *E = GetTerminatorCondition(pred))
+        if (Expr *E = GetTerminatorCondition(pred)) {
+
+          // Check if the pred ends in a switch statement.


Could you add a TODO about handling the default case better and cite the GitHub issue?

mgrang · 2020-04-21T03:41:01Z

clang/lib/Sema/BoundsAnalysis.cpp

+  // This is not an IntegerConstantExpr. It is a ConstantExpr and we get its
+  // value by calling the getResultAsAPSInt() method.
+  if (const auto *CE = dyn_cast_or_null<ConstantExpr>(E))
+    return CE->getResultAsAPSInt();


getResultAsAPSInt can only be called on a ConstantExpr. So I have to cast E to ConstantExpr here.

mgrang · 2020-04-21T03:47:01Z

clang/lib/Sema/BoundsAnalysis.cpp

+
+bool BoundsAnalysis::CheckIsSwitchCaseNull(ElevatedCFGBlock *EB) {
+  if (const auto *CS = dyn_cast_or_null<CaseStmt>(EB->Block->getLabel())) {
+    llvm::APSInt Zero (Ctx.getTypeSize(Ctx.IntTy), 0);


I have added new tests for unsigned, UINT_MAX, INT_MAX and INT_MIN case labels. The compiler and GetSwitchCaseVal seem to handle these correctly. For example:

case 999999999999999999999999999 is internally represented as 11515845246265065471ULL and we return detect that this is non-zero.

In case of case INT_MIN + INT_MIN we do not widen as the compiler evaluates this to 0 with the warning: "overflow in expression; result is 0 with type 'int'".

mgrang · 2020-04-21T03:47:54Z

clang/lib/Sema/BoundsAnalysis.cpp

-        if (Expr *E = GetTerminatorCondition(pred))
+        if (Expr *E = GetTerminatorCondition(pred)) {
+
+          // Check if the pred ends in a switch statement.


dtarditi

I believe GetSwitchCaseVal can be simplified. It shouldn't be removing implicit casts because these code narrow or widen the value in the case statement. Removing them changes how the matching in the switch statement occurs. I think tunction also has an unnecessary case for ConstantExpr.

Here is an example of code that demonstrates how a case statement could have a narrow cast applied to the constant value. As we discussed, the C specification requires that the type of the constant expression used in case statement be narrowed to the type of the controlling expression in the switch statement.

#include <stdio.h>
#include <stdlib.h>

void f(unsigned int a) {
	switch (a) {
	case 0x500000000LL: 
		printf("surprise\n");
		break;
	default:
		break;
	}
}

extern int main(int argc, char* argv[]) {
	if (argc >= 2)
		f(atoi(argv[1]));
}

On a 32-bit architecture, clang issues a warning on the case statement because the value overflows when converted to int type (the suffix LL indicates a long long constant).

c:\Samples\test.c:6:7: warning: overflow converting case value to switch condition type
      (21474836480 to 0) [-Wswitch]
        case 0x500000000LL:
             ^
1 warning generated.

If you run the resulting program with the argument 0, you hit the surprise case.

dtarditi · 2020-04-24T20:34:01Z

clang/lib/Sema/BoundsAnalysis.cpp

+llvm::APSInt BoundsAnalysis::GetSwitchCaseVal(const Expr *CaseExpr) {
+  Expr *E = const_cast<Expr *>(CaseExpr);
+
+  while (auto *ICE = dyn_cast<ImplicitCastExpr>(E))


I don't think you should be removing implicit casts before computing the integer constant expression. The implicit cast could be truncating the constant declared by the case statement.

dtarditi · 2020-04-24T20:38:51Z

clang/lib/Sema/BoundsAnalysis.cpp

+  // Note: According to C11 spec sections 6.6 and 6.8.1 a DeclRefExpr is not
+  // considered a constant expr. But clang allows for this additional
+  // extension. So we handle this here.
+  if (const auto *CE = dyn_cast_or_null<ConstantExpr>(E))


I don't think this case is needed. isIntegerConstantExpr in ExprConstant.cpp calls CheckICE, which has a case that checks for ConstantExpr. It also calls expression evaluation functions that allow have cases for ConstantExpr.

Consider this case:
const int i = 1;

This is represented in AST as:
ConstantExpr 'const int' lvalue 1
-- DeclRefExpr 'const int' lvalue Var 'i' 'const int'

The case for ConstantExpr invokes CheckICE on the DeclRefExpr. Now DeclRefExprs only seem to be considered ICE for EnumConstantDecl or when the language is C++.

case Expr::DeclRefExprClass: { if (isa<EnumConstantDecl>(cast<DeclRefExpr>(E)->getDecl())) return NoDiag(); const ValueDecl *D = cast<DeclRefExpr>(E)->getDecl(); if (Ctx.getLangOpts().CPlusPlus && D && IsConstNonVolatile(D->getType())) { ... } return ICEDiag(IK_NotICE, E->getBeginLoc());

That's the reason I have a separate case for ConstantExpr.

Thank you for clarifying this and giving this concrete example. Yes, it makes sense to keep this case. In test cases where this non-standard extension is used, could you call out that the usage is non-standard C, but allowed by clang.

Thanks David.
And as a result of keeping this case we also need to strip off casts because getResultAsAPSInt does not recursively test subexprs.

llvm::APSInt ConstantExpr::getResultAsAPSInt() const { switch (ConstantExprBits.ResultKind) { case ConstantExpr::RSK_APValue: return APValueResult().getInt(); case ConstantExpr::RSK_Int64: return llvm::APSInt(llvm::APInt(ConstantExprBits.BitWidth, Int64Result()), ConstantExprBits.IsUnsigned); default: llvm_unreachable("invalid Accessor"); } }

I can move the stripping off casts to just above the call to getResultAsAPSInt.

It makes sense to move stripping off casts earlier. I am still concerned that stripping off casts could cause our analysis to differ from what being tested at runtime. What does the code generation phase do for this case? How are they handling casts?

I meant "move stripping off casts later".

The overflown value is truncated to the valid integer range in clang ("lib/Sema/SemaStmt.cpp"):

llvm::APSInt ConvVal(Val); AdjustAPSInt(ConvVal, UnpromotedWidth, UnpromotedSign); AdjustAPSInt(ConvVal, Val.getBitWidth(), Val.isSigned()); if (ConvVal != Val) S.Diag(Loc, diag::warn_case_value_overflow); static void AdjustAPSInt(llvm::APSInt &Val, unsigned BitWidth, bool IsSigned) { Val = Val.extOrTrunc(BitWidth); Val.setIsSigned(IsSigned); }

So in the AST there is no cast for the trunc/ext as the AST does not record the trunc/ext.
And the codegen simply receives the trunc/ext value. Here's the codegen for the following reduced test case:

void f(unsigned int a) { switch (a) { case 0x500000000LL: break; } }

Codegen:

sub sp, sp, #4 str r0, [sp] ldr r0, [sp] cmp r0, #0 bne .LBB0_2 b .LBB0_1 .LBB0_1: @ %sw.bb b .LBB0_2 .LBB0_2: @ %sw.epilog add sp, sp, #4

As we can see the value of the label (cmp r0, #0) is 0.

dtarditi

Looks great! Thank you. I am glad to see the simplification for checking the values of switch cases.

Mandeep Singh Grang added 5 commits March 5, 2020 18:49

Update comments

cc12d59

Update comments

afd4a6e

Support bounds widening for conditionals with for loops

0f5dae3

Added support for widenening bounds for nt_array_ptr dereferences in for loops. For example, "for (; *p; ) {}" would widen the bounds of p upon entry to the loop.

Support bounds widening for conditionals within switch statements

721ff58

Added support for widenening bounds for nt_array_ptr dereferences in switch statements. For example, "switch (*p) {case 'a': break;}" would widen the bounds of p inside the "case 'a'" block.

mgrang requested review from dtarditi and kkjeer March 9, 2020 04:55

mgrang mentioned this pull request Mar 9, 2020

Support bounds widening for conditionals with for loops #804

Closed

Change IntegralCast assert to if condition

8eff508

mgrang mentioned this pull request Mar 12, 2020

Use the widened bounds to update bounds in context #808

Closed

mgrang self-assigned this Mar 19, 2020

mgrang removed request for kkjeer and dtarditi March 20, 2020 17:39

dtarditi reviewed Apr 4, 2020

View reviewed changes

Mandeep Singh Grang added 2 commits April 6, 2020 14:48

Merge branch 'master' into switch_case

b751e06

Generalize the checking of non-null switch cases

bc3bef5

Generalized the checking of non-null switch cases by checking for IntegerConstantExpr and ConstantExpr. Also added more unit tests for widening inside switch cases.

Mandeep Singh Grang added 2 commits April 9, 2020 14:19

Handle GNU range case statements

2f62d82

Handle GNU range case statements is of the form "case LHS ... RHS".

Fix indentation in tests

b8e2bb8

dtarditi reviewed Apr 14, 2020

View reviewed changes

Simplify if condition and fix comments

dd6626b

mgrang commented Apr 21, 2020

View reviewed changes

Mandeep Singh Grang added 2 commits April 23, 2020 15:03

Change type of IntVal and Zero to aseExpr type

d7ddb1c

CCeate LHSZero and RHSZero of the correct bit widths

9f56487

dtarditi reviewed Apr 24, 2020

View reviewed changes

Mandeep Singh Grang added 2 commits April 28, 2020 16:00

Move stripping off casts after the call to isIntegerConstantExpr

b91fdba

Invoke EvaluateKnownConstInt to get the switch case value

d5a8601

dtarditi approved these changes May 8, 2020

View reviewed changes

mgrang merged commit d2f589f into master May 9, 2020

mgrang deleted the switch_case branch May 9, 2020 16:35


		llvm::APSInt IntVal (Ctx.getTypeSize(Ctx.IntTy), 0);

		if (const auto *CE = dyn_cast_or_null<ConstantExpr>(E)) {

Support bounds widening for conditionals within switch statements #805

Support bounds widening for conditionals within switch statements #805

Uh oh!

Conversation

mgrang commented Mar 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgrang commented Mar 9, 2020

Uh oh!

dtarditi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgrang commented Apr 9, 2020

Uh oh!

dtarditi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dtarditi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgrang Apr 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dtarditi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mgrang commented Mar 9, 2020 •

edited

Loading

mgrang Apr 27, 2020 •

edited

Loading