Skip to content

Extend static bounds declaration checking#624

Merged
dtarditi merged 20 commits into
checkedc:masterfrom
ppashakhanloo:extend-static-analyzer
Jul 24, 2019
Merged

Extend static bounds declaration checking#624
dtarditi merged 20 commits into
checkedc:masterfrom
ppashakhanloo:extend-static-analyzer

Conversation

@ppashakhanloo

Copy link
Copy Markdown
Contributor

I have not added the tests yet.

Everytime in bounds declaration checking we encounter a comparison
between zero and an unsigned integer, we can use the fact that
0 <= any unsigned integer expression.

In this commit, the representation of ranges has been extended. Instead
of ConstantSizedRange, a new representation is introduced: BaseRange.

InRange() checks whether the range is constant-sized or
variable-sized, and compare lower/upper offsets based on that.

CreateBaseRange() is responsible for setting the correct RangeType to a
range.
@ppashakhanloo ppashakhanloo changed the title WIP: Extend static analyzer Extend static analyzer Jul 2, 2019

@dtarditi dtarditi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this change. I think with variable-sized ranges, there are additional cases that you need to check. I think the code would benefit from checking upper/lower bounds independently and then combining the results. This would handle the additional cases and make the code simpler.

Comment thread lib/Sema/SemaBounds.cpp
// const2 as signed (APSInt) integers. They must have the same bitsize. A
// variable-sized range is a range that has the form (e1 + e2, e1 + e3), where
// e1, e2, and e3 are all expressions.
class BaseRange {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ranges are more complex than this comment indicates. A variable sized range has the form (e1 + e2, e1 + e3), where one or both of e2 and e3 are non-constant expressions.

I think it will be useful to state the invariant about the representation:

  • If a range is constant sized, the LowerOffsetExpr and UpperOffsetExpr should be null.
  • For a variable sized range, you'll need to think about to indicate whether to use the constant offset or the expression.

In theory, you could just check whether the expression is constant or not. We don't want to do this in practice because that's expensive to determine. So we want to compute this when handed the expression.

Comment thread lib/Sema/SemaBounds.cpp Outdated
}

bool IsConstantSizedRange() {
return BaseRangeType == RangeType::ConstantSized && !UpperOffsetExpr && !LowerOffsetExpr;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the check that UpperOffsetExpr and LowerOffsetExpr are null should be an assert. This is part of the invariant about the representation of ranges.

Comment thread lib/Sema/SemaBounds.cpp
bool IsConstantSizedRange() {
return BaseRangeType == RangeType::ConstantSized && !UpperOffsetExpr && !LowerOffsetExpr;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend adding a matching method IsVariableSizedRange().

Comment thread lib/Sema/SemaBounds.cpp

// Is R in range of this range?
ProofResult InRange(ConstantSizedRange &R, ProofFailure &Cause, EquivExprSets *EquivExprs) {
ProofResult InRange(BaseRange &R, ProofFailure &Cause, EquivExprSets *EquivExprs) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably change that comment at line 1754 for clarity: Is R a subrange of this range?

Comment thread lib/Sema/SemaBounds.cpp Outdated
Result = ProofResult::False;
}
} else { // at least one of the ranges is not constant-sized
Result = ProofResult::Maybe;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic needs to be reworked to check the lower and upper bound separately. For each bound, you can them come to the conclusion that:

  1. this range's lower bound <= R's lower bound (true, false, or maybe).
  2. R's upper bounds <= his range's upper bound (true, false, or maybe).

If 1 and 2 are true, the result is true.
If 1 or 2 are false, the result is false.
Otherwise the result is maybe.

For each bound involved in a ej <= ek, comparison, you need to handle the cases where

  1. ej and ek are constant
  2. ej is 0 and ek is an unsigned integer expression.

If you follow this approach, there is no need to handle variable vs. constant-sized ranges separately.

Comment thread lib/Sema/SemaBounds.cpp

// Does R partially overlap this range?
ProofResult PartialOverlap(ConstantSizedRange &R) {
ProofResult PartialOverlap(BaseRange &R) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be generalized.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed all the above comments (except this one) in my newest commit.
Can you please elaborate more on this comment?

@ppashakhanloo ppashakhanloo changed the title Extend static analyzer Extend static bounds declaration checking Jul 10, 2019

@dtarditi dtarditi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have additional suggestions for the code.

Comment thread lib/Sema/SemaBounds.cpp Outdated
// If both ranges are constant-sized, the constant integer values of the offsets are compared.
// If at least one of the ranges is variable-sized, then in some special cases
// we can prove the lower bound is correct.
if (IsConstantSizedRange() && R.IsConstantSizedRange()) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking that the ranges are constant sized, how about checking that both lower bounds are constant? Then you can eliminate the last case at lines 1794 - 1796.

After reading this code, I think having fields named LowerOffsetExpr and LowerOffset (as well UpperOffsetExpr and UpperOffset) are confusing. The Expr suffix doesn't mean much. LowerOffsetVariable and LowerOffsetConstant instead. It is also worth introducing predicate functions IsLowerOffsetConstant and IsLowerOffsetVariable, as that makes the code easier to read.

Comment thread lib/Sema/SemaBounds.cpp Outdated
// They must have the same bitsize.
// In this case, UpperOffsetExpr and LowerOffsetExpr should be both null.
// - If one or both of e2 and e3 are non-constant expressions, the range is Variable-sized.
// TODO: elaborate more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please state the invariants for the representation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtarditi For
I will add the following lines to the comment:

  • for constant-sized range:
    More specifically: (UpperOffsetExpr == nullptr && LowerOffsetExpr == nullptr && BaseRangeKind == Kind::ConstantSized)
  • for variable-sized range:
    More specifically: ((UpperOffsetExpr != nullptr || LowerOffsetExpr != nullptr) && BaseRangeKind == Kind::VariableSized)

Is that enough?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that will be good.

Comment thread lib/Sema/SemaBounds.cpp Outdated
}
if (UpperOffset < R.UpperOffset) {
} else if (LowerOffsetExpr && R.LowerOffsetExpr &&
!EqualValue(S.Context, LowerOffsetExpr, R.LowerOffsetExpr, EquivExprs))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the case when the offset expressions are equal?

Comment thread lib/Sema/SemaBounds.cpp Outdated
// integer that will fit pointer width.
void SplitIntoBaseAndOffset(Expr *E, Expr *&Base,
llvm::APSInt &Offset) {
llvm::APSInt &Offset, Expr *&OffsetExpr,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding the extra parameter OnlyConstant, I suggest always letting this function always split expressions if possible and checking the results returned by the function. I think you will need to change the function to return a value indicating what it did.

@ppashakhanloo

Copy link
Copy Markdown
Contributor Author

@dtarditi I have made some new changes. The code is ready to be reviewed.

@dtarditi dtarditi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good! I have a few more suggested changes.

Comment thread lib/Sema/SemaBounds.cpp Outdated
R->SetLower(LowerOffset);
R->SetUpper(UpperOffset);
return true;
if (LowerOffsetKind == BaseRange::Kind::ConstantSized && UpperOffsetKind == BaseRange::Kind::ConstantSized) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is OK for one of the upper bounds or lower bounds to be constant and the other to be variable. I think you should set the lower/upper bounds independently of each other:

if (LowerOffsetKind == BaseRange::Kind::ConstantSized) {
 …
} else if (LowerOffsetKind == BaseRange::Kind::VariableSize) {
 …
} else
  return BaseRange::KInd::Invalid

Comment thread lib/Sema/SemaBounds.cpp Outdated
if (UpperOffsetConstant.getExtValue() == 0 &&
R.UpperOffsetConstant.getExtValue() == 0)
return ProofResult::True;
if (SignsMatch(UpperOffsetConstant, R.UpperOffsetConstant) &&

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that you need to check that the signs match for the integer constants. We always represent integers using signed integers.

@ppashakhanloo

Copy link
Copy Markdown
Contributor Author

@dtarditi Thank you very much for your comments. New changes are ready for review.

@dtarditi dtarditi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some additional suggestions.

Comment thread lib/Sema/SemaBounds.cpp Outdated
// R.LowerOffset is within this range, but R.UpperOffset is above the range
if (LowerOffset <= R.LowerOffset && R.LowerOffset < UpperOffset &&
UpperOffset < R.UpperOffset)
if (LowerOffsetConstant <= R.LowerOffsetConstant && R.LowerOffsetConstant < UpperOffsetConstant &&

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to check that the two ranges are constant-sized here.

Comment thread lib/Sema/SemaBounds.cpp Outdated
// Convert an expression E into a base and offset form.
// - If E is pointer arithmetic involving addition or subtraction of a
// constant integer, return the base and offset.
// - If E is a pointer arithmetic involving addition or subtraction of

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs to be updated. OnlyConstant is gone now.

return p;
}

_Array_ptr<int> test_f70(int c) : byte_count(c);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some tests for _Nt_array_ptr too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtarditi Yes, I added some tests for _Nt_array_ptr and will push the changes shortly. There are more tests (for both nt_array_ptr and array_ptr) in CheckedC repo.

@ppashakhanloo

Copy link
Copy Markdown
Contributor Author

@dtarditi Thanks for the suggestions. I pushed the new changes to the branch.

@ppashakhanloo

ppashakhanloo commented Jul 20, 2019

Copy link
Copy Markdown
Contributor Author

@dtarditi I improved the comments of various functions. I hope they are precise enough!

@dtarditi dtarditi left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you.

@dtarditi dtarditi merged commit bf6b7ac into checkedc:master Jul 24, 2019
sulekhark pushed a commit that referenced this pull request Jul 8, 2021
…rectory. (#624)

* Follow new guideline: Avoid `cd DIR && FOO` in RUN lines.

Instead, put `FOO` on a separate run line. `cd` affects later RUN lines,
and `cd DIR && FOO` could mislead the reader that it doesn't and that's
why we needed to put both parts on one line. I added the proposed
guideline to #346.

I'm changing the existing tests before changing more tests to use `cd`
as part of this PR.

* Add `-o /dev/null` to clang commands that use `-` as input.

* Add -working-directory to clang commands with named input files.

* `cd` to a temporary directory for `3c -dump-stats` commands.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants