Skip to content

Add a dynamic check for pointer arithmetic of null pointers #647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 35 commits into from

Conversation

mgrang
Copy link

@mgrang mgrang commented Jul 30, 2019

See issue #237

@mgrang
Copy link
Author

mgrang commented Jul 30, 2019

This is version 1 of the patch and does not yet handle all the cases. Please review this and give your suggestions. I will add tests cases when this patch finalizes.

Driving example:

void foo() {
  int *i = NULL;
  ++i; // No dynamic check added here.

  array_ptr<char> p : count(3) = NULL;
  ++p; // Dynamic check added here.

  array_ptr<char> q : count(3) = "abc";
  ++q; // Dynamic check added here.
}

if (type->isCheckedPointerType() || type->isCheckedArrayType()) {
LValueBaseInfo BaseInfo;
TBAAAccessInfo TBAAInfo;
Address Addr = CGF.EmitPointerWithAlignment(E->getSubExpr(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will result in E's subexpression being evaluated twice. This would be a problem for something where the subexpression might have a side-effect, such as (a[f()])++. I think you need to push the insertion of the check into the cases below. The only interesting cases are those where a pointer type can occur.

@@ -2427,6 +2427,15 @@ ScalarExprEmitter::EmitScalarPrePostIncDec(const UnaryOperator *E, LValue LV,

// Next most common: pointer increment.
} else if (const PointerType *ptr = type->getAs<PointerType>()) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the spec, array_ptrs of function pointers are not allowed. So I guess we do not need a check in the case for function pointers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - we do not need a check for the case of function pointers.

@mgrang
Copy link
Author

mgrang commented Jul 31, 2019

This patch currently handles the case of pre/post inc/dec of array_ptrs. I am now working to extend this patch to handle binary ops as well. For example:

array_ptr<char> p = NULL;
p += 1;
p = p - 1;

@mgrang
Copy link
Author

mgrang commented Jul 31, 2019

Should we guard the insertion of non-null checks via a clang flag? The user can pass a flag like -fno-runtime-checks to skip insertion of run time checks for performance-critical code.

@dtarditi
Copy link
Member

Yes, I think a clang flag controlling isertion of these checks is a good idea. We'll need it so that we can examine the performance impact of adding the checks.

Added two new flags to control whether runtime checks should be added or not:
  -fcheckedc-runtime-checks
  -fno-checkedc-runtime-checks

Also added runtime time checks for pointer arithmetic using binary operations
like:
  ptr += 1
  ptr = ptr - 1;
@mgrang
Copy link
Author

mgrang commented Aug 1, 2019

Yes, I think a clang flag controlling isertion of these checks is a good idea. We'll need it so that we can examine the performance impact of adding the checks.

Added in latest patch set.

@@ -54,6 +54,9 @@ void CodeGenFunction::EmitDynamicNonNullCheck(const Address BaseAddr, const Qual
if (!getLangOpts().CheckedC)
return;

if (!CGM.getCodeGenOpts().CheckedCRuntimeChecks)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime checks for non-null will only be added if the flag -fcheckedc-runtime-checks is present. Since I have added this check here, it means this would affect all non-null checks (and not just those for pointer arithmetic). This is resulting in 3 unit test failures which we need to fix (by adding this flag to those tests). Is this behavior acceptable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be on by default, and something that we can disable.

@lenary
Copy link
Collaborator

lenary commented Aug 1, 2019

Nice!

@dtarditi
Copy link
Member

dtarditi commented Aug 1, 2019

I think you should rename the flag to indicate that it is specific to null checks. With the current name, someone might think we disable bounds checks too.

@@ -364,6 +364,9 @@ CODEGENOPT(BranchTargetEnforcement, 1, 0)
/// Whether to emit unused static constants.
CODEGENOPT(KeepStaticConsts, 1, 0)

/// Whether to add null ptr checks for checkedc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the nullptr checks to be emitted by default?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we want to emit them by default. Please see @dtarditi 's comment above.


def fcheckedc_null_ptr_checks : Flag<["-"], "fcheckedc-null-ptr-checks">,
Group<f_Group>, Flags<[CC1Option]>,
HelpText<"Enable runtime null ptr checks">;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "Enable runtime nullptr checks". Similarly, at other places.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I wanted to distinguish general "null pointers" from the C++ "nulllptr" keyword. That's the reason for the space in between. I guess I can spell it out as "null pointer".

@mgrang
Copy link
Author

mgrang commented Aug 2, 2019

This patch is causing 3 run time unit tests to assert. I am working on fixing it.

}

static Expr *peelOffOuterExpr(Expr *E) {
if (auto *ICE = dyn_cast<ImplicitCastExpr>(E))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we only handle these two types of exprs because I do not have a driving example for any other types of exprs.

CGF.EmitDynamicNonNullCheck(Addr, Ty);
return;

} else if (RV.isComplex()) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also do not have any driving examples for complex and aggregate rvalues but I have added code here just in case. As an alternative, we could simply assert here and only handle scalar rvalues.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion: If you don't have motivating examples but have added the code for completeness, you can refer in comments to other parts of the codebase where such cases are being handled.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer present in the latest patch.

Mandeep Singh Grang and others added 2 commits August 6, 2019 17:19
- Added a missing comment for _Nt_Array_ptr.
- Fixed an incomplete sentence for _Array_ptr.
@mgrang
Copy link
Author

mgrang commented Aug 8, 2019

I ran LNT testsuite locally and do not see any issues due to this patch. Although there are 10 tests failing with errors like expected identifier or '('. These fail even w/o this patch.

if (B->isAdditiveOp() && B->getType()->isPointerType()) {
if (B->getLHS()->getType()->isPointerType()) {
return B->getLHS();
} else if (B->getRHS()->getType()->isPointerType()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Coding style. In some of the other places like CGDynamicCheck.cpp, I'm seeing that for single statements under an if-else, we aren't using braces. Should we drop the braces from the single statements under if-else here?

For example:

if (!(BaseTy->isCheckedPointerType() || BaseTy->isCheckedArrayType()))
return;

@@ -2319,6 +2319,106 @@ static BinOpInfo createBinOpInfoFromIncDec(const UnaryOperator *E,
return BinOp;
}

static Expr *peelOffPointerArithmetic(const BinaryOperator *B) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I guess this function is getting the pointer type for expressions like (a + i) or (i + a), where 'a' is the pointer type and 'i' is the integer offset. Can we add a comment describing this?

Also, maybe we can rename the function to be more clear on the intent? Something like: getPointerTypeFromPointerOffsetExpression or something similar.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

peelOffPointerArithmetic has been adapted from a similar function in lib/StaticAnalyzer/Core/BugReporterVisitors.cpp. I wanted to keep the same name to make the association clear and also to help while grep'ping for the function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment describing the function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Sunny that the name isn't that meaningful.

return nullptr;
}

static Expr *peelOffOuterExpr(Expr *E) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we rename this function to: skipCasts or something similar, which expresses the intent more clearly.
  2. What about parenthesis? Do we create expressions for them in the ASTs as well? If so, we'd want to skip them as well.
  3. The check for: if (const auto *BO = dyn_cast(E)) can be more targeted, since you only handle additive operator. So, you can use: isAdditiveOp() here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. peelOffOuterExpr has been adapted from a similar function in lib/StaticAnalyzer/Core/BugReporterVisitors.cpp. I wanted to keep the same name to make the association clear and also to help while grep'ping for the function.

  2. Parenthesis are represented in the AST as node levels. For example: (a * b) + c is represented is:

                    +
                  /   \
                 *     c
               /   \
              a     b
  1. isAdditiveOp is already checked inside peelOffPointerArithmetic.

Copy link
Member

@dtarditi dtarditi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code needs to revised to not evaluate pointer-typed subexpressions twice. I think additional tests are needed.

I'm interested in data on the effect of this on code size and on th effectiveness of optimization. That can be done as a follow-up work item.

@@ -2319,6 +2319,106 @@ static BinOpInfo createBinOpInfoFromIncDec(const UnaryOperator *E,
return BinOp;
}

static Expr *peelOffPointerArithmetic(const BinaryOperator *B) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Sunny that the name isn't that meaningful.

@@ -2427,6 +2529,9 @@ ScalarExprEmitter::EmitScalarPrePostIncDec(const UnaryOperator *E, LValue LV,

// Next most common: pointer increment.
} else if (const PointerType *ptr = type->getAs<PointerType>()) {
// Insert a dynamic check for arithmetic on null checked pointers.
emitDynamicNonNullCheck(CGF, E->getSubExpr());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller to this function already evaluated getSubExpr() to an lvalue LV. You are re-evaluating E->getSubExpr() here. That's semantically incorrect because the subexpression may have a side-effect. Earlier in the function, code to load the lvalue LV from has already been generated, with the runtime location of the result being stored in the variable value. The runtime check needs to be against value, and not re-evaluate the subexpression.

@@ -0,0 +1,60 @@
// RUN: %clang_cc1 %s -emit-llvm -o - | FileCheck %s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some checks for more complex lvalue-producing subexpressions:

  • array subscripting
  • members
  • subexpressions with side-effects.

@@ -3162,6 +3267,9 @@ static Value *emitPointerArithmetic(CodeGenFunction &CGF,
std::swap(pointerOperand, indexOperand);
}

// Insert a dynamic check for arithmetic on null checked pointers.
emitDynamicNonNullCheck(CGF, pointerOperand);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has similar problems, in that pointerOperand will be re-evaluated.

// Determine whether to emit a dynamic non-null check.
static void emitDynamicNonNullCheck(CodeGenFunction &CGF, Expr *E) {
if (!CGF.CGM.getCodeGenOpts().CheckedCNullPtrChecks)
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would structure this code differently. It's an optimization to not emit a dynamic null check if E can be proved at compile-time to be non-null. So I'd try to prove that E does not need a null check and return if the null check can be skipped. Then I would go into the general case. of adding a check.

Please improve the comments on this method. We should always explain what non-obvious arguments are. For example, in this case E is the pointer-typed expression in a pointer-arithmetic expression. We need to check the value of E is non-null, if E is a checked pointer type.

@@ -2427,6 +2433,9 @@ ScalarExprEmitter::EmitScalarPrePostIncDec(const UnaryOperator *E, LValue LV,

// Next most common: pointer increment.
} else if (const PointerType *ptr = type->getAs<PointerType>()) {
// Insert a dynamic check for arithmetic on null checked pointers.
Copy link
Author

@mgrang mgrang Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about adding a compile time check to see if value is indeed null so that the runtime check can be elided. LLVM has a function called isKnownNonZero which is used in a couple of places in Clang but in my case it always returns 0.

Consider this code:

array_ptr<char> s = NULL;
++s;

This is the IR:

 %s = alloca i8*, align 8
 store i8* null, i8** %s, align 8
 %0 = load i8*, i8** %s, align 8
 %_Dynamic_check.non_null = icmp ne i8* %0, null

I guess we need to check what is getting stored in %s. If it is indeed non-null, then we can elide the check. Is this a valid approach?

@mgrang
Copy link
Author

mgrang commented Aug 14, 2019

This patch had merge conflicts with master. While trying to resolve the conflicts, git added other commits to this PR. At this point, I would simply create a new PR with the conflicts fixed. Please see https://github.com/microsoft/checkedc-clang/pull/663/files.

@dtarditi
Copy link
Member

Closing this PR because it is superseded by PR #663.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants