-
Notifications
You must be signed in to change notification settings - Fork 79
Add a dynamic check for pointer arithmetic of null pointers #647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is version 1 of the patch and does not yet handle all the cases. Please review this and give your suggestions. I will add tests cases when this patch finalizes. Driving example:
|
lib/CodeGen/CGExprScalar.cpp
Outdated
if (type->isCheckedPointerType() || type->isCheckedArrayType()) { | ||
LValueBaseInfo BaseInfo; | ||
TBAAAccessInfo TBAAInfo; | ||
Address Addr = CGF.EmitPointerWithAlignment(E->getSubExpr(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will result in E's subexpression being evaluated twice. This would be a problem for something where the subexpression might have a side-effect, such as (a[f()])++
. I think you need to push the insertion of the check into the cases below. The only interesting cases are those where a pointer type can occur.
@@ -2427,6 +2427,15 @@ ScalarExprEmitter::EmitScalarPrePostIncDec(const UnaryOperator *E, LValue LV, | |||
|
|||
// Next most common: pointer increment. | |||
} else if (const PointerType *ptr = type->getAs<PointerType>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the spec, array_ptrs of function pointers are not allowed. So I guess we do not need a check in the case for function pointers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - we do not need a check for the case of function pointers.
This patch currently handles the case of
|
Should we guard the insertion of non-null checks via a clang flag? The user can pass a flag like |
Yes, I think a clang flag controlling isertion of these checks is a good idea. We'll need it so that we can examine the performance impact of adding the checks. |
Added two new flags to control whether runtime checks should be added or not: -fcheckedc-runtime-checks -fno-checkedc-runtime-checks Also added runtime time checks for pointer arithmetic using binary operations like: ptr += 1 ptr = ptr - 1;
Added in latest patch set. |
lib/CodeGen/CGDynamicCheck.cpp
Outdated
@@ -54,6 +54,9 @@ void CodeGenFunction::EmitDynamicNonNullCheck(const Address BaseAddr, const Qual | |||
if (!getLangOpts().CheckedC) | |||
return; | |||
|
|||
if (!CGM.getCodeGenOpts().CheckedCRuntimeChecks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime checks for non-null will only be added if the flag -fcheckedc-runtime-checks
is present. Since I have added this check here, it means this would affect all non-null checks (and not just those for pointer arithmetic). This is resulting in 3 unit test failures which we need to fix (by adding this flag to those tests). Is this behavior acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be on by default, and something that we can disable.
Nice! |
I think you should rename the flag to indicate that it is specific to null checks. With the current name, someone might think we disable bounds checks too. |
@@ -364,6 +364,9 @@ CODEGENOPT(BranchTargetEnforcement, 1, 0) | |||
/// Whether to emit unused static constants. | |||
CODEGENOPT(KeepStaticConsts, 1, 0) | |||
|
|||
/// Whether to add null ptr checks for checkedc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want the nullptr checks to be emitted by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we want to emit them by default. Please see @dtarditi 's comment above.
include/clang/Driver/Options.td
Outdated
|
||
def fcheckedc_null_ptr_checks : Flag<["-"], "fcheckedc-null-ptr-checks">, | ||
Group<f_Group>, Flags<[CC1Option]>, | ||
HelpText<"Enable runtime null ptr checks">; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "Enable runtime nullptr checks". Similarly, at other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I wanted to distinguish general "null pointers" from the C++ "nulllptr" keyword. That's the reason for the space in between. I guess I can spell it out as "null pointer".
This patch is causing 3 run time unit tests to assert. I am working on fixing it. |
lib/CodeGen/CGExprScalar.cpp
Outdated
} | ||
|
||
static Expr *peelOffOuterExpr(Expr *E) { | ||
if (auto *ICE = dyn_cast<ImplicitCastExpr>(E)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we only handle these two types of exprs because I do not have a driving example for any other types of exprs.
lib/CodeGen/CGExprScalar.cpp
Outdated
CGF.EmitDynamicNonNullCheck(Addr, Ty); | ||
return; | ||
|
||
} else if (RV.isComplex()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also do not have any driving examples for complex and aggregate rvalues but I have added code here just in case. As an alternative, we could simply assert here and only handle scalar rvalues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A suggestion: If you don't have motivating examples but have added the code for completeness, you can refer in comments to other parts of the codebase where such cases are being handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer present in the latest patch.
- Added a missing comment for _Nt_Array_ptr. - Fixed an incomplete sentence for _Array_ptr.
I ran LNT testsuite locally and do not see any issues due to this patch. Although there are 10 tests failing with errors like |
lib/CodeGen/CGExprScalar.cpp
Outdated
if (B->isAdditiveOp() && B->getType()->isPointerType()) { | ||
if (B->getLHS()->getType()->isPointerType()) { | ||
return B->getLHS(); | ||
} else if (B->getRHS()->getType()->isPointerType()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Coding style. In some of the other places like CGDynamicCheck.cpp, I'm seeing that for single statements under an if-else, we aren't using braces. Should we drop the braces from the single statements under if-else here?
For example:
if (!(BaseTy->isCheckedPointerType() || BaseTy->isCheckedArrayType()))
return;
lib/CodeGen/CGExprScalar.cpp
Outdated
@@ -2319,6 +2319,106 @@ static BinOpInfo createBinOpInfoFromIncDec(const UnaryOperator *E, | |||
return BinOp; | |||
} | |||
|
|||
static Expr *peelOffPointerArithmetic(const BinaryOperator *B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I guess this function is getting the pointer type for expressions like (a + i) or (i + a), where 'a' is the pointer type and 'i' is the integer offset. Can we add a comment describing this?
Also, maybe we can rename the function to be more clear on the intent? Something like: getPointerTypeFromPointerOffsetExpression or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
peelOffPointerArithmetic
has been adapted from a similar function in lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
. I wanted to keep the same name to make the association clear and also to help while grep'ping for the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment describing the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Sunny that the name isn't that meaningful.
lib/CodeGen/CGExprScalar.cpp
Outdated
return nullptr; | ||
} | ||
|
||
static Expr *peelOffOuterExpr(Expr *E) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can we rename this function to: skipCasts or something similar, which expresses the intent more clearly.
- What about parenthesis? Do we create expressions for them in the ASTs as well? If so, we'd want to skip them as well.
- The check for: if (const auto *BO = dyn_cast(E)) can be more targeted, since you only handle additive operator. So, you can use: isAdditiveOp() here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
peelOffOuterExpr
has been adapted from a similar function inlib/StaticAnalyzer/Core/BugReporterVisitors.cpp
. I wanted to keep the same name to make the association clear and also to help while grep'ping for the function. -
Parenthesis are represented in the AST as node levels. For example:
(a * b) + c
is represented is:
+
/ \
* c
/ \
a b
isAdditiveOp
is already checked insidepeelOffPointerArithmetic
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code needs to revised to not evaluate pointer-typed subexpressions twice. I think additional tests are needed.
I'm interested in data on the effect of this on code size and on th effectiveness of optimization. That can be done as a follow-up work item.
lib/CodeGen/CGExprScalar.cpp
Outdated
@@ -2319,6 +2319,106 @@ static BinOpInfo createBinOpInfoFromIncDec(const UnaryOperator *E, | |||
return BinOp; | |||
} | |||
|
|||
static Expr *peelOffPointerArithmetic(const BinaryOperator *B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Sunny that the name isn't that meaningful.
lib/CodeGen/CGExprScalar.cpp
Outdated
@@ -2427,6 +2529,9 @@ ScalarExprEmitter::EmitScalarPrePostIncDec(const UnaryOperator *E, LValue LV, | |||
|
|||
// Next most common: pointer increment. | |||
} else if (const PointerType *ptr = type->getAs<PointerType>()) { | |||
// Insert a dynamic check for arithmetic on null checked pointers. | |||
emitDynamicNonNullCheck(CGF, E->getSubExpr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller to this function already evaluated getSubExpr() to an lvalue LV. You are re-evaluating E->getSubExpr() here. That's semantically incorrect because the subexpression may have a side-effect. Earlier in the function, code to load the lvalue LV from has already been generated, with the runtime location of the result being stored in the variable value
. The runtime check needs to be against value
, and not re-evaluate the subexpression.
@@ -0,0 +1,60 @@ | |||
// RUN: %clang_cc1 %s -emit-llvm -o - | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some checks for more complex lvalue-producing subexpressions:
- array subscripting
- members
- subexpressions with side-effects.
lib/CodeGen/CGExprScalar.cpp
Outdated
@@ -3162,6 +3267,9 @@ static Value *emitPointerArithmetic(CodeGenFunction &CGF, | |||
std::swap(pointerOperand, indexOperand); | |||
} | |||
|
|||
// Insert a dynamic check for arithmetic on null checked pointers. | |||
emitDynamicNonNullCheck(CGF, pointerOperand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has similar problems, in that pointerOperand will be re-evaluated.
lib/CodeGen/CGExprScalar.cpp
Outdated
// Determine whether to emit a dynamic non-null check. | ||
static void emitDynamicNonNullCheck(CodeGenFunction &CGF, Expr *E) { | ||
if (!CGF.CGM.getCodeGenOpts().CheckedCNullPtrChecks) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would structure this code differently. It's an optimization to not emit a dynamic null check if E can be proved at compile-time to be non-null. So I'd try to prove that E does not need a null check and return if the null check can be skipped. Then I would go into the general case. of adding a check.
Please improve the comments on this method. We should always explain what non-obvious arguments are. For example, in this case E is the pointer-typed expression in a pointer-arithmetic expression. We need to check the value of E is non-null, if E is a checked pointer type.
@@ -2427,6 +2433,9 @@ ScalarExprEmitter::EmitScalarPrePostIncDec(const UnaryOperator *E, LValue LV, | |||
|
|||
// Next most common: pointer increment. | |||
} else if (const PointerType *ptr = type->getAs<PointerType>()) { | |||
// Insert a dynamic check for arithmetic on null checked pointers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about adding a compile time check to see if value
is indeed null so that the runtime check can be elided. LLVM has a function called isKnownNonZero
which is used in a couple of places in Clang but in my case it always returns 0.
Consider this code:
array_ptr<char> s = NULL;
++s;
This is the IR:
%s = alloca i8*, align 8
store i8* null, i8** %s, align 8
%0 = load i8*, i8** %s, align 8
%_Dynamic_check.non_null = icmp ne i8* %0, null
I guess we need to check what is getting stored in %s
. If it is indeed non-null, then we can elide the check. Is this a valid approach?
Added two new flags to control whether runtime checks should be added or not: -fcheckedc-runtime-checks -fno-checkedc-runtime-checks Also added runtime time checks for pointer arithmetic using binary operations like: ptr += 1 ptr = ptr - 1;
This patch had merge conflicts with master. While trying to resolve the conflicts, git added other commits to this PR. At this point, I would simply create a new PR with the conflicts fixed. Please see https://github.com/microsoft/checkedc-clang/pull/663/files. |
Closing this PR because it is superseded by PR #663. |
See issue #237