Skip to content

Replace current_expr_value with expression temporaries. #561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Sep 10, 2018

Conversation

dtarditi
Copy link
Member

@dtarditi dtarditi commented Sep 7, 2018

This change replaces current_expr_value in the Checked C clang IR with expression temporaries. An expression temporary is a temporary variable that holds the result of computing a subexpression of an expression. Use the expression temporaries to compute bounds for string literals and compound array literals. The bounds are used for static and dynamic checking.

The Checked C specification uses current_expr_value in its description of bounds inference. This leads to bounds inference steps having to adjust current_expr_value to offset the effect of an expression on a subexpression's value, when the subexpression's value is used in the bounds of an expression. For example, if the bounds of e1 are bounds(current_expr_value, current_expr_value + 5), then the bounds of e1 + e2 require subtracting the value of e2. The bounds of the parent expression are bounds(current_expr_value - e2, current_expr_value - e2). If e2 has side effects, it is not possible to recompute the value of e2. By using expression temporaries, we avoid these complications.

The clang AST has several existing forms of temporaries: CXXBindTemporaryExpr, MaterializeExpr, and OpaqueExpr. The first 2 are specialized for C++ and the third form is only used for temporaries that are "locally obvious". We don't generalize/refactor the existing classes because we would likely break something or make future merges from clang much more difficult.

Instead we create yet another class for temporaries called CHKCBindTemporaryExpr, modelled after CXXBindTemporaryExpr. CXXBindTemporaryExpr is specialized for inserting destructor calls. The class CHKCBindTemporaryExpr binds a temporary variable We use objects of type CHKCBindTemporaryExpr to represent the temporary. The binding class is matched with a class for using the value of an expression temporary. We use BoundsValueExpr for the use case.

We insert expression temporaries for array literals and compound array literals at the conversion of the array type to a pointer type (array-to-pointer decay, in clang terminology). During bounds inference, we look for the pattern of binding of an expression temporary whose subexpression is a possible-parenthesized literal, and use the temporary to construct the bounds.

Most of the changes here are boiler-plate changes related to adding a new IR node. There are a few interesting places:

  • We tried inserting the expression temporaries at the creation of literals instead of at array-to-pointer decays, but that didn't work well. There are lots of places in the compiler that assume they are operating on exactly a string literal, and they all had to be patched.
  • During code generation, we track the LLVM value object used to represent the result of evaluating the subexpression of a temporary binding. We create a map from the temporary binding to the value object. At uses, we use that information to obtain the value of the subexpression.
  • Temporary expression binding is a form of declaration. During AST TreeTransform.h, we track when a binding has been transformed so that we can transform the use too.
  • We don't expect uses of temporary expressions to appear during AST serialization. These are created by the compiler during bounds inference for expressions, and we don't serialize ASTs with these inferred bounds. If we ever need to do that, we'll to apply the same logic used for declarations to keep bindings/uses in sync.
  • We need to skip expression temporaries in some helper functions on expressions and in a few cases where expression temporaries now appear.

Testing:

  • Add new clang tests cases that check that the expected clang ASTs are synthesized for array literals and string literals, and that the expected LLVM IR is generated as well.
  • Add new runtime tests to the Checked C repo that check bounds checking of subscripting and bounds dereferences of string literals and compound array literals (such as "abcd"[index]`).
  • Existing automated tests pass, including Checked C tests, clang Checked C tests, and LNT testing.

dtarditi added 21 commits August 5, 2018 12:02
This change adds support during bounds declaration checking for
_Current_expr_value.   It fills in the missing bounds for string
and array literals by using _Current_expr_value.  It also adds
code generation support for _Current_expr_value.

Existing Checked C and clang Checked C tests pass.   We need to
add tests of inferred bounds for string/array literals and of
them LLVM IR generated for _Current_expr_value.
There are lots of places in the compiler that do structural matching on
a StringLiteral expression being present at a specific point in the AST.
We're now returning a CHKCBindTemporaryExprClass instead.  This changes
patches up a number of a places to skip the expression temporary.  There
are still about 26 failing Clang regression tests, though.
The insertion of temporary variables for string literals and compound
literals is more selective now.  We only insert them at conversions from
array types to pointers.  That means that we can revert a bunch of changes
that skip unexpected expression temporaries when looking at string literals.

Update the test results for inferred-bounds.c.

All clang tests pass now.
Copy link
Collaborator

@Prabhuk Prabhuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

dtarditi added a commit to checkedc/checkedc that referenced this pull request Sep 10, 2018
Add runtime tests of subscripting/dereferencing string literals and compound literals.  Examples of that are code like `"abcdef"[index]`, `*("abcdef" + index)`.   These are one of the last few cases where we weren't inserting runtime bounds checking required by the Checked C semantics.

This matches the Checked C clang compiler PR checkedc/checkedc-clang#561.

Most of the changes are for adding arguments to test invocations for the new cases.

Fix an error where tests that check dereferencing the result of pointer arithmetic weren't actually checking that.  We needed to add -DPOINTER_ARITHMETIC to the command line.    We have tests that come two variations: checking subscripting, or checking subscripting expanded into a dereference operation and pointer arithmetic.
@dtarditi dtarditi merged commit 80343a5 into master Sep 10, 2018
@dtarditi dtarditi deleted the issue205-part2 branch September 10, 2018 21:06
dtarditi added a commit that referenced this pull request Jan 4, 2019
…597)

This change implements inference of bounds for call expressions for cases where the bounds depend on the result of the call expressions.   This changes adds inference for `count` and `byte_count` return bounds, including cases where a bounds-safe interface adds these as return bounds.  An additional case that is not yet implemented is when the return bounds uses the `_Return_value` expression.

We didn't implement the inference before because we needed a way to represent the result of evaluating an expression.  In PR #561, we extended the clang IR with temporary variables.    A temporary variable can be bound to the result of an expression and then used later within the enclosing top-level expression.   This change documents the temporary variable extension to the IR made in PR #561.

This change adds temporary variables for call expressions where the bounds depends on the results of the call..   It moves inference for bounds for call expressions to a separate function that also takes the temporary variable (if any) that will hold the result of the call.   It fills in cases for count and byte_count.   It extends the lowering from clang IR to LLVM IR to handle the case where a scalar expression result is bound to a temporary variable.

Before this change, we were returning Bounds(any) for call expressions where we could not infer bounds.  This disabled bounds checking for the result of a call (for example, `f(len)[]5]` would not have bounds checking. It also allowed static checking of bounds declarations where the calls was the right-hand side of assignment or initializer to pass.

This change adds equality facts for initializers (T lhs = rhs) and assignment (lhs = rhs) where the right-hand side binds a temporary variable v.  We add the fact that lhs == v. 

With this change, the compiler now issues additional warnings about being unable to prove the validity of bounds.  This happens for some benchmarks in PtrDst and Olden.  The compiler doesn't understand when some pointer arithmetic expressions are equivalent.   I open issue #596 to track this.

I updated the Checked C clang rewriter to handle additional places where temporary variables can appear.

Testing:
- Update existing AST tests for bounds inference for calls to have the inferred bounds.
- Add new tests of checking bounds declarations to the Checked C clang repo.
- Update Checked C repo tests for cases where new warning messages appear.  Tagged them with GitHub issues.
- Passing existing Checked C, Checked C clang, and LLVM test suite suites.
sulekhark pushed a commit that referenced this pull request Jul 8, 2021
…20210430

Merge from Microsoft 2021-04-30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants