Description
This issue was copied from checkedc/checkedc-clang#561
This change replaces current_expr_value
in the Checked C clang IR with expression temporaries. An expression temporary is a temporary variable that holds the result of computing a subexpression of an expression. Use the expression temporaries to compute bounds for string literals and compound array literals. The bounds are used for static and dynamic checking.
The Checked C specification uses current_expr_value
in its description of bounds inference. This leads to bounds inference steps having to adjust current_expr_value
to offset the effect of an expression on a subexpression's value, when the subexpression's value is used in the bounds of an expression. For example, if the bounds of e1 are bounds(current_expr_value, current_expr_value + 5)
, then the bounds of e1 + e2
require subtracting the value of e2
. The bounds of the parent expression are bounds(current_expr_value - e2, current_expr_value - e2)
. If e2
has side effects, it is not possible to recompute the value of e2
. By using expression temporaries, we avoid these complications.
The clang AST has several existing forms of temporaries: CXXBindTemporaryExpr
, MaterializeExpr
, and OpaqueExpr
. The first 2 are specialized for C++ and the third form is only used for temporaries that are "locally obvious". We don't generalize/refactor the existing classes because we would likely break something or make future merges from clang much more difficult.
Instead we create yet another class for temporaries called CHKCBindTemporaryExpr
, modelled after CXXBindTemporaryExpr
. CXXBindTemporaryExpr
is specialized for inserting destructor calls. The class CHKCBindTemporaryExpr
binds a temporary variable We use objects of type CHKCBindTemporaryExpr
to represent the temporary. The binding class is matched with a class for using the value of an expression temporary. We use BoundsValueExpr
for the use case.
We insert expression temporaries for array literals and compound array literals at the conversion of the array type to a pointer type (array-to-pointer decay, in clang terminology). During bounds inference, we look for the pattern of binding of an expression temporary whose subexpression is a possible-parenthesized literal, and use the temporary to construct the bounds.
Most of the changes here are boiler-plate changes related to adding a new IR node. There are a few interesting places:
- We tried inserting the expression temporaries at the creation of literals instead of at array-to-pointer decays, but that didn't work well. There are lots of places in the compiler that assume they are operating on exactly a string literal, and they all had to be patched.
- During code generation, we track the LLVM value object used to represent the result of evaluating the subexpression of a temporary binding. We create a map from the temporary binding to the value object. At uses, we use that information to obtain the value of the subexpression.
- Temporary expression binding is a form of declaration. During AST TreeTransform.h, we track when a binding has been transformed so that we can transform the use too.
- We don't expect uses of temporary expressions to appear during AST serialization. These are created by the compiler during bounds inference for expressions, and we don't serialize ASTs with these inferred bounds. If we ever need to do that, we'll to apply the same logic used for declarations to keep bindings/uses in sync.
- We need to skip expression temporaries in some helper functions on expressions and in a few cases where expression temporaries now appear.
Testing:
- Add new clang tests cases that check that the expected clang ASTs are synthesized for array literals and string literals, and that the expected LLVM IR is generated as well.
- Add new runtime tests to the Checked C repo that check bounds checking of subscripting and bounds dereferences of string literals and compound array literals (such as "abcd"[index]`).
- Existing automated tests pass, including Checked C tests, clang Checked C tests, and LNT testing.