-
Notifications
You must be signed in to change notification settings - Fork 79
Add document for lvalue bounds checking generalization #970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
2b65190
Add lvalue generalization doc
10ada58
Fix typos
da10b50
Move description of initial ObservedBounds to the end of the ResetKil…
77394bc
Replace details of checking state updating methods with a brief list …
8de96e1
Rephrase identical lvalue work description
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
158 changes: 158 additions & 0 deletions
158
clang/docs/checkedc/Bounds-Checking-for-LValue-Expressions.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
# Bounds Checking: Generalizing from Variables to LValues | ||
|
||
## Problem Summary | ||
|
||
The current bounds checking algorithm checks that the inferred bounds of each | ||
in-scope variable imply the target bounds of the variable after each top-level | ||
CFG statement. This supports bounds checking in multiple assignments. | ||
For example: | ||
|
||
``` | ||
void f(array_ptr<int> small : count(1), | ||
array_ptr<int> medium : count(2), | ||
array_ptr<int> large : count(3)) { | ||
// At the end of medium = small: medium's inferred bounds are count(1). | ||
// These inferred bounds do not imply the target bounds of count(2). | ||
// At the end of medium = large: medium's inferred bounds are count(3). | ||
// These inferred bounds imply the target bounds of count(2). | ||
// The bounds are not checked until the end of the statement, when | ||
// the inferred bounds of count(3) imply the target bounds of count(2). | ||
medium = small, medium = large; | ||
} | ||
``` | ||
|
||
For other kinds of lvalue expressions (`*p`, `a.f`, etc.), bounds checking | ||
occurs immediately after checking an assignment to the expression. The goal | ||
is to generalize the work that was previously done for variables so that the | ||
bounds checking behavior is the same for all lvalue expressions. | ||
|
||
## Bounds Checking for Variables | ||
|
||
The compiler tracks the inferred bounds for each in-scope variable while | ||
traversing expressions. The following data structures and methods are relevant | ||
for tracking, updating, and using the inferred bounds: | ||
|
||
- **Tracking:** the `ObservedBounds` member of the `CheckingState` class. | ||
- **Updating:** the `TraverseCFG`, `GetIncomingBlockState`, | ||
`UpdateCtxWithWidenedBounds`, `GetDeclaredBounds`, `ResetKilledBounds`, | ||
and `UpdateAfterAssignment` methods. | ||
- **Using:** the `ValidateBoundsContext` and `RValueCastBounds` methods. | ||
|
||
## AbstractSet: LValue Expression Equality | ||
|
||
### Requirements for Bounds Checking | ||
|
||
For certain lvalue expressions `e1` and `e2`, updating the inferred bounds | ||
of `e1` should also update the inferred bounds of `e2`. For example: | ||
|
||
``` | ||
struct S { | ||
array_ptr<int> f : count(2); | ||
}; | ||
|
||
void f(struct S *a, array_ptr<int> p : count(3)) { | ||
(*a).f = 0, p = a->f; | ||
} | ||
``` | ||
|
||
At the end of `(*a).f = 0`, the inferred bounds of `(*a).f` are `bounds(any`). | ||
At the end of `p = a->f`, the inferred bounds `p` are the inferred bounds of | ||
`a->f`. The inferred bounds of `a->f` should be the same as the inferred bounds | ||
of `(*a).f`, since `(*a).f` and `a->f` are identical lvalue expressions. | ||
|
||
In general, for lvalue expressions `e1` and `e2`, updating the inferred bounds | ||
of `e1` should also update the bounds of `e2` if and only if `e1` and `e2` | ||
are guaranteed to be **identical lvalue expressions**. That is, if: | ||
|
||
1. `e1` and `e2` point to the same location in memory, and: | ||
2. `e1` and `e2` have the same range in memory. | ||
|
||
For the initial planned work for lvalue generalization, we will only generalize | ||
across lvalue expressions that we are able to determine as identical based on | ||
the definition stated above. Further, while determining identical lvalue | ||
expressions, we will initially ignore aliasing concerns and ignore lvalue | ||
expressions that do not fully overlap in memory. We may consider aliasing | ||
issues in future work. | ||
|
||
### ObservedBounds Keys | ||
|
||
Currently, the `ObservedBounds` map uses `VarDecl *` as its keys. This ensures | ||
that updating the inferred bounds for a `DeclRefExpr *` `x` updates the | ||
inferred bounds for all other `DeclRefExpr *` `y` where `x` and `y` have the | ||
same `VarDecl *`. However, `VarDecl *` will not work as a key for general | ||
lvalue expressions. | ||
|
||
If the `ObservedBounds` map uses `Expr *` as its keys, then updating the | ||
inferred bounds of `(*a).f` will not update the inferred bounds of `a->f` | ||
since `(*a).f` and `a->f` are distinct expressions in the Clang AST. | ||
Therefore, a different type is needed for the `ObservedBounds` keys. | ||
|
||
To support the new keys for `ObservedBounds`, we introduce an AbstractSet API. | ||
Given an lvalue expression `e`, the API should return a representation of the | ||
set containing all lvalue expressions that are identical to `e`. This | ||
representation will be the key in `ObservedBounds` that maps to the inferred | ||
bounds of `e` (as well as all other lvalue expressions that are identical to | ||
`e`). | ||
|
||
The AbstractSet representation of an lvalue expression `e` may use a canonical | ||
form of `e`. For example, the canonical form of `(*a).f` may be `a->f`. This | ||
canonicalization may use the | ||
[PreorderAST](https://github.com/microsoft/checkedc-clang/blob/master/clang/include/clang/AST/PreorderAST.h). | ||
|
||
## Work Item Overview | ||
|
||
1. Define the representation that the AbstractSet API returns. What will be | ||
the type of the representation? What are some examples of the representation | ||
for different kinds of expressions (`DeclRefExpr *`, `MemberExpr *`, etc.)? | ||
(As described above, the representation may use the `PreorderAST` for | ||
canonicalization. We may also consider using the comparison in | ||
[CanonBounds.cpp](https://github.com/microsoft/checkedc-clang/blob/master/clang/lib/AST/CanonBounds.cpp)). | ||
2. Implement the AbstractSet API for `DeclRefExpr *`. This is necessary to | ||
maintain the current behavior for checking variable bounds. | ||
3. Replace the current `VarDecl *` keys in `ObservedBounds` with the | ||
AbstractSet representation. This should result in no changes in compiler | ||
behavior (since only `DeclRefExpr *` have AbstractSet representations). | ||
4. Implement AbstractSet representations for other kinds of lvalue | ||
expressions (see below for the suggested prioritized expression kinds). | ||
This should result in the compiler behavior for the implemented lvalue | ||
expressions to be identical to the current behavior for variables. | ||
5. For each lvalue expression kind that has an AbstractSet representation, | ||
remove the current bounds checking behavior that deals with lvalue | ||
expressions that are not a `DeclRefExpr *`. For example, the | ||
`CheckBinaryOperator` method currently performs bounds checking for all | ||
assignments where the left-hand side is not a `DeclRefExpr *`. | ||
6. Track equality for lvalue expressions. Certain types of lvalue expressions, | ||
e.g. expressions such as `*p` and `a->f` which read memory via a pointer, | ||
are currently not allowed in `EquivExprs`. However, bounds checking relies | ||
on equality information. For example, in an assignment `a->f = e`, where `e` | ||
is an expression with inferred bounds of `(e, e + i)`, the compiler needs | ||
to know that `a->f` and `e` are equal. | ||
|
||
## LValue Expression Priorities | ||
|
||
The lvalue generalization work will be done by incrementally adding AbstractSet | ||
representations for each kind of lvalue expression. The bounds checker checks | ||
the following kinds of lvalue expressions (in the `CheckLValue` method): | ||
|
||
- `DeclRefExpr *`. Examples: `x`, `y`, `myvariable`. | ||
- `UnaryOperator *`. Examples: `*p`, `*(p + 1)`. | ||
- `ArraySubscriptExpr *`. Examples: `arr[0]`, `1[arr]`. | ||
- `MemberExpr *`. Examples: `a.f`, `a->f`. | ||
- `ImplicitCastExpr *`. Examples: `LValueBitCast(e)`. | ||
- `CHKCBindTemporaryExpr *`. Examples: `TempBinding({ 0 })`. | ||
|
||
The following priorities are proposed for AbstractSet implementations for each | ||
kind of lvalue expression. Priority 0 and Priority 1 expressions should be | ||
implemented before the April release of the Checked C compiler. | ||
|
||
- Priority 0: `DeclRefExpr *`. This is necessary to maintain the current | ||
bounds checking behavior for variables. It will also serve to test the | ||
implementation of the AbstractSet API since implementing it for | ||
`DeclRefExpr *` should not result in any compiler behavior changes. | ||
- Priority 1: `MemberExpr *`. This expression kind is more likely to be | ||
involved in bounds checking, since a `MemberExpr *` is more likely to have | ||
target bounds that are not `bounds(unknown)`. | ||
- Priority 2: `UnaryOperator *` and `ArraySubscriptExpr *`. These are more | ||
likely than `MemberExpr *` to have unknown target bounds. | ||
- Priority 3: `ImplicitCastExpr *` and `CHKCBindTemporaryExpr *`. These kinds | ||
are less common in example code. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question: Are bounds on global/static/extern variables declared in the compilation unit, being checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are checked in
TraverseTopLevelVarDecl
. This method itself doesn't do anything to theCheckingState
members, includingObservedBounds
andEquivExprs
- it just initializes an emptyCheckingState
and checks the bounds immediately after traversing itsVarDecl *
argument.How much detail on bounds checking do you think would be helpful in this document? Initially I didn't include
TraverseTopLevelVarDecl
here since it doesn't directly interact withObservedBounds
(orEquivExprs
). I have another document in progress in #945 that describes bounds checking in general. I would propose that I move most of the detail in this "Bounds Checking for Variables" section into that document so that this document can focus more on lvalue generalization. The "Bounds Checking for Variables" section in this document can then just briefly list the places whereObservedBounds
is updated or used since this is relevant to changing theObservedBounds
keys, and reference the other bounds checking document for more detail. What are your thoughts on this?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that most of the detail on bounds checking can be in the other document (#945) and this document can focus on lvalue generalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I've moved most of this section into the #945 document and just included a list of methods where it will be necessary to change the keys of
ObservedBounds
in this section