Skip to content

Commit 4c4eed3

Browse files
committed
Merge branch 'master' into BigRefactor
2 parents eea4eaf + 3c571f4 commit 4c4eed3

35 files changed

+1519
-442
lines changed

clang/docs/checkedc/Bounds-Widening-for-Null-Terminated-Arrays.md

Lines changed: 37 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -31,19 +31,20 @@ in a basic block is taken into consideration when performing the analysis.
3131
4. **Intra-procedural:** The analysis is done on one function at a time.
3232

3333
## Dataflow Analysis Details
34-
For every basic block we compute the following sets: `In` and `Kill`. The `In`
34+
For every basic block, we compute the following sets: `In` and `Kill`. The `In`
3535
set for basic block `B` is denoted as `In[B]` and the `Kill` set is denoted as
3636
`Kill[B]`.
3737

38-
For every edge we compute the following sets: `Out` and `Gen`. The `Out` set on
39-
edge `Bi->Bj` is denoted as `Out[Bi][Bj]` and the Gen set is denoted as
38+
For every edge, we compute the following sets: `Out` and `Gen`. The `Out` set on
39+
edge `Bi->Bj` is denoted as `Out[Bi][Bj]` and the `Gen` set is denoted as
4040
`Gen[Bi][Bj]`.
4141

4242
### In[B]
4343
`In[B]` stores the mapping between an `_Nt_array_ptr` and its widened bounds
4444
inside block `B`. For example, given `_Nt_array_ptr V` with declared bounds
45-
`(low, high)`, `In[B]` would store the mapping `{V:i}`, where `i` is an unsigned
46-
integer and the bounds of `V` should be widened to `(low, high + i)`.
45+
`(V + low, V + high)`, `In[B]` would store the mapping `{V:i}`, where `i` is an
46+
unsigned integer implying that the bounds of `V` should be widened to
47+
`(V + low, V + high + i)`.
4748

4849
Dataflow equation:
4950
`In[B] = ∩ Out[B*][B], where B* ∈ pred(B)`.
@@ -57,43 +58,43 @@ Thus, `Kill[B]` stores the mapping between a statement `S` and `_Nt_array_ptr's`
5758
whose bounds are killed in `S`.
5859

5960
### Gen[Bi][Bj]
60-
Given `_Nt_array_ptr V` with declared bounds `(low, high)`, the bounds of `V`
61-
can be widened by 1 if `V` is dereferenced at the upper bound. This means that
62-
if there is an edge `Bi->Bj` whose edge condition is of the form `if (*(V +
63-
high + i))`, where `i` is an unsigned integer offset, the widened bounds
64-
`{V:i+1}` can be added to `Gen[Bi][Bj]`, provided we have already tested for
65-
pointer access of the form `if (*(V + high + i - 1))`.
61+
Given `_Nt_array_ptr V` with declared bounds `(V + low, V + high)`, the bounds
62+
of `V` can be widened by 1 if `V` is dereferenced at its current upper bound.
63+
This means that if there is an edge `Bi->Bj` whose edge condition is of the
64+
form `if (*(V + high + i))`, where `i` is an unsigned integer offset, the
65+
widened bounds `{V:i+1}` can be added to `Gen[Bi][Bj]`, provided we have
66+
already tested for pointer access of the form `if (*(V + high + (i - 1)))`.
6667

6768
For example:
6869
```
69-
_Nt_array_ptr<T> V : bounds (low, high);
70+
_Nt_array_ptr<T> V : bounds (V + low, V + high);
7071
if (*V) { // Ptr dereference is NOT at the current upper bound. No bounds widening.
71-
if (*(V + high)) { // Ptr dereference is at the current upper bound. Widen bounds by 1. New bounds for V are (low, high + 1).
72-
if (*(V + high + 1)) { // Ptr dereference is at the current upper bound. Widen bounds by 1. New bounds for V are (low, high + 2).
73-
if (*(V + high + 3)) { // Ptr dereference is *not* at the current upper bound. No bounds widening. Flag an error!.
72+
if (*(V + high)) { // Ptr dereference is at the current upper bound. Widen bounds by 1. New bounds for V are (V + low, V + high + 1).
73+
if (*(V + high + 1)) { // Ptr dereference is at the current upper bound. Widen bounds by 1. New bounds for V are (V + low, V + high + 2).
74+
if (*(V + high + 3)) { // Ptr dereference is NOT at the current upper bound. No bounds widening. Flag an error!
7475
```
7576

7677
### Out[Bi][Bj]
7778
`Out[Bi][Bj]` denotes the bounds widened by block `Bi` on edge `Bi->Bj`.
7879

7980
Dataflow equation:
80-
`Out[Bi][Bj] = (In[Bi] - Kill[Bi]) ∪ Gen[Bi][Bj]`
81+
`Out[Bi][Bj] = (In[Bi] - Kill[Bi]) ∪ Gen[Bi][Bj], where Bj ∈ succ(Bi)`.
8182

8283
### Initial values of In and Out sets
8384

8485
To compute `In[B]`, we compute the intersection of `Out[B*][B]`, where `B*` are
8586
all preds of block `B`. When there is a back edge from block `B'` to `B` (for
86-
example in the case of loops), the Out set for block `B'` will be empty. As a
87+
example in the case of loops), the `Out` set for block `B'` will be empty. As a
8788
result, the intersection operation would always result in an empty set `In[B]`.
8889

89-
So to handle this, we initialize the In and Out sets for all blocks to `Top`.
90-
`Top` represents the union of the Gen sets of all edges. We have chosen the
91-
offsets of ptr variables in `Top` to be the max unsigned int. The reason behind
92-
this is that in order to compute the actual In sets for blocks we are going to
93-
intersect the Out sets on all the incoming edges of the block. And in that case
94-
we would always pick the ptr with the smaller offset. Choosing max unsigned int
95-
also makes handling `Top` much easier as we do not need to explicitly store edge
96-
info.
90+
So to handle this, we initialize the `In` and `Out` sets for all blocks to
91+
`Top`. `Top` represents the union of the `Gen` sets of all edges. We have
92+
chosen the offsets of ptr variables in `Top` to be `UINT_MAX`. The reason
93+
behind this is that in order to compute the actual `In` sets for blocks we are
94+
going to intersect the `Out` sets on all the incoming edges of the block. And
95+
in that case we would always pick the ptr with the smaller offset. Choosing
96+
`UINT_MAX` also makes handling `Top` much easier as we do not need to
97+
explicitly store edge info.
9798

9899
Thus, we have the following two equations for `Top`:
99100
```
@@ -107,16 +108,16 @@ In[B] = Top
107108
Out[Bi][Bj] = Top, where Bj ∈ succ(Bi)
108109
```
109110

110-
Now, we also need to handle the case where there is an unconditional jump into a
111-
block (for example, as a result of a `goto`). In this case, we cannot widen the
112-
bounds because we would not have tested the ptr dereference on the
113-
unconditional edge. So in this case we want the intersection (and hence the In
114-
set) to result in an empty set.
111+
Now, we also need to handle the case where there is an unconditional jump into
112+
a block (for example, as a result of a `goto`). In this case, we cannot widen
113+
the bounds because we would not have tested the ptr dereference on the
114+
unconditional edge. So in this case we want the intersection (and hence the
115+
`In` set) to result in an empty set.
115116

116-
So we initialize the In and Out sets of all blocks to `Top`, except the Entry
117-
block.
117+
So we initialize the `In` and `Out` sets of all blocks to `Top`, except the
118+
`Entry` block.
118119

119-
Thus, we have the following initial value for the Entry block:
120+
Thus, we have the following initial values for the `Entry` block:
120121
```
121122
In[Entry] = ∅
122123
Out[Entry][B*] = ∅, where B* ∈ succ(Entry)
@@ -128,12 +129,12 @@ The main class that implements the analysis is
128129
and the main function is `BoundsAnalysis::WidenBounds()`.
129130

130131
`WidenBounds` will perform the bounds widening for the entire function. We can
131-
then we can call `BoundsAnalysis::GetWidenedBounds` to retrieve the
132-
widened bounds for the current basic block.
132+
then call `BoundsAnalysis::GetWidenedBounds` to retrieve the widened bounds for
133+
the current basic block.
133134

134135
The approach used for implementing the analysis is the iterative worklist
135136
algorithm in which we keep adding blocks to a worklist as long as we do not
136-
reach a fixed point i.e.: as long as the Out sets for the blocks keep changing.
137+
reach a fixed point i.e.: as long as the `Out` sets for the blocks keep changing.
137138

138139
### Algorithm
139140
```

clang/include/clang/AST/PreorderAST.h

Lines changed: 59 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -21,38 +21,54 @@
2121
#include "clang/AST/Expr.h"
2222

2323
namespace clang {
24-
2524
using Result = Lexicographic::Result;
2625

27-
// Each binary operator of an expression results in a new node of the
28-
// PreorderAST. Each node contains the following fields:
26+
class Node {
27+
public:
28+
enum class NodeKind { BinaryNode, LeafExprNode };
2929

30-
// Opc: The opcode of the operator.
31-
// Vars: A list of variables in the sub expression.
32-
// Const: Constants of the sub expression are folded.
33-
// HasConst: Indicates whether there is a constant in the node. It is used to
34-
// differentiate between the absence of a constant and a constant value of 0.
35-
// Parent: A link to the parent node of the current node.
36-
// Children: The preorder AST is an n-ary tree. Children is a list of all the
37-
// child nodes of the current node.
30+
NodeKind Kind;
31+
Node *Parent;
32+
33+
Node(NodeKind Kind, Node *Parent) :
34+
Kind(Kind), Parent(Parent) {}
35+
};
3836

39-
struct Node {
37+
class BinaryNode : public Node {
38+
public:
4039
BinaryOperator::Opcode Opc;
41-
std::vector<const VarDecl *> Vars;
42-
llvm::APSInt Const;
43-
bool HasConst;
44-
Node *Parent;
4540
llvm::SmallVector<Node *, 2> Children;
4641

47-
Node(Node *Parent) :
48-
Opc(BO_Add), HasConst(false), Parent(Parent) {}
42+
BinaryNode(BinaryOperator::Opcode Opc, Node *Parent) :
43+
Node(NodeKind::BinaryNode, Parent),
44+
Opc(Opc) {}
45+
46+
static bool classof(const Node *N) {
47+
return N->Kind == NodeKind::BinaryNode;
48+
}
4949

5050
// Is the operator commutative and associative?
5151
bool IsOpCommutativeAndAssociative() {
5252
return Opc == BO_Add || Opc == BO_Mul;
5353
}
5454
};
5555

56+
class LeafExprNode : public Node {
57+
public:
58+
Expr *E;
59+
60+
LeafExprNode(Expr *E, Node *Parent) :
61+
Node(NodeKind::LeafExprNode, Parent),
62+
E(E) {}
63+
64+
static bool classof(const Node *N) {
65+
return N->Kind == NodeKind::LeafExprNode;
66+
}
67+
};
68+
69+
} // end namespace clang
70+
71+
namespace clang {
5672
class PreorderAST {
5773
private:
5874
ASTContext &Ctx;
@@ -62,13 +78,29 @@ namespace clang {
6278
Node *Root;
6379

6480
// Create a PreorderAST for the expression E.
65-
// @param[in] E is the sub expression which needs to be added to N.
66-
// @param[in] N is the current node of the AST.
67-
// @param[in] Parent is the parent node for N.
68-
void Create(Expr *E, Node *N = nullptr, Node *Parent = nullptr);
69-
70-
// Sort the variables in a node of the AST.
71-
// @param[in] N is current node of the AST.
81+
// @param[in] E is the sub expression to be added to a new node.
82+
// @param[in] Parent is the parent of the new node.
83+
void Create(Expr *E, Node *Parent = nullptr);
84+
85+
// Add a new node to the AST.
86+
// @param[in] Node is the current node to be added.
87+
// @param[in] Parent is the parent of the node to be added.
88+
void AddNode(Node *N, Node *Parent);
89+
90+
// Coalesce the BinaryNode with its parent.
91+
// @param[in] B is the current BinaryNode.
92+
// @param[in] Parent is the parent of the node to be coalesced.
93+
void CoalesceNode(BinaryNode *B, BinaryNode *Parent);
94+
95+
// Recursively coalesce binary nodes having the same commutative and
96+
// associative operator.
97+
// @param[in] N is current node of the AST. Initial value is Root.
98+
// @param[in] Changed indicates whether a node was coalesced. We need this
99+
// to control when to stop recursive coalescing.
100+
void Coalesce(Node *N, bool &Changed);
101+
102+
// Sort the children expressions in a binary node of the AST.
103+
// @param[in] N is current node of the AST. Initial value is Root.
72104
void Sort(Node *N);
73105

74106
// Check if the two AST nodes N1 and N2 are equal.
@@ -81,20 +113,13 @@ namespace clang {
81113
void SetError() { Error = true; }
82114

83115
// Print the PreorderAST.
84-
// @param[in] N is the current node of the AST.
116+
// @param[in] N is the current node of the AST. Initial value is Root.
85117
void PrettyPrint(Node *N);
86118

87119
// Cleanup the memory consumed by node N.
88-
// @param[in] N is the current node of the AST.
120+
// @param[in] N is the current node of the AST. Initial value is Root.
89121
void Cleanup(Node *N);
90122

91-
// A DeclRefExpr can be a reference either to an array subscript (in which
92-
// case it is wrapped around a ArrayToPointerDecay cast) or to a pointer
93-
// dereference (in which case it is wrapped around an LValueToRValue cast).
94-
// @param[in] An expression E.
95-
// @return Returns a DeclRefExpr if E is a DeclRefExpr, otherwise nullptr.
96-
DeclRefExpr *GetDeclOperand(Expr *E);
97-
98123
public:
99124
PreorderAST(ASTContext &Ctx, Expr *E) :
100125
Ctx(Ctx), Lex(Lexicographic(Ctx, nullptr)), OS(llvm::outs()),

clang/include/clang/Basic/DiagnosticSemaKinds.td

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10164,20 +10164,21 @@ def err_bounds_type_annotation_lost_checking : Error<
1016410164

1016510165
def warn_bounds_declaration_invalid : Warning<
1016610166
"cannot prove declared bounds for %1 are valid after "
10167-
"%select{assignment|initialization|statement}0">,
10167+
"%select{assignment|decrement|increment|initialization|statement}0">,
1016810168
InGroup<CheckBoundsDeclsUnchecked>;
1016910169

1017010170
def warn_checked_scope_bounds_declaration_invalid : Warning<
1017110171
"cannot prove declared bounds for %1 are valid after "
10172-
"%select{assignment|initialization|statement}0">,
10172+
"%select{assignment|decrement|increment|initialization|statement}0">,
1017310173
InGroup<CheckBoundsDeclsChecked>;
1017410174

1017510175
def error_bounds_declaration_invalid : Error<
1017610176
"declared bounds for %1 are invalid after "
10177-
"%select{assignment|initialization|statement}0">;
10177+
"%select{assignment|decrement|increment|initialization|statement}0">;
1017810178

1017910179
def err_unknown_inferred_bounds : Error<
10180-
"inferred bounds for %0 are unknown after statement">;
10180+
"inferred bounds for %1 are unknown after "
10181+
"%select{assignment|decrement|increment|initialization|statement}0">;
1018110182

1018210183
def note_declared_bounds : Note<
1018310184
"(expanded) declared bounds are '%0'">;

clang/include/clang/Sema/BoundsAnalysis.h

Lines changed: 15 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -116,9 +116,6 @@ namespace clang {
116116
EdgeBoundsTy Gen, Out;
117117
// The Kill set for the block.
118118
StmtDeclSetTy Kill;
119-
// The set of all variables used in bounds expr for each ntptr in the
120-
// block.
121-
BoundsVarTy BoundsVars;
122119

123120
// To compute In[B] we compute the intersection of Out[B*->B], where B*
124121
// are all preds of B. When there is a back edge from block B' to B (for
@@ -156,9 +153,9 @@ namespace clang {
156153
// to lookup ElevatedCFGBlock from CFGBlock.
157154
BlockMapTy BlockMap;
158155

159-
// A set of all ntptrs in scope. Currently, we simply collect all ntptrs
160-
// defined in the function.
161-
DeclSetTy NtPtrsInScope;
156+
// The mapping of all ntptrs in the function and all variables occurring in
157+
// the bounds expr for each ntptr.
158+
BoundsVarTy NtPtrsInScope;
162159

163160
// To compute In[B] we compute the intersection of Out[B*->B], where B* are
164161
// all preds of B. When there is a back edge from block B' to B (for
@@ -246,14 +243,13 @@ namespace clang {
246243
// @param[in] Dest block for the edge for which the Gen set is updated.
247244
void FillGenSet(Expr *E, ElevatedCFGBlock *EB, ElevatedCFGBlock *SuccEB);
248245

249-
// Uniformize the expr, fill Gen set and get variables used in bounds expr
250-
// for the ntptr.
246+
// Uniformize the expr, fill Gen set for the edge EB->SuccEB.
251247
// @param[in] E is an ntptr dereference or array subscript expr.
252248
// @param[in] Source block for the edge for which the Gen set is updated.
253249
// @param[in] Dest block for the edge for which the Gen set is updated.
254-
void FillGenSetAndGetBoundsVars(const Expr *E,
255-
ElevatedCFGBlock *EB,
256-
ElevatedCFGBlock *SuccEB);
250+
void FillGenSetForEdge(const Expr *E,
251+
ElevatedCFGBlock *EB,
252+
ElevatedCFGBlock *SuccEB);
257253

258254
// Collect all variables used in bounds expr E.
259255
// @param[in] E represents the bounds expr for an ntptr.
@@ -264,6 +260,11 @@ namespace clang {
264260
// Assign the widened bounds from the ElevatedBlock to the CFG Block.
265261
void CollectWidenedBounds();
266262

263+
// Extract the terminating sub-expression from the expression E.
264+
// @param[in] E is the expression from which we need to extract the terminating sub-expression.
265+
// @return The terminating sub-expression from the expression E.
266+
Expr *GetTerminatorCondition(const Expr *E) const;
267+
267268
// Get the terminating condition for a block. This could be an if condition
268269
// of the form "if(*(p + i))".
269270
// @param[in] B is the block for which we need the terminating condition.
@@ -297,17 +298,9 @@ namespace clang {
297298
// Get the DeclRefExpr from an expression E.
298299
// @param[in] An expression E which is known to be either an LValueToRValue
299300
// cast or an ArrayToPointerDecay cast.
300-
// @return The DeclRefExpr from the expression E.
301+
// @return The DeclRefExpr from the expression E or nullptr.
301302
DeclRefExpr *GetDeclOperand(const Expr *E);
302303

303-
// A DeclRefExpr can be a reference either to an array subscript (in which
304-
// case it is wrapped around a ArrayToPointerDecay cast) or to a pointer
305-
// dereference (in which case it is wrapped around an LValueToRValue cast).
306-
// @param[in] An expression E.
307-
// @return Whether E is an expression containing a reference to an array
308-
// subscript or a pointer dereference.
309-
bool IsDeclOperand(const Expr *E);
310-
311304
// Make an expression uniform by moving all DeclRefExpr to the LHS and all
312305
// IntegerLiterals to the RHS.
313306
// @param[in] E is the expression which should be made uniform.
@@ -343,10 +336,10 @@ namespace clang {
343336
// @return The intersection of sets A and B.
344337
template<class T> T Intersect(T &A, T &B) const;
345338

346-
// Compute the union of sets A and B.
339+
// Compute the union of sets A and B and widen the bounds where applicable.
347340
// @param[in] A is a set.
348341
// @param[in] B is a set.
349-
// @return The union of sets A and B.
342+
// @return The union of sets A and B containing the widened bounds.
350343
template<class T> T Union(T &A, T &B) const;
351344

352345
// Compute the set difference of sets A and B.

clang/include/clang/Sema/Sema.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5186,8 +5186,10 @@ class Sema {
51865186

51875187
enum BoundsDeclarationCheck {
51885188
BDC_Assignment,
5189+
BDC_Decrement,
5190+
BDC_Increment,
51895191
BDC_Initialization,
5190-
BDC_Statement
5192+
BDC_Statement,
51915193
};
51925194

51935195
/// \brief Check that address=of operation is not taking the

0 commit comments

Comments
 (0)