Skip to content

Major refactoring of array bounds code and some performance improvements #282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 27, 2020

Conversation

Machiry
Copy link
Collaborator

@Machiry Machiry commented Oct 3, 2020

  • Implemented the technique similar to the locksmith, where the bounds of a variable is the intersection of the bounds of all array variables that flows into the variable.

  • Added a flag to disable array bounds heuristics: -disable-arr-hu (By default, they are enabled but we can use this flag to disable them).

  • We used same node for all ocurrences of constant numbers, for example, for the below code:

    int n = 8;
    int l = 8;
    

    We would construct our graph as: n <-> 8 <-> l, this wrongly leads us to believe that n is reachable (i.e., flows) into l. Which is not true.

    We changed the code to create a new node for every occurrence of constants and thereby avoid this.

  • Added cache to breadth-first search, realized that we call BFS on the same node many times and it is better to cache this information (provided that the graph didn't change).

  • This also fixes: Should pick constant bound, but chooses parameter #281

Array bounds inference:

Every variable (both pointers and scalars) are identified by a unique number i.e., BoundsKey, There is a mapping from each BoundsKey to the corresponding ProgramVar* which contains information about program scope, variable name, etc.

The main entry point in this class is: bool AVarBoundsInfo::performFlowAnalysis(ProgramInfo *PI): This as the name suggests performs flow (i.e., LockSmith based) analysis to determine the bounds of all the array variables.

It has a fixed point loop, at each iteration, we try to find the possible bounds for all array variables (that do not have bounds) from its predecessors using the following function:

bool AvarBoundsInference::predictBounds(BoundsKey K,
                                        std::set<BoundsKey> &Neighbours,
                                        AVarGraph &BKGraph)

where K is the array variable whose bounds need to be computed and Neighbours are the predecessors and BKGraph is the graph that contains all the flow information. The possible bounds for K will be stored into CurrIterInferBounds.

Later we call convergeInferredBounds that will pick the best bounds from the possible bound values, with the following preference: count will be the priority over byte bounds. Variables will be given priority over constants. If there are multiple choices (i.e., variables or constants) we give up.

We first propagate all the bounds information from explicit declarations and mallocs.
For other variables that do not have any choice of bounds, we use potential bounds choices (FromPB), these are the variables that are upper bounds to an index variable used in an array indexing operation.
For example:

if (i < n) {
   ...arr[i]...
}

In the above case, we use n as a potential count bounds for arr.
Note: we only use potential bounds for a variable when none of its predecessors have bounds.

Function of interest:

AvarBoundsInference::getReachableBoundKeys(const ProgramVarScope *DstScope,
                                                BoundsKey FromVarK,
                                                std::set<BoundsKey> &PotK,
                                                AVarGraph &BKGraph,
                                                bool CheckImmediate)

The function: getReachableBoundKeys finds all the BoundsKeys (i.e., variables) in scope DstScope that are reachable from FromVarK in the graph BKGraph. All the reachable bounds key will be stored in PotK.

// Insert into BFS cache.
if (BFSCache.find(Start) == BFSCache.end()) {
std::set<Data> ReachableNodes;
ReachableNodes.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to call clear() here? Isn't just creating the variable calling its constructor, which will ensure that it's empty?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Will remove it.


// Here x bounds will be c
void foo(int *x, int c) {
//CHECK: void foo(_Array_ptr<int> x : count(c), int c) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems odd that this should be c since the index of 3 is not at all related to c. I.e., the index is unconditional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is because of the heuristic (which we are not disabling) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a heuristic. It is from all the incoming array pointers i.e., p, q. i.e., all the calls to this function pass array to x and bounds to c. We do not worry about the indexing x as Checked C dynamic checks will take care of that.

x[3] = c;
}

// Here x bounds is c but the violates bounds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh?

//CHECK: _Array_ptr<int> q1 : count(8) = malloc<int>(sizeof(int)*8);

int n = 8;
int l;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty weird, since l is not initialized. Maybe make l the parameter to bar (this function) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will initialize l to something. I just wanted to check Variation 3


// Variation 3
foo3(q2,l);
foo3(q1,28);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about also a test where you pass in n as the second parameter, ie. as if it was q's length, even though it's associated with p? The result should be that it's not treated as the length.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Will add it.

@@ -32,7 +32,8 @@ const StructScope *StructScope::getStructScope(std::string StName) {
if (AllStScopes.find(TmpS) == AllStScopes.end()) {
AllStScopes.insert(TmpS);
}
return &(*AllStScopes.find(TmpS));
const auto &SS = *AllStScopes.find(TmpS);
return &SS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning &x where x is a local variable is bad, isn't it? Same with what's below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a reference. So, returning &SS where SS is a reference is as good as returning a pointer.

@@ -10,13 +10,21 @@
//
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you completely changed this file. I'll look at it from scratch. Can you say at a high level what it does (I think it does the propagation/inference process) and what its entry point is? Then I will start looking at it from that entry point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@Machiry Machiry requested a review from mwhicks1 October 5, 2020 23:41
// Flow inference from context sensitive keys to original keys.
performWorkListInference(ArrNeededBounds, this->CtxSensProgVarGraph, ABI);
// Now, by using potential bounds.
performWorkListInference(ArrNeededBounds, this->CtxSensProgVarGraph, ABI, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering: Why do we need different graphs? Why not always use the context-sensitive graph?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1071,36 +1201,49 @@ bool AVarBoundsInfo::performFlowAnalysis(ProgramInfo *PI) {
removeBounds(TBK, FlowInferred);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of keepHighestPriorityBounds -- what makes a possible bounds high priority?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the priority order we use:

// Regular flow inference.
performWorkListInference(ArrNeededBounds, this->ProgVarGraph, ABI);
// Flow inference using potential bounds.
performWorkListInference(ArrNeededBounds, this->ProgVarGraph, ABI, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "potential bound" (fromPB) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the pull request:

For other variables that do not have any choice of bounds, we use potential bounds choices (FromPB), these are the variables that are upper bounds to an index variable used in an array indexing operation.
For example:

if (i < n) {
   ...arr[i]...
}

In the above case, we use n as a potential count bounds for arr.
Note: we only use potential bounds for a variable when none of its predecessors have bounds.

bool
AvarBoundsInference::convergeInferredBounds() {
bool FoundSome = false;
for (auto &InfABnds : CurrIterInferBounds) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does InfABnds get set? I didn't notice it in the performXXX functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets set in predictBounds function

Copy link
Member

@mwhicks1 mwhicks1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run the tests on my Mac, I get two unexpected failures.

********************
Failing Tests (2):
    Clang :: CheckedCRewriter/basic.c
    Clang :: CheckedCRewriter/multivardecls.c

  Expected Passes    : 349
  Expected Failures  : 7
  Unexpected Failures: 2

Are these unexpected failures to be expected? :)

@mwhicks1
Copy link
Member

I tried this example:

void foo(int *p) {
  p[2] = 1;
}
void bar(int n) {
  int *x = malloc(sizeof(int) * n);
  int y[10];
  foo(x);
  foo(y);
}

It incorrectly rewrites the p count to be 10 even though x, which is passed in, has a length that depends on an arbitrary n. It seems to me that you want to be looking at constraints inside of foo, to notice that count(p) > 2. Then, if you don't have a consistent set of flows in to foo (which you don't, since 10 and n both flow there as possible lengths), you can use the interior constraints to choose a size. Here, it would be count(3). Is this possible to do?

@mwhicks1
Copy link
Member

Another question. For this example:

void foo(void) {
  int x[10];
  int *y = x;
  int *z = y;
  z[1] = 1;
}

It produces this graph:

image

Why are the edges bidirectional? We have bidirectional edges in the ptyp graph to make sense of the kind of pointer it is, but now that we've determined that type, I'm wondering why we don't have single-direction edges for lengths?

@mwhicks1
Copy link
Member

As a consequence of the above, I notice that if we add the following line at the end of foo:

  z = malloc<int>(sizeof(int)*5);

then y no longer gets a count added on, and z gets 5. The latter is right, but I would want the former to be 10, still?

Maybe the reason they have to be bidrectional is somehow due to being able to mutate the variables?

@mwhicks1
Copy link
Member

A bunch of other random questions:

  • What is a BoundsKey doing? It seems like it’s the core of a constraint, like a VarAtom; is that right?
  • What is a ProgramVar? Is it literally always a variable, or can it be, for example, a constant? If the latter, perhaps it’s the wrong name?
  • What's a CtxFunctionArgScope exactly? I think of scopes as lexical blocks in which variable names are resolved. So a StructScope is the set of fields inside a struct definition; a FunctionParamScope is the set of parameters in a function definition; a FunctionScope is the set of variables defined within a function (including its parameters); etc. There is no particular scope for a context-sensitive call, is there? Maybe you are just looking at the variables that are used at a particular call site? E.g., if I make a call foo(a,b) and one place and foo(b,c) at another, there would be two scopes, with a and b in the first and b and c in the second?
  • What is a GlobalScope used for?

@Machiry
Copy link
Collaborator Author

Machiry commented Oct 14, 2020

As a consequence of the above, I notice that if we add the following line at the end of foo:

  z = malloc<int>(sizeof(int)*5);

then y no longer gets a count added on, and z gets 5. The latter is right, but I would want the former to be 10, still?

Maybe the reason they have to be bidrectional is somehow due to being able to mutate the variables?

Yes, this is a consequence of having bi-directional edges and we want the inferred bounds to be consistent.
So, bounds of y will be conflicting i.e., 10 vs 5 and we give up.
However, we consider malloc as seed bounds (i.e., ground truth) i.e., similar to declared bounds, and hence z gets the bounds.

@Machiry
Copy link
Collaborator Author

Machiry commented Oct 14, 2020

Another question. For this example:

void foo(void) {
  int x[10];
  int *y = x;
  int *z = y;
  z[1] = 1;
}

It produces this graph:

image

Why are the edges bidirectional? We have bidirectional edges in the ptyp graph to make sense of the kind of pointer it is, but now that we've determined that type, I'm wondering why we don't have single-direction edges for lengths?

I wanted the inferred bounds to be consistent i.e., for a given array pointer p, all other array pointers (that have bounds) and are related (flow into p or flow from p) should agree with bounds.

We can do better if we have flow-sensitive bounds...which will be supported by Checked C eventually!

@Machiry Machiry closed this Oct 14, 2020
@Machiry Machiry reopened this Oct 14, 2020
@Machiry
Copy link
Collaborator Author

Machiry commented Oct 14, 2020

I tried this example:

void foo(int *p) {
  p[2] = 1;
}
void bar(int n) {
  int *x = malloc(sizeof(int) * n);
  int y[10];
  foo(x);
  foo(y);
}

It incorrectly rewrites the p count to be 10 even though x, which is passed in, has a length that depends on an arbitrary n. It seems to me that you want to be looking at constraints inside of foo, to notice that count(p) > 2. Then, if you don't have a consistent set of flows in to foo (which you don't, since 10 and n both flow there as possible lengths), you can use the interior constraints to choose a size. Here, it would be count(3). Is this possible to do?

This is because at call site foo(x), the context-sensitive argument does not have bounds.
We see x(n) -> x_ctx_sens_p(unknown bounds)->p and y(10) -> y_ctx_sens_p(10)->p.

Our inference looks for pointers that have bounds, here, in this above case, we only see y_ctx_sens_p as the array that has bounds and try to infer bounds from it.

@Machiry
Copy link
Collaborator Author

Machiry commented Oct 14, 2020

When I run the tests on my Mac, I get two unexpected failures.

********************
Failing Tests (2):
    Clang :: CheckedCRewriter/basic.c
    Clang :: CheckedCRewriter/multivardecls.c

  Expected Passes    : 349
  Expected Failures  : 7
  Unexpected Failures: 2

Are these unexpected failures to be expected? :)

Not expected. Will take a look.

@Machiry
Copy link
Collaborator Author

Machiry commented Oct 16, 2020

When I run the tests on my Mac, I get two unexpected failures.

********************
Failing Tests (2):
    Clang :: CheckedCRewriter/basic.c
    Clang :: CheckedCRewriter/multivardecls.c

  Expected Passes    : 349
  Expected Failures  : 7
  Unexpected Failures: 2

Are these unexpected failures to be expected? :)

Not expected. Will take a look.

Looks like it is missing header files. Can you see the exact error you are seeing? you can use llvm-lit -v to get all the warnings.

@mwhicks1
Copy link
Member

@jackastner found and fixed the multivardecls.c bug. It's been merged as of yesterday.

For the basic.c one, I see this as the expected output in basic1:

char data _Nt_checked[17] = "abcdefghijklmnop";

but on my machine I get

char data _Checked[17] = "abcdefghijklmnop";

@mwhicks1
Copy link
Member

OK, merge when ready.

My last request is that your comment at the top is very helpful in describing how things work. It would be useful to add it as comments in the code. For example, you could add a comment at near PerformFlowAnalysis about it being the entry point, and using the opportunity to describe how it works, there.

@Machiry Machiry merged commit c554433 into master Oct 27, 2020
Machiry added a commit that referenced this pull request Oct 28, 2020
This reverts commit c554433, reversing
changes made to 0f8d6dc.
Machiry added a commit that referenced this pull request Oct 29, 2020
Major refactoring of array bounds code and some performance improvements
mattmccutchen-cci added a commit that referenced this pull request Dec 5, 2020
* Bogus warning suppressed

* Fix to work on Mac

* Update generated tests for whitespace changes

* Fix #204

* Fix #205

FunctionDecl replacment now only replaces the return/parameters when
they have changed. This alows macros to appear in these locations as
long as they don't require rewritting.

* Handle defines used as basetypes of checked pointers

Just inline the define for now.

* Remove text file

* syncing with master

* Refactoring GatherTool.cpp

* Initial logging of root cause diagnostics

* Add some comments to PVConstraint constructor

* Nicer root cause reported for invalid cast

* Correctly differentiate void* and vararg root causes

* Undo modification to PersistentSourceLocation

* Remove commented lines

* Comment tweaks

* Fixing issue 239

* Fix

* Testing repeated runs

* Fix regression in lua and ptr_dist

* Actually fix lua this time

* Initial fix for #255

* Add more const qualifiers

* Move PersistentConstraint methods into ProgramInfo

* Store persistent expression constraints in new map

These were previously stored the Variables map that holds declaration
constraint variables.

* Variables map now associates declaration with a single constraint

* Return option type for getVariable instead of nullptr

* Remove commented out code that is now truly dead

* Context sensitive array bounds

* Fix RUN commands in test

* Handle edge cases for PSL collision

* Code cleanup and tweaks

* Update comment in PersistentSourceLocation

* Add a test case for parameter declarations in macro

* Warnings

* Another test case for WILDness of macros

* Use my own option type for nicer behavior with abstract base class

* Addressing comments

* Adding refs

* Move BaseType code into a static method

* Split single declaration and multi-decl rewriting into separate methods

* Initial implementation of new mutli-var-decl rewriting

* Add tests for multi-declarations

* Update main.yml

Checking out correct commit hash of bear

* Fixing running unit tests (They never ran)

* Update main.yml

* Update main.yml

* Use doDeclRewrite for single declaration rewriting

* Cleanup unneeded templating

* Comments

* Insert initializer for arrays of pointers

* Update test for array initializer

* Add EOF check in getNextCommaOrSemicolon

* Remove dead method

* Don't treat global vars / struct fields on same line as multi-decl

* Comments and naming

* Fix bug caused by lack of space between consecutive decls

* Fixing issue #276

* Minor fixes

* Changed error message

* tidying

* Fixing user-after-free in ProgramVar

* Adding TODOs

* Changing branch names from BigRefactor to master

* Updating the links to correct computation org

* Extract duplicated code into a new method

* Major refactoring of array bounds code and some performance improvements

* More 3C changes prior to 5C itype removal

* Handling comments

* Add infrastructure to handle separate 5C specific code

* Add flags for 5C features guarded by ifdef

* Changed artifacts retention days

We will retain artifacts only for 3 days.

* Adding ctx sensitive keys

* Minor refactoring

* Remove directories that contain nothing but old duplicate 3C tests.

Part of #286.

* Avoid generating extra atom on allocator function calls

* constrainToWild only needs to constrain outermost pointer

The inner pointers are constrained by implication constraints. This
helps to improve the root cause analysis.

* Fix bug in multi-declaration rewriting

The set of declarations in a multi-decl was not explicitly ordered by
position in multi-declaration. This meant declaration could sometimes
be rewritten out of order, potentially crashing 3c.

* Adding stats

* Include ExprConstraintVars in pointer source location map

* Add constraints to wild when PtrType solver fails

Previously the effected atoms were assigned to in the solution, without
updating the constraint graph. This change means the update solution is
reflected in the constraint graph and root cause analysis.

* Output diagnostics for constraints not mapped to declaration

Constraints on expressions such as c-style casts are now output in root
cause of wildness diagnostics.

* Add source locations to constraints for better root cause reporting

* Improve root cause diagnostic for invalid casts

Invalid casts between pointers with different base types will now only
produce one warning.

* Add test file for root cause of wildness

* More root cause tests

* Add reason for unsafe allocator calls

* Add WILD counts to output

* Implement per-pointer statistic output

Old methods are renamed to reflect that they works with atoms instead of
pointers

* Fix to propagate wildness cause between levels of pointers

* Cleanup computeIterimConstraintState

* Avoid using WILD as an atom inside constraint variables

This was messing up the BFS in root cause analysis

* Fix bug when root cause is on parameter of checked function pointer

* Comments; renaming.

* Update warning message

* Remove unused conflicts set in returns/params

* Remove some duplicated code pertaining to root cause source locations

* Add method for obtaining constrained VarAtom

* Avoid allocating empty sets in ImpMap

* Fix root cause reporting when cause is on function pointer parameter

* Add comments

* Rename method to getFreshGEQ

* Fix for issue #293

Instead of considering a pointer "originally checked" if any one of the
pointer levels in it is checked in the source, characterize each pointer
level individually.

* Add tests for converting paritaly checked pointers

* Test for partially checked function pointer

* Added comments

* FunctionVariableConstraint return and param vars changed to PVConstraints

* Handle partialy checked extern functions

* Add tests for partialy checked array pointers

* Revert "Merge pull request #282 from correctcomputation/iss281"

This reverts commit c554433, reversing
changes made to 0f8d6dc.

* Fixing array legth inference performance issue

* Revert "Fixing array length inference performance issue"

* Merge pull request #282 from correctcomputation/iss281

Major refactoring of array bounds code and some performance improvements

* Fixing array legth inference performance issue

* Temporarily add the rename scripts for archival.

* Fix typos and inconsistencies in old names to let automated replacements
work.

* git mv 'clang/docs/checkedc/CheckedCConvert.md' 'clang/docs/checkedc/3C.md'

* git mv 'clang/tools/cconv-standalone' 'clang/tools/3c'

* git mv 'clang/tools/3c/utils/cc_conv' 'clang/tools/3c/utils/port_tools'

* git mv 'clang/include/clang/CConv/CCGlobalOptions.h' 'clang/include/clang/CConv/3CGlobalOptions.h'

* git mv 'clang/test/CheckedCRewriter' 'clang/test/3C'

* repl_basename_pfx 'CConvert' '3C' '^clang'

* repl_basename_pfx 'CConv' '3C' '^clang'

* repl '\<cconv-standalone\>' '3c'

* repl '\<clang/tools/3c/utils/cc_conv\>' 'clang/tools/3c/utils/port_tools' '^\.github/workflows/main\.yml$'

* repl '\<CCGlobalOptions\>' '3CGlobalOptions'

* repl '\<CheckedCRewriter\>' '3C'

* repl '\<CConv(ert)?' '3C' '/CMakeLists\.txt$'

* repl '#include "clang/CConv/' '#include "clang/3C/' '\.(h|cpp)$'

* repl '(#include.*["/])CConv(ert)?' '\13C' '\.(h|cpp)$'

* repl '//=--CConvert([^-]*-)' '//=--3C\1------'

* repl '//=--CConv([^-]*-)' '//=--3C\1---'

* repl '\<LLVM_CLANG_TOOLS_EXTRA_CLANGD_CCONVERT' 'LLVM_CLANG_TOOLS_EXTRA_CLANGD_3C' '\.(h|cpp)$'

* repl '\<_CCGLOBALOPTIONS_H\>' '_3CGLOBALOPTIONS_H' '\.(h|cpp)$'

* repl '\<_CCONV([A-Z]*)_H\>' '_3C\1_H' '\.(h|cpp)$'

* repl '\<cconv(build|scripts)\>' '3c-\1' '^\.github/workflows/main\.yml$'

* repl '\<CCInterface\>' '_3CInterface' '\.(h|cpp)$'

* repl '\<cconv(CollectAndBuildInitialConstraints|CloseDocument)\>' '_3C\1' '\.(h|cpp)$'

* repl '\<CConv(Interface|Inter|Main|Sec|DiagInfo|LSPCallBack)\>' '_3C\1' '\.(h|cpp)$'

* repl '\<CCONVSOURCE\>' '_3CSOURCE' '\.(h|cpp)$'

* repl '\<(Command|ExecuteCommandParams)::CCONV_' '\1::_3C_' '\.(h|cpp)$'

* repl '\<CCONV_' '_3C_' '^clang-tools-extra/clangd/Protocol\.h$'

* repl '\<ConvertCategory\>' '_3CCategory' '\.(h|cpp)$'

* repl '\<CConvert(Options|ManualFix|Diagnostics)\>' '_3C\1' '\.(h|cpp)$'

* repl '\<ccConvertManualFix\>' '_3CManualFix' '\.(h|cpp)$'

* repl '\<ccConvResultsReady\>' '_3CResultsReady' '\.(h|cpp)$'

* repl '\<CheckedCConverter\>' '_3C'

* repl '\<executeCConvCommand\>' 'execute3CCommand' '\.(h|cpp)$'

* repl '\<sendCConvMessage\>' 'send3CMessage' '\.(h|cpp)$'

* repl '\<IsCConvCommand\>' 'Is3CCommand' '\.(h|cpp)$'

* repl '\<(clear|report)CConvDiagsForAllFiles\>' '\13CDiagsForAllFiles' '\.(h|cpp)$'

* repl '\<INTERACTIVECCCONV\>' 'INTERACTIVE3C'

* repl '"CConv_(RealWild|AffWild)"' '"3C_\1"' '\.(h|cpp)$'

* repl '"cconv\.(onlyThisPtr|applyAllPtr)"' '"3c.\1"' '\.(h|cpp)$'

* repl '\("CConv(:)? ' '("3C\1 ' '\.(h|cpp)$'

* repl '\<runCheckedCConvert\>' 'run3C'

* repl '\<icconv\>' 'clangd3c'

* repl '\<checked_c_convert_bin\>' '_3c_bin'

* repl '\<checked-c-convert\>' '3c'

* repl 'Checked C rewriter tool' '3C' '^clang/test/3C/'

* repl '\<cconvClangDaemon\>' '3cClangDaemon' '/CMakeLists\.txt$'

* repl '\<[Cc]?[Cc][Cc]onv\>' '3C' '(^|/)3[Cc]|^clang-tools-extra/clangd|^clang/include/clang'

* repl 'fcheckedc_convert_tool' 'f3c_tool'

* repl 'fcheckedc-convert-tool' 'f3c-tool'

* Adding more flags

* Text alignment fixes.

* Remaining edits that aren't worth automating.

* Delete the rename scripts again.

* Removing CheckedCRewriter added due to merging

* Delete the actions file from the correctcomputation/checkedc-clang repo.

We need to do this before sending the PR to Microsoft (see #302).  The
actions are currently broken for several reasons and we cannot afford to
wait to get them working in the new repository first.

* Disable clangd3c build target, which is unmaintained and broken.

* Rename the clang/lib/3C target to clang3C, following convention.

This neatly resolves the confusing near-collision between the executable
target (renamed from cconv-standalone to 3c in #299) and the library
target (renamed from CConv to 3C).

* Rename two identifiers missed in #299.

Specifically, AsCCCommands and ExecuteCCCommand.  These are used only in
clangd3c, which we're about to disable, but the intent of #299 was to
process the whole source tree.  These do not appear in the currently
existing 5C-specific code.

This change intentionally does not address:

- Local variables (except one as requested by Mike).  They aren't
  important and their naming varies too much to make it easy to find all
  of them.

- Name collisions between fields or local variables and their types.  In
  my tests so far, a field collision causes a compile error, while a
  local variable collision doesn't.  I saw the clangd3c build fail due
  to field collisions (we know there are none in 3c because it builds),
  and this led me to discover some local variable collisions in both
  clangd3c and 3c.  But we don't care about clangd3c build failures
  right now, and there isn't an easy way to find /all/ local variable
  collisions, so I don't see a strong motivation to fix the few in 3c
  that I happen to know about.

To make sure I didn't miss anything else, I conducted a more thorough
search than before for "cc" or "c.c" in the diff from LLVM master.  Some
common false positives could be excluded, but I thought that would be
more work than just reviewing them.

RANGE=$(git merge-base llvm/master origin/master)..origin/master
PAT='cc|c[^<(/.a-z]c'
git diff --name-only $RANGE >names
grep -Ei "$PAT" --color=always names | less -R
while read f; do git diff --line-prefix="$f:" $RANGE -- $f; done <names >diffs-with-filenames
GREP_COLORS=cx=37:mt= grep --color=always -B3 -A3 -Ei ":[-+].*($PAT)" diffs-with-filenames | less -R

(Then use the case-insensitive "less" search command on $PAT.)

* update-includes.py: Find the default header location relative to
update-includes.py itself.

Bonus: Make update-includes.py directly executable.

* Add new 3C readmes and a link from the root readme.

* Fix some typos and unintended non-ASCII characters in the new readmes.

* Delete more tests that are obsolete (according to Aravind).

This leaves us with the "regression tests" in clang/test/3C (which are
documented in clang/docs/checkedc/3C/CONTRIBUTING.md) as the only tests
of 3C in this source tree.  We have some external tests that we plan to
document when they are in good enough shape to be run by the public.

* New readmes: Replace two spaces at the end of a sentence with one.

I'm in the minority, and the inconsistency is bound to become
distracting.  Also, there's no good way for an automated rewrapping tool
to know whether a period at the end of a line should get one space or
two when it gets moved.  (Emacs solves this by not breaking after a
period followed by a single space, but the irregular breaking looks
terrible to me.)

* Wrap 3C Markdown text to 80 characters as requested by Microsoft.

* Add 3C .clang-format files.

* Remove a semicolon after the body of ~FunctionVariableConstraint that
confused clang-format.

* clang-format all files covered by the new 3C .clang-format files.

* clang-format 3C files in clangd.

The absence of "ReflowComments: false" (which we use for the 3C
directories) made no difference in these files.

* Format 3C code in non-3C clangd files.

In essence, I ran clang-format and kept only the changes to 3C code,
except that I did fix two problems in non-3C code that had clearly been
introduced by 3C development.

* Fix long lines in comments in 3C C++ code.

* Add a period at the end of a comment as requested by Microsoft.

* clang/lib/3C/*.cpp: Move the #include of the main module header above
other #includes of 3C headers.

This partially applies item 1 from
https://llvm.org/docs/CodingStandards.html#include-style in order to
catch missing dependencies among headers.  Full conformance to this
guideline will come later.

* Fix broken dependency of AVarGraph.h on a declaration of AVarBoundsInfo
as an incomplete type, exposed by the previous commit.

I chose to do so by including ABounds.h since that's what
AVarBoundsInfo.h does, though simply declaring AVarBoundsInfo might have
worked.

* Add missing final newlines to all 3C files.

* Fix nits in 3C .clang-format files.

* Add 3C .clang-tidy files and documentation.

* Add all readability-identifier-naming suppressions for the _3C prefix.

This and the upcoming bulk clang-tidy work will include the unmaintained
clangd3c code because it seems that a modest amount of additional work
can avoid adding many more breakages on top of the ones it already has
(mostly from the upcoming bulk rename).

* Finish applying clang-tidy to 3C (for now).

- Update clang/docs/checkedc/3C/clang-tidy.md with much more information
  about the process. Note that some warnings remain that we haven't yet
  decided what to do about.

- Disable misc-no-recursion rule for now because it looks like we make
  significant intentional use of recursion.

- Remove blank lines between #include lines so that the
  `llvm-include-order` check will sort all the lines. Minor tweaks to
  nearby formatting.

- Run `clang-tidy --fix` multiple times, then `clang-format`, then fix
  up formatting mistakes from `llvm-else-after-return` as described in
  clang-tidy.md.

- Manually rename some elements where we're not happy with what the
  default `readability-identifier-naming` fix did.

- Get the clangd3c target to build and enable it again! After clang-tidy
  fixed some broken references without being asked, there was only one
  real error left, which was straightforward to fix. We still think
  clangd3c doesn't work, but as long as it builds, it will be easier for
  us to include it in bulk code edits like this one if we want to.

* Remove unused local variables and private fields.

* 3C CONTRIBUTING.md: Add link to more information about 5C.

Co-authored-by: Michael Hicks <[email protected]>
Co-authored-by: John Kastner <[email protected]>
Co-authored-by: Shilpa Roy <[email protected]>
Co-authored-by: Aravind Machiry <[email protected]>
Co-authored-by: Aaron Eline <[email protected]>
Co-authored-by: John Kastner <[email protected]>
Co-authored-by: John Kastner <[email protected]>
Co-authored-by: Michael Hicks <[email protected]>
Co-authored-by: Machiry Aravind Kumar <[email protected]>
@Machiry Machiry deleted the iss281 branch January 22, 2022 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should pick constant bound, but chooses parameter
3 participants