-
Notifications
You must be signed in to change notification settings - Fork 5
Major refactoring of array bounds code and some performance improvements #282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
// Insert into BFS cache. | ||
if (BFSCache.find(Start) == BFSCache.end()) { | ||
std::set<Data> ReachableNodes; | ||
ReachableNodes.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to call clear()
here? Isn't just creating the variable calling its constructor, which will ensure that it's empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Will remove it.
|
||
// Here x bounds will be c | ||
void foo(int *x, int c) { | ||
//CHECK: void foo(_Array_ptr<int> x : count(c), int c) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems odd that this should be c
since the index of 3
is not at all related to c
. I.e., the index is unconditional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is because of the heuristic (which we are not disabling) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a heuristic. It is from all the incoming array pointers i.e., p
, q
. i.e., all the calls to this function pass array to x
and bounds to c
. We do not worry about the indexing x
as Checked C dynamic checks will take care of that.
x[3] = c; | ||
} | ||
|
||
// Here x bounds is c but the violates bounds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh?
//CHECK: _Array_ptr<int> q1 : count(8) = malloc<int>(sizeof(int)*8); | ||
|
||
int n = 8; | ||
int l; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty weird, since l
is not initialized. Maybe make l
the parameter to bar
(this function) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will initialize l
to something. I just wanted to check Variation 3
|
||
// Variation 3 | ||
foo3(q2,l); | ||
foo3(q1,28); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about also a test where you pass in n
as the second parameter, ie. as if it was q
's length, even though it's associated with p
? The result should be that it's not treated as the length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will add it.
@@ -32,7 +32,8 @@ const StructScope *StructScope::getStructScope(std::string StName) { | |||
if (AllStScopes.find(TmpS) == AllStScopes.end()) { | |||
AllStScopes.insert(TmpS); | |||
} | |||
return &(*AllStScopes.find(TmpS)); | |||
const auto &SS = *AllStScopes.find(TmpS); | |||
return &SS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning &x
where x
is a local variable is bad, isn't it? Same with what's below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a reference. So, returning &SS
where SS
is a reference is as good as returning a pointer.
@@ -10,13 +10,21 @@ | |||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you completely changed this file. I'll look at it from scratch. Can you say at a high level what it does (I think it does the propagation/inference process) and what its entry point is? Then I will start looking at it from that entry point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
clang/lib/CConv/AVarBoundsInfo.cpp
Outdated
// Flow inference from context sensitive keys to original keys. | ||
performWorkListInference(ArrNeededBounds, this->CtxSensProgVarGraph, ABI); | ||
// Now, by using potential bounds. | ||
performWorkListInference(ArrNeededBounds, this->CtxSensProgVarGraph, ABI, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering: Why do we need different graphs? Why not always use the context-sensitive graph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be better explained in a call. Reference: https://drive.google.com/file/d/1U4_OLmur0OPlr3iq4CF6GFARv7EK70dg/view?usp=sharing
@@ -1071,36 +1201,49 @@ bool AVarBoundsInfo::performFlowAnalysis(ProgramInfo *PI) { | |||
removeBounds(TBK, FlowInferred); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of keepHighestPriorityBounds
-- what makes a possible bounds high priority?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the priority order we use:
enum BoundsPriority { |
clang/lib/CConv/AVarBoundsInfo.cpp
Outdated
// Regular flow inference. | ||
performWorkListInference(ArrNeededBounds, this->ProgVarGraph, ABI); | ||
// Flow inference using potential bounds. | ||
performWorkListInference(ArrNeededBounds, this->ProgVarGraph, ABI, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a "potential bound" (fromPB
) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the pull request:
For other variables that do not have any choice of bounds, we use potential bounds choices (FromPB
), these are the variables that are upper bounds to an index variable used in an array indexing operation.
For example:
if (i < n) {
...arr[i]...
}
In the above case, we use n
as a potential count bounds for arr.
Note: we only use potential bounds for a variable when none of its predecessors have bounds.
clang/lib/CConv/AVarBoundsInfo.cpp
Outdated
bool | ||
AvarBoundsInference::convergeInferredBounds() { | ||
bool FoundSome = false; | ||
for (auto &InfABnds : CurrIterInferBounds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does InfABnds
get set? I didn't notice it in the performXXX
functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets set in predictBounds
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run the tests on my Mac, I get two unexpected failures.
********************
Failing Tests (2):
Clang :: CheckedCRewriter/basic.c
Clang :: CheckedCRewriter/multivardecls.c
Expected Passes : 349
Expected Failures : 7
Unexpected Failures: 2
Are these unexpected failures to be expected? :)
I tried this example:
It incorrectly rewrites the |
Another question. For this example:
It produces this graph: Why are the edges bidirectional? We have bidirectional edges in the ptyp graph to make sense of the kind of pointer it is, but now that we've determined that type, I'm wondering why we don't have single-direction edges for lengths? |
As a consequence of the above, I notice that if we add the following line at the end of
then Maybe the reason they have to be bidrectional is somehow due to being able to mutate the variables? |
A bunch of other random questions:
|
Yes, this is a consequence of having bi-directional edges and we want the inferred bounds to be consistent. |
I wanted the inferred bounds to be consistent i.e., for a given array pointer We can do better if we have flow-sensitive bounds...which will be supported by Checked C eventually! |
This is because at call site Our inference looks for pointers that have bounds, here, in this above case, we only see |
Not expected. Will take a look. |
Looks like it is missing header files. Can you see the exact error you are seeing? you can use |
@jackastner found and fixed the For the
but on my machine I get
|
OK, merge when ready. My last request is that your comment at the top is very helpful in describing how things work. It would be useful to add it as comments in the code. For example, you could add a comment at near |
Major refactoring of array bounds code and some performance improvements
* Bogus warning suppressed * Fix to work on Mac * Update generated tests for whitespace changes * Fix #204 * Fix #205 FunctionDecl replacment now only replaces the return/parameters when they have changed. This alows macros to appear in these locations as long as they don't require rewritting. * Handle defines used as basetypes of checked pointers Just inline the define for now. * Remove text file * syncing with master * Refactoring GatherTool.cpp * Initial logging of root cause diagnostics * Add some comments to PVConstraint constructor * Nicer root cause reported for invalid cast * Correctly differentiate void* and vararg root causes * Undo modification to PersistentSourceLocation * Remove commented lines * Comment tweaks * Fixing issue 239 * Fix * Testing repeated runs * Fix regression in lua and ptr_dist * Actually fix lua this time * Initial fix for #255 * Add more const qualifiers * Move PersistentConstraint methods into ProgramInfo * Store persistent expression constraints in new map These were previously stored the Variables map that holds declaration constraint variables. * Variables map now associates declaration with a single constraint * Return option type for getVariable instead of nullptr * Remove commented out code that is now truly dead * Context sensitive array bounds * Fix RUN commands in test * Handle edge cases for PSL collision * Code cleanup and tweaks * Update comment in PersistentSourceLocation * Add a test case for parameter declarations in macro * Warnings * Another test case for WILDness of macros * Use my own option type for nicer behavior with abstract base class * Addressing comments * Adding refs * Move BaseType code into a static method * Split single declaration and multi-decl rewriting into separate methods * Initial implementation of new mutli-var-decl rewriting * Add tests for multi-declarations * Update main.yml Checking out correct commit hash of bear * Fixing running unit tests (They never ran) * Update main.yml * Update main.yml * Use doDeclRewrite for single declaration rewriting * Cleanup unneeded templating * Comments * Insert initializer for arrays of pointers * Update test for array initializer * Add EOF check in getNextCommaOrSemicolon * Remove dead method * Don't treat global vars / struct fields on same line as multi-decl * Comments and naming * Fix bug caused by lack of space between consecutive decls * Fixing issue #276 * Minor fixes * Changed error message * tidying * Fixing user-after-free in ProgramVar * Adding TODOs * Changing branch names from BigRefactor to master * Updating the links to correct computation org * Extract duplicated code into a new method * Major refactoring of array bounds code and some performance improvements * More 3C changes prior to 5C itype removal * Handling comments * Add infrastructure to handle separate 5C specific code * Add flags for 5C features guarded by ifdef * Changed artifacts retention days We will retain artifacts only for 3 days. * Adding ctx sensitive keys * Minor refactoring * Remove directories that contain nothing but old duplicate 3C tests. Part of #286. * Avoid generating extra atom on allocator function calls * constrainToWild only needs to constrain outermost pointer The inner pointers are constrained by implication constraints. This helps to improve the root cause analysis. * Fix bug in multi-declaration rewriting The set of declarations in a multi-decl was not explicitly ordered by position in multi-declaration. This meant declaration could sometimes be rewritten out of order, potentially crashing 3c. * Adding stats * Include ExprConstraintVars in pointer source location map * Add constraints to wild when PtrType solver fails Previously the effected atoms were assigned to in the solution, without updating the constraint graph. This change means the update solution is reflected in the constraint graph and root cause analysis. * Output diagnostics for constraints not mapped to declaration Constraints on expressions such as c-style casts are now output in root cause of wildness diagnostics. * Add source locations to constraints for better root cause reporting * Improve root cause diagnostic for invalid casts Invalid casts between pointers with different base types will now only produce one warning. * Add test file for root cause of wildness * More root cause tests * Add reason for unsafe allocator calls * Add WILD counts to output * Implement per-pointer statistic output Old methods are renamed to reflect that they works with atoms instead of pointers * Fix to propagate wildness cause between levels of pointers * Cleanup computeIterimConstraintState * Avoid using WILD as an atom inside constraint variables This was messing up the BFS in root cause analysis * Fix bug when root cause is on parameter of checked function pointer * Comments; renaming. * Update warning message * Remove unused conflicts set in returns/params * Remove some duplicated code pertaining to root cause source locations * Add method for obtaining constrained VarAtom * Avoid allocating empty sets in ImpMap * Fix root cause reporting when cause is on function pointer parameter * Add comments * Rename method to getFreshGEQ * Fix for issue #293 Instead of considering a pointer "originally checked" if any one of the pointer levels in it is checked in the source, characterize each pointer level individually. * Add tests for converting paritaly checked pointers * Test for partially checked function pointer * Added comments * FunctionVariableConstraint return and param vars changed to PVConstraints * Handle partialy checked extern functions * Add tests for partialy checked array pointers * Revert "Merge pull request #282 from correctcomputation/iss281" This reverts commit c554433, reversing changes made to 0f8d6dc. * Fixing array legth inference performance issue * Revert "Fixing array length inference performance issue" * Merge pull request #282 from correctcomputation/iss281 Major refactoring of array bounds code and some performance improvements * Fixing array legth inference performance issue * Temporarily add the rename scripts for archival. * Fix typos and inconsistencies in old names to let automated replacements work. * git mv 'clang/docs/checkedc/CheckedCConvert.md' 'clang/docs/checkedc/3C.md' * git mv 'clang/tools/cconv-standalone' 'clang/tools/3c' * git mv 'clang/tools/3c/utils/cc_conv' 'clang/tools/3c/utils/port_tools' * git mv 'clang/include/clang/CConv/CCGlobalOptions.h' 'clang/include/clang/CConv/3CGlobalOptions.h' * git mv 'clang/test/CheckedCRewriter' 'clang/test/3C' * repl_basename_pfx 'CConvert' '3C' '^clang' * repl_basename_pfx 'CConv' '3C' '^clang' * repl '\<cconv-standalone\>' '3c' * repl '\<clang/tools/3c/utils/cc_conv\>' 'clang/tools/3c/utils/port_tools' '^\.github/workflows/main\.yml$' * repl '\<CCGlobalOptions\>' '3CGlobalOptions' * repl '\<CheckedCRewriter\>' '3C' * repl '\<CConv(ert)?' '3C' '/CMakeLists\.txt$' * repl '#include "clang/CConv/' '#include "clang/3C/' '\.(h|cpp)$' * repl '(#include.*["/])CConv(ert)?' '\13C' '\.(h|cpp)$' * repl '//=--CConvert([^-]*-)' '//=--3C\1------' * repl '//=--CConv([^-]*-)' '//=--3C\1---' * repl '\<LLVM_CLANG_TOOLS_EXTRA_CLANGD_CCONVERT' 'LLVM_CLANG_TOOLS_EXTRA_CLANGD_3C' '\.(h|cpp)$' * repl '\<_CCGLOBALOPTIONS_H\>' '_3CGLOBALOPTIONS_H' '\.(h|cpp)$' * repl '\<_CCONV([A-Z]*)_H\>' '_3C\1_H' '\.(h|cpp)$' * repl '\<cconv(build|scripts)\>' '3c-\1' '^\.github/workflows/main\.yml$' * repl '\<CCInterface\>' '_3CInterface' '\.(h|cpp)$' * repl '\<cconv(CollectAndBuildInitialConstraints|CloseDocument)\>' '_3C\1' '\.(h|cpp)$' * repl '\<CConv(Interface|Inter|Main|Sec|DiagInfo|LSPCallBack)\>' '_3C\1' '\.(h|cpp)$' * repl '\<CCONVSOURCE\>' '_3CSOURCE' '\.(h|cpp)$' * repl '\<(Command|ExecuteCommandParams)::CCONV_' '\1::_3C_' '\.(h|cpp)$' * repl '\<CCONV_' '_3C_' '^clang-tools-extra/clangd/Protocol\.h$' * repl '\<ConvertCategory\>' '_3CCategory' '\.(h|cpp)$' * repl '\<CConvert(Options|ManualFix|Diagnostics)\>' '_3C\1' '\.(h|cpp)$' * repl '\<ccConvertManualFix\>' '_3CManualFix' '\.(h|cpp)$' * repl '\<ccConvResultsReady\>' '_3CResultsReady' '\.(h|cpp)$' * repl '\<CheckedCConverter\>' '_3C' * repl '\<executeCConvCommand\>' 'execute3CCommand' '\.(h|cpp)$' * repl '\<sendCConvMessage\>' 'send3CMessage' '\.(h|cpp)$' * repl '\<IsCConvCommand\>' 'Is3CCommand' '\.(h|cpp)$' * repl '\<(clear|report)CConvDiagsForAllFiles\>' '\13CDiagsForAllFiles' '\.(h|cpp)$' * repl '\<INTERACTIVECCCONV\>' 'INTERACTIVE3C' * repl '"CConv_(RealWild|AffWild)"' '"3C_\1"' '\.(h|cpp)$' * repl '"cconv\.(onlyThisPtr|applyAllPtr)"' '"3c.\1"' '\.(h|cpp)$' * repl '\("CConv(:)? ' '("3C\1 ' '\.(h|cpp)$' * repl '\<runCheckedCConvert\>' 'run3C' * repl '\<icconv\>' 'clangd3c' * repl '\<checked_c_convert_bin\>' '_3c_bin' * repl '\<checked-c-convert\>' '3c' * repl 'Checked C rewriter tool' '3C' '^clang/test/3C/' * repl '\<cconvClangDaemon\>' '3cClangDaemon' '/CMakeLists\.txt$' * repl '\<[Cc]?[Cc][Cc]onv\>' '3C' '(^|/)3[Cc]|^clang-tools-extra/clangd|^clang/include/clang' * repl 'fcheckedc_convert_tool' 'f3c_tool' * repl 'fcheckedc-convert-tool' 'f3c-tool' * Adding more flags * Text alignment fixes. * Remaining edits that aren't worth automating. * Delete the rename scripts again. * Removing CheckedCRewriter added due to merging * Delete the actions file from the correctcomputation/checkedc-clang repo. We need to do this before sending the PR to Microsoft (see #302). The actions are currently broken for several reasons and we cannot afford to wait to get them working in the new repository first. * Disable clangd3c build target, which is unmaintained and broken. * Rename the clang/lib/3C target to clang3C, following convention. This neatly resolves the confusing near-collision between the executable target (renamed from cconv-standalone to 3c in #299) and the library target (renamed from CConv to 3C). * Rename two identifiers missed in #299. Specifically, AsCCCommands and ExecuteCCCommand. These are used only in clangd3c, which we're about to disable, but the intent of #299 was to process the whole source tree. These do not appear in the currently existing 5C-specific code. This change intentionally does not address: - Local variables (except one as requested by Mike). They aren't important and their naming varies too much to make it easy to find all of them. - Name collisions between fields or local variables and their types. In my tests so far, a field collision causes a compile error, while a local variable collision doesn't. I saw the clangd3c build fail due to field collisions (we know there are none in 3c because it builds), and this led me to discover some local variable collisions in both clangd3c and 3c. But we don't care about clangd3c build failures right now, and there isn't an easy way to find /all/ local variable collisions, so I don't see a strong motivation to fix the few in 3c that I happen to know about. To make sure I didn't miss anything else, I conducted a more thorough search than before for "cc" or "c.c" in the diff from LLVM master. Some common false positives could be excluded, but I thought that would be more work than just reviewing them. RANGE=$(git merge-base llvm/master origin/master)..origin/master PAT='cc|c[^<(/.a-z]c' git diff --name-only $RANGE >names grep -Ei "$PAT" --color=always names | less -R while read f; do git diff --line-prefix="$f:" $RANGE -- $f; done <names >diffs-with-filenames GREP_COLORS=cx=37:mt= grep --color=always -B3 -A3 -Ei ":[-+].*($PAT)" diffs-with-filenames | less -R (Then use the case-insensitive "less" search command on $PAT.) * update-includes.py: Find the default header location relative to update-includes.py itself. Bonus: Make update-includes.py directly executable. * Add new 3C readmes and a link from the root readme. * Fix some typos and unintended non-ASCII characters in the new readmes. * Delete more tests that are obsolete (according to Aravind). This leaves us with the "regression tests" in clang/test/3C (which are documented in clang/docs/checkedc/3C/CONTRIBUTING.md) as the only tests of 3C in this source tree. We have some external tests that we plan to document when they are in good enough shape to be run by the public. * New readmes: Replace two spaces at the end of a sentence with one. I'm in the minority, and the inconsistency is bound to become distracting. Also, there's no good way for an automated rewrapping tool to know whether a period at the end of a line should get one space or two when it gets moved. (Emacs solves this by not breaking after a period followed by a single space, but the irregular breaking looks terrible to me.) * Wrap 3C Markdown text to 80 characters as requested by Microsoft. * Add 3C .clang-format files. * Remove a semicolon after the body of ~FunctionVariableConstraint that confused clang-format. * clang-format all files covered by the new 3C .clang-format files. * clang-format 3C files in clangd. The absence of "ReflowComments: false" (which we use for the 3C directories) made no difference in these files. * Format 3C code in non-3C clangd files. In essence, I ran clang-format and kept only the changes to 3C code, except that I did fix two problems in non-3C code that had clearly been introduced by 3C development. * Fix long lines in comments in 3C C++ code. * Add a period at the end of a comment as requested by Microsoft. * clang/lib/3C/*.cpp: Move the #include of the main module header above other #includes of 3C headers. This partially applies item 1 from https://llvm.org/docs/CodingStandards.html#include-style in order to catch missing dependencies among headers. Full conformance to this guideline will come later. * Fix broken dependency of AVarGraph.h on a declaration of AVarBoundsInfo as an incomplete type, exposed by the previous commit. I chose to do so by including ABounds.h since that's what AVarBoundsInfo.h does, though simply declaring AVarBoundsInfo might have worked. * Add missing final newlines to all 3C files. * Fix nits in 3C .clang-format files. * Add 3C .clang-tidy files and documentation. * Add all readability-identifier-naming suppressions for the _3C prefix. This and the upcoming bulk clang-tidy work will include the unmaintained clangd3c code because it seems that a modest amount of additional work can avoid adding many more breakages on top of the ones it already has (mostly from the upcoming bulk rename). * Finish applying clang-tidy to 3C (for now). - Update clang/docs/checkedc/3C/clang-tidy.md with much more information about the process. Note that some warnings remain that we haven't yet decided what to do about. - Disable misc-no-recursion rule for now because it looks like we make significant intentional use of recursion. - Remove blank lines between #include lines so that the `llvm-include-order` check will sort all the lines. Minor tweaks to nearby formatting. - Run `clang-tidy --fix` multiple times, then `clang-format`, then fix up formatting mistakes from `llvm-else-after-return` as described in clang-tidy.md. - Manually rename some elements where we're not happy with what the default `readability-identifier-naming` fix did. - Get the clangd3c target to build and enable it again! After clang-tidy fixed some broken references without being asked, there was only one real error left, which was straightforward to fix. We still think clangd3c doesn't work, but as long as it builds, it will be easier for us to include it in bulk code edits like this one if we want to. * Remove unused local variables and private fields. * 3C CONTRIBUTING.md: Add link to more information about 5C. Co-authored-by: Michael Hicks <[email protected]> Co-authored-by: John Kastner <[email protected]> Co-authored-by: Shilpa Roy <[email protected]> Co-authored-by: Aravind Machiry <[email protected]> Co-authored-by: Aaron Eline <[email protected]> Co-authored-by: John Kastner <[email protected]> Co-authored-by: John Kastner <[email protected]> Co-authored-by: Michael Hicks <[email protected]> Co-authored-by: Machiry Aravind Kumar <[email protected]>
Implemented the technique similar to the locksmith, where the bounds of a variable is the intersection of the bounds of all array variables that flows into the variable.
Added a flag to disable array bounds heuristics:
-disable-arr-hu
(By default, they are enabled but we can use this flag to disable them).We used same node for all ocurrences of constant numbers, for example, for the below code:
We would construct our graph as:
n <-> 8 <-> l
, this wrongly leads us to believe thatn
is reachable (i.e., flows) intol
. Which is not true.We changed the code to create a new node for every occurrence of constants and thereby avoid this.
Added cache to breadth-first search, realized that we call BFS on the same node many times and it is better to cache this information (provided that the graph didn't change).
This also fixes: Should pick constant bound, but chooses parameter #281
Array bounds inference:
Every variable (both pointers and scalars) are identified by a unique number i.e.,
BoundsKey
, There is a mapping from eachBoundsKey
to the correspondingProgramVar*
which contains information about program scope, variable name, etc.The main entry point in this class is:
bool AVarBoundsInfo::performFlowAnalysis(ProgramInfo *PI)
: This as the name suggests performs flow (i.e.,LockSmith
based) analysis to determine the bounds of all the array variables.It has a fixed point loop, at each iteration, we try to find the possible bounds for all array variables (that do not have bounds) from its predecessors using the following function:
where
K
is the array variable whose bounds need to be computed andNeighbours
are the predecessors andBKGraph
is the graph that contains all the flow information. The possible bounds forK
will be stored intoCurrIterInferBounds
.Later we call
convergeInferredBounds
that will pick the best bounds from the possible bound values, with the following preference:count
will be the priority overbyte
bounds. Variables will be given priority over constants. If there are multiple choices (i.e., variables or constants) we give up.We first propagate all the bounds information from explicit declarations and
mallocs
.For other variables that do not have any choice of bounds, we use potential bounds choices (
FromPB
), these are the variables that are upper bounds to an index variable used in an array indexing operation.For example:
In the above case, we use
n
as a potential count bounds forarr
.Note: we only use potential bounds for a variable when none of its predecessors have bounds.
Function of interest:
The function:
getReachableBoundKeys
finds all the BoundsKeys (i.e., variables) in scopeDstScope
that are reachable fromFromVarK
in the graphBKGraph
. All the reachable bounds key will be stored inPotK
.