-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Rust: update docs #19280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Rust: update docs #19280
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more libraries we model (partial support, but then I don't think we can claim to fully model any libraries). Sorry I probably missed a few of these in the list I sent you before.
We also have models for the standard io library coming in #19304 , I should update that PR once this is merged or vice-versa.
Co-authored-by: Geoffrey White <[email protected]>
Co-authored-by: Geoffrey White <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two docs + other changes look great to me!
Some minor suggestions below.
"CodeQL library for Rust" doesn't go into a lot of detail, I think that's OK at this stage, it would be nice to have more example code in future.
I haven't checked the links, references to notes, and generally that everything fits together correctly - we really need a full preview for that and/or to be ready to correct any problems quickly after this goes live.
You should probably get a review from the docs team next?
docs/codeql/codeql-language-guides/analyzing-data-flow-in-rust.rst
Outdated
Show resolved
Hide resolved
exists(Function f, CallExpr call, int index | | ||
call.getArg(index) = node.asExpr().getExpr() and | ||
call.getStaticTarget() = f and | ||
f.getParam(index).getPat().(IdentPat).getName().getText() = "password" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should strongly consider adding helper predicates getParamName
and getArgName
to make code like this a bit cleaner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I already cleaned up getting by index (it was f.getParamList().getParam(index)
), but a getParamByName
seems a good addition too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I noticed that. 👍
Data flow is particularly useful for security queries, where untrusted data flows to vulnerable parts of the program | ||
to exploit it. Related to data flow, is the taint-tracking library, which finds how data can *influence* other values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear what "it" is, and I think these few words are superfluous anyway.
Data flow is particularly useful for security queries, where untrusted data flows to vulnerable parts of the program | |
to exploit it. Related to data flow, is the taint-tracking library, which finds how data can *influence* other values | |
Data flow is particularly useful for security queries, where untrusted data flows to vulnerable parts of the program. Related to data flow is the taint-tracking library, which finds how data can *influence* other values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hasn't been addressed yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was sure I did, sorry!
|
||
## Overview | ||
|
||
<!-- autogenerated CWE coverage table will be added below --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm appreciative of the automation here!
docs/codeql/codeql-language-guides/analyzing-data-flow-in-rust.rst
Outdated
Show resolved
Hide resolved
docs/codeql/codeql-language-guides/analyzing-data-flow-in-rust.rst
Outdated
Show resolved
Hide resolved
... | ||
} | ||
You can use the predicates ``exprNode`` and ``parameterNode`` to map from expressions and parameters to their data-flow node: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't think those predicates exist yet.
docs/codeql/codeql-language-guides/analyzing-data-flow-in-rust.rst
Outdated
Show resolved
Hide resolved
docs/codeql/codeql-language-guides/analyzing-data-flow-in-rust.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing on behalf of the docs team. I've added editorial suggestions, but aside from that, this looks good to me ✅
Local data flow | ||
--------------- | ||
|
||
Local data flow tracks the flow of data within a single method or callable. Local data flow is easier, faster, and more precise than global data flow. Before looking at more complex tracking, you should always consider local tracking because it is sufficient for many queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local data flow tracks the flow of data within a single method or callable. Local data flow is easier, faster, and more precise than global data flow. Before looking at more complex tracking, you should always consider local tracking because it is sufficient for many queries. | |
Local data flow tracks the flow of data within a single method or callable. Local data flow is easier, faster, and more precise than global data flow. Before using more complex tracking, consider local tracking, as it is sufficient for many queries. |
... | ||
} | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Styling NIT: Remove the extra new line here.
Please feel free to ignore this comment if it should stay as is
Note that since ``asExpr`` maps from data-flow to control-flow nodes, you then need to call the ``getExpr`` member predicate on the control-flow node to map to the corresponding AST node, | ||
for example by writing ``node.asExpr().getExpr()``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that since ``asExpr`` maps from data-flow to control-flow nodes, you then need to call the ``getExpr`` member predicate on the control-flow node to map to the corresponding AST node, | |
for example by writing ``node.asExpr().getExpr()``. | |
Note that because ``asExpr`` maps from data-flow to control-flow nodes, you need to call the ``getExpr`` member predicate on the control-flow node to map to the corresponding AST node. For example, you can write ``node.asExpr().getExpr()``. |
A control-flow graph considers every way control can flow through code, consequently, there can be multiple data-flow and control-flow nodes associated with a single expression node in the AST. | ||
|
||
The predicate ``localFlowStep(Node nodeFrom, Node nodeTo)`` holds if there is an immediate data flow edge from the node ``nodeFrom`` to the node ``nodeTo``. | ||
You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localFlow``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localFlow``. | |
You can apply the predicate recursively by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localFlow``. |
|
||
The local taint tracking library is in the module ``TaintTracking``. | ||
Like local data flow, a predicate ``localTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo)`` holds if there is an immediate taint propagation edge from the node ``nodeFrom`` to the node ``nodeTo``. | ||
You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localTaint``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localTaint``. | |
You can apply the predicate recursively by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localTaint``. |
Using local sources | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
When exploring local data flow or taint propagation between two expressions as above, you would normally constrain the expressions to be relevant to your investigation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When exploring local data flow or taint propagation between two expressions as above, you would normally constrain the expressions to be relevant to your investigation. | |
When exploring local data flow or taint propagation between two expressions, such as in the previous example, you typically constrain the expressions to those relevant to your investigation. |
The next section gives some concrete examples, but first it's helpful to introduce the concept of a local source. | ||
|
||
A local source is a data-flow node with no local data flow into it. | ||
As such, it is a local origin of data flow, a place where a new value is created. | ||
This includes parameters (which only receive values from global data flow) and most expressions (because they are not value-preserving). | ||
The class ``LocalSourceNode`` represents data-flow nodes that are also local sources. | ||
It comes with a useful member predicate ``flowsTo(DataFlow::Node node)``, which holds if there is local data flow from the local source to ``node``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next section gives some concrete examples, but first it's helpful to introduce the concept of a local source. | |
A local source is a data-flow node with no local data flow into it. | |
As such, it is a local origin of data flow, a place where a new value is created. | |
This includes parameters (which only receive values from global data flow) and most expressions (because they are not value-preserving). | |
The class ``LocalSourceNode`` represents data-flow nodes that are also local sources. | |
It comes with a useful member predicate ``flowsTo(DataFlow::Node node)``, which holds if there is local data flow from the local source to ``node``. | |
The next section provides concrete examples, but first introduces the concept of a local source. | |
A local source is a data-flow node with no local data flow into it. | |
It is a local origin of data flow, a place where a new value is created. | |
This includes parameters (which only receive values from global data flow) and most expressions (because they are not value-preserving). | |
The class ``LocalSourceNode`` represents data-flow nodes that are also local sources. | |
It includes a useful member predicate ``flowsTo(DataFlow::Node node)``, which holds if there is local data flow from the local source to ``node``. |
codeql-library-for-rust | ||
analyzing-data-flow-in-rust | ||
|
||
- :doc:`CodeQL library for Rust <codeql-library-for-rust>`: When you're analyzing Rust code, you can make use of the large collection of classes in the CodeQL library for Rust. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- :doc:`CodeQL library for Rust <codeql-library-for-rust>`: When you're analyzing Rust code, you can make use of the large collection of classes in the CodeQL library for Rust. | |
- :doc:`CodeQL library for Rust <codeql-library-for-rust>`: When analyzing Rust code, you can make use of the large collection of classes in the CodeQL library for Rust. |
CodeQL library for Rust | ||
================================= | ||
|
||
When you're analyzing Rust code, you can make use of the large collection of classes in the CodeQL library for Rust. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you're analyzing Rust code, you can make use of the large collection of classes in the CodeQL library for Rust. | |
When analyzing Rust code, you can make use of the large collection of classes in the CodeQL library for Rust. |
Unfortunately this will only give the expression in the argument, not the values which could be passed to it. | ||
So we use local data flow to find all expressions that flow into the argument: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this will only give the expression in the argument, not the values which could be passed to it. | |
So we use local data flow to find all expressions that flow into the argument: | |
Unfortunately, this only returns the expression used as the argument, not the possible values that could be passed to it. To address this, you can use local data flow to find all expressions that flow into the argument. |
DataFlow::localFlow(source, sink) | ||
select source, sink | ||
We can vary the source, for example, making the source the parameter of a function rather than an expression. The following query finds where a parameter is used for the file creation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can vary the source, for example, making the source the parameter of a function rather than an expression. The following query finds where a parameter is used for the file creation: | |
You can vary the source by making the source the parameter of a function instead of an expression. The following query finds where a parameter is used in file creation: |
Global taint tracking is to global data flow what local taint tracking is to local data flow. | ||
That is, global taint tracking extends global data flow with additional non-value-preserving steps. | ||
The global taint tracking library uses the same configuration module as the global data flow library. You can perform taint flow analysis using ``TaintTracking::Global``: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Global taint tracking is to global data flow what local taint tracking is to local data flow. | |
That is, global taint tracking extends global data flow with additional non-value-preserving steps. | |
The global taint tracking library uses the same configuration module as the global data flow library. You can perform taint flow analysis using ``TaintTracking::Global``: | |
Global taint tracking relates to global data flow in the same way that local taint tracking relates to local data flow. | |
In other words, global taint tracking extends global data flow with additional non-value-preserving steps. | |
The global taint tracking library uses the same configuration module as the global data flow library. You can perform taint flow analysis using ``TaintTracking::Global``: |
- Since this is a taint-tracking query, the ``TaintTracking::Global`` module is used. | ||
- The ``isSource`` predicate defines sources as any ``StringLiteralExpr``. | ||
- The ``isSink`` predicate defines sinks as arguments to a ``CallExpr`` called "password". | ||
- The sources and sinks may need tuning to a particular use, for example, if passwords are represented by a type other than ``String`` or passed in arguments of a different name than "password". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The sources and sinks may need tuning to a particular use, for example, if passwords are represented by a type other than ``String`` or passed in arguments of a different name than "password". | |
- The sources and sinks may need to be adjusted for a particular use. For example, passwords might be represented by a type other than ``String`` or passed in arguments with a different name than "password". |
CodeQL ships with a library for analyzing Rust code. The classes in this library present the data from a CodeQL database in an object-oriented form and provide | ||
abstractions and predicates to help you with common analysis tasks. | ||
|
||
The library is implemented as a set of CodeQL modules, that is, files with the extension ``.qll``. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The library is implemented as a set of CodeQL modules, that is, files with the extension ``.qll``. The | |
The library is implemented as a set of CodeQL modules, which are files with the extension ``.qll``. The |
import rust | ||
The CodeQL libraries model various aspects of Rust code. The above import includes the abstract syntax tree (AST) library, which is used for locating program elements | ||
to match syntactic elements in the source code. This can be used for example to find values, patterns and structures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to match syntactic elements in the source code. This can be used for example to find values, patterns and structures. | |
to match syntactic elements in the source code. This can be used to find values, patterns, and structures. |
The CodeQL libraries model various aspects of Rust code. The above import includes the abstract syntax tree (AST) library, which is used for locating program elements | ||
to match syntactic elements in the source code. This can be used for example to find values, patterns and structures. | ||
|
||
The control flow graph (CFG) is imported using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The control flow graph (CFG) is imported using | |
The control flow graph (CFG) is imported using: |
The CFG models the control flow between statements and expressions, for example whether one expression can | ||
be evaluated before another expression, or whether an expression "dominates" another one, meaning that all paths to an | ||
expression must flow through another expression first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CFG models the control flow between statements and expressions, for example whether one expression can | |
be evaluated before another expression, or whether an expression "dominates" another one, meaning that all paths to an | |
expression must flow through another expression first. | |
The CFG models the control flow between statements and expressions. For example, it can determine whether one expression can | |
be evaluated before another expression, or whether an expression "dominates" another one, meaning that all paths to an | |
expression must flow through another expression first. |
be evaluated before another expression, or whether an expression "dominates" another one, meaning that all paths to an | ||
expression must flow through another expression first. | ||
|
||
The data flow library is imported using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data flow library is imported using | |
The data flow library is imported using: |
Data flow tracks the flow of data through the program, including through function calls (interprocedural data flow) and between steps in a job or workflow. | ||
Data flow is particularly useful for security queries, where untrusted data flows to vulnerable parts of the program. Related to data flow is the taint-tracking library, | ||
which finds how data can *influence* other values in a program, even when it is not copied exactly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data flow tracks the flow of data through the program, including through function calls (interprocedural data flow) and between steps in a job or workflow. | |
Data flow is particularly useful for security queries, where untrusted data flows to vulnerable parts of the program. Related to data flow is the taint-tracking library, | |
which finds how data can *influence* other values in a program, even when it is not copied exactly. | |
Data flow tracks the flow of data through the program, including across function calls (interprocedural data flow) and between steps in a job or workflow. | |
Data flow is particularly useful for security queries, where untrusted data flows to vulnerable parts of the program. The taint-tracking library is related to data flow, | |
and helps you find how data can *influence* other values in a program, even when it is not copied exactly. |
This should be kept unmerged until we get to the public preview phase.