Skip to content

Rust: update docs #19280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: main
Choose a base branch
from
Open

Rust: update docs #19280

wants to merge 25 commits into from

Conversation

redsun82
Copy link
Contributor

@redsun82 redsun82 commented Apr 11, 2025

This should be kept unmerged until we get to the public preview phase.

@redsun82 redsun82 changed the title Rust: start preparing documentation changes Rust: update supported languages and frameworks Apr 11, 2025
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more libraries we model (partial support, but then I don't think we can claim to fully model any libraries). Sorry I probably missed a few of these in the list I sent you before.

We also have models for the standard io library coming in #19304 , I should update that PR once this is merged or vice-versa.

aibaars
aibaars previously approved these changes Apr 15, 2025
geoffw0
geoffw0 previously approved these changes May 2, 2025
@github-actions github-actions bot added the Rust Pull requests that update Rust code label Jun 6, 2025
@github-actions github-actions bot removed the Rust Pull requests that update Rust code label Jun 10, 2025
@redsun82 redsun82 changed the title Rust: update supported languages and frameworks Rust: update docs Jun 10, 2025
@redsun82 redsun82 marked this pull request as ready for review June 12, 2025 15:35
@redsun82 redsun82 requested a review from a team as a code owner June 12, 2025 15:35
@redsun82 redsun82 requested a review from a team as a code owner June 13, 2025 11:03
@github-actions github-actions bot added the Rust Pull requests that update Rust code label Jun 13, 2025
Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two docs + other changes look great to me!

Some minor suggestions below.

"CodeQL library for Rust" doesn't go into a lot of detail, I think that's OK at this stage, it would be nice to have more example code in future.

I haven't checked the links, references to notes, and generally that everything fits together correctly - we really need a full preview for that and/or to be ready to correct any problems quickly after this goes live.

You should probably get a review from the docs team next?

exists(Function f, CallExpr call, int index |
call.getArg(index) = node.asExpr().getExpr() and
call.getStaticTarget() = f and
f.getParam(index).getPat().(IdentPat).getName().getText() = "password"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should strongly consider adding helper predicates getParamName and getArgName to make code like this a bit cleaner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I already cleaned up getting by index (it was f.getParamList().getParam(index)), but a getParamByName seems a good addition too!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I noticed that. 👍

Comment on lines 42 to 43
Data flow is particularly useful for security queries, where untrusted data flows to vulnerable parts of the program
to exploit it. Related to data flow, is the taint-tracking library, which finds how data can *influence* other values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear what "it" is, and I think these few words are superfluous anyway.

Suggested change
Data flow is particularly useful for security queries, where untrusted data flows to vulnerable parts of the program
to exploit it. Related to data flow, is the taint-tracking library, which finds how data can *influence* other values
Data flow is particularly useful for security queries, where untrusted data flows to vulnerable parts of the program. Related to data flow is the taint-tracking library, which finds how data can *influence* other values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hasn't been addressed yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was sure I did, sorry!


## Overview

<!-- autogenerated CWE coverage table will be added below -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm appreciative of the automation here!

...
}
You can use the predicates ``exprNode`` and ``parameterNode`` to map from expressions and parameters to their data-flow node:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't think those predicates exist yet.

@redsun82 redsun82 added the ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. label Jun 17, 2025
Copy link

@sunbrye sunbrye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing on behalf of the docs team. I've added editorial suggestions, but aside from that, this looks good to me ✅

Local data flow
---------------

Local data flow tracks the flow of data within a single method or callable. Local data flow is easier, faster, and more precise than global data flow. Before looking at more complex tracking, you should always consider local tracking because it is sufficient for many queries.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Local data flow tracks the flow of data within a single method or callable. Local data flow is easier, faster, and more precise than global data flow. Before looking at more complex tracking, you should always consider local tracking because it is sufficient for many queries.
Local data flow tracks the flow of data within a single method or callable. Local data flow is easier, faster, and more precise than global data flow. Before using more complex tracking, consider local tracking, as it is sufficient for many queries.

...
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Styling NIT: Remove the extra new line here.

Please feel free to ignore this comment if it should stay as is

Comment on lines +47 to +48
Note that since ``asExpr`` maps from data-flow to control-flow nodes, you then need to call the ``getExpr`` member predicate on the control-flow node to map to the corresponding AST node,
for example by writing ``node.asExpr().getExpr()``.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that since ``asExpr`` maps from data-flow to control-flow nodes, you then need to call the ``getExpr`` member predicate on the control-flow node to map to the corresponding AST node,
for example by writing ``node.asExpr().getExpr()``.
Note that because ``asExpr`` maps from data-flow to control-flow nodes, you need to call the ``getExpr`` member predicate on the control-flow node to map to the corresponding AST node. For example, you can write ``node.asExpr().getExpr()``.

A control-flow graph considers every way control can flow through code, consequently, there can be multiple data-flow and control-flow nodes associated with a single expression node in the AST.

The predicate ``localFlowStep(Node nodeFrom, Node nodeTo)`` holds if there is an immediate data flow edge from the node ``nodeFrom`` to the node ``nodeTo``.
You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localFlow``.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localFlow``.
You can apply the predicate recursively by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localFlow``.


The local taint tracking library is in the module ``TaintTracking``.
Like local data flow, a predicate ``localTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo)`` holds if there is an immediate taint propagation edge from the node ``nodeFrom`` to the node ``nodeTo``.
You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localTaint``.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can apply the predicate recursively, by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localTaint``.
You can apply the predicate recursively by using the ``+`` and ``*`` operators, or you can use the predefined recursive predicate ``localTaint``.

Using local sources
~~~~~~~~~~~~~~~~~~~

When exploring local data flow or taint propagation between two expressions as above, you would normally constrain the expressions to be relevant to your investigation.
Copy link

@sunbrye sunbrye Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When exploring local data flow or taint propagation between two expressions as above, you would normally constrain the expressions to be relevant to your investigation.
When exploring local data flow or taint propagation between two expressions, such as in the previous example, you typically constrain the expressions to those relevant to your investigation.

Comment on lines +87 to +93
The next section gives some concrete examples, but first it's helpful to introduce the concept of a local source.

A local source is a data-flow node with no local data flow into it.
As such, it is a local origin of data flow, a place where a new value is created.
This includes parameters (which only receive values from global data flow) and most expressions (because they are not value-preserving).
The class ``LocalSourceNode`` represents data-flow nodes that are also local sources.
It comes with a useful member predicate ``flowsTo(DataFlow::Node node)``, which holds if there is local data flow from the local source to ``node``.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The next section gives some concrete examples, but first it's helpful to introduce the concept of a local source.
A local source is a data-flow node with no local data flow into it.
As such, it is a local origin of data flow, a place where a new value is created.
This includes parameters (which only receive values from global data flow) and most expressions (because they are not value-preserving).
The class ``LocalSourceNode`` represents data-flow nodes that are also local sources.
It comes with a useful member predicate ``flowsTo(DataFlow::Node node)``, which holds if there is local data flow from the local source to ``node``.
The next section provides concrete examples, but first introduces the concept of a local source.
A local source is a data-flow node with no local data flow into it.
It is a local origin of data flow, a place where a new value is created.
This includes parameters (which only receive values from global data flow) and most expressions (because they are not value-preserving).
The class ``LocalSourceNode`` represents data-flow nodes that are also local sources.
It includes a useful member predicate ``flowsTo(DataFlow::Node node)``, which holds if there is local data flow from the local source to ``node``.

codeql-library-for-rust
analyzing-data-flow-in-rust

- :doc:`CodeQL library for Rust <codeql-library-for-rust>`: When you're analyzing Rust code, you can make use of the large collection of classes in the CodeQL library for Rust.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- :doc:`CodeQL library for Rust <codeql-library-for-rust>`: When you're analyzing Rust code, you can make use of the large collection of classes in the CodeQL library for Rust.
- :doc:`CodeQL library for Rust <codeql-library-for-rust>`: When analyzing Rust code, you can make use of the large collection of classes in the CodeQL library for Rust.

CodeQL library for Rust
=================================

When you're analyzing Rust code, you can make use of the large collection of classes in the CodeQL library for Rust.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When you're analyzing Rust code, you can make use of the large collection of classes in the CodeQL library for Rust.
When analyzing Rust code, you can make use of the large collection of classes in the CodeQL library for Rust.

Comment on lines +108 to +109
Unfortunately this will only give the expression in the argument, not the values which could be passed to it.
So we use local data flow to find all expressions that flow into the argument:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Unfortunately this will only give the expression in the argument, not the values which could be passed to it.
So we use local data flow to find all expressions that flow into the argument:
Unfortunately, this only returns the expression used as the argument, not the possible values that could be passed to it. To address this, you can use local data flow to find all expressions that flow into the argument.

DataFlow::localFlow(source, sink)
select source, sink
We can vary the source, for example, making the source the parameter of a function rather than an expression. The following query finds where a parameter is used for the file creation:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We can vary the source, for example, making the source the parameter of a function rather than an expression. The following query finds where a parameter is used for the file creation:
You can vary the source by making the source the parameter of a function instead of an expression. The following query finds where a parameter is used in file creation:

Comment on lines +186 to +188
Global taint tracking is to global data flow what local taint tracking is to local data flow.
That is, global taint tracking extends global data flow with additional non-value-preserving steps.
The global taint tracking library uses the same configuration module as the global data flow library. You can perform taint flow analysis using ``TaintTracking::Global``:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Global taint tracking is to global data flow what local taint tracking is to local data flow.
That is, global taint tracking extends global data flow with additional non-value-preserving steps.
The global taint tracking library uses the same configuration module as the global data flow library. You can perform taint flow analysis using ``TaintTracking::Global``:
Global taint tracking relates to global data flow in the same way that local taint tracking relates to local data flow.
In other words, global taint tracking extends global data flow with additional non-value-preserving steps.
The global taint tracking library uses the same configuration module as the global data flow library. You can perform taint flow analysis using ``TaintTracking::Global``:

- Since this is a taint-tracking query, the ``TaintTracking::Global`` module is used.
- The ``isSource`` predicate defines sources as any ``StringLiteralExpr``.
- The ``isSink`` predicate defines sinks as arguments to a ``CallExpr`` called "password".
- The sources and sinks may need tuning to a particular use, for example, if passwords are represented by a type other than ``String`` or passed in arguments of a different name than "password".
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The sources and sinks may need tuning to a particular use, for example, if passwords are represented by a type other than ``String`` or passed in arguments of a different name than "password".
- The sources and sinks may need to be adjusted for a particular use. For example, passwords might be represented by a type other than ``String`` or passed in arguments with a different name than "password".

CodeQL ships with a library for analyzing Rust code. The classes in this library present the data from a CodeQL database in an object-oriented form and provide
abstractions and predicates to help you with common analysis tasks.

The library is implemented as a set of CodeQL modules, that is, files with the extension ``.qll``. The
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The library is implemented as a set of CodeQL modules, that is, files with the extension ``.qll``. The
The library is implemented as a set of CodeQL modules, which are files with the extension ``.qll``. The

import rust
The CodeQL libraries model various aspects of Rust code. The above import includes the abstract syntax tree (AST) library, which is used for locating program elements
to match syntactic elements in the source code. This can be used for example to find values, patterns and structures.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to match syntactic elements in the source code. This can be used for example to find values, patterns and structures.
to match syntactic elements in the source code. This can be used to find values, patterns, and structures.

The CodeQL libraries model various aspects of Rust code. The above import includes the abstract syntax tree (AST) library, which is used for locating program elements
to match syntactic elements in the source code. This can be used for example to find values, patterns and structures.

The control flow graph (CFG) is imported using
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The control flow graph (CFG) is imported using
The control flow graph (CFG) is imported using:

Comment on lines +31 to +33
The CFG models the control flow between statements and expressions, for example whether one expression can
be evaluated before another expression, or whether an expression "dominates" another one, meaning that all paths to an
expression must flow through another expression first.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The CFG models the control flow between statements and expressions, for example whether one expression can
be evaluated before another expression, or whether an expression "dominates" another one, meaning that all paths to an
expression must flow through another expression first.
The CFG models the control flow between statements and expressions. For example, it can determine whether one expression can
be evaluated before another expression, or whether an expression "dominates" another one, meaning that all paths to an
expression must flow through another expression first.

be evaluated before another expression, or whether an expression "dominates" another one, meaning that all paths to an
expression must flow through another expression first.

The data flow library is imported using
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The data flow library is imported using
The data flow library is imported using:

Comment on lines +41 to +43
Data flow tracks the flow of data through the program, including through function calls (interprocedural data flow) and between steps in a job or workflow.
Data flow is particularly useful for security queries, where untrusted data flows to vulnerable parts of the program. Related to data flow is the taint-tracking library,
which finds how data can *influence* other values in a program, even when it is not copied exactly.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Data flow tracks the flow of data through the program, including through function calls (interprocedural data flow) and between steps in a job or workflow.
Data flow is particularly useful for security queries, where untrusted data flows to vulnerable parts of the program. Related to data flow is the taint-tracking library,
which finds how data can *influence* other values in a program, even when it is not copied exactly.
Data flow tracks the flow of data through the program, including across function calls (interprocedural data flow) and between steps in a job or workflow.
Data flow is particularly useful for security queries, where untrusted data flows to vulnerable parts of the program. The taint-tracking library is related to data flow,
and helps you find how data can *influence* other values in a program, even when it is not copied exactly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation ready-for-doc-review This PR requires and is ready for review from the GitHub docs team. Rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants