-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Rust: Diff-informed queries: phase 3 (non-trivial locations) #20081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables diff-informed mode on Rust security queries that select non-trivial locations (locations other than dataflow source or sink). The change adds diff-informed query capabilities to improve incremental analysis performance by only analyzing code changes that are relevant to the query results.
- Adds
observeDiffInformedIncrementalMode()
predicate to enable diff-informed mode on 8 security queries - Implements custom location override in
AccessAfterLifetime.ql
to return the actual selected locations - Maintains existing query logic while adding incremental analysis optimization
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
AccessInvalidPointer.ql | Adds diff-informed mode enablement |
AccessAfterLifetime.ql | Adds diff-informed mode with custom location override for target variables |
UncontrolledAllocationSize.ql | Adds diff-informed mode enablement |
CleartextLogging.ql | Adds diff-informed mode enablement |
CleartextTransmission.ql | Adds diff-informed mode enablement |
SqlInjection.ql | Adds diff-informed mode enablement |
TaintedPath.ql | Adds diff-informed mode enablement |
RegexInjection.ql | Adds diff-informed mode enablement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than rust/access-after-lifetime-ended
these are very straightforward.
I have a couple of new Rust queries in progress, should I make similar changes to those before they are merged? And is there anything I should do to test if I do that (beyond letting CI run as usual)?
Location getASelectedSourceLocation(DataFlow::Node source) { | ||
exists(Variable target, DataFlow::Node sink | result = target.getLocation() | | ||
isSink(sink) and | ||
AccessAfterLifetime::dereferenceAfterLifetime(source, sink, target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not check the flowPath
exists as well?
I guess it may not be required for correctness actually ... but dereferenceAfterLifetime
is bindingset[source, sink]
for performance reasons, I'm a little bit concerned that not checking the flowPath
here may greatly increase the amount we compute for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, we can't check flowPath
because that would lead to non-monotonic recursion (getASelected…Location
is negated somewhere in flowPath
).
DCA shows a slight (probably within noise range) increase in neon-empty-diff
, so it's possible this has a negative impact.
Maybe an alternative is to include the rest of the clauses from the where
, which might reduce the search space before calling dereferenceAfterLifetime
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the unsafe
block check in particular may narrow things down a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the updated commit, I've used the other clauses from the where
. Running the test for this query appears to be slightly faster:
BEFORE:
Completed in 2m16s (extract 1m16s comp 17.7s eval 34.8s).
Completed in 2m31s (extract 1m42s comp 3.6s eval 37.9s).
AFTER:
Completed in 2m19s (extract 1m22s comp 17.5s eval 28.9s).
Completed in 2m5s (extract 1m20s comp 9s eval 29.1s).
Sure, you can go ahead and enable diff-informed on new queries. Just make sure that they have qlref tests, because that's a prerequisite for the |
276effe
to
83fe9e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This PR enables diff-informed mode on queries that select a location other than dataflow source or sink. This entails adding a non-trivial location override that returns the locations that are actually selected.
Prior work includes PRs like #19663, #19759, and #19817. This PR uses the same patch script as those PRs to find candidate queries to convert to diff-enabled. This is the final step in mass-enabling diff-informed queries on all the languages.
Commit-by-commit reviewing is recommended.