Skip to content

Apply rules to annotate candidates (in addition to the ML part of the filter-rank module) #150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Riruk opened this issue May 11, 2021 · 12 comments

Comments

@Riruk
Copy link
Collaborator

Riruk commented May 11, 2021

We could use the hard-coded rules to identify some commits relevant to security fixes.

For example: if a commit message says that this commit is a fix for a particular CVE, then it is a very strong candidate for the commit to be related to the vulnerability fix.

I will create a PR for the implementation part and I propose to have this issue to discuss possible hard-coded rules that we could create.

@Riruk
Copy link
Collaborator Author

Riruk commented May 11, 2021

The initial work is done in the context of the following pull request: #149

@copernico
Copy link
Contributor

Another rule could be something like:

  • the advisory mentions a file (or class) and a given candidate is the only one (or one of the few..) that touches that file (class)

@copernico copernico changed the title Use hard-coded rules before actual ML part of the filter-rank module Use hard-coded rules before (or in addition to) the ML part of the filter-rank module May 11, 2021
@copernico
Copy link
Contributor

Another obvious one:

  • the Advisory mentions the commit at hand

NOTE: this is not conclusive per-se (some advisories point to commits that contain changelog changes, not the actual code fixes...)

@copernico
Copy link
Contributor

As for the naming: I guess we could settle on calling these just "rules" (implying "manually-defined", or "handcrafted"),
unless the context might cause ambiguity.

@copernico
Copy link
Contributor

Hi @Riruk, any progress on this issue?

@Riruk
Copy link
Collaborator Author

Riruk commented May 17, 2021

Hi @copernico, I was a bit busy last week with a paper. It's submitted now, so I will come back to working on the prospector tool from tomorrow

@Riruk
Copy link
Collaborator Author

Riruk commented May 25, 2021

What about adding a rule "contains text in commit message"?

@copernico
Copy link
Contributor

What about adding a rule "contains text in commit message"?

The "text" would be a keyword extracted from the advisory? If so, yes, definitely a useful rule.

@copernico
Copy link
Contributor

Work continues in #161

@copernico
Copy link
Contributor

I think we need to elaborate on the representation of the results to be shown to the user before proceeding.
I would propose that, after the candidates are obtained from Git and they are processed to compute their features, we apply one or more "analysis" steps that produce "annotations" to be attached to the candidates. The user will then be able to inspect the results by seeing which "annotations" are attached to each candidate.

Examples:

commit X1 from repository R

  • Reason: token "user" in advisory matches the path "src/main/whatever/User.java" changed in the commit
  • Reason: the commit is in tag "v1.2.0" but not in the subsequent "v1.2.1"

Note: the reasons are human readable, but are constructed automatically based on the candidate features and annotations.

@copernico
Copy link
Contributor

One point I'm not quite sure how to handle is the conceptual distinction between the above annotations and the features computed with the extract_* functions. In some sense, they look the same. Maybe, the annotations are what the user will see...?

@copernico copernico changed the title Use hard-coded rules before (or in addition to) the ML part of the filter-rank module Apply rules to annotate candidates (in addition to the ML part of the filter-rank module) May 31, 2021
@Riruk
Copy link
Collaborator Author

Riruk commented Jun 8, 2021

Work continues in #175

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants