Skip to content

Refactor CommitWithFeatures based on the new "rule" mechanism #190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
copernico opened this issue Jun 21, 2021 · 3 comments
Closed

Refactor CommitWithFeatures based on the new "rule" mechanism #190

copernico opened this issue Jun 21, 2021 · 3 comments

Comments

@copernico
Copy link
Contributor

Now that we have rules, that are basically used to make commit features (or the result of some reasoning on those features) visible to the user, maybe some of the computation that we do upfront to produce certain features could be pushed to rules.
See also: #150 (comment)

@amilankovich-slab @Riruk @chicxurug What are your thoughts on this?

@Riruk
Copy link
Collaborator

Riruk commented Jun 21, 2021

I definitely support this idea. Otherwise, we perform double computation: first in feature_extractor (to compute the result of a rule), second in the filter_rank module (to generate the explanations).

I'd propose that the output of feature_extractor returns all the necessary information for generating the explanation. For example, extract_changes_relevant_path rule could return the list of changed paths (in oppose to the current version, when it returns whether there exists at least one changed path).

@copernico
Copy link
Contributor Author

@Riruk indeed, many features that are now booleans should return the 'raw' information instead wherever possible, so that the logic of how to convert that into a meaningful label for the user is differed until the presentation step.

@chicxurug
Copy link
Collaborator

chicxurug commented Jun 21, 2021

I also agree with this. These are the current feature extractor functions and here are the proposal how to change their return values. Do you agree? Also, what should be the return values instead of the "?"s?

  • extract_references_vuln_id (bool) -> stays like this
  • extract_time_between_commit_and_advisory_record (int) -> stays like this
  • extract_changes_relevant_path (bool) -> return the list of changed paths
  • extract_other_CVE_in_message (bool) -> list of all CVE ids mentioned the commit
  • extract_is_close_to_advisory_date (bool) -> remove that (it is more like a rule)
  • extract_referred_to_by_nvd (bool) ->list of all the references containing commit hash
  • extract_referred_to_by_pages_linked_from_advisories (bool) ->list of all the external pages that contain references to the commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants