-
-
Notifications
You must be signed in to change notification settings - Fork 600
Fix false positive detection heuristics #4009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix false positive detection heuristics #4009
Conversation
@AyanSinhaMahapatra This should fix the issue with the false positive license detection heuristics. #4005 It feels like there should be a test for this. If so, can you help me write one? It seemed a bit more involved and I wanted to get this out quickly. |
7212179
to
35029f3
Compare
There is a failing test |
35029f3
to
ee32157
Compare
Update: I scanned |
@AyanSinhaMahapatra and @pombredanne Would be great to get your feedback on this one. The bug currently blocking us from updating. It would seem that it basically renders the |
Friendly ping @AyanSinhaMahapatra. It has been a few months, would be nice to get your feedback :) |
@alexzurbonsen apologies for the quite large delay on this and other PR reviews, looking at this now 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexzurbonsen Thanks++ for spotting and fixing this bug, even though we had the right approach, we were not detecting false positives correctly because of this bug.
For now a simple test with a stripped version of the file at https://github.com/steinwurf/boost/blob/ade3189e2c03fd975dbfa667a4f49e98a49d2fdf/boost/assign/ptr_list_of.hpp#L196 (where gpl one word matches are correctly detected as clues) should suffice as a test to protect against any further regressions. Your test expectation updates also correct one such case from
scancode-toolkit/tests/licensedcode/data/plugin_license/scan/ffmpeg-LICENSE.md
Lines 100 to 101 in ee32157
are incompatible with the GPLv2 and v3. We do not know for certain if their | |
licenses are compatible with the LGPL. |
In future we probably need to add one test case per case in the implementation of the is_false_positive
function, i.e. better unit tests, which the detection.py
lacks 😅 I will open a follow up issue on this.
This fixes #4005 and #3270 too (thanks @alok1304 for pointing this out.)
@@ -1,4 +1,41 @@ | |||
{ | |||
"headers": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header portion is not required, we remove this to reduce test expectation updates.
Due to wrong handling of any and all functions license matches are categorized as having full relevance or copyrights, even if they do not. This leads to a regression in false positive detection. Correct the any and all conditions to correctly detect copyrights and relevance. Signed-off-by: alexzurbonsen <[email protected]>
ee32157
to
9872108
Compare
Hey @AyanSinhaMahapatra, thanks for the prompt review. I removed the header. Do you want me to add the regression test against the boost file you mention? |
@alexzurbonsen we have this tested allright, so no worries. Merging! |
467449a
into
aboutcode-org:develop
Thanks++ for adding this. |
Context
The newly introduced conditions on rule relevance and copyrights for false positive detections, see f9863e61, contain a bug that makes them detect copyrights and rule relevance if matches do not even have a
matched_text
or the rule doesn't even have arelevance
attribute.Fixes #4005 (at least for the example I am looking at)
Summary
Correct the any and all conditions to correctly detect copyrights and relevance.
Test plan
I reran the scan of the file from #4005
and get a scancode.json that has the same contents as on v32.2.1.
scancode_fix.json
Tasks
Run tests locally to check for errors.