Skip to content

Fix false positive detection heuristics #4009

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

alexzurbonsen
Copy link
Contributor

@alexzurbonsen alexzurbonsen commented Dec 9, 2024

Context

The newly introduced conditions on rule relevance and copyrights for false positive detections, see f9863e61, contain a bug that makes them detect copyrights and rule relevance if matches do not even have a matched_text or the rule doesn't even have a relevance attribute.

Fixes #4005 (at least for the example I am looking at)

Summary

Correct the any and all conditions to correctly detect copyrights and relevance.

Test plan

I reran the scan of the file from #4005

scancode -l <your path to boost repo>/boost/boost/assign/ptr_list_of.hpp --json scancode.json

and get a scancode.json that has the same contents as on v32.2.1.

scancode_fix.json

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

@alexzurbonsen
Copy link
Contributor Author

alexzurbonsen commented Dec 9, 2024

@AyanSinhaMahapatra This should fix the issue with the false positive license detection heuristics. #4005

It feels like there should be a test for this. If so, can you help me write one? It seemed a bit more involved and I wanted to get this out quickly.

@alexzurbonsen alexzurbonsen force-pushed the fix-regression-false-positive-detection branch from 7212179 to 35029f3 Compare December 9, 2024 10:54
@alexzurbonsen alexzurbonsen marked this pull request as ready for review December 9, 2024 10:55
@alexzurbonsen
Copy link
Contributor Author

There is a failing test tests/licensedcode/test_plugin_license_detection.py::test_complicated_license_text_from_ffmpeg It seems that it is relying on the buggy logic. I would be glad for some help on this one @AyanSinhaMahapatra, since I am not sure yet, what the appropriate change is.

@alexzurbonsen alexzurbonsen force-pushed the fix-regression-false-positive-detection branch from 35029f3 to ee32157 Compare December 10, 2024 17:54
@alexzurbonsen
Copy link
Contributor Author

alexzurbonsen commented Dec 10, 2024

Update: I scanned plugin_license/scan/ffmpeg-LICENSE.md with the scancode version from this branch and pasted the result into the expected file. The changes seem reasonable to me.

@alexzurbonsen
Copy link
Contributor Author

@AyanSinhaMahapatra and @pombredanne Would be great to get your feedback on this one. The bug currently blocking us from updating. It would seem that it basically renders the is_false_positive function useless, since it always returns false, and thus torpedoes false positive detection.

@alexzurbonsen
Copy link
Contributor Author

Friendly ping @AyanSinhaMahapatra. It has been a few months, would be nice to get your feedback :)

@AyanSinhaMahapatra
Copy link
Member

@alexzurbonsen apologies for the quite large delay on this and other PR reviews, looking at this now 😅

Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexzurbonsen Thanks++ for spotting and fixing this bug, even though we had the right approach, we were not detecting false positives correctly because of this bug.

For now a simple test with a stripped version of the file at https://github.com/steinwurf/boost/blob/ade3189e2c03fd975dbfa667a4f49e98a49d2fdf/boost/assign/ptr_list_of.hpp#L196 (where gpl one word matches are correctly detected as clues) should suffice as a test to protect against any further regressions. Your test expectation updates also correct one such case from

are incompatible with the GPLv2 and v3. We do not know for certain if their
licenses are compatible with the LGPL.

In future we probably need to add one test case per case in the implementation of the is_false_positive function, i.e. better unit tests, which the detection.py lacks 😅 I will open a follow up issue on this.

This fixes #4005 and #3270 too (thanks @alok1304 for pointing this out.)

@@ -1,4 +1,41 @@
{
"headers": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header portion is not required, we remove this to reduce test expectation updates.

Due to wrong handling of any and all functions license matches are
categorized as having full relevance or copyrights, even if they
do not. This leads to a regression in false positive detection.

Correct the any and all conditions to correctly detect copyrights
and relevance.

Signed-off-by: alexzurbonsen <[email protected]>
@alexzurbonsen alexzurbonsen force-pushed the fix-regression-false-positive-detection branch from ee32157 to 9872108 Compare March 17, 2025 19:14
@alexzurbonsen
Copy link
Contributor Author

Hey @AyanSinhaMahapatra, thanks for the prompt review. I removed the header.

Do you want me to add the regression test against the boost file you mention?

@AyanSinhaMahapatra
Copy link
Member

@alexzurbonsen we have this tested allright, so no worries. Merging!

@AyanSinhaMahapatra AyanSinhaMahapatra merged commit 467449a into aboutcode-org:develop Apr 16, 2025
36 of 38 checks passed
@AyanSinhaMahapatra
Copy link
Member

Thanks++ for adding this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression: GPL false positive license detections with v32.3.0
2 participants