-
-
Notifications
You must be signed in to change notification settings - Fork 601
Description
Context
We are reporting too many false positive licenses. We need to fix this!
Problem
There are several false cases, yet they boil down to these types:
-
False detection of very short and weak license detection rules detected exactly such as:
- a URL or a project name such as a URL to a well known AGPL-licensed which is not always a sign of AGPL as in False positive AGPL detection from a mere URL #2877
- the detection of the word
GPL
in a binary Tracing "Start Line" of ScanCode report back to the Binary file. #2874 - the detection of longer
may not be modified
in False-positiveproprietary-license
finding in Guava source code #2865
-
Detection of a license text or notice fragment which is too weak to represent a bona fide license detection alone.
-
Detection of longer unknown license references such as
- a "license introduction" (as in "This is licensed under....") that may be noisy when followed by a bona fide license notice or text.
- a license reference to the license in a file (as in "See file COPYING for license") where we can follow the reference
-
Lack of proper detection of a structured license tag found in a package manifest which is returned as an unknown license
-
When fragments of the same license are detected with only copyrights added in between as in license detection: Add the nunit license #2859
-
When sequence of SPDX licenses id are found in license detection tools
-
Please add yours!
Solution elements
We could treat and report separately mere clues such as this one: they could be an interesting insight in some cases, but alone they are too weak to be considered a license detection
The upcoming two-step process where license matches are grouped in a license detection is another way to consider. We could detect patterns of license matches that could be resolved in a detection. For instance a license intro followed by a license notice.
The scancode-analyzer heuristics and ML-based detection of false positive is another way