License file not detected after scanning
Opened this issue · 17 comments
Package license is is named correctly and it at the root, but scanning did not pick it.
License file not detected with only two scanners used: https://clearlydefined.io/definitions/nuget/nuget/-/ReportGenerator/4.0.5-rc3
This version had three scanners used and the license file was detected: https://clearlydefined.io/definitions/nuget/nuget/-/ReportGenerator/2.1.1
https://clearlydefined.io/definitions/pod/cocoapods/-/FirebaseInstanceID/7.2.0
https://clearlydefined.io/definitions/pod/cocoapods/-/FirebaseCore/6.10.2
https://clearlydefined.io/definitions/nuget/nuget/-/EPPlus/5.0.0-beta
https://clearlydefined.io/definitions/nuget/nuget/-/Blazorise.Charts/0.9.3.6
https://clearlydefined.io/definitions/maven/mavencentral/org.flywaydb/flyway-core/7.7.2
Partially harvested package missed "LICENSE" in "Files" section - https://clearlydefined.io/definitions/pod/cocoapods/-/Flipper-Boost-iOSX/1.76.0.1.11
An example of all scanners being run but the LICENSE.html in "FIles" was not marked "NOASSERTION" - https://clearlydefined.io/definitions/npm/npmjs/@ag-grid-enterprise/core/23.1.1
The scanners also missed at the file level (example: server-side-row-model.cjs.js - https://clearlydefined.io/file/6af929081ba05472bf83e5d8fe13bdbc84f598ce261ea87a4775c10fcc7d4223) the reference to "@license Commercial." ClearlyDefined marked that file with "MIT," which is accurate - there's references to MIT on that file. However, I would have expected the tooling to put "MIT AND NOASSERTION" for that file given the @license Commercial reference. Here's the definition page: https://clearlydefined.io/definitions/npm/npmjs/@ag-grid-enterprise/server-side-row-model/23.1.1.
Another example - on this one (https://clearlydefined.io/definitions/nuget/nuget/-/RequireJS/2.1.14), scanners did not pick up license info on the files (e.g. https://clearlydefined.io/file/7a10cbce6ec24bc6769d7a8de5f99620c1635dbead60712225404dde13d424f9). That file should have said "NOASSERTION" because it has license info on the file. Or better, it would have said MIT OR BSD-3-Clause.
Instead, it's just blank
Partially harvested, 1 scanner, MIT license: https://clearlydefined.io/definitions/git/github/xiang90/probing/43a291ad63a214a207fefbf03c7d9d78b703162b
Another example of the scanner not detecting the LICENSE file - https://clearlydefined.io/definitions/git/github/fluent/fluentd-kubernetes-daemonset/fcdf045fec92b70f40c05cba3a00117ed0c11547
I believe the majority of these will be fixed by me upgrading the Crawler infrastructure (as detailed in #841).
So far, these components have been successfully reharvested with all three scanning tools:
DONE - https://clearlydefined.io/definitions/nuget/nuget/-/ReportGenerator/4.0.5-rc3
DONE - https://clearlydefined.io/definitions/pod/cocoapods/-/FirebaseInstanceID/7.2.0
DONE - https://clearlydefined.io/definitions/pod/cocoapods/-/FirebaseCore/6.10.2
DONE - https://clearlydefined.io/definitions/nuget/nuget/-/EPPlus/5.0.0-beta
DONE - https://clearlydefined.io/definitions/nuget/nuget/-/Blazorise.Charts/0.9.3.6
DONE - https://clearlydefined.io/definitions/nuget/nuget/-/RequireJS/2.1.14
DONE - https://clearlydefined.io/definitions/maven/mavencentral/org.flywaydb/flyway-core/7.7.2
And these are in process or about to be harvested:
https://clearlydefined.io/definitions/pod/cocoapods/-/Flipper-Boost-iOSX/1.76.0.1.11
I will update here with the results!
Update - all components that hadn't been scanned by all three scanners have successfully been reharvested!
I have not yet addressed the ones that had all the scanners run on them, but showed what appeared to be an incorrect license.
After the scancode upgrade, of the 14 components reported here, the following components seemed to be resolved in my local environment and the dev server
-
https://dev.clearlydefined.io/definitions/pod/cocoapods/-/FirebaseCore/6.10.2
Declared
Apache-2.0
Discovered
Apache-2.0, MIT -
https://dev.clearlydefined.io/definitions/nuget/nuget/-/Blazorise.Charts/0.9.3.6
Declared
MIT
Discovered
MIT -
https://dev.clearlydefined.io/definitions/npm/npmjs/@ag-grid-enterprise/server-side-row-model/23.1.1
Expected "MIT AND NOASSERTION" for the license file
Declared
NOASSERTION
Discovered
MIT, MIT AND NOASSERTION (as expected) -
https://dev.clearlydefined.io/definitions/nuget/nuget/-/RequireJS/2.1.14
Files>content>scripts>require.js is displayed as "MIT OR BSD-3-Clause" -
https://dev.clearlydefined.io/definitions/git/github/xiang90/probing/43a291ad63a214a207fefbf03c7d9d78b703162b
all scanners triggered.
Declared
MIT
Discovered
MIT -
https://dev.clearlydefined.io/definitions/git/github/quobyte/api/9cfd29338dd9fdaaf956b7082e5550aab5fe3841
Declared
BSD-3-Clause
Discovered
BSD-3-Clause -
https://dev.clearlydefined.io/definitions/git/github/prometheus-junkyard/tsdb/d48a5e2d5c34116dfcbc7b935c66157847b2d8b5
Declared
Apache-2.0
Discovered
Apache-2.0, Apache-2.0 AND BSD-2-Clause AND NO -
https://dev.clearlydefined.io/definitions/git/github/fluent/fluentd-kubernetes-daemonset/fcdf045fec92b70f40c05cba3a00117ed0c11547
Declared
Apache-2.0
Discovered
Apache-2.0 -
https://dev.clearlydefined.io/definitions/nuget/nuget/-/EPPlus/5.0.0-beta
"LICENSE" present (vs missing prior) in "Files" section
Declared
NOASSERTION
Discovered
LGPL-2.0-or-later AND PolyForm-Noncommercial-1.0.0 OR NOASSERTIONNOASS
Scancode result missing for the following components:
- pod/cocoapods/-/FirebaseInstanceID/7.2.0
- pod/cocoapods/-/Flipper-Boost-iOSX/1.76.0.1.11 (only in my local dev env, was successfully scanned on the dev server)
License still missing in "Declared" for the following 4 components
- https://dev.clearlydefined.io/definitions/nuget/nuget/-/ReportGenerator/4.0.5-rc3
Declared
NOASSERTION
Discovered
Apache-2.0, Apache-2.0 AND MIT AND WTFPL OR MIT - https://dev.clearlydefined.io/definitions/maven/mavencentral/org.flywaydb/flyway-core/7.7.2
Declared
NOASSERTION
Discovered
Apache-2.0 - https://dev.clearlydefined.io/definitions/npm/npmjs/@ag-grid-enterprise/core/23.1.1
Declared
NOASSERTION
Discovered
MIT, MIT AND NOASSERTION - https://dev.clearlydefined.io/definitions/nuget/nuget/-/EPPlus/5.0.0-beta
Declared
NOASSERTION
Discovered
LGPL-2.0-or-later AND PolyForm-Noncommercial-1.0.0 OR NOASSERTION
NOASSERTION
NOASSERTION AND PolyForm-Noncommercial-1.0.0
PolyForm-Noncommercial-1.0.0
Another case of "License file not detected" can be found at https://clearlydefined.io/definitions/crate/cratesio/-/mpmc/0.1.6 "NOASSERTION" was declared in the licensed section. BSD-2-Clause-Views is listed as discovered.
For composer components, the files are packaged as one directory in the root directory. The license file does not seemed to picked up. Test case: https://clearlydefined.io/definitions/composer/packagist/mmucklo/krumo/0.7.0
Test case: https://clearlydefined.io/definitions/composer/packagist/colinmollenhour/cache-backend-redis/1.14.4
Scancode correctly declared it as "BSD-3-Clause-Modification", which was converted to NoAssertion after SPDX.normallize. "BSD-3-Clause-Modification" was introduced in spdx-license-list 3.17. Need to update spdx to address this case.
@capfei @ariel11 I triggered harvest on dev for all the components discussed here, and so that the definition can be verified without curation.
-
The following looks resolved. Could you please confirm?
https://dev.clearlydefined.io/definitions/pod/cocoapods/-/FirebaseCore/6.10.2
https://dev.clearlydefined.io/definitions/nuget/nuget/-/Blazorise.Charts/0.9.3.6
https://dev.clearlydefined.io/definitions/git/github/xiang90/probing/43a291ad63a214a207fefbf03c7d9d78b703162b
https://dev.clearlydefined.io/definitions/git/github/quobyte/api/9cfd29338dd9fdaaf956b7082e5550aab5fe3841
https://dev.clearlydefined.io/definitions/git/github/prometheus-junkyard/tsdb/d48a5e2d5c34116dfcbc7b935c66157847b2d8b5
https://dev.clearlydefined.io/definitions/git/github/fluent/fluentd-kubernetes-daemonset/fcdf045fec92b70f40c05cba3a00117ed0c11547
https://dev.clearlydefined.io/definitions/nuget/nuget/-/ReportGenerator/4.0.5-rc3
https://dev.clearlydefined.io/definitions/nuget/nuget/-/EPPlus/5.0.0-beta
https://dev.clearlydefined.io/definitions/pod/cocoapods/-/FirebaseInstanceID/7.2.0
https://dev.clearlydefined.io/definitions/pod/cocoapods/-/Flipper-Boost-iOSX/1.76.0.1.11 -
The following packages are marked commercial in packages.json and in registry data, scancode also reported commercial for them:
https://dev.clearlydefined.io/definitions/npm/npmjs/@ag-grid-enterprise/server-side-row-model/23.1.1
https://dev.clearlydefined.io/definitions/npm/npmjs/@ag-grid-enterprise/core/23.1.1
CD marked them as "NoAssertion", because there is no SPDX identifier for "commercial". Curation might be the way to address this type of components. What do you think?
-
https://dev.clearlydefined.io/definitions/nuget/nuget/-/RequireJS/2.1.14, the license is not declared as SPDX identifier in registry and the license file is not present in the root directory (best practice), thus hard for scanners to detect. This may be another case for curation?
-
Still outstanding
https://dev.clearlydefined.io/definitions/maven/mavencentral/org.flywaydb/flyway-core/7.7.2
@qtomlinson Yes, I looked at these and verified that what came back as declared matches the license information in the components.
Thank you for your work!
@capfei Fix for maven (point 4) has been merged and deployed on dev: https://dev.clearlydefined.io/definitions/maven/mavencentral/org.flywaydb/flyway-core/7.7.2. The declared license is no longer NOASSERTION.
Thanks @elrayle for review!