clearlydefined/service

License file not detected after scanning

Opened this issue · 17 comments

Package license is is named correctly and it at the root, but scanning did not pick it.

License file not detected with only two scanners used: https://clearlydefined.io/definitions/nuget/nuget/-/ReportGenerator/4.0.5-rc3

This version had three scanners used and the license file was detected: https://clearlydefined.io/definitions/nuget/nuget/-/ReportGenerator/2.1.1

Partially harvested package missed "LICENSE" in "Files" section - https://clearlydefined.io/definitions/pod/cocoapods/-/Flipper-Boost-iOSX/1.76.0.1.11

An example of all scanners being run but the LICENSE.html in "FIles" was not marked "NOASSERTION" - https://clearlydefined.io/definitions/npm/npmjs/@ag-grid-enterprise/core/23.1.1

The scanners also missed at the file level (example: server-side-row-model.cjs.js - https://clearlydefined.io/file/6af929081ba05472bf83e5d8fe13bdbc84f598ce261ea87a4775c10fcc7d4223) the reference to "@license Commercial." ClearlyDefined marked that file with "MIT," which is accurate - there's references to MIT on that file. However, I would have expected the tooling to put "MIT AND NOASSERTION" for that file given the @license Commercial reference. Here's the definition page: https://clearlydefined.io/definitions/npm/npmjs/@ag-grid-enterprise/server-side-row-model/23.1.1.

Another example - on this one (https://clearlydefined.io/definitions/nuget/nuget/-/RequireJS/2.1.14), scanners did not pick up license info on the files (e.g. https://clearlydefined.io/file/7a10cbce6ec24bc6769d7a8de5f99620c1635dbead60712225404dde13d424f9). That file should have said "NOASSERTION" because it has license info on the file. Or better, it would have said MIT OR BSD-3-Clause.
image
Instead, it's just blank
image

Update - all components that hadn't been scanned by all three scanners have successfully been reharvested!

I have not yet addressed the ones that had all the scanners run on them, but showed what appeared to be an incorrect license.

After the scancode upgrade, of the 14 components reported here, the following components seemed to be resolved in my local environment and the dev server

  1. https://dev.clearlydefined.io/definitions/pod/cocoapods/-/FirebaseCore/6.10.2
    Declared
    Apache-2.0
    Discovered
    Apache-2.0, MIT

  2. https://dev.clearlydefined.io/definitions/nuget/nuget/-/Blazorise.Charts/0.9.3.6
    Declared
    MIT
    Discovered
    MIT

  3. https://dev.clearlydefined.io/definitions/npm/npmjs/@ag-grid-enterprise/server-side-row-model/23.1.1
    Expected "MIT AND NOASSERTION" for the license file
    Declared
    NOASSERTION
    Discovered
    MIT, MIT AND NOASSERTION (as expected)

  4. https://dev.clearlydefined.io/definitions/nuget/nuget/-/RequireJS/2.1.14
    Files>content>scripts>require.js is displayed as "MIT OR BSD-3-Clause"

  5. https://dev.clearlydefined.io/definitions/git/github/xiang90/probing/43a291ad63a214a207fefbf03c7d9d78b703162b
    all scanners triggered.
    Declared
    MIT
    Discovered
    MIT

  6. https://dev.clearlydefined.io/definitions/git/github/quobyte/api/9cfd29338dd9fdaaf956b7082e5550aab5fe3841
    Declared
    BSD-3-Clause
    Discovered
    BSD-3-Clause

  7. https://dev.clearlydefined.io/definitions/git/github/prometheus-junkyard/tsdb/d48a5e2d5c34116dfcbc7b935c66157847b2d8b5
    Declared
    Apache-2.0
    Discovered
    Apache-2.0, Apache-2.0 AND BSD-2-Clause AND NO

  8. https://dev.clearlydefined.io/definitions/git/github/fluent/fluentd-kubernetes-daemonset/fcdf045fec92b70f40c05cba3a00117ed0c11547
    Declared
    Apache-2.0
    Discovered
    Apache-2.0

  9. https://dev.clearlydefined.io/definitions/nuget/nuget/-/EPPlus/5.0.0-beta
    "LICENSE" present (vs missing prior) in "Files" section
    Declared
    NOASSERTION
    Discovered
    LGPL-2.0-or-later AND PolyForm-Noncommercial-1.0.0 OR NOASSERTIONNOASS

Scancode result missing for the following components:

  1. pod/cocoapods/-/FirebaseInstanceID/7.2.0
  2. pod/cocoapods/-/Flipper-Boost-iOSX/1.76.0.1.11 (only in my local dev env, was successfully scanned on the dev server)

License still missing in "Declared" for the following 4 components

  1. https://dev.clearlydefined.io/definitions/nuget/nuget/-/ReportGenerator/4.0.5-rc3
    Declared
    NOASSERTION
    Discovered
    Apache-2.0, Apache-2.0 AND MIT AND WTFPL OR MIT
  2. https://dev.clearlydefined.io/definitions/maven/mavencentral/org.flywaydb/flyway-core/7.7.2
    Declared
    NOASSERTION
    Discovered
    Apache-2.0
  3. https://dev.clearlydefined.io/definitions/npm/npmjs/@ag-grid-enterprise/core/23.1.1
    Declared
    NOASSERTION
    Discovered
    MIT, MIT AND NOASSERTION
  4. https://dev.clearlydefined.io/definitions/nuget/nuget/-/EPPlus/5.0.0-beta
    Declared
    NOASSERTION
    Discovered
    LGPL-2.0-or-later AND PolyForm-Noncommercial-1.0.0 OR NOASSERTION
    NOASSERTION
    NOASSERTION AND PolyForm-Noncommercial-1.0.0
    PolyForm-Noncommercial-1.0.0

Another case of "License file not detected" can be found at https://clearlydefined.io/definitions/crate/cratesio/-/mpmc/0.1.6 "NOASSERTION" was declared in the licensed section. BSD-2-Clause-Views is listed as discovered.

For composer components, the files are packaged as one directory in the root directory. The license file does not seemed to picked up. Test case: https://clearlydefined.io/definitions/composer/packagist/mmucklo/krumo/0.7.0

Test case: https://clearlydefined.io/definitions/composer/packagist/colinmollenhour/cache-backend-redis/1.14.4
Scancode correctly declared it as "BSD-3-Clause-Modification", which was converted to NoAssertion after SPDX.normallize. "BSD-3-Clause-Modification" was introduced in spdx-license-list 3.17. Need to update spdx to address this case.

@capfei @ariel11 I triggered harvest on dev for all the components discussed here, and so that the definition can be verified without curation.

  1. The following looks resolved. Could you please confirm?
    https://dev.clearlydefined.io/definitions/pod/cocoapods/-/FirebaseCore/6.10.2
    https://dev.clearlydefined.io/definitions/nuget/nuget/-/Blazorise.Charts/0.9.3.6
    https://dev.clearlydefined.io/definitions/git/github/xiang90/probing/43a291ad63a214a207fefbf03c7d9d78b703162b
    https://dev.clearlydefined.io/definitions/git/github/quobyte/api/9cfd29338dd9fdaaf956b7082e5550aab5fe3841
    https://dev.clearlydefined.io/definitions/git/github/prometheus-junkyard/tsdb/d48a5e2d5c34116dfcbc7b935c66157847b2d8b5
    https://dev.clearlydefined.io/definitions/git/github/fluent/fluentd-kubernetes-daemonset/fcdf045fec92b70f40c05cba3a00117ed0c11547
    https://dev.clearlydefined.io/definitions/nuget/nuget/-/ReportGenerator/4.0.5-rc3
    https://dev.clearlydefined.io/definitions/nuget/nuget/-/EPPlus/5.0.0-beta
    https://dev.clearlydefined.io/definitions/pod/cocoapods/-/FirebaseInstanceID/7.2.0
    https://dev.clearlydefined.io/definitions/pod/cocoapods/-/Flipper-Boost-iOSX/1.76.0.1.11

  2. The following packages are marked commercial in packages.json and in registry data, scancode also reported commercial for them:
    https://dev.clearlydefined.io/definitions/npm/npmjs/@ag-grid-enterprise/server-side-row-model/23.1.1
    https://dev.clearlydefined.io/definitions/npm/npmjs/@ag-grid-enterprise/core/23.1.1

CD marked them as "NoAssertion", because there is no SPDX identifier for "commercial". Curation might be the way to address this type of components. What do you think?

  1. https://dev.clearlydefined.io/definitions/nuget/nuget/-/RequireJS/2.1.14, the license is not declared as SPDX identifier in registry and the license file is not present in the root directory (best practice), thus hard for scanners to detect. This may be another case for curation?

  2. Still outstanding
    https://dev.clearlydefined.io/definitions/maven/mavencentral/org.flywaydb/flyway-core/7.7.2

capfei commented

@qtomlinson Yes, I looked at these and verified that what came back as declared matches the license information in the components.

Thank you for your work!

@capfei Fix for maven (point 4) has been merged and deployed on dev: https://dev.clearlydefined.io/definitions/maven/mavencentral/org.flywaydb/flyway-core/7.7.2. The declared license is no longer NOASSERTION.
image
Thanks @elrayle for review!