qurator-spk/dinglehopper

Check licenses of used libraries

mikegerber opened this issue ยท 24 comments

dinglehopper is Apache-licensed. All libraries used as libraries need to have a compatible license, e.g. BSD, MIT, Apache or public domain. GPL-licensed programs used seem to be fine. See also #48 for a relevant discussion.

Checklist from requirements*.txt:

  • click
  • jinja2
  • lxml
  • uniseg
  • numpy
  • colorama
  • MarkupSafe
  • ocrd >= 2.20.1
  • attrs
  • multimethod == 1.3
  • tqdm
  • pytest
  • pytest-flake8
  • pytest-cov
  • pytest-mypy
  • black

click: BSD License (BSD-3-Clause)

jinja2: BSD License (BSD-3-Clause)

lxml: BSD License (BSD)

uniseg: MIT License (MIT)

numpy: BSD License (BSD)

colorama: BSD License (BSD)

MarkupSafe: BSD License (BSD-3-Clause)

ocrd: Apache License 2.0

attrs: MIT License (MIT)

multimethod: Apache Software License

tqdm: MIT License, Mozilla Public License 2.0

pytest: MIT License (MIT)

pytest-flake8: BSD License (BSD License)

(We are also not linking to it.)

pytest-cov: BSD License (MIT)

(We are also not linking to it.)

pytest-mypy: MIT License (MIT)

(We are also not linking to it)

black: MIT License (MIT)

(We are also not linking to it)

All libraries used use - to the best of my knowledge - compatible licenses. ๐Ÿ‘

@b2m expressed interest in creating a CI job to regularly check for licensing problems (#48), so I am reopening.

My notes:

  • pip-licenses --allow-only="MIT License;BSD License;Apache" (addtion of Apache is untested) seems to be an interesting approach
    • I like that it's a whitelist
    • I like that it also checks transitively (by checking everything installed by pip)
  • It would be nice to keep this out of the usual test suite due to (possible) network activity (I haven't checked where it gets the license info from, though)
  • I might do the check in ocrd-galley eventually, because the builds there are network heavy already, but it does not hurt to explore software options in this project's CI

(Keeping it short as I am on my free day actually ๐Ÿ˜œ )

b2m commented

(Keeping it short as I am on my free day actually ๐Ÿ˜œ )

Free day? Same as yesterday and the day before yesterday... ๐Ÿ˜‰

I have three proposals, let me know which one(s) you'd like to try:

licensed

  • https://github.com/github/licensed
  • Provided by GitHub
  • Available as GitHub Action
  • Supports multiple technologies
  • Integrated in GitHub with Pull Requests and Branches
  • Using configuration files

LicenseFinder

pip-licenses

My thoughts:

  • If it is ok to only check Python dependencies I would give pip-licenses a try as the setup is quite simple.
  • If you want to check the licenses of other technologies as well (like the JavaScript dependencies in dinglehopper =) I would try licensed, as the integration in GitHub already is provided.

I have three proposals, let me know which one(s) you'd like to try:

While having support for JavaScript is certainly interesting, the tools seem to require Bower/Yarn/or npm(?) for that, and switching to that is maybe a bit overkill for the three JS dependencies :) (Might do it anyway because of #2 someday.)

pip-licenses seems to be a simple solution for Python dependencies, so maybe try that first ๐Ÿ‘ If it can do the license checking offline from a previously set up venv, that would be the best case.

pip-licenses
* Supports only whitelisting

I thought --fail-on supports blacklisting, but I'd prefer whitelisting anyway.

My thoughts:
* If it is ok to only check Python dependencies I would give pip-licenses a try as the setup is quite simple.
* If you want to check the licenses of other technologies as well (like the JavaScript dependencies in dinglehopper =) I would try licensed, as the integration in GitHub already is provided.

๐Ÿ‘ Note that we're currently using CircleCI and while I'm not super passionate about it I am super passionate about not switiching CI systems every few months ;-)

pip-licenses seems to be a simple solution for Python dependencies, so maybe try that first +1 If it can do the license checking offline from a previously set up venv, that would be the best case.

It seems to work offline!

pip-licenses seems to be a simple solution for Python dependencies, so maybe try that first +1 If it can do the license checking offline from a previously set up venv, that would be the best case.
It seems to work offline!

In other words: From my point of view, this makes it suitable for the normal test suite

b2m commented

While having support for JavaScript is certainly interesting, the tools seem to require Bower/Yarn/or npm(?) for that, and switching > to that is maybe a bit overkill for the three JS dependencies :) (Might do it anyway because of #2 someday.)

Yes (packacking tool), Yes (overkill) and Yes (switching someday) =)

I thought --fail-on supports blacklisting, but I'd prefer whitelisting anyway.

No idea why I had this in my notes... striked it out in my original comment.

๐Ÿ‘ Note that we're currently using CircleCI and while I'm not super passionate about it I am super passionate about not
switiching CI systems every few months ;-)

Why switching? Just use all in parallel ๐Ÿ˜†


Regarding the integration of pip-licenses:

  1. The "cleanest" way would be to introduce version pinning (maybe with the help of pip-tools) and only run license-checks when requirements.txt changes.
  2. The "fastest" way (regarding integration) would be to run a license-check as extra step after each test run on each python-version.
  3. A compromise would be to have a license-check workflow restricted e.g. to the master branch.

I've added a pre-commit hook for this in 3233dbc.