numpy/numpy-release

Improve how we bundle extra licenses into wheels

Opened this issue · 2 comments

Right now we concatenate the licenses for libgfortran et al. into the main LICENSE.txt file, here:

# Update license
echo "" >> $NUMPY_SRC_DIR/LICENSE.txt
echo "----" >> $NUMPY_SRC_DIR/LICENSE.txt
echo "" >> $NUMPY_SRC_DIR/LICENSE.txt
if [[ $RUNNER_OS == "Linux" ]] ; then
cat $PROJECT_DIR/tools/wheels/LICENSE_linux.txt >> $NUMPY_SRC_DIR/LICENSE.txt
elif [[ $RUNNER_OS == "macOS" ]]; then
cat $PROJECT_DIR/tools/wheels/LICENSE_osx.txt >> $NUMPY_SRC_DIR/LICENSE.txt
elif [[ $RUNNER_OS == "Windows" ]]; then
cat $PROJECT_DIR/tools/wheels/LICENSE_win32.txt >> $NUMPY_SRC_DIR/LICENSE.txt
fi

That is quite ugly. It'd be much better to use PEP 639's support for separate license files, and add them directly into .dist-info. @oscarbenjamin shared a script that can do this at numpy/numpy#29535 (comment)

The script is here, and it is "called" from the pyproject.toml. Nice indeed.

Feel free to copy/adapt the script and include under whichever of numpy's licenses/copyright is most convenient (I didn't realise there were so many!).

Looking at it now one way that the script could go wrong is if the text License-File: appears anywhere in the long freetext description at the bottom of the METADATA file (the whole of python-flint's README is in there).

Looking now the packaging library has things for license expressions and metadata files so it would probably be better to use those rather than manually parsing the files:
https://packaging.pypa.io/en/stable/licenses.html
https://packaging.pypa.io/en/stable/metadata.html

The wheel package is awkward in that it only has a CLI but I'm not sure there is any better alternative. You can't just unzip and rezip because you need to update the file hashes in RECORD.

That script is good enough for python-flint right now though. The proper way here is that there should be a dedicated tool for this and it should be integrated with auditwheel et al along with some verification that each of the bundled libs is accounted for by one of the licenses in case auditwheel sucks up some random other file from the manylinux image (also a security issue). The one thing I might add in python-flint is just something that verifies that there is nothing unexpected in python-flint.libs after running auditwheel.