FHPythonUtils/LicenseCheck

Bug: failed parsing requirements.txt

NicolaDonelli opened this issue · 1 comments

Bug

The newly released version introduced a bug in how requirements.txt files are read.

Describe the bug

With a requirements file of the form (generated by pip-tools):

#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
#    pip-compile --no-emit-index-url --output-file=requirements/tmp.txt subset.in
#
aiohttp==3.8.5
    # via
    #   -c requirements/requirements_dev.txt
    #   langchain
aiosignal==1.3.1
    # via
    #   -c requirements/requirements_dev.txt
    #   aiohttp
anyio==3.7.1
    # via
    #   -c requirements/requirements_dev.txt
    #   starlette
async-timeout==4.0.2
    # via
    #   -c requirements/requirements_dev.txt
    #   aiohttp
    #   langchain
attrs==21.4.0
    # via
    #   -c requirements/requirements_dev.txt
    #   aiohttp
certifi==2023.7.22
    # via
    #   -c requirements/requirements_dev.txt
    #   requests
cfg-load==0.9.0
    # via
    #   -c requirements/requirements_dev.txt
    #   py4ai-core
charset-normalizer==3.2.0
    # via
    #   -c requirements/requirements_dev.txt
    #   aiohttp
    #   requests
click==8.1.6
    # via
    #   -c requirements/requirements_dev.txt
    #   nltk
    #   uvicorn
cramjam==2.6.2
    # via
    #   -c requirements/requirements_dev.txt
    #   fastparquet
dataclasses-json==0.5.13
    # via
    #   -c requirements/requirements_dev.txt
    #   langchain
deprecated==1.2.14
    # via
    #   -c requirements/requirements_dev.txt
    #   py4ai-core
dnspython==2.4.1
    # via
    #   -c requirements/requirements_dev.txt
    #   pinecone-client
exceptiongroup==1.1.2
    # via
    #   -c requirements/requirements_dev.txt
    #   anyio
faiss-cpu==1.7.4
    # via
    #   -c requirements/requirements_dev.txt
    #   -r subset.in
fastapi==0.100.1
    # via
    #   -c requirements/requirements_dev.txt
    #   fastapi-utils
    #   microservices-core
    #   microservices-hexagonal
fastapi-utils==0.2.1
    # via
    #   -c requirements/requirements_dev.txt
    #   microservices-core
    #   microservices-hexagonal
fastparquet==2023.7.0
    # via
    #   -c requirements/requirements_dev.txt
    #   -r subset.in
filelock==3.12.2
    # via
    #   -c requirements/requirements_dev.txt
    #   huggingface-hub
    #   torch
    #   transformers
frozenlist==1.4.0
    # via
    #   -c requirements/requirements_dev.txt
    #   aiohttp
    #   aiosignal
fsspec==2023.6.0
    # via
    #   -c requirements/requirements_dev.txt
    #   fastparquet
    #   huggingface-hub
h11==0.14.0
    # via
    #   -c requirements/requirements_dev.txt
    #   uvicorn
hexagonal-core @ git+https://bitbucket.org/3rdplace/hexagonal-core@v0.0.2
    # via
    #   -c requirements/requirements_dev.txt
    #   hexagonal-repository-core
    #   microservices-hexagonal
hexagonal-repository-core @ git+https://bitbucket.org/3rdplace/hexagonal-repository-core@v0.0.2
    # via
    #   -c requirements/requirements_dev.txt
    #   hexagonal-repository-langchain
hexagonal-repository-langchain @ git+https://bitbucket.org/3rdplace/hexagonal-repository-langchain@v0.0.3
    # via
    #   -c requirements/requirements_dev.txt
    #   -r subset.in
    #   hexagonal-repository-pinecone
hexagonal-repository-pinecone @ git+https://bitbucket.org/3rdplace/hexagonal-repository-pinecone@v0.0.3
    # via
    #   -c requirements/requirements_dev.txt
    #   -r subset.in
huggingface-hub==0.16.4
    # via
    #   -c requirements/requirements_dev.txt
    #   sentence-transformers
    #   transformers
idna==3.4
    # via
    #   -c requirements/requirements_dev.txt
    #   anyio
    #   requests
    #   yarl
jinja2==3.1.2
    # via
    #   -c requirements/requirements_dev.txt
    #   torch
joblib==1.3.1
    # via
    #   -c requirements/requirements_dev.txt
    #   nltk
    #   scikit-learn
langchain==0.0.246
    # via
    #   -c requirements/requirements_dev.txt
    #   hexagonal-repository-langchain
    #   hexagonal-repository-pinecone
langsmith==0.0.15
    # via
    #   -c requirements/requirements_dev.txt
    #   langchain
loguru==0.7.0
    # via
    #   -c requirements/requirements_dev.txt
    #   pinecone-client
markupsafe==2.1.3
    # via
    #   -c requirements/requirements_dev.txt
    #   jinja2
marshmallow==3.20.1
    # via
    #   -c requirements/requirements_dev.txt
    #   dataclasses-json
microservices-core @ git+https://bitbucket.org/3rdplace/microservice-core@v0.0.1
    # via
    #   -c requirements/requirements_dev.txt
    #   microservices-hexagonal
microservices-hexagonal @ git+https://bitbucket.org/3rdplace/microservice-hexagonal@v0.0.2
    # via
    #   -c requirements/requirements_dev.txt
    #   -r subset.in
mpmath==1.3.0
    # via
    #   -c requirements/requirements_dev.txt
    #   sympy
mpu[io]==0.23.1
    # via
    #   -c requirements/requirements_dev.txt
    #   cfg-load
multidict==6.0.4
    # via
    #   -c requirements/requirements_dev.txt
    #   aiohttp
    #   yarl
mypy-extensions==1.0.0
    # via
    #   -c requirements/requirements_dev.txt
    #   typing-inspect
networkx==3.1
    # via
    #   -c requirements/requirements_dev.txt
    #   torch
nltk==3.8.1
    # via
    #   -c requirements/requirements_dev.txt
    #   sentence-transformers
numexpr==2.8.4
    # via
    #   -c requirements/requirements_dev.txt
    #   langchain
numpy==1.25.1
    # via
    #   -c requirements/requirements_dev.txt
    #   fastparquet
    #   langchain
    #   numexpr
    #   pandas
    #   pinecone-client
    #   scikit-learn
    #   scipy
    #   sentence-transformers
    #   torchvision
    #   transformers
openapi-schema-pydantic==1.2.4
    # via
    #   -c requirements/requirements_dev.txt
    #   langchain
packaging==23.1
    # via
    #   -c requirements/requirements_dev.txt
    #   fastparquet
    #   huggingface-hub
    #   marshmallow
    #   transformers
pandas==2.0.3
    # via
    #   -c requirements/requirements_dev.txt
    #   fastparquet
    #   py4ai-core
pillow==10.0.0
    # via
    #   -c requirements/requirements_dev.txt
    #   torchvision
pinecone-client==2.2.2
    # via
    #   -c requirements/requirements_dev.txt
    #   hexagonal-repository-pinecone
py4ai-core==1.0.0
    # via
    #   -c requirements/requirements_dev.txt
    #   hexagonal-core
    #   hexagonal-repository-core
    #   hexagonal-repository-langchain
    #   microservices-core
    #   microservices-hexagonal
pydantic==1.10.12
    # via
    #   -c requirements/requirements_dev.txt
    #   -r subset.in
    #   fastapi
    #   fastapi-utils
    #   hexagonal-repository-langchain
    #   hexagonal-repository-pinecone
    #   langchain
    #   langsmith
    #   microservices-core
    #   microservices-hexagonal
    #   openapi-schema-pydantic
    #   py4ai-core
python-dateutil==2.8.2
    # via
    #   -c requirements/requirements_dev.txt
    #   pandas
    #   pinecone-client
python-multipart==0.0.6
    # via
    #   -c requirements/requirements_dev.txt
    #   -r subset.in
pytz==2023.3
    # via
    #   -c requirements/requirements_dev.txt
    #   cfg-load
    #   mpu
    #   pandas
pyyaml==6.0.1
    # via
    #   -c requirements/requirements_dev.txt
    #   cfg-load
    #   huggingface-hub
    #   langchain
    #   pinecone-client
    #   transformers
regex==2023.6.3
    # via
    #   -c requirements/requirements_dev.txt
    #   nltk
    #   transformers
requests==2.31.0
    # via
    #   -c requirements/requirements_dev.txt
    #   cfg-load
    #   huggingface-hub
    #   langchain
    #   langsmith
    #   pinecone-client
    #   torchvision
    #   transformers
safetensors==0.3.1
    # via
    #   -c requirements/requirements_dev.txt
    #   transformers
scikit-learn==1.3.0
    # via
    #   -c requirements/requirements_dev.txt
    #   sentence-transformers
scipy==1.11.1
    # via
    #   -c requirements/requirements_dev.txt
    #   py4ai-core
    #   scikit-learn
    #   sentence-transformers
sentence-transformers==2.2.2
    # via
    #   -c requirements/requirements_dev.txt
    #   -r subset.in
sentencepiece==0.1.99
    # via
    #   -c requirements/requirements_dev.txt
    #   sentence-transformers
six==1.16.0
    # via
    #   -c requirements/requirements_dev.txt
    #   cfg-load
    #   python-dateutil
sniffio==1.3.0
    # via
    #   -c requirements/requirements_dev.txt
    #   anyio
sqlalchemy==1.4.49
    # via
    #   -c requirements/requirements_dev.txt
    #   fastapi-utils
    #   langchain
starlette==0.27.0
    # via
    #   -c requirements/requirements_dev.txt
    #   fastapi
sympy==1.12
    # via
    #   -c requirements/requirements_dev.txt
    #   torch
tenacity==8.2.2
    # via
    #   -c requirements/requirements_dev.txt
    #   langchain
threadpoolctl==3.2.0
    # via
    #   -c requirements/requirements_dev.txt
    #   scikit-learn
tokenizers==0.13.3
    # via
    #   -c requirements/requirements_dev.txt
    #   transformers
tomli==2.0.1
    # via
    #   -c requirements/requirements_dev.txt
    #   -r subset.in
    #   hexagonal-core
    #   hexagonal-repository-core
    #   hexagonal-repository-langchain
    #   hexagonal-repository-pinecone
    #   microservices-core
    #   microservices-hexagonal
    #   py4ai-core
torch==2.0.1
    # via
    #   -c requirements/requirements_dev.txt
    #   sentence-transformers
    #   torchvision
torchvision==0.15.2
    # via
    #   -c requirements/requirements_dev.txt
    #   sentence-transformers
tqdm==4.65.0
    # via
    #   -c requirements/requirements_dev.txt
    #   huggingface-hub
    #   nltk
    #   pinecone-client
    #   sentence-transformers
    #   transformers
transformers==4.31.0
    # via
    #   -c requirements/requirements_dev.txt
    #   sentence-transformers
typing-extensions==4.7.1
    # via
    #   -c requirements/requirements_dev.txt
    #   -r subset.in
    #   fastapi
    #   hexagonal-core
    #   hexagonal-repository-core
    #   hexagonal-repository-langchain
    #   hexagonal-repository-pinecone
    #   huggingface-hub
    #   microservices-core
    #   microservices-hexagonal
    #   pinecone-client
    #   py4ai-core
    #   pydantic
    #   torch
    #   typing-inspect
    #   uvicorn
typing-inspect==0.9.0
    # via
    #   -c requirements/requirements_dev.txt
    #   dataclasses-json
tzdata==2023.3
    # via
    #   -c requirements/requirements_dev.txt
    #   pandas
tzlocal==5.0.1
    # via
    #   -c requirements/requirements_dev.txt
    #   mpu
urllib3==2.0.4
    # via
    #   -c requirements/requirements_dev.txt
    #   pinecone-client
    #   requests
uvicorn==0.23.1
    # via
    #   -c requirements/requirements_dev.txt
    #   microservices-core
    #   microservices-hexagonal
wrapt==1.15.0
    # via
    #   -c requirements/requirements_dev.txt
    #   deprecated
yarl==1.9.2
    # via
    #   -c requirements/requirements_dev.txt
    #   aiohttp

# The following packages are considered to be unsafe in a requirements file:
# setuptools

the current code snippet at lines 89-90 of licensecheck/get_deps.py

	for req in reqPath.read_text("utf-8").strip().split("\n"):
		reqs.add(resolveReq(req))

fails to resolve any requirement due to the new definition of the function resolveReq at line 35 of the same file:

resolveReq = lambda req: pkg_resources.Requirement.parse(req).project_name.lower()

while the old implementation:

with open(reqPath, encoding="utf-8") as requirementsTxt:
		for req in requirements.parse(requirementsTxt):
			reqs.add(str(req.name).lower())

worked correctly.

Suggested solution

Substitute lines 89-90 of licensecheck/get_deps.py

    for req in reqPath.read_text("utf-8").strip().split("\n"):
        reqs.add(resolveReq(req))

with:

    for req in reqPath.read_text("utf-8").strip().split("\n"):
        if len(req.strip()) > 0 and not req.strip().startswith("#"):
            reqs.add(resolveReq(req))

or, probably better, resort again to the requirements module (that automatically correctly parses the requirements files)

Thanks for this. And good spot! I'll include a more complex requirements file and write more regression tests!