nexB/aboutcode-toolkit

`check` command is too slow in version 7.0.x

tdruez opened this issue · 4 comments

$ pip install aboutcode_toolkit==6.0.0
$ time about check ./thirdparty/

-> 0.295s total

$ pip install aboutcode-toolkit==7.0.2
$ time about check ./thirdparty/

-> 34.902s total

We went from a super check operation to now over 100 times slower.
In the context of checking the validity of your file often, like in your pre-commits, this is not a great user experience.

This issue started in version 7.0.0.

$ pip install aboutcode-toolkit==7.1.1

Traceback (most recent call last):
  File "bin/about", line 5, in <module>
    from attributecode.cmd import about
  File "lib/python3.9/site-packages/attributecode/cmd.py", line 35, in <module>
    from attributecode.attrib import check_template
  File "lib/python3.9/site-packages/attributecode/attrib.py", line 32, in <module>
    from attributecode.attrib_util import multi_sort
  File "lib/python3.9/site-packages/attributecode/attrib_util.py", line 18, in <module>
    from jinja2.filters import pass_environment
ImportError: cannot import name 'pass_environment' from 'jinja2.filters' (lib/python3.9/site-packages/jinja2/filters.py)
make: *** [check] Error 1

Looks like aboutcode-toolkit depends on functions that were not available in jinja prior to version 3.x.x
https://github.com/pallets/jinja/blob/1b714c7e82c73575d1dba48f560db07fe9a5cb74/CHANGES.rst#version-300

The requirements to reflect that https://github.com/nexB/aboutcode-toolkit/blob/develop/setup.cfg#L65


$ pip install aboutcode_toolkit==6.0.0
$ time about check ./thirdparty/

-> 0.29s total

$ pip install aboutcode-toolkit==7.1.1
$ time about check ./thirdparty/

-> 24.63s total

This is still about 85 times slower than the performances in version 6.0.0

I will check about the jinja dependencies

For the time performance,
in v6.0.0, the check command only validate the basic "syntax" with the collect_inventory function
https://github.com/nexB/aboutcode-toolkit/blob/v6.0.0/src/attributecode/cmd.py#L485
it won't check the value of the license, i.e. it will pass if the input has mitt in the license_expression field

On the other hand, in the current version, it will validate the correctness of the license_key (license_expression) and that's why it takes much time.
https://github.com/nexB/aboutcode-toolkit/blob/develop/src/attributecode/cmd.py#L720

I would suggest an option to not validate the licenses then.

Good suggestion. I will add an option for that