oss-review-toolkit/ort

Improve Python support to adhere to Python versions

fviernau opened this issue ยท 18 comments

For Python projects it is hard (impossible) to determine to which Python version they apply.
ORT has some mechanism which tries (1) Python 2.x and (2) Python 3. In particular, for the latter Python 3.6 is used.

When installing dependencies via pip install -r requirements.txt from pypi.org, pip considers only the dependencies which are compatible with the used Python version, 3.6.

For example in a Python 3.7 project this can lead to the following two issues:

  1. A dependency cannot be resolved by ORT, while it is actually resolvable using the appropriate Python version. This
    happens if none of the versions allowed by the constraints is compatible with Python 3.7.
  2. Even worse, the wrong dependency version is resolved. This happens e.g. if the youngest 3.6 compatible version allowed by
    the constraints does not equal the youngest 3.7 compatible version. In the worst case it can happen that the project uses the
    latest version of a dependency while ORT resolves a really old (Python 3.6 compatible) version of that dependency.

At least for setup.py-based projects, there might be hints which Python version to use. For example, the classifiers may contain something like

classifiers=[
    "Programming Language :: Python",
    "Programming Language :: Python :: 3.6",
    "Programming Language :: Python :: 3.7",
    "Programming Language :: Python :: 3.8",
],

This at least tells you the minimum version to use. Similarly, a line like

python_requires=">=3.6",

might be contained. However, both types of entries are a bit inconvenient to parse.

But even if the Python version to use would be known, another problem then is that you actually require that version to be installed. At least for our Dockerfile it's impractical to install all sorts of Python versions beforehand, just in case they might be needed. Maybe bootstrapping the right Python version on the fly would be an option, but doing that right for all supported platform is a lot of effort.

A completely different idea that I have in mind is to generally switch over to using GraalVM for ORT, and leverage its Python support to run pip programmatically, and somehow "pretending" to be the right Python version.

Ping @pombredanne, being the Python guy here, do you have any good idea how to solve this problem?

@sschuberth a way is to rely on python_requires=">=3.6" in modern setup.cg or setup.py. Classifiers are weak information otherwise.

If you want to go the dynamic pip route in anycase pip would resolve based on it current interpreter version and not some other version of Python. Note that pip now uses this bundled resolver https://github.com/sarugaku/resolvelib which you could also invoke directly and be in control of which os/python version/python/arch combo you are emulating.

With that said there is no absolute right answer as each of these combos may resolve to different deps versions and different packages, unless you work from a lockfile, as each dep may be tagged like here

So to recap: each combo of

  • Python interpreter (CPython, Pypy)
  • Python version (3.6 to 3.9 at least today)
  • OS (Windows, Linux, Mac, etc)
  • Architecture (x68, x86_64, ARM, etc)
    ... may yield a different set of:
  • built/bundled dependencies (included in a built "wheel")
  • external dependencies (resolved and fetched from PyPI)
    ... which furthermore can change over time, unless pinned/locked down.

That said this is nothing really specific to Python AFAIK, just made ore visible here.

However, both types of entries are a bit inconvenient to parse.

ScanCode knows how to parse these alright AFAIK.

Note that pip now uses this bundled resolver https://github.com/sarugaku/resolvelib

Does pip really use that library as-is? I thought I read somewhere that it's rather the other way around, with resolvelib being a stand-alone reimplementation of pip's (new) resolver mechanism.

However, both types of entries are a bit inconvenient to parse.

ScanCode knows how to parse these alright AFAIK.

Sure, because ScanCode is written in Python itself ๐Ÿ˜‹

Does pip really use that library as-is?

See https://github.com/pypa/pip/tree/0ffff034f376feef189cf32cfba56ddd3a472c70/src/pip/_vendor/resolvelib

The resolvelib resolver library is "vendored" as-is in pip from what I can see (and I reckon I did study the code ;) ). FWIW, the vendoring is because pip cannot have external dependencies itself and needs things bundled to be able to bootstrap standalone.

Sure, because ScanCode is written in Python itself

Actually that's not really an advantage when we do static analysis. There is of course a clear and obvious advantage to use Python overall ;) but in the case of Python manifests proper, setup.cfg is an .ini format, and except for setup.py that needs some code lexing and finicking, (as does Ruby and as would Groovy and Kotlin need it) most everything else is either RFC822-, Toml- or JSON-formatted, so reasonably easy to get to.

Actually that's not really an advantage when we do static analysis.

Except that you can more easily use the same Python libraries / functions as pip itself when parsing setup.py.

There is of course a clear and obvious advantage to use Python overall ;)

You came to the wrong place for that statement ๐Ÿ˜„ When we meet in person again, I'll explain you why I basically like Python's syntax, but its ecosystem (and esp. dependency management) just sucks. I mean, there obviously are not even means to just query the transitive dependency tree without actually downloading all the binaries (that you don't need nor are interested in) and pretending to do a build (which is not what you want to do). But let's continue that discussion elsewhere ๐Ÿ˜‰

Any progress on this?

Why is this "inconvenient to parse"?

classifiers=[
    "Programming Language :: Python",
    "Programming Language :: Python :: 3.6",
    "Programming Language :: Python :: 3.7",
    "Programming Language :: Python :: 3.8",
]

There is always the nice ast package you can use without executing the parsed code...

I ran into this issue with the analyzer which was running python 3.6, but the project was developed for a newer python version. It had requirements like numpy~=1.20.3 which at least requires python 3.7:

07:54:22.146 [DefaultDispatcher-worker-3] INFO  org.ossreviewtoolkit.utils.ProcessCapture - Running 'virtualenv --version' in '/'...
07:54:22.202 [DefaultDispatcher-worker-3] INFO  org.ossreviewtoolkit.analyzer.managers.Pip - Resolving PIP dependencies for '/project/requirements.txt'...
07:54:22.203 [DefaultDispatcher-worker-3] INFO  org.ossreviewtoolkit.analyzer.managers.Pip - Creating a virtualenv for the 'project' project directory...
07:54:22.206 [DefaultDispatcher-worker-3] INFO  org.ossreviewtoolkit.utils.ProcessCapture - Running 'python3 /tmp/python_compatibility9712707363750178.py -d /project' in '/'...
07:54:29.849 [DefaultDispatcher-worker-3] INFO  org.ossreviewtoolkit.analyzer.managers.Pip - Trying to install dependencies using Python 3...
07:54:29.852 [DefaultDispatcher-worker-3] INFO  org.ossreviewtoolkit.utils.ProcessCapture - Running 'virtualenv /tmp/ort-project-virtualenv122357601095579656 -p /usr/bin/python3' in '/project'...
07:54:39.758 [DefaultDispatcher-worker-3] INFO  org.ossreviewtoolkit.utils.ProcessCapture - Running '/tmp/ort-project-virtualenv122357601095579656/bin/pip --trusted-host pypi.org --trusted-host pypi.python.org install pip==18.0' in '/project'...
07:54:41.992 [DefaultDispatcher-worker-3] INFO  org.ossreviewtoolkit.utils.ProcessCapture - Running '/tmp/ort-project-virtualenv122357601095579656/bin/pip --trusted-host pypi.org --trusted-host pypi.python.org install pipdeptree==0.13.2' in '/project'...
07:54:42.769 [DefaultDispatcher-worker-3] INFO  org.ossreviewtoolkit.utils.ProcessCapture - Running '/tmp/ort-project-virtualenv122357601095579656/bin/pip --trusted-host pypi.org --trusted-host pypi.python.org install --no-warn-conflicts --prefer-binary -r requirements.txt' in '/project'...
07:54:45.662 [DefaultDispatcher-worker-3] ERROR org.ossreviewtoolkit.utils.ProcessCapture - Running '/tmp/ort-project-virtualenv122357601095579656/bin/pip --trusted-host pypi.org --trusted-host pypi.python.org install --no-warn-conflicts --prefer-binary -r requirements.txt' in '/project' failed with exit code 1:
  Could not find a version that satisfies the requirement numpy~=1.20.3 (from -r requirements.txt (line 13)) (from versions: 1.3.0, 1.4.1, 1.5.0, 1.5.1, 1.6.0, 1.6.1, 1.6.2, 1.7.0, 1.7.1, 1.7.2, 1.8.0, 1.8.1, 1.8.2, 1.9.0, 1.9.1, 1.9.2, 1.9.3, 1.10.0.post2, 1.10.1, 1.10.2, 1.10.4, 1.11.0, 1.11.1, 1.11.2, 1.11.3, 1.12.0, 1.12.1, 1.13.0rc1, 1.13.0rc2, 1.13.0, 1.13.1, 1.13.3, 1.14.0rc1, 1.14.0, 1.14.1, 1.14.2, 1.14.3, 1.14.4, 1.14.5, 1.14.6, 1.15.0rc1, 1.15.0rc2, 1.15.0, 1.15.1, 1.15.2, 1.15.3, 1.15.4, 1.16.0rc1, 1.16.0rc2, 1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.16.4, 1.16.5, 1.16.6, 1.17.0rc1, 1.17.0rc2, 1.17.0, 1.17.1, 1.17.2, 1.17.3, 1.17.4, 1.17.5, 1.18.0rc1, 1.18.0, 1.18.1, 1.18.2, 1.18.3, 1.18.4, 1.18.5, 1.19.0rc1, 1.19.0rc2, 1.19.0, 1.19.1, 1.19.2, 1.19.3, 1.19.4, 1.19.5)
No matching distribution found for numpy~=1.20.3 (from -r requirements.txt (line 13))
You are using pip version 18.0, however version 21.1.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

07:54:45.663 [DefaultDispatcher-worker-3] INFO  org.ossreviewtoolkit.analyzer.managers.Pip - Falling back to trying to install dependencies using Python 2...
07:54:45.665 [DefaultDispatcher-worker-3] INFO  org.ossreviewtoolkit.utils.ProcessCapture - Running 'virtualenv /tmp/ort-project-virtualenv1856848556700345242 -p /usr/bin/python2' in '/project'...

Considering that the analyzer is creating a virtual environment, it would be nice if it could select a Python version.

For my case I'll try running the analyzer with a newer Python version and running the Scanner with Python 3.6 as is required by Scancode toolkit. This will make it difficult to have an all-in-one Docker image though.

For my case I'll try running the analyzer with a newer Python version and running the Scanner with Python 3.6 as is required by Scancode toolkit. This will make it difficult to have an all-in-one Docker image though.

The current version of ScanCode TK supports all Python versions from 3.6 to 3.9 .... so I am not sure what is the issue. We release the app bundle with support for Python 3.6 only for now... may be that's what is used here?

To help the discussion, I looked into the available Python versions in Ubuntu, as that is currently used in the Dockerfile:
image

For example the information on the default Python3 package in Focal can be found on the package page

Considering that not all Python3.6+ versions are available in Bionic, perhaps we can create multiple Docker image variants for the analyzer with different tags?

But considering the discussion above, it isn't necessary to have a complete Python installation, as long as the packages can be resolved for a specified Python version. That would make it far more flexible.

@pombredanne my bad, I assumed this was still also an issue with ScanCode TK as there was another issue on that. Sorry for speaking bad about it then. I'll try switching to Focal to see if the entire setup works. If ScanCode is not holding us back, we could create multiple Docker image versions for specific Python versions for now.

@nicorikken re:

Sorry for speaking bad about it then

You did not! :P

we could create multiple Docker image versions for specific Python versions for now.

I chatted with @sschuberth about this a few weeks ago and there was an alternative... but I forgot which one!

I started reworking the Docker image to Ubuntu 20.04, but now I see @heliocastro has already made this effort. And has also worked on multiple Python versions: #3613 #3902 I'll look into that work to see if I can remove my blockade. But on this issue, if we can avoid relying on system-wide Python versions for resolving Python dependencies, that would be great.

Great work on ORT, it's an awesome project! I'm struggling with this issue. Wouldn't it be easier to be able to configure the project python version explictly, for example with an ENV variable as @heliocastro proposes in #3902 or with some other specific scanner yaml configuration?

I just updated the new docker with final solution using pyenv, so please jump on #3902 to take a look.
With pyenv now we can install all versions you want, i just set one 3.8.11 as default.
On the commit readme explains how to install extra versions and then can use pyenv as usual

Some updates that are likely relevant here: https://github.com/nexB/python-inspector is now out and has been designed specifically to be integrated in ort. See also nexB#1 that we are refining there first before submitting to ort proper here.

https://github.com/nexB/python-inspector features are that you can point it to requirements and pass it a target Python version and a target os/architecture and it will resolve the dependencies tree as pip does it, but without installing any of the packages. The target Python version and os can be different than the runtime Python and os. It internally uses the same resolution library as pip (resolvelib) and a pip requirements parser that we extracted from pip and is now in use in several other tools including CycloneDX.

@mnonnenmacher @fviernau can we already close this as 1208225 made the Python version configurable?