pypa/pip

New resolver takes a very long time to complete

nijel opened this issue Β· 192 comments

nijel commented

What did you want to do?

One of CI jobs for Weblate is to install minimal versions of dependencies. We use requirements-builder to generate the minimal version requirements from the ranges we use normally.

The pip install -r requirements-min.txt command seems to loop infinitely after some time. This started to happen with 20.3, before it worked just fine.

Output

Requirement already satisfied: google-auth<2.0dev,>=1.21.1 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate==3.0.0->-r requirements-min.txt (line 63)) (1.23.0)
Requirement already satisfied: pytz>dev in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from celery[redis]==4.4.5->-r requirements-min.txt (line 3)) (2020.4)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate==3.0.0->-r requirements-min.txt (line 63)) (1.52.0)
Requirement already satisfied: six>=1.9.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from bleach==3.1.1->-r requirements-min.txt (line 1)) (1.15.0)
Requirement already satisfied: protobuf>=3.12.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate==3.0.0->-r requirements-min.txt (line 63)) (3.14.0)
Requirement already satisfied: grpcio<2.0dev,>=1.29.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate==3.0.0->-r requirements-min.txt (line 63)) (1.33.2)
Requirement already satisfied: google-auth<2.0dev,>=1.21.1 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate==3.0.0->-r requirements-min.txt (line 63)) (1.23.0)
Requirement already satisfied: pytz>dev in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from celery[redis]==4.4.5->-r requirements-min.txt (line 3)) (2020.4)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate==3.0.0->-r requirements-min.txt (line 63)) (1.52.0)
Requirement already satisfied: six>=1.9.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from bleach==3.1.1->-r requirements-min.txt (line 1)) (1.15.0)
Requirement already satisfied: protobuf>=3.12.0 in /opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages (from google-api-core[grpc]<2.0.0dev,>=1.22.0->google-cloud-translate==3.0.0->-r requirements-min.txt (line 63)) (3.14.0)

This seems to repeat forever (well for 3 hours so far, see https://github.com/WeblateOrg/weblate/runs/1474960864?check_suite_focus=true)

Additional information

Requirements file triggering this: requirements-min.txt

It takes quite some time until it gets to above loop. There is most likely something problematic in the dependencies set...

I'm going to use this issue to centralize incoming reports of situations that seemingly run for a long time, instead of having each one end up in it's own issue or scattered around.

@jcrist said in #8664 (comment)

Note: I was urged to comment here about our experience from twitter.

We (prefect) are a bit late on testing the new resolver (only getting around to it with the 20.3 release). We're finding that install times are now in the 20+ min range (I've actually never had one finish), previously this was at most a minute or two. The issue here seems to be in the large search space (prefect has loads of optional dependencies, for CI and some docker images we install all of them) coupled with backtracking.

I enabled verbose logs to try to figure out what the offending package(s) were but wasn't able to make much sense of them. I'm seeing a lot of retries for some dependencies with different versions of setuptools, as well as different versions of boto3. For our CI/docker builds we can add constraints to speed things up (as suggested here), but we're reluctant to increase constraints in our setup.py as we don't want to overconstrain downstream users. At the same time, we have plenty of novice users who are used to doing pip install prefect[all_extras] - telling them they need to add additional constraints to make this complete in a reasonable amount of time seems unpleasant. I'm not sure what the best path forward here is.

I've uploaded verbose logs from one run here (killed after several minutes of backtracking). If people want to try this themselves, you can run:

pip install "git+https://github.com/PrefectHQ/prefect.git#egg=prefect[all_extras]"

Any advice here would be helpful - for now we're pinning pip to 20.2.4, but we'd like to upgrade once we've figured out a solution to the above. Happy to provide more logs or try out suggestions as needed.

Thanks for all y'all do on pip and pypa!

These might end up being resolved by #9185

Thanks, @dstufft.

I'll mention here some useful workaround tips from the documentation -- in particular, the first and third points may be helpful to folks who end up here:

  • If pip is taking longer to install packages, read Dependency resolution backtracking for ways to reduce the time pip spends backtracking due to dependency conflicts.

  • If you don’t want pip to actually resolve dependencies, use the --no-deps option. This is useful when you have a set of package versions that work together in reality, even though their metadata says that they conflict. For guidance on a long-term fix, read Fixing conflicting dependencies.

  • If you run into resolution errors and need a workaround while you’re fixing their root causes, you can choose the old resolver behavior using the flag --use-deprecated=legacy-resolver. This will work until we release pip 21.0 (see Deprecation timeline).

nijel commented

For my case, the problematic behavior can be reproduced much faster with pip install 'google-cloud-translate==3.0.0' 'requests==2.20.0' 'setuptools==36.0.1', so it sounds like #9185 might improve it.

The legacy resolver bails out on this quickly with: google-auth 1.23.0 requires setuptools>=40.3.0, but you'll have setuptools 36.0.1 which is incompatible..

One other idea toward this is, stopping after 100 backtracks (or something) with a message saying "hey, pip is backtracking due to conflicts on $package a lot".

I wonder how much time is taken up by downloading and unzipping versus actually taking place in the resolver iteself?

I wonder how much time is taken up by downloading and unzipping versus actually taking place in the resolver iteself?

Most of it, last I checked. Unless we're hitting some very bad graph situation, in which case... 🀷 the users are better off giving pip the pins.

I'm having our staff fill out that google form where ever they can, but I just want to mention that pretty much all of our builds are experiencing issues with this. Things that worked fine and had a build time of about 90 seconds are now timing out our CI builds. In theory we could increase the timeout, but we're paying for these machines by the minute so having all of our builds take a huge amount of time longer is a painful choice. We've switched over to enforcing the legacy resolver on all of our builds for now.

As a general note to users reaching this page, please read https://pip.pypa.io/en/stable/user_guide/#dependency-resolution-backtracking.

I was asked to add some more details from twitter, so here are some additional thoughts. Right now the four solutions to this problem are:

  1. Just wait for it to finish
  2. Use trial and error methods to reduce versions checked using constraints
  3. Record and reuse those trial error methods in a new "constraints.txt" file
  4. Reduce the number of supported versions "during development"

Waiting it out is literally too expensive to consider

This solution seems to rely on downloading an epic ton of packages. In the era of cloud this means-

  • Larger harddrives are needed to store the additional packages
  • More bandwidth is consumed downloading these packages
  • It takes longer to process everything due to the need to decompress these images

These all cost money, although the exact balance will depend on the packages (people struggling with a beast like tensorflow might choke on the hard drive and bandwidth, while people with smaller packages just get billed for the build time).

What's even more expensive is the developer time wasted during an operation that used to take (literally) 90s that now takes over 20 minutes (it might take longer but it times out on our CI systems).

We literally can't afford to use this dependency resolution system.

Trial and error constraints are extremely burdensome

This adds a whole new set of processes to everyone's dev cycle where not only do they have to do the normal dev work, but now they need to optimize the black box of this resolver. Even the advice on the page is extremely trial and error, basically saying to start with the first package giving you trouble and continue iterating until your build times are reasonable.

Adding more config files complicates and already overcomplicated ecosystem.

Right now we already have to navigate the differences between setup.py, requirements.txt, setup.cfg, pyproject.toml , and now adding in constraints.txt just adds even more burden (and confusion) on maintaining python packages.

Reducing versions checked during development doesn't scale

Restricting versions during development but releasing the package without those constraints means that the users of that package are going to have to reinvent those constraints themselves during development. If I install a popular package my build times could explode until I duplicate their efforts. There's no way to share those constraints other than copy/paste methods, which adds to the maintenance burden.

What this is ultimately going to result in is people not using constraints at all, instead limiting the dependency versions directly based not off of actual compatibility but a mix of compatibility and build times. This will make it harder to support smaller packages in the long term.

Most of it, last I checked.

Might be a good reason to prioritize pypi/warehouse#8254

Might be a good reason to prioritize pypi/warehouse#8254

Definitely. And a sdist equivalent when PEP 643 is approved and implemented.

This solution seems to rely on downloading an epic ton of packages

It doesn't directly rely on downloading, but it does rely on knowing the metadata for packages, and for various historical reasons, the only way to get that data is currently by downloading (and in the case of source distributions, building) the package.

That is a huge overhead, although pip's download cache helps a lot here (maybe you could persist pip's cache in your CI setup?) On the plus side, it only hits hard in cases where there are a lot of dependency restrictions (where the "obvious" choice of the latest version of a package is blocked by a dependency from another package), and it's only tended to be really significant in cases where there is no valid solution anyway (although this is not always immediately obvious - the old resolver would happily install invalid sets of packages, so the issue looks like "old resolver worked, new one fails" where it's actually "old one silently broke stuff, new one fails to install instead").

This doesn't help you address the issue, I know, but hopefully it gives some background as to why the new resolver is behaving as it is.

@tedivm please look into using pip-tools to perform dependency resolution as a separate step from deployment. It's essentially point 4 -- "local" dependency resolution with the deployment only seeing pinned versions.

Actually, It would be an interesting experiment to see. These pathological cases that people. are experiemnting with, if they let the resolver complete once, persist the cache, and then try again, is it faster? If it's still hours long even with a cache, then that suggests pypi/warehouse#8254 isn't going to help much.

I don't know what we're doing now exactly, but I also wonder if it would make sense to stop exhaustively searching the versions after a certain point. This would basically be a trade off of saying that we're going to start making assumptions about how dependencies evolve over time. I assume we're currently basically starting with the latest version, and iterating backwards one version at a time, is that correct? If so, what if we did something like:

  1. Iterate backwards one version at a time until we fail resolution X times.
  2. Start a binary search, cut the remaining candidates in half and try with that.
    2a. If it works, start taking the binary search towards the "newer" side (cut that in half, try again, etc).
    2b. If it fails, start taking the binary search towards the "older"side (cut that in half, try again, etc).

This isn't exactly the correct use of a binary search, because the list of versions aren't really "sorted" in that way, but it would kind of function similiarly to git bisect? The biggest problem with it is it will skip over good versions if the latest N versions all fail, and the older half of versions all fail, but the middle half are "OK".

Another possible idea is instead of a binary search, do a similar idea but instead of bucketing the version set in halves, try to bucket them into buckets that match their version "cardinality". IOW, if this has a lot of major versions, bucket them by major version, if it has few major versions, but a lot of minor versions, bucket it by that, etc. So that you divide up the problem space, then start iterating backwards trying the first (or the last?) version in each bucket until you find one that works, then constraint the solver to just that bucket (and maybe one bucket newer if if you're testing the last version instead of first?).

I dunno, it seems like exhaustively searching the space is the "correct" thing to do if you want to always come up with the answer if one exists anywhere, but if we can't make that fast enough, even with changes to warehouse etc, we could probably try to be smart about using heuristics to narrow the search space, under the assumption that version ranges typically don't change that often and when they do, they don't often change every single release.

Maybe if we go into heuristics mode, we emit a warning that we're doing it, suggest people provide more information to the solver, etc. Maybe provide a flag like --please-be-exhaustive-its-ok-ill-wait to disable the heuristics.

Maybe we're already doing this and I'm jsut dumb :)

We're not doing it, and you're not dumb :-) But it's pretty hard to do stuff like that - most resolution algorithms I've seen are based on the assumption that getting dependency data is cheap (many aren't even usable by pip because they assume all dependency info is available from the start). So we're getting into "designing new algorithms for well-known hard CS problems" territory :-(

Another possible idea is instead of a binary search, do a similar idea but instead of bucketing the version set in halves, try to bucket them into buckets that match their version "cardinality". IOW, if this has a lot of major versions, bucket them by major version, if it has few major versions, but a lot of minor versions, bucket it by that, etc.

Some resolvers I surveyed indeed do this, espacially from ecosystems that promote semver heavily (IIRC Cargo?) since major version bumps there imply more semantics, so this is at least a somewhat charted territory.

The Python community do not generally adhere to semver that strictly, but we may still be able to do it since the resolver never promised to return the best solution, but only a good enough one (i.e. if both 2.0.1 and 1.9.3 satisfy, the resolver does not have to choose 2.0.1).

The other part is how we handle failure-to-build. With our current processes, we could have to get build deps, do the build (or at best call prepare_metadata_for_build_wheel to get the info).

With binary search-like semantics, we'd have to be lenient about build failures and allow pip to attempt-to-use a different version of the package on failures (compared to outright failing as we do today).

Maybe provide a flag like --please-be-exhaustive-its-ok-ill-wait to disable the heuristics.

I think stopping after we've backtracked 100+ times and saying "hey, this is taking too long. Help me by reducing versions of $packages, or tell me to try harder with --option." is something we can do relatively easily now.

If folks are on board with this, let's pick a number (I've said 100, but I pulled that out of the air) and add this in?

Do we have a good sense of whether these cases where it takes a really long time to solve are typically cases where there is no answer and it's taking a long time to exhaustively search the space because our slow time per candidate means it takes hours.. or are these cases where there is a successful answer, but it just takes us awhile to get there?

nijel commented

@dstufft in my case, there was no suitable solution (see #9187 (comment)). I guessed which might be the problematic dependencies and with reduced set of packages it doesn't take that long and produces expected error. With full requirements-min.txt it didn't complete in hours.

With nearly 100 pinned dependencies, the space to search is enormous, and pip ends up with (maybe) infinitely printing "Requirement already satisfied:" when trying to search for some solution (see https://github.com/WeblateOrg/weblate/runs/1474960864?check_suite_focus=true for long log, it was killed after some hours). I just realized that the CI process is slightly more complex that what I've described - it first installs packages based on the ranges, then generates list of minimal versions and tries to adjust existing virtualenv. That's probably where the "Requirement already satisfied" logs come from.

The problematic dependency chain in my case was:

  • google-cloud-translate==3.0.0 from command line
  • setuptools==36.0.1 from command line
  • google-api-core[grpc] >= 1.22.0, < 2.0.0dev from google-cloud-translate==3.0.0
  • google-auth >= 1.19.1, < 2.0dev from google-api-core
  • setuptools>=40.3.0 from google-auth (any version in the range)

In the end, I think the problem is that it tries to find solution in areas where there can't be any. With full pip cache:

$ time pip install  google-cloud-translate==3.0.0 setuptools==36.0.1
...

real	0m6,206s
user	0m5,136s
sys	0m0,242s
$ time pip install  google-cloud-translate==3.0.0 setuptools==36.0.1 requests==2.20.0
...

real	0m28,724s
user	0m25,162s
sys	0m0,283s

In this case, adding requests==2.20.0 (which can be installed without any problem with either of the dependencies) multiplies the time nearly five times. This is caused by pip looking at different chardet and certifi versions for some reason.

Do we have a good sense of whether these cases where it takes a really long time to solve are typically cases where there is no answer and it's taking a long time to exhaustively search the space because our slow time per candidate means it takes hours.. or are these cases where there is a successful answer, but it just takes us awhile to get there?

I'm pretty sure in prefect's case with [all_extras] it's because no solution exists, but I haven't yet been able to determine what the offending package(s) are. At some point I'll sit down and iteratively add dependencies on to the base install until things slow down, just need to find the time.

Tips on interpreting the logs might be useful here - I can see what packages pip is searching through, but it's not clear what constraint is failing leading to this search.


Regarding the few comments above about giving up after a period or using heuristics/assumptions about version schemes - for most things I've worked on, a valid install is usually:

  • All packages use the most recent versions (e.g. most recent A, B, and C)
  • Except if some dependency's most recent release breaks, in which case we usually fix things pretty quick to make it work and use a fairly recent release of the broken one (e.g. latest A and B, C is 1 or 2 releases old).

Rarely will the install I'm looking for be "the most recent versions of A and B, plus a release of C from 3 years ago". The one case where I might want this is if I'm debugging something, or trying to recreate an old environment, but in that case I'd usually specify that I want C=some-old-version directly rather than having the solver do it for me.

@brainwane asked me to post my case here from #9126. TLDR: the new resolver is (only) 3x slower in my case.

Basically, I use
pip list --format freeze | sed 's/==.*//' | xargs --no-run-if-empty pip install --upgrade --upgrade-strategy eager
to convert my manual environment (after adding and removing packages, upgrading, downgrading, trying out things) into something that is as up to date as possible. That failed with the old resolver, but works great with the new one. It upgrades old packages that can be upgraded, and downgrades packages that are too new for some other package.

The only thing I wondered about is how much slower the new resolver was. It's about a factor 3 (42 vs 13 seconds, using pip==2.3.0 with and without --use-deprecated legacy-resolver). I though that maybe network requests would be the main issue, but pip list --outdated takes only about 20s with the exact same number of GET request (125). I was wondering how pip can spend ~30s just on resolving version, but again, in the context of this thread, I begin to understand what the problem is.

Feel free to use or ignore this comment as you seem fit ;)

> time pip list --outdated
Package           Version Latest Type
----------------- ------- ------ -----
gast              0.3.3   0.4.0  wheel
grpcio            1.32.0  1.33.2 wheel
h5py              2.10.0  3.1.0  wheel
lazy-object-proxy 1.4.3   1.5.2  wheel
protobuf          3.13.0  3.14.0 wheel

real    0m19.373s
user    0m19.718s
sys     0m0.721s


> time pip list --format freeze | sed 's/==.*//' | xargs --no-run-if-empty pip install --upgrade --upgrade-strategy eager

[...]

real    0m41.655s
user    0m38.308s
sys     0m1.786s


> time pip list --format freeze | sed 's/==.*//' | xargs --no-run-if-empty pip install --upgrade --upgrade-strategy eager
> --use-deprecated legacy-resolver

[...]

Successfully installed gast-0.4.0 grpcio-1.33.2 h5py-3.1.0 lazy-object-proxy-1.5.2 protobuf-3.14.0

real    0m13.064s
user    0m10.804s
sys     0m0.391s


> time pip list --format freeze | sed 's/==.*//' | xargs --no-run-if-empty pip install --upgrade --upgrade-strategy eager

[...]

Successfully installed gast-0.3.3 grpcio-1.32.0 h5py-2.10.0 lazy-object-proxy-1.4.3 protobuf-3.13.0

real    0m42.860s
user    0m39.015s
sys     0m2.000s

Do we have a good sense of whether these cases where it takes a really long time to solve are typically cases where there is no answer and it's taking a long time to exhaustively search the space because our slow time per candidate means it takes hours.. or are these cases where there is a successful answer, but it just takes us awhile to get there?

Well, it works with the old resolver without error but not with the new one- does that answer the question?

That is a huge overhead, although pip's download cache helps a lot here (maybe you could persist pip's cache in your CI setup?) On the plus side, it only hits hard in cases where there are a lot of dependency restrictions (where the "obvious" choice of the latest version of a package is blocked by a dependency from another package), and it's only tended to be really significant in cases where there is no valid solution anyway (although this is not always immediately obvious - the old resolver would happily install invalid sets of packages, so the issue looks like "old resolver worked, new one fails" where it's actually "old one silently broke stuff, new one fails to install instead").

This doesn't help you address the issue, I know, but hopefully it gives some background as to why the new resolver is behaving as it is.

We do persist the cache, but it literally never finishes.

I do understand how the resolver works, but my point is that understanding it doesn't make the problem go away. This level of over overhead is literally orders of magnitudes more than the previous version- or of any other package manager out there.

I understand the legacy decisions that had to be supported here, but frankly until the issue of performance is addressed this version of the resolver should not be the default. PyPi should be sending out the already computed dependency data, not forcing us to build dozens of packages over and over again to generate the same data that hundreds of other people are also regenerating. I understand that this is in the roadmap, pending funding, but it's my opinion that this resolver is not ready for production until this issue is addressed.

I have to leave my thoughts here: I agree with @tedivm that this resolver is not ready for production. The UX of having pip run for 10s of minutes with no useful output is terrible. Right now pip is producing an ungodly amount of duplicative text(which is probably slowing down the search) Requirement already satisfied: ....

If the resolver fails on the first attempt(I assume pip tries to install the latest versions) I think pip should print out the constraint violations immediately. Or add options to limit the search to N attempts, or only make M attempts for a given package. And maybe after some amount of attempts pip should print the situation with the least amount of constraint violations. As it stands I have to just Ctrl-C pip when it runs for too long (10 minutes is too long) and I get no useful information from having waited.

My python packages installment is taking too long and then Jenkins CI/CD pipeline fails after 2 hrs.

Collecting amqp==2.5.2
14:27:30 Downloading amqp-2.5.2-py2.py3-none-any.whl (49 kB)
14:27:30 Collecting boto3==1.16.0
14:27:30 Downloading boto3-1.16.0-py2.py3-none-any.whl (129 kB)
14:27:30 Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /root/.local/lib/python3.6/site-packages (from boto3==1.16.0->gehc-edison-ai-container-support==3.5.0) (0.10.0)
14:27:30 Requirement already satisfied: s3transfer<0.4.0,>=0.3.0 in /root/.local/lib/python3.6/site-packages (from boto3==1.16.0->gehc-edison-ai-container-support==3.5.0) (0.3.3)
14:27:30 Collecting botocore==1.19.26
14:27:30 Downloading botocore-1.19.26-py2.py3-none-any.whl (6.9 MB)
14:27:30 Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /root/.local/lib/python3.6/site-packages (from botocore==1.19.26->gehc-edison-ai-container-support==3.5.0) (2.8.1)
14:27:30 Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /root/.local/lib/python3.6/site-packages (from boto3==1.16.0->gehc-edison-ai-container-support==3.5.0) (0.10.0)
14:27:30 Requirement already satisfied: urllib3<1.27,>=1.25.4 in /root/.local/lib/python3.6/site-packages (from botocore==1.19.26->gehc-edison-ai-container-support==3.5.0) (1.25.11)
14:27:30 Collecting celery==5.0.2
14:27:30 Downloading celery-5.0.2-py3-none-any.whl (392 kB)
14:27:30 INFO: pip is looking at multiple versions of botocore to determine which version is compatible with other requirements. This could take a while.
14:27:31 INFO: pip is looking at multiple versions of boto3 to determine which version is compatible with other requirements. This could take a while.
14:27:31 INFO: pip is looking at multiple versions of <Python from Requires-Python> to determine which version is compatible with other requirements. This could take a while.
14:27:31 INFO: pip is looking at multiple versions of amqp to determine which version is compatible with other requirements. This could take a while.

why pip is looking at multiple versions?

Is there any resolution for this.

@Nishanksingla I see one item in the output you have copied here:

14:27:31 INFO: pip is looking at multiple versions of to determine which version is compatible with other requirements. This could take a while.

Is that literally what pip output, or did you remove the name of a package?

Also, I recommend that you take a look at the tips and guidance in this comment.

I updated the comment, git was not showing <Python from Requires-Python>

Well, it works with the old resolver without error but not with the new one- does that answer the question?

No. The old resolver would regularly resolve to a set of dependencies that violated the dependency constraints. The new resolver is slower, in part, because it stops doing that, and part of the work to stop doing that makes things slower (partially for reasons that are unique to the history of Python packaging).

PyPi should be sending out the already computed dependency data, not forcing us to build dozens of packages over and over again to generate the same data that hundreds of other people are also regenerating. I understand that this is in the roadmap, pending funding, but it's my opinion that this resolver is not ready for production until this issue is addressed.

This is not actually possible in all cases.

Basically we have wheels, who have statically computed dependency information. This currently requires downloading a wheel from PyPI and extracting this information from that wheel. We currently have plans to do that for sure in Warehouse.

Wheels are the easy case... the problem then comes down to sdists. Historically sdists can have completely dynamic dependency information, like something like this:

setup(
    install_requires=[random.choice(["urllib3", "requests"])]
)

is a completely valid (although silly) setup.py file, where it isn't possible to pre-compute the set of dependencies. A more serious example would be one that introspects the current running environment, and adjusts the set of dependencies based on what it's discovered about the current environment, this could sometimes be as mundane as based on the OS or the Python version (which in modern times we have static ways to express that, but not everyone is using those yet) or things like what C libraries exist on a system or. something like that.

Thus for sdists, we have some cases we end up where some of them could have static dependency information (but currently don't, though we have a plan for it) but some of them cannot and will not, and in those cases backtracking through those choices are basically always going to be slow.

So our hope here is to speed up the common cases by having what static sets of dependencies we can compute be available as part of the repository API, but there's always a chance that some project can exist in a state that triggers this slow behavior even with those improvements (and this can happen with all package managers that use a system liek this, however Python is in a worse position because of our dynamic dependency capability).

I think it's probably true that more people were hitting the "bad case" than expected, one of the reasons I asked if these slow things eventually end with a resolved set of dependencies, or if they end with an unresolvable set of dependencies. If they are typically errors, then it makes sense to just bomb out sooner with an error message, because our heuristic can be "if we have ot backtrack more than N times, we're probably heading towards a failure". If they typically end with success, but it jsut takes awhile to get there ,then that suggests it would be better to invest in trying to make our heuristics for picking candidates smarter in some way, to try to arrive at a solution faster.

One thing whic is surprising me is that I am not getting this issue in my system when I install my requirements.txt with pip 20.3 and python3.6 in a virtual environment.
But for the same requirements.txt I am getting issue in my Jenkins pipeline.
Any ideas?

Well, it works with the old resolver without error but not with the new one- does that answer the question?
No. The old resolver would regularly resolve to a set of dependencies that violated the dependency constraints. The new resolver is slower, in part, because it stops doing that, and part of the work to stop doing that makes things slower (partially for reasons that are unique to the history of Python packaging).

When I say "without error" I was speaking literally- the violation you're saying could happen did not. Normally when it breaks things it says so- ;ike you get an message saying something like "Package wants X but we installed Y instead". I am explicitly saying that we got no such message.

When we run the legacy resolver on the same set of dependencies as the the new resolver the old legacy resolver comes back with a valid working set of dependencies in one minute and twenty nine seconds, while the old resolver fails after timing out our CI systems with 20 minutes of nothing.

Any ideas?

Environmental differences perhaps? It's possible to have dependencies conditional to the environment that pip is running in (python version, OS, platform etc)

Tips on interpreting the logs might be useful here - I can see what packages pip is searching through, but it's not clear what constraint is failing leading to this search.

There is an undocumented + unsupported option that I'd added for my own personal debugging: PIP_RESOLVER_DEBUG. No promises that it'll be in future releases or that there won't be a performance hit, but right now, you can probably use that. Moar output! :)

Normally when it breaks things it says so- ;ike you get an message saying something like "Package wants X but we installed Y instead". I am explicitly saying that we got no such message.

Oh interesting! Are you sure you're not suppressing the error message? (there's a CLI option, env var or config file that can do this -- pip config list would help identify the last two)

If not, could you post reproduction instructions in a Github Gist perhaps, and link to that from here?

PS: I've worked on/written each of the components here - the warning, the old resolver and the new one, and AFAIK what you're describing shouldn't be possible unless I've derped real hard and no one else has noticed. ;)

My experience is the following:

  1. The new resolver with backtracking is straightforwardly too slow to use (would be very helpful if there were a flag to just hard fail it as soon as it starts to backtrack), so the obvious workaround is just to snapshot dependencies that we know work from a legacy pip freeze into a constraints.txt file as a stopgap. (God knows how we're going to regenerate that file, but that's a problem for another day).

  2. Uh oh, looks like we still have a conflict even though we know that the versions work, but luckily the project we depend on has fixed its dependencies on master, so let's just depend on the git URL. Ahh, cool, that doesn't work (#8210), those belong in requirements.txt.

  3. A few more issues, including a hard failure on bad metadata for a .post1 version (#9085 apparently didn't fix, or thinks this is a real failure) -- so now I'm manually editing the constraints.txt and adding comments explaining that this file is going to need to be maintained by hand going forwards.

4; Everything seems resolved, and now I'm in an apparently infinite loop (who knows, I stopped it after 33k lines were printed to stdout) in which the following lines are printed over and over and over again:

Requirement already satisfied: google-auth<2.0dev,>=0.4.0 in /Users/max/.virtualenvs/internal/lib/python3.7/site-packages (from google-api-core<1.24,>=1.16.0->dbt-bigquery@ git+https://github.com/fishtown-analytics/dbt.git#egg=dbt-bigquery&subdirectory=plugins/bigquery->-r python_modules/elementl-data/requirements.txt (line 4)) (1.23.0)
Requirement already satisfied: google-auth<2.0dev,>=0.4.0 in /Users/max/.virtualenvs/internal/lib/python3.7/site-packages (from google-api-core<1.24,>=1.16.0->dbt-bigquery@ git+https://github.com/fishtown-analytics/dbt.git#egg=dbt-bigquery&subdirectory=plugins/bigquery->-r python_modules/elementl-data/requirements.txt (line 4)) (1.23.0)
Requirement already satisfied: six>=1.14.0 in /Users/max/.virtualenvs/internal/lib/python3.7/site-packages (from dbt-bigquery@ git+https://github.com/fishtown-analytics/dbt.git#egg=dbt-bigquery&subdirectory=plugins/bigquery->-r python_modules/elementl-data/requirements.txt (line 4)) (1.15.0)
Requirement already satisfied: requests<2.24.0,>=2.18.0 in /Users/max/.virtualenvs/internal/lib/python3.7/site-packages (from dbt-core@ git+https://github.com/fishtown-analytics/dbt.git#egg=dbt-core&subdirectory=core->-r python_modules/elementl-data/requirements.txt (line 3)) (2.23.0)
Requirement already satisfied: pytz>=2015.7 in /Users/max/.virtualenvs/internal/lib/python3.7/site-packages (from Babel>=2.0->agate<2,>=1.6->dbt-core@ git+https://github.com/fishtown-analytics/dbt.git#egg=dbt-core&subdirectory=core->-r python_modules/elementl-data/requirements.txt (line 3)) (2020.4)
Requirement already satisfied: googleapis-common-protos<1.53,>=1.6.0 in /Users/max/.virtualenvs/internal/lib/python3.7/site-packages (from dbt-bigquery@ git+https://github.com/fishtown-analytics/dbt.git#egg=dbt-bigquery&subdirectory=plugins/bigquery->-r python_modules/elementl-data/requirements.txt (line 4)) (1.6.0)
Requirement already satisfied: setuptools>=34.0.0 in /Users/max/.virtualenvs/internal/lib/python3.7/site-packages (from google-api-core<1.24,>=1.16.0->dbt-bigquery@ git+https://github.com/fishtown-analytics/dbt.git#egg=dbt-bigquery&subdirectory=plugins/bigquery->-r python_modules/elementl-data/requirements.txt (line 4)) (50.3.2)
Requirement already satisfied: six>=1.14.0 in /Users/max/.virtualenvs/internal/lib/python3.7/site-packages (from dbt-bigquery@ git+https://github.com/fishtown-analytics/dbt.git#egg=dbt-bigquery&subdirectory=plugins/bigquery->-r python_modules/elementl-data/requirements.txt (line 4)) (1.15.0)

Baffling. I've attached the requirements.txt, and constraints.txt. The setup.py runs as follows:

from setuptools import setup

setup(
    install_requires=[
        "boto3",
        "dagster_aws",
        "dagster_dbt",
        "dagster_gcp",
        "dagster_pandas",
        "dagster_slack",
        "dagster",
        "dagstermill",
        "dbt",
        "google-cloud-bigquery",
        "idna",
        "nltk",
        "pandas",
        "pybuildkite",
        "requests",
        "slackclient",
        "snowflake-sqlalchemy",
        "tenacity",
    ],
)

requirements.txt
constraints.txt

@mgasner I imagine you'd benefit from adopting pip-tools, and performing the dependency graph construction and dependency management as a separate step from installation. :)

Oh interesting! Are you sure you're not suppressing the error message? (there's a CLI option, env var or config file that can do this -- pip config list would help identify the last two)

This is in CircleCI so it's not trivial to run the command, but I've seen these messages before in CircleCI with these containers and we're not overriding things so I have no reason to believe we're suppressing anything.

If not, could you post reproduction instructions in a Github Gist perhaps, and link to that from here?

I do appreciate you all looking into it, and will definitely try to help replicate it- but since it involves some private libraries of ours (pulled from github repos) I'll have to put some effort in and can't promise it'll be quick.

@pradyunsg Yes, it's crystal clear that using the dependency resolver to resolve dependencies is a nonstarter, but that is not the issue I encountered here -- that's the starting point.

A note to everyone reporting problems here:

Hi. I'm sorry you're having trouble right now. Thank you for sharing your report with us. We're working on the multiple intertwined problems that people are reporting to us.

(If you don't mind, please also tell us what could have happened differently so you could have tested and caught and reported this during the resolver beta period.)

FYI: PEP-643 (Metadata for Package Source Distributions) has been approved. πŸš€

So earlier I predicted that this would force people to stop supporting valid versions of packages simply because of the dependency issues, not because of any actual programmatic problem with them. This is already happening-

Screen Shot 2020-12-02 at 12 05 03 PM

This change is going to push people into being far, far more restrictive in the supported versions and that's going to have ramifications that I really hope people have considered.

This change is going to push people into being far, far more restrictive in the supported versions and that's going to have ramifications that I really hope people have considered.

The hope is that people would provide reasonably strict requirements than the collective community traditionally prefer. It is very rarely, when users ask for β€œrequests” (for example), that really any version of requests would do; but Python packaging tools traditionally β€œhelp” the user out by naively settling on the newset possible version. My hope is that Python package users and maintainers alike would be able to provide more context when a requirement is specified; this would help all users, maintainers, and tool developers to provide a better co-operating environment.

One thing that might be worth considering is whether the reports of long resolution times share any common traits - the most obvious thing being a particular set of "troublesome" packages. I've seen botocore come up a lot in reports and I wonder whether it's got an unusually large number of releases, or has made more incompatible changes than other packages?

Obviously, it's not practical for us (the pip developers) to investigate packages on a case by case basis, but we need something more specific to get traction on the problem.

Maybe we could instrument pip to dump stats ("tried X versions of project A, Y versions of project B, ..., before failing/succeeding"), to a local file somewhere that we ask people to upload? But that's mostly what's in the log anyway, and it's less useful unless people let the command run to completion, so maybe it wouldn't be much additional help.

#9187 (comment)

One other idea toward this is, stopping after 100 backtracks (or something) with a message saying "hey, pip is backtracking due to conflicts on $packages a lot".

Let's do this -- and pick a number for this. And allow the user to pick a higher number from the CLI?

One thing to consider is how do we count toward that number. Say if X depends on Y. X==2.0 is pinned, Y were backtracked three times and ultimately all versions failed, so X is backtracked and pinned into X==1.0, where Y is backtracked another two times and finally found a working version. Does Y now have a backtrack count of 3 or 5? I can think of reasons why either may be better than the other.

nijel commented

I've seen botocore come up a lot in reports and I wonder whether it's got an unusually large number of releases, or has made more incompatible changes than other packages?

It indeed has an unusually high number of releases, it's being released nearly daily, see https://pypi.org/project/botocore/#history

It indeed has an unusually high number of releases, it's being released nearly daily

And as an example, it depends on python-dateutil>=2.1,<3.0.0. So if you to to install python-dateutil 3.0.0 and botocore, pip will have to backtrack through every release of botocore before it can be sure that there isn't one that works with dateutil 3.0.0.

Fundamentally, that's the scenario that's causing these long runtimes. We can't assume that a version of botocore from years ago might not have allowed any version of python-dateutil (even though 3.0.0 probably didn't even exist back then and in practice won't work with it), so we have to check. And worse still, if an ancient version of botocore does have an unconstrained dependency on python-dateutil, we could end up installing it with dateutil 3.0.0, and have a system that, while technically consistent, doesn't actually work.

The best fix is probably for the user to add a constraint telling pip not to consider versions of botocore earlier than some release that the user considers "recent". But pip can't reasonably invent such a constraint.

I've seen botocore come up a lot in reports and I wonder whether it's got an unusually large number of releases, or has made more incompatible changes than other packages?

The AWS packages are indeed released frequently, and probably the fact that they pin so strictly is the cause for the extensive backtracking. So it seems not only too loose, but also too strict requirement specifications can cause problems for the resolver. There are some tricks to force conflicts fast (e.g. choosing next package to solve based on least amount of versions still viable, and choosing packages that have least amount of dependencies, ref sdispater/mixology#5).

The classic exploding example is combining a preferred boto3 version with any version of awscli, because boto3 is restrictive on botocore, and awscli is restrictive on botocore as well.

Some libraries try to solve this issue by providing extras_require, e.g. aiobotocore[awscli,boto3]==1.1.2 will enforce exact pins (awscli==1.18.121 boto3==1.14.44) that are known to be compatible with each other (botocore being deciding here).

If either is requested without pin however, the resolver will have to consider a huge amount of versions to find one that requests overlapping botocore versions.

$ pipgrip --tree boto3==1.14.44 awscli==1.18.121
boto3==1.14.44 (1.14.44)
β”œβ”€β”€ botocore<1.18.0,>=1.17.44 (1.17.44)
β”‚   β”œβ”€β”€ docutils<0.16,>=0.10 (0.15.2)
β”‚   β”œβ”€β”€ jmespath<1.0.0,>=0.7.1 (0.10.0)
β”‚   β”œβ”€β”€ python-dateutil<3.0.0,>=2.1 (2.8.1)
β”‚   β”‚   └── six>=1.5 (1.15.0)
β”‚   └── urllib3<1.26,>=1.20 (1.25.11)
β”œβ”€β”€ jmespath<1.0.0,>=0.7.1 (0.10.0)
└── s3transfer<0.4.0,>=0.3.0 (0.3.3)
    └── botocore<2.0a.0,>=1.12.36 (1.17.44)
        β”œβ”€β”€ docutils<0.16,>=0.10 (0.15.2)
        β”œβ”€β”€ jmespath<1.0.0,>=0.7.1 (0.10.0)
        β”œβ”€β”€ python-dateutil<3.0.0,>=2.1 (2.8.1)
        β”‚   └── six>=1.5 (1.15.0)
        └── urllib3<1.26,>=1.20 (1.25.11)
awscli==1.18.121 (1.18.121)
β”œβ”€β”€ botocore==1.17.44 (1.17.44)
β”‚   β”œβ”€β”€ docutils<0.16,>=0.10 (0.15.2)
β”‚   β”œβ”€β”€ jmespath<1.0.0,>=0.7.1 (0.10.0)
β”‚   β”œβ”€β”€ python-dateutil<3.0.0,>=2.1 (2.8.1)
β”‚   β”‚   └── six>=1.5 (1.15.0)
β”‚   └── urllib3<1.26,>=1.20 (1.25.11)
β”œβ”€β”€ colorama<0.4.4,>=0.2.5 (0.4.3)
β”œβ”€β”€ docutils<0.16,>=0.10 (0.15.2)
β”œβ”€β”€ pyyaml<5.4,>=3.10 (5.3.1)
β”œβ”€β”€ rsa<=4.5.0,>=3.1.2 (4.5)
β”‚   └── pyasn1>=0.1.3 (0.4.8)
└── s3transfer<0.4.0,>=0.3.0 (0.3.3)
    └── botocore<2.0a.0,>=1.12.36 (1.17.44)
        β”œβ”€β”€ docutils<0.16,>=0.10 (0.15.2)
        β”œβ”€β”€ jmespath<1.0.0,>=0.7.1 (0.10.0)
        β”œβ”€β”€ python-dateutil<3.0.0,>=2.1 (2.8.1)
        β”‚   └── six>=1.5 (1.15.0)
        └── urllib3<1.26,>=1.20 (1.25.11)

@ddelange thanks for the analysis!

There are some tricks to force conflicts fast

We're exploring that option in #9211

The classic exploding example is combining a preferred boto3 version with any version of awscli, because boto3 is restrictive on botocore, and awscli is restrictive on botocore as well.

Unfortunately, I can't think of any way to address this without the package maintainers helping somehow (or users explicitly, and manually, constraining what versions they are willing to let pip consider).

Maybe we need a mechanism to mark versions as "too old to be worth considering by default". But that would need packaging standards to define how that information is exposed, and package maintainers to manage that information, so in practice I doubt it would be practical.

FYI: PEP-643 (Metadata for Package Source Distributions) has been approved. πŸš€

Ignoring the more platform-specific/legacy etc packages, would it theoretically become possible for pip to fetch all .whl.METADATA files for every version of a package in one big call to PyPI?

With proper caching both on pypa/warehouse servers side and on the pip-user side, it could be a huge speedup. As you mentioned earlier:

most resolution algorithms I've seen are based on the assumption that getting dependency data is cheap

@ddelange If the major cost is downloading+building packages, then yes. See pypi/warehouse#8254. :)

Edit: @dstufft discussed about this at some length in #9187 (comment).

2\. Start a binary search, cut the remaining candidates in half and try with that.
    2a. If it works, start taking the binary search towards the "newer" side (cut that in half, try again, etc).
    2b. If it fails, start taking the binary search towards the "older"side (cut that in half, try again, etc).

This isn't exactly the correct use of a binary search, because the list of versions aren't really "sorted" in that way, but it would kind of function similiarly to git bisect? The biggest problem with it is it will skip over good versions if the latest N versions all fail, and the older half of versions all fail, but the middle half are "OK".

I wonder if a noisy binary search/probabilistic bisection algorithm could make this approach more robust. https://github.com/choderalab/thresholds/blob/master/thresholds/bisect.py

And as an example, it depends on python-dateutil>=2.1,<3.0.0. So if you to to install python-dateutil 3.0.0 and botocore, pip will have to backtrack through every release of botocore before it can be sure that there isn't one that works with dateutil 3.0.0.

The best fix is probably for the user to add a constraint telling pip not to consider versions of botocore earlier than some release that the user considers "recent". But pip can't reasonably invent such a constraint.

Apologies if this has already been thought of and nixed

I wonder if pip could make an assumption here though.

Given packages A and B where A depends on B. If A version 10 supports versions B <= 5, I think pip could assume that versions of A < 10 don't support versions of B > 5. In my experience, packages rarely use upper limits on dependencies. And when they do, they rarely decrease in number (usually maintainers bump versions, not decrement them). It seems unlikely that A==9 would support B<=6 and decrement to B<=5 on the next release. And if it did, the user could still get pip to solve this environment by providing an explicit constraint.

I think this would help the case where pip keeps backtracking on botocore versions - it can check the most recent version, see it doesn't support python-dateutil 3.0.0 and bail out early (since it assumes older versions of botocore don't support a newer version of python-dateutil.

Maybe we need a mechanism to mark versions as "too old to be worth considering by default". But that would need packaging standards to define how that information is exposed, and package maintainers to manage that information, so in practice I doubt it would be practical.

@pfmoore I had a number of early failures where the dependency resolver was wandering back over a dozen package versions and crashing out when trying to get package metadata from packages so old that their setup.py wasn't compatible with Python 3...

  1. I'd like to hope the dependency resolver is being smart enough to not download a version a version that it knows is python2 only... not sure exactly the best way how, but just stating my sentiment on the matter.
  2. Giving up before downloading the n+1th (for an arbitrary reasonable value of n) versions of a package with some kind of informative message about version constraints in the event of a failure, is probably smarter than downloading 100+ packages until the build times out.

Not specifically addressed at you @pfmoore, just my general feedback after dealing with (and researching into) the new resolver in the latest published release of pip for several (>6) hours today.

  1. The issue behind most people's issues seems to be less a case of "it couldn't find a valid solution" and more the case that the new resolver is being pathologically persistent in its effort to resolve things. It feels like the new resolver has no safeguards against obvious infinite loop and pathological worst case performance times. Why would anyone want to let pip run for over an hour as default behaviour. There are several suggestions along the line of a "fail fast" or "give up early" option, which when put along side the issue of the resolver fetching dependency information for a single package, going back 50 versions (see my lxml example in point 4) as it gets slower and slower over the course of an hour on one CI build... doesn't feel like this was ready to be made the default.

  2. The only way I was able to get even close to acceptable performance was with --use-feature=fast-deps but even this is a partial mitigation as the dependency resolver will happily walk back so far that it can no longer get the information this way, and starts doing things the slow way again. My best example being lxml which used the fast path from versions 4.6.1 until 3.7.2 Then the slower download the package tarball method was in use from 3.7.1 until that CI run timed out after 60 minutes, having walked all the way back to 2.0.8.

  3. CI systems for docker containers frequently contain no pip cache due to the limitations of various combinations of available docker building tools. Regardless of other issues going on, measuring performance of the resolver must include some kind of performance testing for this case, reasonable size projects of at least 100 total dependencies with a 100% clean build environment, nothing cached.

    • For interested people troubleshooting their own performance problems with the new resolver, I've found 1 partial mechanism around this https://stackoverflow.com/questions/58018300/using-a-pip-cache-directory-in-docker-builds but it relies on experimental new features, so I'd caution anyone from relying on it, and even with this to pull the cache out from the docker environment onto the host... there is the issue of limitations imposed by CI systems that reside outside of the docker build itself... such as in my own case where even if I did pull a pip cache out of the container onto the build host filesystem, the CI service would throw that filesystem content away within 60 minutes of the build box being idle.

Edit- fixed typos

Apologies if this has already been thought of and nixed

It's certainly something that's been implicit in our discussions, but we may never have explicitly covered it. So thanks for bringing it up.

I wonder if pip could make an assumption here though.

The problem is, this is very dangerous ground. There are a huge number of assumptions pip could make which would simplify things, but experience has shown that whenever we do make such assumptions, we find some package that breaks them. And "making an invalid assumption" is pretty much guaranteed to be reported to us as "a bug" πŸ™‚

Your example seems very reasonable, but what if a package released a version supporting python-dateutil <= 3.0, but then added a feature that relied on behaviour that was removed in dateutil 3.0, so they released their next version depending on python-dateutil < 3 as a short term approach while they tried to implement their feature in a way that worked with newer versions of dateutil?

I think if we started using heuristics like this, we'd need to add ways to allow users to control if they are enabled. And that gets complex very fast.

I like the idea in principle (I wish projects wouldn't do weird things that make my job as a pip developer hard πŸ™‚) but the practicalities will quite likely make it infeasible.

@pfmoore Along the lines of what I was getting at in point 3 of my comment #9187 (comment) - A "simple default" heuristic solution here where "default package version backtrace depth = N" and a command line flag to override this with a user defined value of M would solve some of the issue with with botocore and other packages that walk back far too many versions. For the new resolver I think that simple heuristics like this, with well documented defaults, helpful error messages, and easy configuration... could go a long way to mitigating the bug reports that could come from more complicated "smart" approaches.

I think if we started using heuristics like this, we'd need to add ways to allow users to control if they are enabled. And that gets complex very fast.

A fair point. I wholly relate to the "too many knobs" on an OSS project. I understand that making assumptions comes at a cost here, and trust your (and the pip team's) judgement. That said, I think the assumption above is a fair one given a well-designed escape hatch. Perhaps:

  • User's that need an older version could set a constraint for that version? This changes the workaround behavior for the new resolver from forcing users to explicitly set lower bounds on requirements to speed things up to setting upper bounds on dependencies requiring an older version due to a dependency upper-bound downgrade.

  • Pip might slightly make the assumption described above, but also backtracks a small amount as needed (with a configurable number of backtracks). This makes the assumption that decreases in a dependency max version are rare and shortlived (e.g. release a patch release that downgrades the dep, fix in the next version or two and remove the cap again) and would solve:

    what if a package released a version supporting python-dateutil <= 3.0, but then added a feature that relied on behaviour that was removed in dateutil 3.0, so they released their next version depending on python-dateutil < 3 as a short term approach while they tried to implement their feature in a way that worked with newer versions of dateutil?

That said, I haven't thought about this nearly as much as y'all have. Just my 2 cents.

@jcrist's proposal makes the same assumption that @dstufft's proposal for binary search does: that the minimum required dependencies of a package is usually ordered through the versions of the package

I suppose the question is whether the cases where this is not true occur frequently enough that it is not worth speed ups of the resolver.

As absurd as it might sound... I wonder if anyone has used all the available package metadata on PyPI to perform an actual analysis of the situation. It should be possible to discover a reasonable lower bound for how often such decreases in a dependency maximum version actually happen.

I'm not able to respond as fast as the suggestions are coming in now, so I'll leave this discussion for now and review later. But can I just say thanks to everyone for the constructive and helpful suggestions. This is a really tricky problem to get right, and new ideas and perspectives are really helpful!

As absurd as it might sound... I wonder if anyone has used all the available package metadata on PyPI to perform an actual analysis of the situation. It should be possible to discover a reasonable lower bound for how often such decreases in a dependency maximum version actually happen.

Not absurd at all. I've an ongoing piece of work trying to collect that data for analysis. The problem is, I've got the stuff available from the PyPI JSON API, and that's been a pretty big job to collect. But getting dependency data means downloading every wheel on PyPI, as well as building all the sdists (and even then, for sdists I only get metadata that applies to my system). I've yet to even work out where to begin with that task. Even "just pick representative packages" isn't practical, as I wouldn't have looked at "botocore" and assumed it was an important package, if I hadn't seen it come up here so often.

(If anyone has a PyPI mirror and could extract the medatada files from all the wheels, and publish just that data somewhere, that would be very useful).

So yes, it's a reasonable suggestion, but the logistics of downloading the whole of PyPI in order to get the data makes it a lot harder than you'd think. (The pip devs don't have any privileged access to PyPI that might make this easier, unfortunately).

I'm going to let things accumulate overnight and see how the conversation has moved on. But for now... @pfmoore (and anyone else curious ... its a bit late for me to start running this sort of thing at this hour of night) if you're curious theres a few starting points I found while looking into if it had been attempted that are from other people who have done PyPI analysis "the hard way" in the past.

... edit: Last thought, for anyone looking into this idea, don't forget to check out how pip does the dependency lookup for --use-feature=fast-deps because it will likely make a big difference in how easily you get the metadata for so many versions.

@techdragon These posts are mostly from a fair few years ago, and PyPI has significantly more packages now than it did before (here's a post from early 2019, showing the growth curve: https://blog.adafruit.com/2019/03/13/growth-in-the-python-ecosystem-python/). It's roughly exponential growth.

Many of these techniques now require significantly more resources than what they used to. I've had some visibility in a non-public effort to do this, and even with the biggest single EC2 machine from AWS, it was non trivial and called off before anything useful came out of it.

On the issue of analyzing (and possibly validating) archival metadata from packages on PyPI, folks may be interested in pypi/warehouse#474 (comment) and pypa/packaging-problems#264 , as listed in the "Audit and update package metadata" item in the list of fundable packaging projects.

(If anyone has a PyPI mirror and could extract the medatada files from all the wheels, and publish just that data somewhere, that would be very useful).

fwiw: I scraped PyPI in September 2019 and August 2020, using distlib to get the dependencies of each package, and dumped the results to a file. The results are messy and incomplete (I don't remember how distlib works out the dependencies, so I don't know exactly how incomplete), and I don't have time right now to clean them up, but that's now available here if it's useful to anyone: https://github.com/sersorrel/pypi-stats

The hope is that people would provide reasonably strict requirements than the collective community traditionally prefer. It is very rarely, when users ask for β€œrequests” (for example), that really any version of requests would do; but Python packaging tools traditionally β€œhelp” the user out by naively settling on the newset possible version. My hope is that Python package users and maintainers alike would be able to provide more context when a requirement is specified; this would help all users, maintainers, and tool developers to provide a better co-operating environment.

This is reasonable at the top- capping the version makes sense since you don't know if new versions are going to work (and this is obviously easier for packages that support semantic versioning).

The problem is in the other direction- more restrictive limits (we only support the latest three versions of library X for example) lowers the compatibility window between packages. If you've got one package that works with ~v1 but is set to only >v1.46 and another package that's capped at v1.40 you won't be able to find a match at all.

That's fine if it's for legitimately reasons, but if it's only happening to avoid dealing with excessive issues from the package manager then it seems like a problem. This is a simplified example- in the real world where people depend on more than two libraries, and maintainers (who often work for free) are sticking to only supporting a minimal versions set, it'll be far worse and result in a lot of cases where pip won't be able to find matches.

ashb commented

Even "just pick representative packages" isn't practical, as I wouldn't have looked at "botocore" and assumed it was an important package

https://pypistats.org/top

ashb commented

What @tedivm said is true -- by encouraging module developers to more tightly pin their requirements, you are asking for Version Hell where two modules are just impossible to install together, even if they would actually work.

Seriously PIP maintainers - please roll-back to the previous resolver and come back with better one, when all those problems get resolved. PLEASE. There is no way you are going to resolve all those issues. There is no way it can be done.

Just treat is as a learning opportunity. There is no problem with trying but when you see it does not work, there is no shame with withdrawing.

Hi, @potiuk! Thank you for sharing your advice and opinion. We will take it into consideration.

We ran a beta period for the new pip resolver to solicit bug reports, starting with the release of pip 20.2 in July. Folks who are reporting new issues to us: it would really help us if you could also tell us what could have happened differently so you could have tested and caught and reported the problems you're seeing during the pip resolver beta period. No matter whether we rip out the legacy resolver as planned, keep it around for longer, or do something else, your response on that point will help us with future rollouts.

I actually shared Airflow Story here already with full explanation of why we actually tried, but could not test the new resolver before #9203 (comment)

@potiuk Thank you for doing that! Other participants here - please do follow @potiuk's lead.

And to be very frank - this was not a complaint of any sort. It was just a piece of advice. I DO understand how hard job you have keeping half of the internet working! It was really suggestion, and observation as a user that there is no way you can fix it now, looking at the type and amount of problems we see. It's simply realistic (I think) assessment of the situation and suggesting the course of action you might take.

I do appreciate all the work you put in it! It's just not going to work this time and I think it's good you face the reality.

ashb commented

A second to both of @potiuk's points -- I've long bemoaned the lack of a "proper" solver in pip so I can't wait for the kinks to get worked out of this.

And the developer equivalent of HugOps to everyone the the pip team!

Maybe the backtracker should stop at 2 backtracks per default ?

Would it make sense to raise by default (and print out the test case) if total elapsed time exceeds a default threshold, as well?

(Note that the "generate an infinite chain of packages and slowloris" worst case here is still unbounded when there's only a backtrack limit)

@westurner no, because not everyone has fast internet connections and/or computers, so time based methods would make pip unusable by default for such users.

Number of backtracks would be a better heuristic.

This is a long thread, with many different opinions and suggestions. I'll throw my own experience in here too.

At work, I'm using the new resolver and it works very well. We have ~70 complex direct dependencies, including large ML frameworks such as pytorch, tensorflow, etc and also boto3 and the usual suspects like requests, Click, Flask. It resolves to a full dependency graph of ~267 dependencies successfully.

For production systems, I am very much of the opinion that dependencies should be locked ahead of time. A developer working on a project needs to understand what the impact is when they update a dependency or add a new dependency. I realize this workflow or opinion is not the only workflow or opinion. Im not too interested in discussing the pros or cons either. I'm only sharing what works well in my experience and what I believe leads to deterministic and good outcomes.

To lock our dependencies, we have very positive results using pip-tools which was mentioned earlier.

Our workflow is as follows:

  • Developer modifies requirements.in
  • Developer runs a script that ultimately invokes python3.7 -m piptools compile
  • Developer observes the resolution results and commits the locked requirements.txt file to the repo
  • CI only needs to use the locked requirements.txt file which is already guaranteed to resolve in a reasonable time

I recognize this only works for "end-users" and not other projects such as Airflow which need to have "looser" dependencies. However, I do think projects such as awscli, boto, airflow have some responsibility for setting reasonable constraints on their dependencies. Other language ecosystems such as Java and JavaScript are able to do this successfully, why shouldn't Python?

I'd also like to bring attention to some data that may assist with some static analysis of PyPI.
https://github.com/DavHau/pypi-deps-db

The link above is from mach-nix which tries very hard to make Python pleasant to use in Nix. I'm not suggesting the Python ecosystem needs to move towards such extreme mechanisms, but perhaps that database will help if somebody is looking to analyse things.

Reasonable default limits for a tool that should not download an infinite sequence of code to execute with install-level permissions:

  • number of backtracks
    • 100?
  • total runtime
    • 1hr?
  • number of packages
    • 1000?

These could be overridden with pip.conf and/or PIP_ environment variables.

We absolutely should make attempts to bound the resource consumption (bandwidth, cpu time) of the worst cases.

Pinning dependencies is a workaround which can introduce delay in time to patch for critical severity issues: with (e.g. SemVer) constraints, users needn't be locked to old versions.

(Edit) Though, to be fair, pip doesn't claim to manage the full software distribution lifecycle: pip install -U could break things, not work: we should not expect users to run pip install regularly or even again (until a container is rebuilt with --no-cache TBH)

Other language ecosystems such as Java and JavaScript are able to do this successfully, why shouldn't Python?

Javascript dependencies tend to be fairly unrestricted- I don't think locking down versions is the right answer here (although it'll be the one forced on people if this issue isn't resolved). They don't solve this problem by only supporting a couple of versions per package, they've managed to avoid the problem altogether by not supporting dynamic dependencies- every version of every package has a single set of dependencies.

This is the big difference- other languages don't force you to build the package to find out what the dependencies are, instead you just hit an API and get a list. That makes this process a lot less resource consuming. Python allows you to do all sorts of ridiculous things when it comes to your dependencies (you could literally decide to depend on a specific package only on wednesdays, for example), and as a result we're stuck between picking a resolver that doesn't technical work and one that literally doesn't work.

(Edit) Though, to be fair, pip doesn't claim to manage the full software distribution lifecycle: pip install -U could break things, not work: we should not expect users to run pip install regularly or even again (until a container is rebuilt with --no-cache TBH)

πŸ‘ Yes. OS patches need to happen as well. pip isn't responsible for everything. Patching can still be managed. It is certainly possible to run a CI job (weekly for example) that updates and locks dependencies moving from requirements.in -> requirements.txt via python3.7 -m piptools compile --upgrade. In any case, I didn't claim to have a solution for the general problem, just trying to help people having trouble now. It helps "end-users" more than a general solution for every package out there.

Python allows you to do all sorts of ridiculous things when it comes to your dependencies (you could literally decide to depend on a specific package only on wednesdays, for example), and as a result we're stuck between picking a resolver that doesn't technical work and one that literally doesn't work.

πŸ‘ Yes, although, there are movements towards more static metadata such as the previously mentioned PEP-643. None of these will happen quickly. Even with such a solution, a good resolver will be desirable.

Anyway, I'm unsubscribing now. I appreciate all the hard work everyone has put in here. I hope things don't get rolled back or anything drastic like that.

#9187 (comment) is good advice. As well:

  • Fix the depgraph constraints for the specific requirements combinations which apparently aren't halting
  • Heuristically determine whether an "infinite playlist" of packages will halt.
    • Pip's early advantage over lots of calls to easy_install in a platform-unportable Makefile or shell script: download all packages (and, crucially, execute each package's setup.py) before building anything
  • Bound local and network resource consumption by applying exceptional thresholds to tracing metrics that should also be useful for monitoring performance regressions and optimizations

Backtracking is not a bug, or an unexpected behaviour. It is part of the way pip’s dependency resolution process works.

IMHO it was a mistake to introduce this mechanism. Instead a very simple mechanism should have been used:

  1. always download and install the latest versions of everything
  2. if anything breaks, it is package authors' responsibilities to fix

There is an alternative to it, if you think that breakage should be automatically prevented:

modify python and its standards:

  1. introduce a db of pins. db of pins is an SQLite DB, containing the following information for each package: its name, its deps names, its deps versions pins, packages dependent on it. Everything is related to each other using integer ids, which are the same as SQLite rowids. There is a special version ccalled latest means that package developer saying "it must break if it is incomaptible, I'll fix it, but please break when it must break, don[t mask the issue".
  2. support multiple versions of packages. import machinery consults with the db of pins for each package.
  3. introduce a concept of integration tests. Integration tests are very small and fast tests allowed to rely for testing only on unittest module. No pytest, nose, hypithesis, etc. They must be registered in a manifest under integration_tests key and must be a part of a package, unlike usual tests, that mustn't.
  4. introduce a mechanism to override pins in runtime into importlib programmatically.
  5. introduce <= specifier in package metadata and treat it as a recommendation: "this was the maximum version it has worked for the author of the package. All versions below this one and above the minimum version are assummed as working." So, it will never prevent a package from being instal.ed anymore. Its effect will be to serve as the lower bound.
  6. stop processing < specifiers

modify pip:

  1. For a specific package and its specific dep version pin pip should have a mechanism to run tests, loading the module with the newest installed version of the dep. If it works, it updates db. If it doesn't, it either searches linearly or bisects. The strategy should be usually defined in package metadata, if it is not, it defaults to bisect. It also should be overrideable with a cli flag. Then unreferenced libs versions are uninstalled.

  2. Introduce a mechanism of installing side-by-side different libs versions. When a lib is upgraded, it is installed side-by-side with old versions. Then pip tries to upgrade its pin version for all the packages depending on it in topological order.

  3. introduce a command to pip to manually trigger a try to upgrade all deps pins of deps of a package to their max versions the package works with.

  4. Optionally we can modify setuptools to update <= specifiers by running the tests and writing the pins to the metadata.

ashb commented

I recognize this only works for "end-users" and not other projects such as Airflow which need to have "looser" dependencies. However, I do think projects such as awscli, boto, airflow have some responsibility for setting reasonable constraints on their dependencies

What do you define as resonable? To me, reasonable is "the largest possible range of versions that we can work with" -- that way the user has as great a chance of using Airflow with packages that we haven't tested. The same should be true of any library (vs a final application) -- if a library pins its dependencies too strictly, it becomes impossible to use it with specific combos of packages.

And as @westurner points out -- if a library pins dependencies to a specific version, it means that it and all of its downstream packages need to release a new version when ever a security fix to an upstream project is released. That is simply not workable.

Fully agree with @ashb here. We've spend a lot of time in Airflow on making our dependencies work well and we do everything mentioned here. We build our CI (very complex system to do that with @ashb often - rightfully - complained for complexity of) but I relentlessly pushed some of that to get where we are now and which responds to many of those things mentioned:

  1. we automatically update base Python images we are building on to get the latest OS/security base patches. Whenever new version of Python images is released, our images automatically rebuild at the next master build and new Airflow base images are pushded to dockerhub using those security patches.

  2. we keep our requirements in setup.py as open as possible. Airflow is a "platform" not an "end user application" and we expect that people will install their own libraries, dependencies and what not. We cannot fix those requirements as "required".

  3. we "constraint" the requiremetns to "known well set of constraints" and use --constraint for all the PRs because many times it happend in the past that transitive dependencies started to break many PRs of many people at the same time - this way we get stability of PRs.

  4. we AUTOMATICALLY update our "known well set of constraints" - whenever we push a commit to master we rebuild our images and run pip install --upgrade --upgrade-strategy eager - then we run all tests, then we run pip check and only when all that passes, we update the constraint files to the new ones. And with > 420 dependencies it happens DAILY.

  5. Last but not least - we also need to provide our users a way to install base airflow in reproducible way. Due to open-ended nature of the deps (see point 2 above) we had many cases in the past when somoene releasing a dependency of a dependency broke our installation. And we figured out solution which joins both worlds - open-ended limits but also fixed constraints. Our official installation command for airflow is (http://airflow.apache.org/docs/apache-airflow/stable/installation.html):

AIRFLOW_VERSION=1.10.12
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
# For example: 3.6
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
# For example: https://raw.githubusercontent.com/apache/airflow/constraints-1.10.12/constraints-3.6.txt
pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

We are using this approach successfully for quite some time, but it is hell-of-complexity and possibly 20-30% of my time over last year was spend on building and maintaining it (I am one of the lucky people who gets paid for full-time open-source work which is not very common as well).

And I think many other projects would love to use similar approach. Not only boto/awscli. There are many python project that are not "complete standalone application" - those are tools or platforms that expect other libraries to be installed and have their own dependencies that cannot be pinned due to conflicts. This is not good, and I think a good way of solving it would be to built in an easy way to manage similar workflow. I'd love to be able to something like I descibed above without inveniting it all and developing from the scratch.

I think the new resolver is a step back. It does not solve any of those problems above, it is sometimes orders of magnitude slower than the old one, it has heuristic behavior which means that you cannot provide any stability when even one dependency change, it is broken in many places (including the horrible #9203 where any popular package maintainers can break everyone's installation in a left-pad style), it goes in endless loops where by mistake you have conflicts in your constraints, it does not recognise extras in wheel dependencies.

I think really the change should be rolled back and the PIP maintainers should go back to the drawing board and think how to improve the workflow rather than introduce this huge breaking thing. This is a huge distraction to whole community IMHO. I think trying to fix all these problems under the fire right now is really bad idea - as I understand that the goal of this change was nice refactor and clean code. but if you are going to patch all those problems now, what you end up will be a total mess and some spaghetti-code of quick bugfixes that you will not solve for years. So quite the opposite of what your intentions were.

Thanks to everyone for their perspectives here. We're looking at the implications and trying to take them into account, but I will note that we've also had a lot of feedback from people who are very happy with the new resolver, where it's solved long-standing and frustrating problems for them. So please don't feel that we're ignoring the cases where the new resolver is suboptimal, but we have to balance usability for all of our users, and that's a difficult thing to get right. Simply rolling back would harm at least as many projects as continuing to improve the new resolver incrementally.

We also have to acknowledge that the current packaging ecosystem has flaws which make any sort of resolver problematic - we would love to have dependency information available for all packages, statically and cheaply. We're working on that, but it's a slow process, and while it would be straightforward to mandate static metadata for probably 99% (warning: made up number) of projects on PyPI, it's extremely difficult to get the remaining few projects to change - and breaking them would provoke the same sort of response as we're seeing here.

So please be patient with us. It's not ideal that we're seeing these issues in a live release of pip, but we've put a lot of work into publicity, pre-releases and advance testing. It didn't catch everything, and we're asking for feedback from people to try to improve that process in future, but we are doing our best to minimise disruption. (Not eliminate it, that's not possible with a change of this scale, as I hope everyone appreciates).

as I understand that the goal of this change was nice refactor and clean code

I need to correct you here. Pip's developers developed and rolled out this change because so many other improvements are blocked on it, such as:

and because it would fix so many dependency issues for our users, such as:

and, in our larger ecology, because the old behavior causes installation problems, examples including:

(Folks here might also be interested in GitHub issue #988 where this was initially requested, an in-depth explanation by Sebastian Awwad of the problem & our approach, and issue #6536 where we worked on planning the rollout.)

The nonprofit Python Software Foundation was able to get USD $407,000 in funding (in total) from Mozilla Open Source Support and the Chan Zuckerberg Initiative to hire contractors to work on the new pip resolver and related user experience issues because so many users need or want it. (If you're interested, you can take a look at our meeting notes from this year's work, as well as our blog and forum and mailing list posts, YouTube video, and podcast interviews we did to publicize the changes.) We did not decide to do this simply for the sake of a refactor and clean code; we did it to unblock a lot of features and bugfixes for pip and for Python packaging more generally, and to fix pain that many users currently have. And some users are already reporting happiness as a result of the changes (example, example, example, example.)

This does not mean that the rest of the things you have said are invalid! But I figure people here should be clear on why we have made this change.

Thanks for explanation. I perfectly understand it's not only the refactor, and it was not my intention to imply it.

I do appreciate the hard work you do - as I build on top of it. and I do understand a number of people will benefit from it, however my suggestion is there to - maybe - just withdraw for a time being, and come back when the issues are worked out - learning from the experience, knowing them now with all the "problematic" projects that you can use as a testing ground (i am super happy to test it with Airflow).

I do not want to discard the work you'we done. I simply point out, that from those issues and possibly there were at least few architectural decisions that could be changed as a result of seeing the problems the new resolves creates (on top of solving a number of others). Even more so if your resources are strapped - If I was to make a decision for your team (I am not of course - this is merely a suggestion from a worried user) - I would rather roll back and iron out all those issues (again - using the help of all the people who actually raised the issues) in a calm atmosphere and giving yourself quality time to think and do it properly, rather than act under the fire from all those projects out there. This is what I mean by "back to the drawing board". It's easier to stand by the board and think while you do not have to put out the fire at the same time.

When I look at the situation and how many different problems you have , this is going to take days if not weeks to resolve. And you will get new users banging your doors more and more every day, I believe. I think most of the serious issues you have you have from those early adopters who run their CI automation and upgrade PIP automatically (a lot of people who actually run open-source projects and are your users), but soon you will get the "casual" users - who will find out all those problems tomorrow, the weekend and the week after and this will grow. This is my worry for you. And you can - surely - take the risk, but from what I see and a number of conversations I had with other people seeing the same problems, it is just a tip of the iceberg what you see now.

Another one is the strain you are putting on other projects like ours. We are releasing 1.10.14 just now, and we've managed to update the documentation and warn people to use the 20.2.4 - but you know how it is with the documentation. The errors people will get when they will not downgrade will make them completely confused as they have no idea about pip upgrade. Also - we can't upgrade documentation for past versions really - and it started to affect ALL our released packages. We are also yet to see the flood of people complaining and we will have to direct them to #9203 every single time. And we have no unlimited resources either,.

Now, you actually know what the impact is of the change. And you have to make a deliberate decision what to do. Bear in mind that your decision of not rolling back affects not only your time, but also it heavily impacts the time of people from possibly many, many projects, who will have to bear the consequences of your decision.

And also bear in mind that the longer you delay that decision, the worse. If it happens in a week or so, you will have to deal with those who already started to use the new resolver and their problems have been solved. Which might put you in a difficult situation that you will have really bad or very bad decision to make.

It's just for your consideration. I will not tell you what to do, you have to make your own decision and live with the consequences. But I were in your position, I'd simply roll-back now and came back when those "teething issues" are solved. Happy to help with testing - now when we managed to get non-conflicting constraints last week, It will be an easy task to do.

I think the correlation is the other way around. The old implementation is very sensitive to things like ordering, and it is not uncommon for people to exploit this to β€œfix” resolution. But since people are already doing that, maybe we could mimic the behaviour by resolving the requirements in a similar ordering.

We also have to acknowledge that the current packaging ecosystem has flaws which make any sort of resolver problematic - we would love to have dependency information available for all packages, statically and cheaply. We're working on that, but it's a slow process, and while it would be straightforward to mandate static metadata for probably 99% (warning: made up number) of projects on PyPI, it's extremely difficult to get the remaining few projects to change - and breaking them would provoke the same sort of response as we're seeing here.

Personally I think this change to the default resolver should wait until those issues are resolved, but obviously the cat is out of the bag now. This isn't the end of the world as long as we have workarounds- which the legacy provider gives us- so can the PIP team commit to not removing the legacy provider until these issues are resolved?

But how are we to know the issues exist if we don’t release the resolver as the default? πŸ™‚ The resolver has already gone through a period that almost begs people to test it, with the most involved publicity drive I have ever seen in the Python community. And the fact we still end up here is an indication that more delays to the public release would likely not help; the only way to gain real-world usage is to force the process, unfortunately, since most people simply won’t use the resolver (and likely not even aware of its existence!) unless it’s made the default.

can the PIP team commit to not removing the legacy provider until these issues are resolved?

This is the plan.

But how are we to know the issues exist if we don’t release the resolver as the default? The resolver has already gone through a period that almost begs people to test it, with the most involved publicity drive I have ever seen in the Python community. And the fact we still end up here is an indication that more delays to the public release would likely not help; the only way to gain real-world usage is to force the process, unfortunately, since most people simply won’t use the resolver (and likely not even aware of its existence!) unless it’s made the default.

I dunno, maybe what it all takes is to identify, say, 300 projects on PyPI with biggest number of dependencies and reach out to them to cooperate and coordinate testing with them? Otherwise what you are talking about is dangerously close to testing in production.

I think it's easy to blame the users that they have not done they part. You have to be a little empathetic, and see the perspective of the user. With > 420 dependencies for Airflow, I bet we'd fit in the list. And we got hell-busy last 2 months preparing for 2.0 release which is THE big release - the one that we worked for 2 years on.

And If I knew the exact date of release I could speed up my "conflict resolution" and I could flag the problems before. Remember that we have our own projects, we are not continuously monitoring what's going on in PIP, we are not following PIP discussions list. Pip is just a tool that works. I do not even know what kind of publicity you talk about to be honest :).

If pip works - that's great, but you cannot expect people to track what's happening in the project. The only indication was the warning that some-time the switch will happen. I took a notice and started working on conflict resolution, but by the time I finished, it was about when you released and I even did not know when it's going to happen. I guess vast majority of people who have now problems got similar story.

As I wrote above - your decision, but you have to be aware about consequences and how it impacts other projects.

But how are we to know the issues exist if we don’t release the resolver as the default?

The first thing that comes to mind is that you could scrape github for all python projects with requirements files and look at which ones fail to install and why. It seems like asking end users to do due diligence for such a centrally important project is a bit backwards. Great power->great responsibility and all that.

300 projects is a lot! It feels unfair to expect the pip maintainers, who are a few individuals with barely enough time to work on improving pip itself, to individually ask hundreds of their users to test upcoming changes. Surely that's the point of things like the pypi-announce mailing list, which this change was advertised on in July? (This sucks for everyone; putting the blame on one group of people is not helping.)

With > 420 dependencies for Airflow, I bet we'd fit in the list.

You would – but at the end of August, distlib identified only 336 dependencies for apache-airflow. Hopefully the discrepancy there shows that it might not be as simple to identify people who might be impacted by this change as you think.

@potiuk Please do tell us how you get news about coming changes to software that you depend on, so that in the future you can subscribe to the channels that will inform you better, or so that folks in Python packaging can put the right venues on their checklists.

Since we did know ahead of time that it is hard to get users' attention about infrastructure that they depend on, and we recognized that many people assume that they do not need to keep apprised of changes coming in pip, we did publicity and outreach (some of which is catalogued in #8511 or this wiki page), such as:

  • A message in pip output saying:

ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

(The announcements of future changes also attracted comment on Hacker News, on the r/python and related subreddits, and elsewhere on news websites in Japanese, Chinese, Russian, and other languages that I can only read via translation services.)

But of course there is room to improve, which is why I ask: what information sources do you pay attention to about software you depend on?

And per the conversation about funding in comments in #9203 (comment) : more money would definitely change the equation regarding what levels of publicity, pre-release testing, etc. we could support in the future. As an example: Pradyun was able to use some publicly available requirements.txt files to run some speed tests #8664 (comment) that helped us improve the speed of the new resolver, but we currently have very limited access to CI and related services, and a faster and more robust CI service would help us improve our automated testing capability by leaps and bounds (#7279 has some more on that and I can find more issues etc. if that's of interest). And with more money we could fund time to do more in-depth and systematic outreach to individual projects.

ashb commented

@brainwane I think part of the problem is that there's always going to be users/project like us who just don't get that involved in the "python" eco system, so unless it's in pip itself, we'll miss it.

And speaking for myself:

airflow ❯ pip --version
pip 19.3.1 from /home/ash/.virtualenvs/airflow/lib/python3.7/site-packages/pip (python 3.7)

I just haven't upgraded cos I don't unless something is broken generally. I'm not sure there was much more you could have done, other than go around create issues in each of the projects with problems -- which clearly isn't a realistic solution.

But of course there is room to improve, which is why I ask: what information sources do you pay attention to about software you depend on?

@brainwane None. Almost everyone only goes looking for information after something stops working. I don't think the problem here between devlopers and users is in how to distribute news better. I think the problem is rather expecting blog posts to solve this in the first place. Signing up for the mailing lists and user groups and blogs of every tool a person ever uses is not tenable.

FWIW, I'm excited about the future potential of this resolver. I just landed here because suddenly all of my CI builds started timing out after 5 hours because of infinite loops during setup and it wasn't happening yesterday.

Github has an excellent API. A tiny script that iterates through https://api.github.com/search/repositories?q=language:python&sort=stars&order=desc, calls git clone --depth=1 on each repo in a fresh venv, looks for "requirements.txt" and calls pip install -r requirements.txt or looks for setup.py and calls pip install ., and then adds an entry to a log file if the install fails or takes more than an hour to run would probably do the trick.

@ashb I hear you. I hope that, if nothing else, as a result of this discussion, you (or someone at your company) subscribes to pypi-announce, which is a very low-traffic list for Python packaging announcements that we started a few years ago while overhauling the Python Package Index and have been trying to publicize since then.

I agree regarding how difficult it would have been to do mass outreach to GitHub repositories: Hand-creating issues in literally hundreds of repositories is not something that was realistic under our budget, and using automated tools to mass-create those issues would have gotten us a lot of pushback for being spammy and reduced our credibility for future GitHub issues we opened with those projects.

[Now: speaking as an individual, not as a person who sometimes has a contract relationship with the PSF]

I also recommend that anyone who depends deeply on a piece of open source software, but doesn't want to pay a lot of attention to it on an ongoing basis, hire a vendor to keep an eye out for upcoming developments on their behalf.

[now, with all my hats on]

To everyone who's been trying to work with us in this issue and help get a clearer sense of problems and possible next steps, and who's been nice about it: Thanks for your responses and for your empathy; I really appreciate them.

I hope that, if nothing else, as a result of this discussion, you (or someone at your company) subscribes to pypi-announce

How many other mailing lists should I preemptively sign up for today as well? Because I use thousands of different open source software projects on a daily basis. Probably so do you? This kind of thing is going to happen. Breakage happens. It's unfortunate if nobody pushed early for automatic preemptive testing that doesn't require end users to do it, but that happens too. But saying "you should have signed up for the mailing list" is weird.