Relaxing / Ignoring constraints during dependency resolution
stonebig opened this issue Β· 125 comments
What's the problem this feature will solve?
Puting together some packages that have by default incompatible constraints.
Indeed:
- constraints are often meant by the package maintainer as:
. "accepting complains on this known/focus set of package",
. you're on your own support if you deviate, but not necessarly bad. - packages rarely focus on the same versions of complementary packages.
==> The new resolver may create more problems than solutions, when trying to build an environment with a large set of package.
Describe the solution you'd like
Be able to ignore voluntary some constraints
Wish:
- we can put some "relax" rule to over-rule too strict packages (for our need), because we know what we want:
pip install Spyder --relax relaxrules.r
, with relax file below meaning:
. if you want PyQt5, it must be 5.14.x
. if you want Jedi, it must be >=0.16
PyQt5~=5.14
Jedi>=0.16
Alternative Solutions
Today:
- I have to manually recompile from source "too strict" packages, to workaround this,
- or I would have to build one virtualenv per package,
- or I would only be able to use a specific Python distribution, with much older packages and Python version.
Additional context
Maintaining WinPython
** Current pip check **
- datasette 0.39 has requirement Jinja2~=2.10.3, but you have jinja2 2.11.2.
- astroid 2.3.3 has requirement wrapt==1.11.*, but you have wrapt 1.12.1.
** Current workaround **
- Spyder manually recompiled to accept :
. PyQt5-5.14.2 (as pip doesn't have a "long term support" of PyQt5-5.12, so the fresher version is safer)
other wishes:
- a basic GUI on Pip (tkinter or web) would still be nice, to have a better view of all the coming version conflicts.
@stonebig Would you mind separate the other wishes part into their own issues? It would be much easier to discuss them that way.
As for relxing, I am honestly uncomfortable with having such a impactful feature handy for the general audience. I was by chance also having a similar discussion in the Pipenv tracker, and both the Pipenv one and your are exactly situations I personally think the maintainers have a valid point to restrict the versions, and a user should not be able to override them easily. Itβs still useful to have this option somewhere since indeed there are packages with incorrect metadata out there, but pip is too low in the packaging management realm to implement the feature IMO.
Ok, separating other whishes in a few minutes.
this is moved on another issue:
- having the beautifull "pipdeptree" features in the standard pip:
. a beautifull description of what package needs (or is needed) by what package of what version,
. the possibility to get that programatically as json answers.
Thanks for filing this @stonebig! I've gone ahead and re-titled this to issue to be more clearly scoped.
We have seen multiple groups of users express interest in a feature like this. @pfmoore @uranusjr and I have had this come up in our discussions during our work on the resolver, and we are aware of this user need.
We don't know how exactly this would work and what approach we'd be taking here -- we're gonna visit this specific topic at a later date, once the new resolver implementation is at feature parity with the current resolver.
a basic GUI on Pip (tkinter or web) would still be nice, to have a better view of all the coming version conflicts.
This is a completely separate request, and can be built outside of pip and doesn't need to be built into pip. If someone wants to build this outside of pip and later propose bringing it into pip (with clear reasoning for why it can't live outside pip), that'd be perfect. I don't think pip's maintainers are going to be developing/integrating this into pip, and I welcome others to try to build such tooling on-top-of or outside of pip.
I think there has been a "pip GUI" project undertaken as part of IDLE in the past, but I don't have the time to take a look right now. :)
I hope that in the new resolver project, easy to use functions will be provided to facilitate emergence of a GUI project
Building a dsitribution WinPython is quite simple:
- download in a dedicated directory all the wheels (and version) you want,
- then pip install -r requirement.txt
- then one by one, try to fix all the problems:
- missing wheels,
- pip download --dest,
- or cgohlke site of wonders
- or github/gitlab (/pip-forge one day ?)
- non-existing wheels,
- do-compile-yourself (often fails, like for cartopy, or for Python-recent version)
- raise issues to package maintainer
- wheels whose beloved version of dependancies mutualy contradicts
- ask maintainer to relax or upgrade his/her dependancies (very slow process)
- recompile it yourself without the annoying constraint,
- go back in version to an older one (with potential security or known issues fixed since ages)
- or drop the wheel.
- missing wheels,
I dream of a way to reverse the problem:
- showing package maintainer how their 'too restrictive' constraints makes them incompatible with the rest of the world, (hence a GUI or a pypi website feature ?):
- give your requirements.txt
- precise your "beloved" package,
- the site/gui tells you what fits / what contradicts / what downgrade your package imposes
- or do a third kind of constraints on dependancies in wheel specification:
- supported constraints (you can speak of a problem to the maintainers when you have this "set"),
- a "support_requires" next to "install_requires" and "extra_requires" ?
Just to note that, while I agree that over-restrictive requirements can be an issue1, this is a fairly specialised use case. It's not that dependencies can't clash, but that putting together a Python distribution involves including (and managing) a lot of libraries that potentially have no "natural" reason to expect to be used together. So dependency clashes that the library maintainers haven't anticipated/haven't seen before are likely to be more common.
Using --no-deps
and manually managing dependencies for problem packages is one option here. It's tricky without some means of identifying where the problems lie, though - we're hoping to give good error reporting for dependency clashes in the new resolver, but how to best express the information is something we don't really know yet, so that may be something that will need to be improved over time. (It might also be possible for a 3rd party tool to help here - dependency resolution and dependency graph analysis and visualisation are somewhat different problems, and separate tools may be able to focus on the different aspects of the problem.)
It's also entirely possible that pip could have options to ignore or relax certain dependency constraints. As a general problem, it could be hard to get a good UI for this (we're currently explicitly doing user research into what users want from the new dependency resolution - @ei8fdb you may want to invite @stonebig to get involved in that, if they aren't already). And I worry that while such a feature would be invaluable for specialists like @stonebig, it could easily be abused by naive users ("Cannot install X because Y is installed" - "just say --ignore-dependency=Z") and generate more confusion than it addresses - that's a further trade-off that we need to consider.
Sorry, there's no immediate answers in this, but hopefully it adds some context to the issue and explains what we're looking at when deciding how to address it.
I should also point out that this may not be something that makes it into the initial release of the resolver. Correct behaviour while satisfying the declared dependencies has to be the first priority, as I'm sure you'll understand. So --no-deps
or recompiling with altered dependencies may remain the best answer for the short term.
1 I've made the argument myself that libraries should avoid over-restricting dependencies.
There are some more use cases outlined in python-poetry/poetry#697
The use cases noted in the poetry issues are good to have as examples of where strict dependency resolution can cause issues, but I'm in agreement with @sdispater that ignoring declared dependency data is very dangerous and not usually the right way to handle this issue.
If a project declares certain dependency data then there are three possibilities:
- They are correct, and using a different version of the dependency is going to cause errors.
- They are correct, but only certain types of usage will cause errors. It's not really an installer's job to make this judgement, but users who have reviewed the code in detail and can be sure that they will never hit the cases that cause errors may want to override this decision. This seems to me that it should be a fairly rare situation, and the users involved can be assumed to be expert (as they are willing to trust their analysis over the declared dependencies and the resolver's calculations).
- They are wrong, and you should file a bug against the project asking them to relax the dependency. Obviously, projects may not accept such a bug report, but then we're in the same situation as any other case where a bug gets left unfixed. Users can make their own local fix, or find a workaround.
In pip's case, pip install --no-deps
and manually handling the process of installing the correct dependencies is an available approach for working around such issues. It's awkward, and not for the faint hearted, but IMO we don't want to make it too easy for people to ignore declared dependencies (for the same reason that heavy machinery has safety guards...)
If there is a genuine need for dependency data overrides, that pip has to address, then I would argue that the need is not limited to a single tool, and should be standardised - maybe a "local metadata override file", whose format is standardised and can be implemented by any tool that does dependency resolution (pip, poetry, pipenv, ...). This would mean that users can declare such overrides once, and not be tied to a single tool's implementation. It also means that any edge cases and potential risks can be identified and addressed once, rather than having every project go through the same process.
Adding my use case from #8307 for ignoring a pinned sub-dependency when that dependency is the thing being developed locally.
In Jinja, I started pinning development dependencies with pip-compile from pip-tools. One of the development dependencies is Sphinx to build the docs. Sphinx has a dependency on (the latest release of) Jinja, so pip-compile adds a pin for Jinja. I want to provide one pip command for new contributors to set up their development environment with the the pinned dev dependencies and Jinja in editable mode. I want this command to remain simple, so that new contributors can have an easy time getting started.
$ pip install -r requirements/dev.txt -e .
However, the pinned sub-dependency on Jinja takes precedence over the direct flag to install in editable mode, so Jinja2==2.11.2
is installed instead of Jinja2==3.0.0.dev0
. This causes tests to fail, because they import the old version instead of the development version that new tests are written for.
I have a similar issue with Click. It has a dev dependency on pip-tools, which has a dependency on Click. A few new contributors were confused because pip list
showed that Click was indeed installed, but tests were insisting that new things were not importable and failing.
I see -e .
as a direct command to use the local version rather than any pinned version. I see -e .
written out on the command line as more direct than a pinned sub-dependency pulled from a file. I don't see a legitimate case where the user asks for a local editable install but pip refuses because there's also a pinned dependency for that library. -e .
is a direct request to develop at the local version, regardless of the fact that something might depend on a released version.
If you don't mind, I'm going to leave the question of how you view -e .
for now. I see your point, and it has some merits, but I want to explore your underlying use case a bit further before tackling solutions, so that I'm sure I understand it.
You say you have the latest production version of Jinja pinned in your requirements/dev.txt
. But that says to me "in order to have a correct development environment set up, you must have Jinja2==2.11.2
. That's clearly not the case, as it appears that the in-development version of Jinja works just as well (as otherwise your preferred outcome, that the local copy takes precedence, will cause failures). So why not have Jinja2>=2.11.2
in your requirements file? That surely gives you the expected outcome while still allowing installation of the in-development version?
I wonder if the problem here is that your workflow, or the tools you are using, are resulting in over-strict pinning, which means that having Jinja2>=2.11.2
as a requirement is harder than it needs to be. I can understand that, but I want to confirm if that is the limitation here, or if there's some more fundamental problem that I'm not understanding yet.
Neither pip-tools nor Dependabot (which uses pip-tools) have the capability of doing anything but pinning exact dependencies. Both those projects are fairly common now, it's why I chose them. Plenty of other projects will be using them, I'm just in the more unique case that I develop projects that the projects I depend on depend on.
I'm not really clear what pip-tools could do here, since it's designed to pin exact versions. Jinja isn't a direct dependency of itself, all there is in the template file is Sphinx. Anything pip-tools does also needs to be understood by Dependabot, otherwise we lose automation. If you have any input about that, I opened an issue a while ago before I opened an issue here: jazzband/pip-tools#1150.
Yup, I definitely agree with sdispater on the the principle of the thing. The one thing I'll note is that it's easier for poetry to be principled than pip: currently the solution for all problems on that thread is to fallback to pip. Both hoping for timely upstream releases and using --no-deps
is more painful (whether or not the user is expert), so not supporting an easy workaround should be seen as eating into pip's churn budget. Obviously, you're in a better place than I am to judge whether pip should afford that :-)
In terms of what standardised overrides could look like, the poetry issue also had some ideas. Someone linked to https://classic.yarnpkg.com/en/docs/selective-version-resolutions/#toc-why-would-you-want-to-do-this describing yarn's solution that could be useful to refer to.
Neither pip-tools nor Dependabot (which uses pip-tools) have the capability of doing anything but pinning exact dependencies.
OK, cool. So (without making a comment on the importance of addressing this issue) I think it's fair to characterise this as asking for pip to provide a way to work around the limitations of pip-tools and/or Dependabot.
Thanks for clarifying.
currently the solution for all problems on that thread is to fallback to pip
Yes, but you have to remember, that what you're falling back to is relying on a buggy resolver in pip. Pip has never had a solution for this issue, all it's ever had is bugs that mean that people can get it to do things that aren't correct - and by failing to enforce constraints, pip has encouraged the community to think that ignoring constraints is OK, rather than being more accurate when specifying constraints.
And yes, this is sophistry, and the reality is that people do rely on pip's current behaviour. And we do take that very seriously. But we also have people complaining about the problems that pip's buggy resolver causes, and we have to balance the two priorities. It's hard to credibly say "we've decided that we won't fix known bugs because the buggy behaviour is useful"...
this should be seen as eating into pip's churn budget. Obviously, you're in a better place than I am to judge whether pip should afford that
Oh, boy, are we aware of that π In all seriousness, thanks for acknowledging that this is a difficult trade-off. One of the things we're looking at with the new resolver implementation is trying to bundle these sorts of things together, so there's one well-defined move to a more "correct"ΒΉ behaviour, rather than a trickle of breakages that leave users with an extended but continually changing "getting there" phase. Hopefully that strategy will turn out OK. It won't please everyone, but I doubt anything can do that.
One irony here is that a lot of what we're doing is focused on making the bits that make up pip more reusable and standardised, so that building alternatives to pip is a viable idea. And a lot of the tolerance people have for churn is because there isn't really a good alternative to pip.
In terms of what standardised overrides could look like
Getting back to more practical considerations, thanks for the link to yarn. I don't know if any of the other pip devs have looked at yarn (I suspect someone has) but it's certainly worth seeing how they deal with this sort of thing.
For information, as part of the funded pip improvements work, we also have some dedicated UX specialists looking at how to improve pip's user interface, and this is one of the problems they will be working on (under issue #8452 linked above). So I'm sure they will be following up on this in some detail.
ΒΉ Yes, "correct" is in the eye of the beholder, here, unfortunately.
I think it's fair to characterise this as asking for pip to provide a way to work around the limitations of pip-tools and/or Dependabot.
While one way to demonstrate this issue is with these tools in their current state, the issue is with pip ignoring a command to install in editable mode, instead preferring a dependency resolution that is not useful.
As I said, I was (deliberately, for the sake of understanding the use case) ignoring your view on how -e
should be interpreted.
Let's put it this way. The behaviour you want is available right now if you were able to use >=
requirements. But you can't use that type of requirement, so you have no options with existing tools.
As you suggest, one possible way of getting the behaviour you want without >=
constraints would be to reinterpret -e
as meaning "Install this and ignore all other requirements". However, please understand that this is not "pip's current behaviour". It may look similar, but what pip is actually doing at the moment, is picking one requirement to satisfy and ignoring all others. When the requirement picked is -e
, you get behaviour that is useful to you, but when different types of constraints are involved, this results in broken installs. It's a known, long-standing issue that we have always described as a "bug", not as an implementation choice. We've fixed this bug in the new resolver, so that pip now takes into account all requirements equally. But in doing so, your convenient workaround for your problem, exploiting that bug to your advantage, no longer works.
Please understand, I'm not against the idea that if someone requests an explicit distribution file (whether a local directory, or a sdist or a wheel, whether with -e
or without) then they want that to be installed. That makes perfect sense to me, and it's actually what the new resolver does. What's less obvious is how pip should react when given two contradictory requirements ("I want this file, but I also want a version that's different than what this file gives"). You're saying ignore version specifiers if an explicit file is given. Pip's new resolver says report the problem and ask the user to fix it. This discussion is about maybe giving the user a way to control the choice without needing to fix the sources, but leave the default as "report the problem".
I don't know if any of the other pip devs have looked at yarn (I suspect someone has)
I hadn't. That's basically the same model as I've mentioned in discussions about/for dependency resolution "overrides" in pip.
FWIW, I do think we need to figure out how important it is for users, especially those who've been using pip's buggy behavior as a fallback to solve their dependency resolution woes, to have this override capability. All ideas I've had to provide some mechanism to the users have is non-trivial to implement, and even if the functionality is well understood + implementable, I have no idea how we should be exposing this to the users.
w.r.t. The churn budget, I think that's primarily what we'll learn during the beta period, where we'll ask users to test the new resolver and help us figure out what to do on this topic (and others); all while keeping us from eating into too much of our churn budget, since these are users clearly opting into testing beta functionality.
I do think we'll have to resolve this appropriately before making it the default (as indicated by where we've put this in the tracking board), and the understanding gained from the user testing during the beta, will be pretty important in that. :)
For the purposes of being precise, I believe that an actionable version of @davidism's suggestion would be:
- If an editable requirement is provided, pip should ignore any version specifier requirements for the same project.
Some possible variations on this:
- Rather than editables, extend this to all direct links (
pip install my/local/project
orpip install path/to/project.whl
) - Rather than silently ignoring version constraints, warn and ignore if version constraints that won't be satisfied are encountered.
There may be other possible variations with different trade-offs.
@pfmoore thanks for your patience, your further explanations clarified things for me.
@davidsim Not at all, pip's resolver has been broken for so long that it's really hard to untangle what counts as "broken" and what is "behaviour that people need that accidentally worked". This isn't the only place where we'll need to look very carefully at the transition, and how we handle "urgent feature requests exposed because no-one realised that what they were relying on were bugs".
Getting good involvement from users like yourself is crucial to getting that transition right, so your help is much appreciated.
I just wish all of the people relying on undefined behaviour of pip were doing things as unreasonable as this: https://xkcd.com/1172/ - it'd be much easier to not worry about it π
Python's packaging tools in particular have fallen victim to Hyrum's Law, and this is really just another case of it. It is unlikely that the resolver lands without breaking some non zero number of workflows/installs. The only thing we can really do is try to figure them out as much as possible before hand, figure out which ones we do not plan to support, which ones we want to continue to support in the same way, or which ones we want to provide some new mechanism for supporting.
I suspect we're going to get a lot of noise at first once the resolver lands, but that's pretty much always the case when you go from nonstrict to strict behavior.
Another use case here. pypi does not have any way multiple packages can "provide" the same resource. It is not uncommon on pypi to have the same package packages in a different way under multiple names like:
- psycopg2 and psycopg2-binary
- opencv-python-headless and opencv-python
It should be possible to choose which one use to fulfil a library requirement at the project level.
Another example of yesterday, preparing a build of WinPython.
When I include latest and freshest possible Tensorflow, it asks me to:
- downgrade to Scipy-1.4.1 (7 month old)
- downgrade to numpy<1.19.0 (1 month old, removing a pile of technical debt)
pip check
tensorflow-cpu 2.3.0rc2 has requirement numpy<1.19.0,>=1.16.0, but you have numpy 1.19.1+mkl.
tensorflow-cpu 2.3.0rc2 has requirement scipy==1.4.1, but you have scipy 1.5.2.
with PIP of today:
- I can recompile simple wheels , like Spyder as:
- Spyder doesn't want PyQt-5.15 only PyQt5-5.12, because it's not available on condaforge,
- but I can test it works good enough and more securely enough for my narrow use case
- I can't recompile complex wheels like Tensorflow, but pip let me ignore Tensorflow limitations just warning me,
Dilemna to be created per PIP of tomorrow:
- either I bow to the slowest package development cycle and don't use numpy-1.19.1 / Scipy-1.5.2:
- dragging my feet with technical debt, (not using new features/value created 7 month ago or more)
- slowing upgrades when Python cycle itself is accelerating, and context is moving faster with the pandemic
- or I drop some important packages that I can't relax with my bare hands.
With regards to TensorFlow: for the scipy
dependency, this is definitely over-constrained and will be removed in the next release, see:
- PRs: tensorflow/tensorflow#41865, tensorflow/tensorflow#41866, tensorflow/tensorflow#41867
- Issues: tensorflow/tensorflow#40884, tensorflow/tensorflow#35709, tensorflow/tensorflow#40789
For the numpy
dependency, this version of numpy
apparently has a breaking ABI change that the TensorFlow project is not prepared to migrate to, but should be eventually fixed. I filed tensorflow/tensorflow#41902 to ensure the TensorFlow maintainers are aware, if you are currently using TensorFlow with numpy >= 1.19.0
please leave a π there.
I think ultimately, instead of having pip
be able to relax it's constraints, we should embrace this friction as a forcing function to get projects with less-than-ideal dependency specifications to either fix them, or work towards relaxing them, as it will improve the overall ecosystem.
I hope you're right, and it would go towards a strictness more compatible with "conda".... yet pip is not a distro, so a "--relax" option would help soften the transition on the first year.
It is nice for abstractions to have escape hatches for when they break down. Take for example Django's ORM, which --- at least in the early days --- as a design decision only covered 80% of use cases and encouraged "dropping down" to SQL for the remaining 20%. When people deny that the escape hatches should exist, it is often framed in moralistic terms: anything that does not use the abstraction correctly is wrong and should be fixed. A consenting adults approach which allows escape hatches dispenses with simple moralistic arguments and instead seeks to provide maximum utility without attempting to dictate "best practices" which are supposed to apply against unseen and unknown contexts:
In this case, the escape hatch would:
- Allow papering over problems with the dependencies of individual packages;
- Allow papering over edge cases in the ecosystem such as a shortcoming in the specification of dependencies where no metapackage or supplies -type mechanism exists and so it is impossible to have multiple packages fulfill the same dependency name.
If the escape hatch is not added, it does not mean that the whole packaging ecosystem will magically improve. Instead, users faced with an unresponsive upstream will be forced to make their own ad-hoc escape hatches such adding manual instructions to READMEs, manual installation shell scripts, usage of git submodules ,and passive-aggressive forks which are almost immediately unmaintained.
@frankier Thanks for sharing your thoughts.
As I see it, the "escape hatch" would be sticking with pip 20.2.
Some further thoughts:
The more lenient framework you have in mind makes sense for "victimless crimes" where no one other than the people involved are affected. However, pip's maintainers have to deal with support requests from users who get tangled up in incompatible dependencies and resolution conflicts. Also, the fact that we can't depend on the user's installation being consistent blocks the development of a lot of features which we, and many users, want. Check out the "We need to finish the resolver because so many other improvements are blocked on it" section there for several examples, such as adding an "upgrade-all" command to pip.
If you or others are volunteering to donate a bunch of money so that pip can hire multiple full-time maintainers, or you or others are donating your services to maintain your proposed "escape hatch" and/or respond to the user support queries pertaining to it, then please let us know, as that changes the equation! Currently, the only reason anyone's being paid to work on pip is that we wrote some grant proposals and got some money that will run out at the end of the year.
I'd also like to know of the unresponsive upstreams that have at least, say, 100+ users and that completely ignore those users telling them "the upcoming version of pip simply will not install your package". I think we'll learn more in the next few weeks of the beta to see how many of those there are. If there are scores of such packages then that will influence our decisionmaking -- and, I hope, help people and companies that depend on those packages decide to invest in and rejuvenate them.
Another use case here. pypi does not have any way multiple packages can "provide" the same resource. It is not uncommon on pypi to have the same package packages in a different way under multiple names like:
* psycopg2 and psycopg2-binary * opencv-python-headless and opencv-python
This seems to me like something that the upstreams could work on fixing on their side; what do opencv and psycopg2 say about the upcoming change to pip's dependency resolver?
It should be possible to choose which one use to fulfil a library requirement at the project level.
Could you please file this as a separate issue so we can discuss it separately? Thanks!
I think you already know that I don't have any resources to offer. Incidentally it's long-tail projects including those that have never received any funding which would benefit most from this feature, while projects with 100+ users or backed by Google will surely adapt. You will receive very skewed information if you only ask libraries and upstream-level project since this issue is about giving more control to downstream projects. Upstream projects will either be responsive and not mind, or else not respond. Nevertheless when framed as a matter of priorities it's indisputable.
I have filed the issue about virtual packages here: #8669
We have a tough situation here and I'd love thoughts from @chrahunt @xavfernandez and other pip maintainers.
My current thinking: people who need an escape hatch within pip 20.3 should use --use-deprecated=legacy-resolver
. Per the deprecation timeline they will then have three months (till pip 21.0 comes out in January) to get upstreams to get their houses in order.
With this and #8836 (where the root issue is the other way around: tighting constraints of an already-released package), Iβve been thinking about proposing a confiuration format for pip to consume. But I donβt think it has a chance to be finished before 20.3 even if we agree to do that.
We already have a mechanism to tighten constraints -- using constraint files.
@uranusjr that would be a config file to relax some requirements? Is there an example of such a mechanism in other ecosystems? If that does not exist elsewhere maybe it's a sign that it is "only" something we need to manage the transition period and the legacy that was allowed by the old resolver?
In that case might it be simpler to just keep the old resolver around as --use-deprecated=legacy-resolver
for a longer period, to avoid introducing a new configuration file concept that we'd need to maintain forever?
Composer (PHP) has a replace
section that allows the user to override a package in the dependency graph. You can replace any package with anything (even with βnothingβ i.e. remove it), thus relaxing dependency specifications.
I am not keen to go that far, however, since I still feel itβs not best to make such a powerful tool easily accessible. A better approach IMO would be to simplify the artifact-vendoring process for users. It is actually pretty easy (conceptually) to modify a packageβs dependenciesβyou just crack open the wheel, and modify its METADATA
and RECORD
. So the procedure I have in mind is something like
- Create a tool (either part of pip or standalone) that takes a configuration file that specify distributions to patch, and files in each distribution to patch.
- The tool would download distributions into a directory (say
./vendor/
) and apply the patches, re-calculatingRECORD
accordingly. - The tool would generate a file for
pip install
to consume (a la βURL constraintsβ #8253 or a custom index/find-links page). - Now the user can do
pip install -c vendor/constraints.txt ...
orpip install --find-links vendor/index.html
to use the patched distribution files.
This would make the necessary overriding process easy to use for people who know what theyβre doing. But keep whatβs done transparent and obvious (the user can see theyβre now installing local, patched packages) to signal them the implied responsibility.
I like this idea - it makes the process reasonably straightforward while making it obvious that you're changing packages. Also, by making it a standalone tool we can iterate on the UI at a much faster pace than if it were part of pip, and there's more options for competing tools with different approaches to appear.
I'd love to see an ecosystem of tools like this that work with pip, removing the endless pressure to cram everything into pip itself.
I'd like to just add some thoughts on this.
One of the things I love about pip
is that if I ask it to pip install foo==x bar==y
it will just do it.
Even if there is a version conflict somewhere in dependencies.
It is good to be warn about the conflict but in some cases it is very, very useful to just force the installation.
Some kind of --force
flag that would mean "I know what I am doing, install anyway" will be highly appreciated.
Agree that probably having a strict behaviour by default is a good idea, though.
We have a tough situation here and I'd love thoughts from @chrahunt @xavfernandez and other pip maintainers.
I personally think that pip should provide an escape hatch.
A possible solution could be a --force
option that only applies for user provided requirements in the form of:
dep==version
matching a single version,dep===version
direct_url
/dep @ direct_url
-e direct_url
Let's call those, "forced" packages.
During the resolution process:
- if a "not-forced" package has a dependency on a "forced" package, it must be honored (and a
ResolutionError
is expected if no solution is found) - if a "forced" package has a dependency on an other "forced" package, it can be safely ignored (but a Warning is still expected if the dependency isn't honored)
In the infamous tea
/coffee
example where:
tea 1.0.0 depends on water <1.12.
coffee 1.0.0 depends on water>=1.12
this would give:
pip install tea coffee
would error
pip install tea==1.0 coffee==1.0
would error (no water can honor both "<1.12" & ">=1.12")
pip install tea==1.0 coffee==1.0 water==1.13 --force
would succeed with a warning
This seems both explainable and (possibly, from someone that did not work on the new resolver) might not be that hard to implement. This would also mean that any old frozen requirements.txt
with broken dependencies would still be installable with the new resolver, simply by adding a --force
option.
I realize we're still talking about this hypothetically, but I want to try to steer the conversation away from the flag name --force
if possible, as it seems much too vague. Something like --force-dependency-conflicts
or --allow-dependency-conflicts
would be a significant improvement IMO.
What would be the result of the following?
pip install tea coffee --force-dependency-conflicts
EDIT: (assuming there is no combination of tea and coffee version that satisfies the constraints)
I agree that --force
is a little vague. maybe --ignore-user-conflicts
?
What would be the result of the following?
pip install tea coffee --force-dependency-conflicts
It would error saying that it found no version of water matching both coffee and tea requirements.
I would also suggest a warning along the lines of You specified option --force-dependency-conflicts but did not provide any pinned requirement.
--override install-thing-and-skip-resolution==1.0
If we do this, I'm more inclined towards something like @dstufft's suggestion, where people can say "install precisely this thing, I'll accept the consequences". It feels more precise, and easier to explain. A broader flag that says "force things to install" seems like we'd end up having to guess what constraint it's OK to violate, and whatever we choose, someone will object.
Making the user specify explicitly is much clearer, at the cost that it makes the user choose. But I'm OK with that - anyone wanting to override the resolver should know what they are doing.
In terms of implementation, I think --override NAME SPECIFIER
could be implemented by making the code that generates dependency information for the resolver replace every requirement for project NAME
with NAME SPECIFIER
. The devil is in the details, but I quite like the idea of a mechanism that's both easy to describe and easy to implement π
The problem I have with flags like this is what pip should do afterwards. For example, say I did
pip install some-package --override some-dependency==1.0
to install some-package
and its dependencies. What should pip do when I later try this?
pip install some-package
Should it βoverride the overrideβ (or report confliction), or should pip βrememberβ the previous override and show βrequirements already satisfiedβ? pip does not currently have a storage to hold the override information, so weβll need to design some new things if we go that route. OTOH the command line flag approach is likely not scalable if we require the user to re-supply the override every time they re-install packages, and weβll need to some kind of requirements.txt equivalent for the feature to be practically useful.
maybe in the requirement file, you could add an "override" keyword ?:
requirement.txt:
package1;==1.2.3;override
#whatever_other_packages_want_for_package1, version package1-1.2.3 will be installed and used for other dependency resolution
if a new pip install :
- doesn't include that requirement line "override", the overriding is lifted,
- doesn't let pip guess what to do to get back to a consistant install, pip stops. that suggests that when removing an "override" the new pip move is to explicitely install a new version of the no more problematic package.
you may also use the valrus operator to specify the ovveride ? (only strict overrides are "managed")
package1;:=1.2.3;
there can be also the "do not install" overriding or "do not touch":
package1;:=1.2.3;
# package1-1.2.3 is the version force decided to be installed per the resolver (if it can resolves the rest)
package2;:=donotinstall;
# package2 is totally ignored per pip install resolver (as if it doesn't exist)
package3;:=donottouch;
#package3 currently installed package is supposed the overriding version decided per pip install resolver
that may allow:
- no additional flag or tricky resolver hack,
- limit the hack to strict versions of packages, as the "knowing-what-she_or_he-wants" person is supposed to use this for a limited time/case
The problem I have with flags like this is what pip should do afterwards.
Fail. You chose to install an inconsistent set of packages, pip will error out if it has to deal with an inconsistent environment.
I know that's harsh, but to me, that's fundamentally what "I know what I'm doing" implies. You have deliberately made your environment inconsistent, and so pip will no longer be able to correctly resolve dependencies. We do nothing special to support this situation, behaviour would be exactly the same as if you'd built an environment with the old resolver that failed pip check
and were now trying to install into it. If that's not the "desired behaviour" then I think people who say "I want to be able to override dependencies like I can with the old resolver" need to explain what they mean more clearly π€·
I'd assume that either people want this for "throwaway" virtualenvs, which they rebuild from scratch each time, or they understand that they'll need to manually manage the mess they've made from now on.
maybe in the requirement file, you could add an "override" keyword ?
See above. I'm very, very strongly against having any sort of way to record an "override" option persistently (beyond pip's standard config file and environment variable mechanisms for specifying command line options). Once we start doing this, we have opened up the whole question of designing a "language" that lets the user describe a system configuration that violates the dependencies declared by packages, and that's a huge and complex problem that shouldn't be a pip implementation detailΒΉ.
If you want absolute control over the state of your system, you can have a requirements file that says:
foo=1.0
bar=2.0
baz=1.0
# Include *all* requirements, including dependencies.
You build that environment using
virtualenv .venv
.venv\Scripts\activate
pip install --no-deps -r requirements.txt
To change the environment, modify requirements.txt
, delete the venv and rebuild it.
There are (or could be) tools that let you generate such a requirements file. I believe pip-compile
does something like this, although whether it has any way to allow you to ignore dependencies, I don't know.
(Maybe I could be persuaded that allowing --no-deps
to be specified in a requirements file is a reasonable feature request).
ΒΉ If there's a need for such a language, I'd argue that it should be discussed and agreed as a new packaging PEP, defining a standard format that all tools (pip, pipenv, poetry, etc) could use to define an "environment that violates package dependency metadata". This is significant enough that I don't want it to be an implementation-defined feature of pip.
A broader flag that says "force things to install" seems like we'd end up having to guess what constraint it's OK to violate, and whatever we choose, someone will object.
My list of
- dep==version matching a single version,
- dep===version,
- direct_url/dep @ direct_url,
- -e direct_url
was quite natural to build (at least for me).
Making the user specify explicitly is much clearer, at the cost that it makes the user choose.
But in the case of a pip install A==X.Y
or pip install -r frozen_requirements.txt
the user has already chosen and is asking to install exactly those versions. Mandating that the user relists the requested versions in an other option seems a little bit overkill.
In terms of implementation, I think --override NAME SPECIFIER could be implemented by making the code that generates dependency information for the resolver replace every requirement for project NAME with NAME SPECIFIER.
This would needlessly break some environments.
If A-1.0 requires B-1.0 and A-2.0 requires B-2.0, pip install A --override B 1.0
would likely install A==2.0
(since it is the latest) with your solution while A==1.0 B==1.0
would likely be preferable.
Hence my slightly more complicated solution of only ignoring dependencies information between 2 "forced" packages.
You build that environment using
virtualenv .venv .venv\Scripts\activate pip install --no-deps -r requirements.txt
Oh, if --no-deps
allows to specify conflicting dependencies (and my tests seem to confirm that), that's also a good escape hatch π
My current thinking: people who need an escape hatch within pip 20.3 should use
--use-deprecated=legacy-resolver
. Per the deprecation timeline they will then have three months (till pip 21.0 comes out in January) to get upstreams to get their houses in order.
I discussed this in a meeting today with @pradyunsg . While using pip 20.3, users who want more flexible, legacy-style behavior can use the --use-deprecated=legacy-resolver
flag. During the October-to-December timeframe, pip developers can make further policy decisions on the possibility of a "install precisely this thing, I'll accept the consequences" feature, and, if they decide to make one, design, implement, and test it.
Quick update on this:
For the
numpy
dependency, this version ofnumpy
apparently has a breaking ABI change that the TensorFlow project is not prepared to migrate to, but should be eventually fixed. I filed tensorflow/tensorflow#41902 to ensure the TensorFlow maintainers are aware, if you are currently using TensorFlow withnumpy >= 1.19.0
please leave a π there.
This was resolved in tensorflow/tensorflow@aafe25d and will be available in the next release.
Speaking as Python maintainer of Nixpkgs here. Now having a proper dependency resolver with pip is great. However, as mentioned in this issue, there are always incorrect/unnecessary pinning in packages.
Being able to enforce a certain version is a requirement for large integrators if they wish to automate the update process of their package set. For in Nixpkgs such feature would save quite some time, and I imagine in order distros as well (or even more considering some of their processes).
In Nixpkgs we have several thousand Python packages, which we currently blindly upgrade to the latest, and then spend many hours adjusting to get it working, for the far majority of packages. This is unfortunately a very manual work. As an example, see NixOS/nixpkgs#105368. After that initial work there will still be many regressions in leaf packages that require fixing up.
Some really good suggestions have been made in this thread. Some remarks:
--no-deps
is good for the installing phase, but not really for the resolving phase. We need to get a listing of the resolved versions, taking into account the overrides. We don't actually want to install. We need to be able to work with the resolver.- there is the constraints file, however, it is to further limit without the possibility to override.
Create a tool (either part of pip or standalone) that takes a configuration file that specify distributions to patch, and files in each distribution to patch.
If we separate the resolving and the installing, then the resolving could keep the version info in memory, without patching. After resolution, it could install using --no-deps
which is how distros need to install with pip anyway.
It was mentioned UX investigations are done, which is good. Given the world is moving towards a more declarative approach, I would argue it could be fine to have the override possibility only in file format, and not using cli options. Essentially, we end up with our abstract requirements including overrides in one file, and specific requirements in the form of a lock file in another. Which gets us back to the discussion on the lock file.
I'm very worried about the upcoming removal of legacy resolver support.
So far, I had a workflow for heavily dependent projects : using a higher level tool like Poetry to resolve and pin dependencies when possible, and go back to lower level pip to install problematic dependencies (or those requiring special options/compilations).
Conflicts were detected, but only reported as warnings, and it was fine ; project testing ensured that compatibility was there, anyway.
But if pip also refuses to relax dependencies, what tool can we resort to?
Sometimes pypi dependencies just can't be taken into account anymore.
For example, I use my django-compat-patcher to automatically restore backwards compatibility on Django framework core.
Then, all Django dependencies that heavily contrain their Django versions don't need it anymore, since they'll automatically work with newer versions (and if they don't, it's my responsibility).
How can I tell pip that I want THIS latest Django version, whatever dozens of other dependencies might assert in their own setup.py? Without abandoning all dependencies at once with "--no-deps", nor requesting other maintainers to relax their constraints just for my own use case ?
I guess having strict resolution algorithm by default is fine, but projects maintainers should be able to make it issue warnings instead of errors, or to pinpoints some dependencies as proposed above.
I proposed a solution in #8076 (comment) that does not require any involvement in pipβs development at all. I wrote down more about the idea here but never got the time to implement it. If this affects deeply to you, please lend a hand to making the tool happen.
@uranusjr An intermediate tool is indeed a solution, but would the approach of GERB also work for Pypi packages that are not wheels but source archives?
On another subject, I've read lost of times that overriding dependencies shouldn't be an option left available to common users, but - like - when did we stop being "all consenting adults here"?
would the approach of GERB also work for Pypi packages that are not wheels but source archives
If you have the source, you can manually change the dependency metadata. So why would you need a tool? With a wheel you may not be able to build yourself, which is why a tool to modify the built wheel is useful. There's no such problem with sources, as they are useless anyway unless you can build.
I've read lost of times that overriding dependencies shouldn't be an option left available to common users, but - like - when did we stop being "all consenting adults here"?
"Consenting adults" applies here, certainly. If you wat to modify the metadata of existing wheels, you can do so. But the "consenting adults" principle doesn't imply that we make it easy for people to do things that are generally not needed or advisable. It just says that we don't go out of our way to prohibit people who want to (and are aware of and willing to accept the consequences) from doing so.
Please also see my July 30, 2020 comment in this issue, particularly:
The more lenient framework you have in mind makes sense for "victimless crimes" where no one other than the people involved are affected. However, pip's maintainers have to deal with support requests from users who get tangled up in incompatible dependencies and resolution conflicts. Also, the fact that we can't depend on the user's installation being consistent blocks the development of a lot of features which we, and many users, want. Check out the "We need to finish the resolver because so many other improvements are blocked on it" section there for several examples, such as adding an "upgrade-all" command to pip.
If you or others are volunteering to donate a bunch of money so that pip can hire multiple full-time maintainers, or you or others are donating your services to maintain your proposed "escape hatch" and/or respond to the user support queries pertaining to it, then please let us know, as that changes the equation! Currently, the only reason anyone's being paid to work on pip is that we wrote some grant proposals and got some money that will run out at the end of the year.
That has now happened; as far as I know, no one is currently paid to maintain pip. It's all volunteers.
If you have the source, you can manually change the dependency metadata. So why would you need a tool?
From what I've understood, if a bunch of dependencies have incorrect (for my use case) requirements, I have to fork them all, and pin these new commits in my lock file, and then maintain these forks. Or I have to give up the pip resolver entirely, and manually pin the dozens, the hundreds of (sub)dependencies in a lock file to be used with "--no-deps". That's the kind of process which would be worth a tool or pip option too, imo ^^
Also, the fact that we can't depend on the user's installation being consistent blocks the development of a lot of features which we, and many users, want
Actually I don't understand how these new features (like upgrade-all) will deal with pinned git-installs of forks - which are for now the official workaround to quickly solve deadlock situations. Don't these git forks falsely overlap with the package names and package versions of Pypi?
"Consenting adults" applies here, certainly. If you wat to modify the metadata of existing wheels, you can do so. But the "consenting adults" principle doesn't imply that we make it easy for people to do things that are generally not needed or advisable. It just says that we don't go out of our way to prohibit people who want to (and are aware of and willing to accept the consequences) from doing so.
If "consenting adults" just means that one is not prohibited (?) from forking or hacking modules to achieve one's goals, then every open-source ecosystem on Earth follows this philosophy ;)
I thought more of a "end users keep control of all vital levers, even risky ones" meaning, but maybe I'm overinterpreting the quote.
That's the "generally not needed or advisable" that I'm worried about; most projects I've crossed recently were blinking with pip warnings about conflicts, although everything worked perfectly fine. And lots of legitimate use cases have already been cited (https://classic.yarnpkg.com/en/docs/selective-version-resolutions/#toc-why-would-you-want-to-do-this), cases that are probably why Yarn, Composers, Gopkg and others allow dependency overrides. For my own usecases, I just see no solutions for now, except remaining on an old pip version.
I'd love to rely of upper-level tools to handle the problem for me, but these projects, too, seem unwilling to implement options so that users can solve conflicts directly ("Ability to override/ignore sub-dependencies" python-poetry/poetry#697).
That has now happened; as far as I know, no one is currently paid to maintain pip. It's all volunteers.
If the lack of "dependency override" option is motivated by a lack of workforce/funding, or by the objective of diminishing the support pressure on Pip by redirecting users to misc lib authors, everyone can understand it. I thought our donations to PSF also partly went to the packaging effort? (even though, considered the current context, money mustn't flow around)
For now I'm still trying to understand the pros/cons arguments about the different kinds of dependency overrides. Numerous users are complaining about appearing dependency hells, yet adding options is in several comments deemed unnecessary or even-more-harmful, so I'm confused.
Is there some kind of PEP, of rationale, somewhere, detailing the choices done regarding the new resolver (lots of algos are possible it seems), the dismissed alternatives for solving conflicts, etc?
I only found announcements on the web, no in-depth analysis of the subject.
Some common guidance on how to properly constrain requirements, for lib authors, would certainly be precious.
From what I've understood, if a bunch of dependencies have incorrect (for my use case) requirements, I have to fork them all, and pin these new commits in my lock file, and then maintain these forks. Or I have to give up the pip resolver entirely, and manually pin the dozens, the hundreds of (sub)dependencies in a lock file to be used with "--no-deps". That's the kind of process which would be worth a tool or pip option too, imo ^^
I would say itβs better off if you do this notherless. Maintainers of these packages with βincorrect dependenciesβ are not stupid. It is my experience project maintainers tend to know their projects better than users. If you canβt convince them to change the dependencies to fit your need, itβs more likely either youβre wrong, or the project has different goals than how your usages of it. Either way, the project is not really for you, so youβd better fork or start something different.
Imho it's not just a matter of βincorrect dependenciesβ. If it was so, the problems would probably solve itself after a bit of hard times. But I think dependency constraints, especially in a dynamic language, can't be taken as absolute assertions:
- libraries are supposed to restrict dependencies to the set needed to work FULLY. But by deciding not to use some part of a library/framework (ex. some database backend, or worker runtime), a careful user can easily extend its compatibility range to much wider values ; the "extras" system was meant for that, but authors are not necessarily willing to put on such additional work and complexity
- it's not clear if libs use constraints to express "this is the range of officially supported dependencies, we don't provide support outside this", or "this is just a known good working set, your mileage may vary with other versions", or "the lib won't work with versions below or above this range, we checked it"; these semantics would alas require different metadata than just "install_requires"
- like I noted above, adding compatibility layers can completely change the constraints actually needed by dependencies - even if these constraints are initially perfectly tuned by lib authors
What generally worsens the problem is probably the lack of "guidance" of lib authors, that I mentioned above. There is now a good bunch of documents and tutorial about setting up python environments (with pip, poetry, pipenv, pip-tools...), but I don't remember finding official recommandations about how to properly constrain dependencies for a library. Setuptools docs don't expand on the subject either (https://setuptools.readthedocs.io/en/latest/userguide/dependency_management.html).
Theoretically, a compatibility matrix should be tested with a tox/nox-like tool, which would check that the test suite of the lib passes with all authorized versions of dependencies, and only with these versions. In a perfect world, even the combinatory explosion of all dependencies would be checked (since in rare cases, sibling dependencies can interfere).
But as far as I know, no such thing is used in the Python ecosystem (I don't know about others) ; dependency compatibility seem just out of scope of continuous integration, probably due to the weight of such process (and this is not a criticism - just an observation). When Tox is used, it's to check Python versions, and sometimes versions of the main framework (Django, Flask...), never more.
So library authors typically improvise: they seem to originally use current lib versions as "minimum requirements", forbid the next major version (or leave the maximum version undefined, which can be a problem too), and then wait for user complaints to tweaks requirements.
Most of these remarks are hopefully corner cases of dependency management; and proper semantic versioning (or more precisely, backwards compatibility) seem to prevent a good part of troubles; but they push me to the opinion that project developers would be better with easy (even if dangerous) tools to quickly solve temporary dependency hells themselves. [Note: finding funding for such features is another matter, I agree]
On a side note : has the idea been discussed (and what was the outcome?) of letting SOME entries only of pip requirements be marked as "--no-deps"?
That's the "all or nothing" approach of the dependency resolver which sounds alarming, but if this resolver can still apply to most dependencies, it's a big win for project maintainers. (And it'll avoid custom install hacks, like having both a requirements.txt and an extra_requirement_extras_nodeps.txt in project root)
has the idea been discussed (and what was the outcome?) of letting SOME entries only of pip requirements be marked as "--no-deps"?
Yes.
Please do open the comments that GitHub has hidden due to this being a long discussion, if you haven't already. I'll particularly flag #8076 (comment).
has the idea been discussed (and what was the outcome?) of letting SOME entries only of pip requirements be marked as "--no-deps"?
FWIW I've personally not completely given up on #8076 (comment) (if the implementation isn't too complex/patchy and with a more explicit option name to be bikesheded) but am also unlikely to have much time to work on it in the coming months so don't hold your breath :)
Thanks for the pointers
See above. I'm very, very strongly against having any sort of way to record an "override" option persistently (beyond pip's standard config file and environment variable mechanisms for specifying command line options). Once we start doing this, we have opened up the whole question of designing a "language" that lets the user describe a system configuration that violates the dependencies declared by packages, and that's a huge and complex problem that shouldn't be a pip implementation detailΒΉ.
I'm not a packaging veteran by any means, but I don't understand how forcing some dependencies at project level will create a huge and complex problem.
To me, it just excludes them from all considerations about dependency resolution, as if they had become top-level dependencies, which no other dependency of the project relied on (though Pip could still warn about them).
I don't realize which other work-in-progress on Pip would be blocked by these workarounds - I'd love some examples if anyone has some ^^'
Note that I'm talking about project-level dependencies, not library-level - because then solving conflicts between "forced dependencies" declared by libraries themselves, would indeed become another piece of cake.
Licensing issues have been cited as another reason for dependency constraints - I must say I hadn't thought about that - but it adds up to the fact that dependency constraints signify waaay too many things at the same time, and we can't know what they mean (legal issues ? real incompatibility ? disclaimer ? good-working-set ?) without digging changelogs.
@xavfernandez Indeed such a "forced package" semantic would probably do the trick too, and remain close to what other package managers provide as escape hatches. :)
I don't understand how forcing some dependencies at project level will create a huge and complex problem.
I guess the only good answer for you is likely to be, give it a try and see what you think. The pip developers have all struggled to come up with a workable solution to this question (or in @xavfernandez' case, with the time to implement the solution he has hopes for), but it's possible we're too close to the problem, so if you have ideas, it's possible you'll find an approach we haven't thought of.
it just excludes them from all considerations about dependency resolution, as if they had become top-level dependencies, which no other dependency of the project relied on
I'm not 100% sure I know what you mean by "top-level dependencies", but I assume you mean "requirements specified by the user, as opposed to requirements introduced from dependency metadata of other packages". If so, then they are no more "excluded from dependency resolution" than any other dependency. The resolver considers and satisfies all requirements - that's its job. Ignoring constraints is a change to the resolver, and it doesn't matter which constraints you're ignoring, because there's no such behaviour anywhere at the moment.
Note that I'm talking about project-level dependencies, not library-level
Can you explain, in terms of the abstractions used in pip's code, what the difference is between "project-level" and "library-level" dependencies? We don't have that distinction in the code, and I have no idea what you mean by it.
dependency constraints signify waaay too many things at the same time
They really don't. If package A
depends on B>1.0
then that signifies precisely one thing, that you can't install a version of B
that is older than 1.0 alongside A and have a working system. People try to use them to imply other things ("A is only supported with B>1.0", or "A hasn't been tested with B<=1.0 so I have no idea if it works or not", etc), which means that sometimes the user wants to ignore the stated dependencies. But that's a different matter, and means that there's an additional problem of trying to work out a good UI that lets users (who might not even know where a particular constraint came from) express what they want to do, without ignoring more than they mean to. Pip can't help with this, because pip has no way of knowing why any given dependency might be unreliable.
These are the sorts of complexities that any implementation would have to consider, so hopefully this gives you a better idea of why it's not as straightforward as you hoped...
I'm not a packaging veteran by any means, but I don't understand how forcing some dependencies at project level will create a huge and complex problem.
pip does not have a notion of βprojectsβ; you can use it to manage dependencies in a project, but it is only one usage of pip, and cannot be universally applied to pip since itβd affect every kind of pip usages.
This is exactly why my preference is to handle this outside of pip. Overriding dependency information only makes sense at either the user-project or per-requirement level. pip cannot do the former, and the latter (both Xavierβs --force
and my gerb proposals) looks very clumbersome at the project level because they need to be repeated every time you do something related to the package.
I guess the only good answer for you is likely to be, give it a try and see what you think.
I guess that if experienced pip developers have encountered big stumbling blocks, I'll only reinvent the wheel or fall into the same pitfalls ^^'
I might even come up with a working PR, only to hear "sorry, your new option would make important upcoming feature XYZ undoable".
that's why I'm trying to deal with these two things very separately:
-
Is a feature, theoretically, a good idea?
-
How hard is this feature to implement and maintain, practically?
I know near nothing about the internals of pip, so for now I wanted to focus on (1) - i.e. gather data about whether implementing "override" options like other package managers (yarn, Composers, Gopkg and others) would be beneficial, or would endanger other aspects of the packaging ecosystem. But so far I'm unable to decipher what concerns 1) and what concerns 2) in our discussions. There are strong warnings about some approaches, but I can't yet grasp the rationales behind them.
Summarizing the (1) pros and cons, ad abstracto, of each approach (via command line arguments, via requirements.txt file, via constraints.txt file, via another separate "enforced_packages.txt" file, via separate programs...) would pave the way out of dead-ends and pitfalls. Adding (2) implementation notes, about the current data model, and how much refactoring is needed for this and that, would then help see what the most promising approach is.
Clearly, I'm not thinking that any of all this is easy. I'm just trying to evaluate what was tried, what is a good idea, and what is a bad idea, to have a better view of the situation, and how much work is needed. (This feature would probably make a nice Gsoc-style project, but the internships I sponsor this year are already fully booked on other Foss projects, and so am I)
I'm not 100% sure I know what you mean by "top-level dependencies", but I assume you mean "requirements specified by the user, as opposed to requirements introduced from dependency metadata of other packages". If so, then they are no more "excluded from dependency resolution" than any other dependency.
OK, I assumed there were notions of direct/indirect dependencies as evocated in some pip docs, on which the resolver could be tweaked, but then it must be obsolete content.
They really don't. If package A depends on B>1.0 then that signifies precisely one thing, that you can't install a version of B that is older than 1.0 alongside A and have a working system.
That's one semantic for it, alas, in the absence of clear guidelines, I'm not even sure that it's the most common interpretation.
Take for example the problem of "maximum version". If package AA currently works with dependency BB>=1.0, what must this package AA declare as constraint on BB?
- If AA declare BB<2.0, and BB2.0 ends up being compatible with AA too, the constraint on BB is uselessly tight, and AA will need a release just to fix that (and since not many libraries maintain multiple major versions simultaneously, new incompatibilities might occur).
- And if AA remains lax, and only specifies BB>=1.0, then the release of a backwards-incompatible B2.0 might break AA in case of mass-upgrade (though it depends on the precise resolution algo, about which I know nothing).
I guess that in such cases, lib authors do it just how they feel like, for now.
pip does not have a notion of βprojectsβ; you can use it to manage dependencies in a project, but it is only one usage of pip, and cannot be universally applied to pip since itβd affect every kind of pip usages.
I'm not sure what other "pip usages" we're referring to here. In my daily work, I only know 2 cases : either I'm gradually tinkering with new packages in a "global" python environment (until it's completely incoherent); or I'm deploying projects, each in their own virtualenv, each with a well defined set of requirements (all pinned, if possible).
The announcements of the new resolver have made it clear that the first case would not always work. In the same way, I think it'd be understandable if some "dependency overrides" options were only functional in some use cases of pip (that's what other languages do with their package manager, IIRC).
I have run into a particularly egregious example of where the lack of ability to override dependency conflicts becomes a major blocker in your workflow.
- python-dev-tools specifies a version for Sphinx: Sphinx<3.0.0,>=2.4.0
- Another internal package specifies a Sphinx version: Sphinx>=3.0.0
- Sphinx is a documentation tool, it has nothing to do with the daily use of these packages
- Yet, I am unable to install them in the same environment with a modern version of pip
If I was to specify --no-deps and manually specify all requirements myself, I have taken on the responsibility of managing the requirements for my package and all dependent packages, which could be 100+ packages.
Sphinx is a documentation tool, it has nothing to do with the daily use of these packages
Do you use either (or both) python-dev-tools
or sphinx>=3.0.0
to only build documentation? If thatβs the case, you likely should have different virtual environments, one for building documentation, one for βdaily useβ, etc. This is why virtual environments exist in the first place (different environments for different things) and what task runner tools like tox and nox are built for.
This is why virtual environments exist in the first place (different environments for different things) and what task runner tools like tox and nox are built for.
It is an incorrect solution to the problem. Venv is a useful tool for testing, but not for solving this problem with version incompatibility - someone eventually will need both newest versions of the incompatible packages simultaneously and venvs won't help him. Also venvs have large disk overhead (at least when I have tried (in 2016, if I remember right) them on an old PC with HDD, it took long to create it and HDD have made a lot of noise in process). The correct one is to make latest versions of everything compatible to each other and just use them. If something is incompatible to the latest version of something, it is a bug, and bugs must be fixed.
The much less correct one is to allow simultaneous installation of multiple versions of packages, and take measures for deduplication, both on-disk and in-memory.
But maintainers of the software may be either unwanting to accept patches or just too busy to find time do that. So, maintaining an own fork just to fix a bug in metadata because someone just added <
condition is just ridiculous.
The problem, as you see, is more social than technical. It cannot be solved by technical means, but it can be sometimes worked around by them, and the simple case of using <
clause is just a such a case. Of course, if a maintainer is really stubborn person who wnts to prevent users of his package from upgrade at all costs, he can add runtime checks into his packages, but we don't assumme the most of maintainers are such.
Sphinx is a documentation tool, it has nothing to do with the daily use of these packages
Do you use either (or both)
python-dev-tools
orsphinx>=3.0.0
to only build documentation? If thatβs the case, you likely should have different virtual environments, one for building documentation, one for βdaily useβ, etc. This is why virtual environments exist in the first place (different environments for different things) and what task runner tools like tox and nox are built for.
Sphinx is only used to build documentation, however, the packages python-dev-tools and the internal package (both maintained by people other than myself), specify different versions of the dependency they use to build documentation. So they cannot be installed in the same environment with the latest pip, regardless of whether it is a virtual environment or not, and regardless of whether or not I am building the documentation for the packages. Their requirements conflict, but only on the (in this case irrelevant) dependency required for building documentation.
Related topic for pipenv. I like what is said in pypa/pipenv#4530 (comment). This is essentially also why we cannot really use external lock files in Nixpkgs; if the tooling that generated them can't override a version, it wouldn't allow us to deal with security updates.
If you have a fully locked environment, run pip with βno-deps.
Just re-upping this issue. It's been years.
Another use case you might be interested in. The latest version of apache beam requires numpy version < 1.22, however, I have packages requiring features of numpy 1.22.1 that also require apache beam. For my use case, apache beam works just fine with numpy 1.22.1, however the requirements of this library explicitly specify numpy<1.22.
When apache beam deploys to something like Dataflow, you don't have control over the pip operation that executes on each worker, so hacking this with custom pip commands and options is not an option. The numpy version conflict causes the pipeline to fail when I know it would work fine if pip was to install the packages anyway.
I don't have many good options here with pip failing on a version conflict. I can contact the apache beam developers, however, then I will have to wait for another release IF they decide to relax the numpy requirement, though this issue is blocking work today.
I think the core issue here is that the latest version of pip completely fails all package installation when one conflict is unresolved, rather than creating a best-attempt environment. The only solution I can fathom in this scenario is to change the docker image used in Dataflow to use an older version of pip.
Furthermore, the version backtracking in later versions of pip increase the Dataflow pipeline setup time. Which amounts to real $$$ spent.
Another use case you might be interested in. The latest version of apache beam requires numpy version < 1.22, however, I have packages requiring features of numpy 1.22.1 that also require apache beam. For my use case, apache beam works just fine with numpy 1.22.1, however the requirements of this library explicitly specify numpy<1.22.
If the Apache Beam maintainers are explicitly saying that it only works with numpy<1.22
, but Apache Beam users disagree, that sounds like a conversation that needs to happen between those maintainers and the users. I don't think pip
is at fault here: it's doing exactly what the Apache Beam maintainers have asked it to do.
When apache beam deploys to something like Dataflow, you don't have control over the pip operation that executes on each worker, so hacking this with custom pip commands and options is not an option. The numpy version conflict causes the pipeline to fail when I know it would work fine if pip was to install the packages anyway.
This sounds like Dataflow needs a feature to skip resolution entirely and install a fully-specified set of requirements. There are lots of other reasons this would be desirable as well: installation would be faster, it would allow for hash checking, etc.
I think the core issue here is that the latest version of pip completely fails all package installation when one conflict is unresolved, rather than creating a best-attempt environment.
Everything is a tradeoff. The fact that pip
's resolver used to create a best-attempt environment was an incredible source of issues and confusion for many years (see #988). Ultimately the community decided that it was better to have a resolver that correctly resolves dependency specifications than it was to have a resolver that makes a best attempt -- the sheer amount of time, resources and energy that went into this change should be an indicator of which is generally more preferable.
That doesn't mean it's perfect, and it doesn't mean that any issues experienced as a result (like yours) aren't valid, but it does mean that overall pip
's current behavior is better for the Python ecosystem, that pip
users have less overall issues, and that we have more options for resolving issues that do arise.
I don't think time and effort going into a change is a completely reliable indicator of how useful or effective or preferable that change is, though I do appreciate that many people worked hard on this. I think the issues and confusion over the past few years on best-attempt environments were likely raised by a vocal minority of pip uses (there is no good survey around this, so I'm guessing, but there are a LOT of pip users!), while for many many other pip users, pip was working just fine, hence no need to raise issues or have a discussion, and so they weren't part of this discussion earlier on.
I agree that there should be a discussion around Apache Beam loosening their requirements. But pip's current draconian approach means that work that uses the latest pip version is completely blocked until this discussion is resolved, and these discussions can take a long time, particularly when they are between organizations. I imagine overall that this would slow down and push back release dates for products using the latest version of pip worldwide. Of course there is always the question of tech debt, but it should be on the user to diligently follow up and resolve version issues on the timeline that they can be resolved, rather than on the tool to enforce them, and the user to wait, blocked, until they are resolved.
Accepting the explicit version overwrites for conflicts would solve the issue. It would be clear indication of "I know there is a conflict of dependencies, but I tested this combination for my application and it works fine, so pip, please do as I say."
There is a difference between "correct" way of solving it in the upstream and "we need to meet deadlines" kind of things.
Accepting the explicit version overwrites for conflicts would solve the issue. It would be clear indication of "I know there is a conflict of dependencies, but I tested this combination for my application and it works fine, so pip, please do as I say."
That's exactly what --no-deps
is for, which was already mentioned in #8076 (comment).
The issue there is that many cloud systems run pip for you, so you don't have interactive access to pip commands. I think that's reasonable that the user specifies dependencies, and the cloud deployment system handles installing them. It's nice to have the requirements and the installation of those requirements separated.
There are cases where a specific package may require a custom version override of a dependency. In this case it makes more sense to override the this via the specifications in the requirements, rather than changing the commands / system used to install those requirements.
I suspect that is what @RafalSkolasinski may be referring to above. Accepting the explicit version specification in the requirements, rather than running a posthoc pip command with the --no-deps
flag.
The other option (which at this stage I'm not particularly fond of, but open to debate) is that all the cloud deployment systems that use pip start exposing pip parameters and custom pip commands through their interface.
The issue there is that many cloud systems run pip for you, so you don't have interactive access to pip commands.
So pip is designed as a command line utility, where control is exercised via program arguments. If cloud systems expect you to use pip in a way that prohibits supplying command line arguments, then it's on them for not giving you access to all of the necessary pip features, not on pip for not giving you a way to do what you want.
By the way, nearly all pip options can be set via environment variables, so if the cloud provider lets you set environment variables before running pip, maybe you could work around their limitations by using that option.
It's possible to consider a new use case "pip is used in a limited environment that only allows X, Y and Z to be supplied". That would be a new feature that may or may not be reasonable to add to pip's existing set of features. But it would be important in that case to precisely define what features X, Y and Z are allowed, and confirm that the list is common across all cloud providers (we're not going to add features to support each individual cloud provider separately). So far, though, no-one has explained precisely what "someone using pip via a cloud provider" is allowed access to, so we're all guessing here.
Would it help, for instance, if --no-deps
were allowed in requirements files? Because #9948 is tracking that.
I suspect that is what @RafalSkolasinski may be referring to above. Accepting the explicit version specification in the requirements, rather than running a posthoc pip command with the --no-deps flag.
Yup, exactly this.
As far as I understand it --no-deps
means "Don't install package dependencies." and that's probably not totally what we are after. If the package with conflicting dependencies has 20 dependencies in total and only few are in conflict, we'd still want to leave resolution of most of them to pip, only providing resolution when it is needed.
Using --no-deps
would require me to specify whole 20 dependencies manually and that's neither convenient nor desired.
Would it help, for instance, if --no-deps were allowed in requirements files? Because #9948 is tracking that.
IMO, what would help would be option to flag dependency on package X as forced. If for example packages A and B has conflicting dependency on X I could then specify version of X, marked it as forced, and have pip rollout everything as I want it.
Using
--no-deps
would require me to specify whole 20 dependencies manually and that's neither convenient nor desired.
Furthermore, this would require you to not only specify the 20 dependencies, but take on responsibility for updating them and maintaining them yourself. When it may only be one dependency that is the issue.
By the way, nearly all pip options can be set via environment variables, so if the cloud provider lets you set environment variables before running pip, maybe you could work around their limitations by using that option.
^ This is great!
So far, though, no-one has explained precisely what "someone using pip via a cloud provider" is allowed access to, so we're all guessing here.
Here is airflow's example.
Dataflow installs requirements directly from a setup.py
or requirements.txt
file, though I'm not sure where the use of pip is buried in the Apache-Beam or Dataflow services code, it would take some digging.
I'm sure there are many other cloud services that also use pip. These are the ones I've been engaged with most recently.
I think I am with pip
maintainers here. Being able to run single pip
install with conflicting dependencies is not something pip
should ever do. The resolver is not perfect due to problem space complexity (and it's less and less with each new version of pip
), but if Apache Beam has strict requirement for numpy, then well, ther is no reason it should not be followed. And if you need it relaxed, you should discuss it with Apache Beam, not pip
. They likely have a good reason for that.
And if you are really sure you can relax those requirements, you have other options. For example you can fork apache-beam in your own repo, write script to automatically update the limits it has in setup.py, and release a private apache-beam-relaxed-numpy
package and use this one. No problem with that, Apache Beam licence allows that and you can publish it - either in your private repo or even in pypi
. There are no legal/technical limits for that.
The airflow example is also very customizable for you without having custom package. Python Virtualenv is literally wrappint your python code, so you can not only provide your own requirement file to start with, you can also run pip install --no-deps
command inside your Python task, there is totally no problem for that and no need to add functionality to pip
for that.
And if you need it relaxed, you should discuss it with Apache Beam, not
pip
. They likely have a good reason for that.
It's good practice in many packages to include a maximum version limit in your python dependencies so that versions don't change out from underneath you when a breaking change occurs that may break your package. It is then on the package maintainers to test the latest version and update these dependency version requirements in their package on their schedule, as their dependencies update (in-turn on the dependency maintainer's schedule). Unfortunately, package maintainers / developers always have limited resources, and their own release schedules. It may be a very high priority for a particular package user, but has to be weighed against other priorities, and may take some time to be updated.
A very easy solution for that user would be to temporarily override the dependency requirements via pip, take on any testing / risk for that dependency in the mean-time, and then revert it once the dependency has updated. If this functionality was available in pip it would be a lot less overhead than forking the package, changing the one dependency, taking on the entirety of package maintenance / updates themselves, until the maintainers have updated the package on their schedule.
At this stage, I don't see why pip wouldn't provide this functionality to override requirement specs, as long as it is explicit enough for users to understand that in doing so they are overriding the recommendations of their dependency. Other language's package managers provide avenues to overcome dependency version conflicts. For example, npm allows multiple versions installed simultaneously, while gradle allows you to explicitly ignore transitive dependencies.
so you can not only provide your own requirement file to start with, you can also run
pip install --no-deps
command inside your Python task
You can provide your own requirements file, but recent pip versions will fail to install the packages unless all package dependency requirements can agree. Running custom pip install commands inside a python script running in that environment that pip is modifying, is a bit of cruft I would prefer to avoid.
It's good practice in many packages to include a maximum version limit in your python dependencies so that versions don't change out from underneath you when a breaking change occurs that may break your package.
Actually this is a good practice only for applications. For libraries (like apache-beam) the general consensus is that they should not be upper-bound as long as there is a good reason. There are even some solutions (like poetry package manger) build around that premise. It's the decision of package maintainers to decide on the limits.
Running custom pip install commands inside a python script running in that environment that pip is modifying, is a bit of cruft I would prefer to avoid.
Well you will have to do a custom command anyway if you want a new switch to be added. What's the difference whether you add a switch to existing command or add a new command? You said yourself this is useful with "experimenting" only. By bying able to run custom commands with --no-deps
gives you much more control for the experimentation because you can literally install any packages in any combination. I do not see how adding your feature to pip
would help with that.
One more comment (after a bith thinking abouti it), eventually - it your decision as person who wants to install conflicting dependencies to break the limits that were designed by the package maintainers and resolved by pip
.
If you want to follow this path, you should do it consciously and bear the burden of added complexity. I persnally think pip
even should not make it easier, precisely to make sure that this is done only when you really know what you are doing. Yes - it should be possible (and it is with --no-deps
but it should not be easy
. There should be a significant barrier to overcome if you want to do it.
With the exception of popular, actively maintained projects, my experience has been that package authors do not define dependency version ranges with the level of care pip maintainers assume. The recursive nature of transitive dependencies and the velocity of updates throughout the package ecosystem makes it challenging for even fastidious maintainers to set timely and ideal requirements ranges.
In nearly every case where we have run into conflicting transitive dependencies, we could find a non-conflicting version of the dependency that's compatible with both packages. However, it's an enormous struggle to get these fixes accepted into package repos. We've submitted issues and PRs, and it typically takes months before they get any attention from maintainers, if ever. We've tried forking repos, but each one amounts to a semi permanent maintenance burden on our team.
All this extra work accumulates to a significant amount of developer time addressing these issues. Unless we expect the rate of change of Python packages to slow down dramatically, this will result in an endless game of Whac-a-Mole for anyone with a sizable set of requirements. The suggestion that this is somehow good for the Python ecosystem seems to willfully ignore obvious practical difficulties teams face building Python software.
I simply don't understand PyPA's insistence on this point, since they certainly haven't presented any evidence that it's having a positive impact, while there's an abundance of evidence that it's materially hurting developers.
PyPA's position against dependency resolution overrides is harmful and contrary to the Python community's long held philosophy that "we are all consenting adults." The only reasonable way to address this complex problem is to give us basic tools to control how things work in our own systems.
I ask in advance for forgiveness for the frustration, but the PyPA's insistence on this point it's just crazy.
Have they ever seen any other package manager for any other language?
If you work for a web agency with hundred or thousand of web projects, you will be able to understand my current state of mind about this :(
Edited: I see that there is no βPyPAβs insistenceβ on this, sorry.
the PyPA's insistence on this point it's just crazy
Please be a little more moderate in your tone. I appreciate you might be frustrated, but so are we by people repeatedly suggesting that we haven't thought this through.
And if you look back through this thread, you'll see that the pip maintainers have not been insisting that we won't solve this issue, but we have been trying to get clarity from people as to what they expect (with limited success - most people don't really get beyond "I need to install X and I wish pip would stop telling me it's incompatible", which is a great starting point, but not enough to build an implementation from).
Have they ever seen any other package manager for any other language?
To be honest, my personal familiarity with other languages' packaging ecosystems is limited. But in what I have seen, I'm not aware of any mechanism for doing what you say. Can you give me an example of one, and suggest how their design would translate to Python?
Some starter things to consider (and explain how other language ecosystems handle them):
- How do you even find out what to override? We see pip invocations that involve hundreds of packages, with incredibly complex dependency trees. We try to give informative errors, but they don't necessarily tell the full picture. For example, pip may refuse to install X because A and B depend on different versions of X. The user might test and confirm that both A and B work with a particular version of X, and want to force that version - but there may be another package C, deep in the dependency tree, which doesn't work with this version of X. And the user has no way of knowing this, because they've just told pip to ignore constraints on X...
- If someone installs a broken combination of packages by overriding a dependency, how do we handle later installs that add to that environment, or attempt to upgrade something?
- How do we support people with such broken environments (they often won't remember how they got into the state they are in)?
We're not insisting that we won't implement a solution, it's just that no-one has yet offered a design (much less an implementation) that solves the problem. And yes, I'll be completely honest, we're not spending a lot of time on trying to address this ourselves - we have plenty of other areas that need work as well, and not much resource to work on them.
BTW I have written some code (as was proposed by @uranusjr, unfortunately I haven't yet pushed it on GH since it was unfinished) that just patches wheels to remove the <
constraints. But it doesn't make sense if we cannot patch the archive in-place (we surely don't want repack hundreds of MiBs of archives contents just to patch 1 small file). https://github.com/KOLANICH-libs/libzip.py can be a solution, it is ctypes
-based bindings to ziplib
, which seems to be supporting patching archives in-place. We can probably use it to patch archives in place right now, but I think we should write a drop-in replacement to ZipArchive
using this lib instead as the actual engine dealing with the archive. When it is finished I guess I can integrate it into the tool.
Yet another idea is a setuptools
plugin. Setuptools plugins are called by both setuptools
and poetry
. In the case of using PEP 621 I wonder if they can mess with the stuff parsed from there, so maybe just installing such a plugin when building a wheel can partially solve the issue in the another place.
@pfmoore I think https://yarnpkg.com it's a good starting point.
I think it's great to have the opportunity to count on different development cultures to solve complex problems, without having to find the solution yourself.
I could provide some example of conflict resolution using yarn, If you think could be helpful and if you think you can spend some time on this.
Anyway, if you have already decided that pip's current behavior is the correct one, I promise that I will try to better manage my frustration arising from this issue π
Anyway, if you have already decided that pip's current behavior is the correct one, I promise that I will try to better manage my frustration arising from this issue π
I repeat - we haven't decided any such thing, but it will need someone to do the work (i.e., manage the discussion and submit a PR) to move things forward.
if you think you can spend some time on this.
Nope, I have no personal interest in doing anything about this. I'd be willing to consider reviewing a PR should someone produce one, but that's all.
Okβ¦ itβs worth noting that this issue is one of the most commented issues in the pip repo so I think that should have an higher priorityβ¦ but is just an opinion π
Thanks for your valuable work and thanks for listening.
π
PyPA's position against dependency resolution overrides is harmful and contrary to the Python community's long held philosophy that "we are all consenting adults."
- There is no such thing as a "PyPA position" on this (the PyPA is a volunteer group and doesn't really function like that).
- No one has stated that this is a bad idea and that we should not do this.
Beyond that, if folks bother to click "show hidden items" they'll notice that basically every pip maintainer has been involved in this discussion and contributed to the discussion in the direction of figuring out how to do this -- the problem isn't that people do not want to do this. Rather, there are a few big open design questions here and no one has stepped up to do the work of figuring out the answers for them. No maintainer currently has the bandwidth to drive this. See also #8076 (comment).
@pfmoore's comment from a few hours ago is an appropriate summary of the state of affairs -- #8076 (comment). For the folks interested in seeing this happen, consider contributing to the discussion toward figuring out how to do this -- instead of writing comments about the "PyPA's position" or "PyPA's insistence".
I'll reiterate that Yarn's approach and NPM's approach have been mentioned in the discussion already.
I didn't realize that pip was run entirely by volunteers.
Really thanks for your great workπ
I think we can continue using some workarounds for βlegacyβ projects with unmaintained or poorly maintained dependencies (eg. forking them, creating pull request, β¦). Thanks again.
For the folks interested in seeing this happen, consider contributing to the discussion toward figuring out how to do this
Would any PyPA devs be willing distill/catalog the suggestions they've received so far along with problems/challenges to implementing them (e.g. in a separate structured document)? I realize this is more work for you, but it's difficult to pick up on all these specific important bits scattered throughout the long and overlapping conversations on this topic. I'm not sure exactly where you're stuck in this process or why, and I think it would be a huge help to everyone who wants contribute to be able to understand this better.
The problem we repeatedly encounter is a package has declared a dependency range we found to be unnecessarily strict, either for our application only or generally, and we want to override that. For example, I want to be able to define something equivalent to e.g. "snowflake-connector-python is behind on the max cryptography version, we need latest security update now and we know it works fine. pip should pretend that snowflake-connector-python has defined cryptography>=3.1.0,<38.0.0 instead of cryptography>=3.1.0,<37.0.0." I could see this being implemented a number of different ways.
Would any PyPA devs be willing distill/catalog the suggestions they've received so far along with problems/challenges to implementing them (e.g. in a separate structured document)?
Why does it need to be a PyPA dev? The discussion is all here in this thread, so anyone interested could read it and provide a summary. There's nothing that's happened in private. (There may be some older related issues that would be worth reviewing, but I can assure you my memory isn't good enough to find them, and my ability to search github issues is notoriously bad...)
For example, I want to be able to define something equivalent to e.g. "snowflake-connector-python is behind on the max cryptography version, we need latest security update now and we know it works fine.
I'm going to reiterate some of the comments made previously, not so much as a matter of "you should do this", but to explore why people don't use the various options that currently exist, as a way of understanding the design constraints better.
- Why not just wait for an update to snowflake-connector-python? In the case you describe, it's a security issue, which is something of a special case - are there any other cases where it's justified to rush a solution?
- Why isn't it sufficient to ask the snowflake-connector-python maintainers to update their constraints? Again, if it's a security problem, then either they will be resposnve to the request, or you have other questions to ask about how appropriate it is to use them in a security-sensitive project. Are projects really so bad at responding to requests like this that it's a common issue?
- How can you be sure you're right when you say breaking the dependency is safe? You say you've tested, but what if the downstream maintainers are aware of a subtlety that you're not? You might be introducing vulnerabilities while trying to remove them - what would you do if that happened? Revert the change and stick with the published dependencies? If that's a viable option in that case, why isn't it a viable option now?
- What's wrong with installing snowflake-connector-python, then replacing cryptography by installing the version you want? Pip will complain about a broken environment, but will still do the install.
- Assuming we've established that it's urgent to get a fix, and it's not possible to wait for downstream, then why can't you build a local wheel of snowflake-connector-python, with no change apart from the dependency metadata, and use that? If the concern is that it's too fiddly, would having a tool that automated the process help, or would the existence of such a tool expose the fact that "it's too fiddly" is actually a proxy for some other constraint which you're not sharing?
- Assuming you get the dependency overridden, is this something you'll just do once, or is it something you'll need to incorporate into your build processes and automations? Assuming you need to incorporate it into your workflow, what happens when snowflake-connector-python releases a new version with fixed dependencies? How will you notice that? Will you need to change all of your workflows back to remove the override?
That's just a short list of the questions that immediately occur to me when looking at this feature request. While you may well say "none of these options work for me", and I'm fine with that, if you can't articulate clearly why they don't work for you, how can we design another approach without risking the possibility that it would also not work for you?
There's plenty of other design questions that need answering, but these start by helping to understand why we need a new feature at all, and why existing options aren't sufficient.
And in case anyone is wondering why I'm pushing back so hard, I'm really not. In the sort of situations where I use pip, all of the workarounds I've suggested above would be fine for me. I may be a pip maintainer, but I don't have experience of all the myriad ways people use pip - so I have to go with what I do know, and ask questions where that knowledge isn't enough. I think a lot of the frustration people express here comes from a sense of "why don't the pip maintainers understand why this is a problem" - and that's the answer, it often isn't a problem to us, and we don't understand the aspects of your processes that make it such a problem for you. So please help us understand, rather than assuming we're being difficult because we "don't think this is an important issue", or "don't care about what users need"...
Would any PyPA devs be willing distill/catalog the suggestions they've received so far along with problems/challenges.
@willsthompson Just a comment on this thread (it's a fascinating read). I think this is actually most of the work to be done. Once someone will review, catalogue, propose and summarise the problems/proposals and design questions - distilling it from all the discussions and get pople to understand it and involve others, the actual decision making process and implementation will be easy,
And this can be done by anyone. It does not have to be a PyPa developer. This is the nature of the open-source project like this run by volunteers. If you want to make thing happen badly - volunteer and do it. So if you really want this to happen, roll the sleeves up, create a structured document, invite others to collaborate and lead it to successful agreements. This is the bulk of work to be done here, There is absolutely no reason PyPa developers and maintainers should be responsible for that. They have the power to vote and make the final decisions, but it does not mean that you cannot do the bulk of the work organising it. You seem to have pretty good ideas how to do it, so - why don't you? In open-source, voluntary projects "talking the talk" brings at most value to discussiong but "walking the walk" yourself is the only way to move things forward actually.
Why does it need to be a PyPA dev? The discussion is all here in this thread, so anyone interested could read it and provide a summary.
It doesn't have to be, but no one has a better understanding of the high level design principles from which I assume PyPA devs are drawing their objections, so you're simply in the better position to summarize accurately and clearly. I've also read these discussions and similar ones in poetry and pip-tools, and I get the sense there are more fundamental differences of opinion over what's good for Python ecosystem. But I'm happy to be wrong about that!
- Why not just wait for an update to snowflake-connector-python? ... it's a security issue, which is something of a special case
Security is a special case, but it's a very common special case. The most common general case is simply any conflicting transitive dependencies, e.g. cases
- A transitive dependency conflicts with a top-level requirement. e.g. aws-data-wrangler req pyarrow=7.0 but I require pyarrow directly and want to specify 8.0 to address a bugfix from 7
- Two requirements have conflicting transitive dependencies. e.g. snowflake-connector-python req pyarrow<6.1, but aws-data-wrangler req pyarrow=7.0.
- Why isn't it sufficient to ask the snowflake-connector-python maintainers to update their constraints? ... Are projects really so bad at responding to requests like this that it's a common issue?
In my experience, yes. Security issues tend to get more priority, but even (or maybe especially) very active projects have internal release processes and priroities they're juggling, and months is generally a good turnaround time. For less active projects, you're lucky to get a response in months or at all. I don't know why anyone would find this suprising. It seems perfectly normal and expected for busy dev teams, but also unacceptable as a primary solution to this problem.
- How can you be sure you're right when you say breaking the dependency is safe?
In the same way that the authors can: we perform automated and manual testing. Obviously I can get this wrong, but so can they! Just because they're probably right doesn't mean they're always right, and if they're not always right and I know it, shouldn't I have some recourse to resolve it myself without maintaining forks or waiting indefinitely?
You might be introducing vulnerabilities while trying to remove them...
Yes there is always a general risk of breaking things, but in this example we have already acknowledged that things are broken now, prompting the need for an override. Could I break them more? Yes, but I can do this with a fork as well, which pip docs suggest. It's just much more cumbersome than official pip support1.
... what would you do if that happened? Revert the change and stick with the published dependencies? If that's a viable option in that case, why isn't it a viable option now?
That's not typically how things work. I rush a patch to fix a bug, introducing a new bug... I don't revert because I still need that first patch, I patch the patch. Another way to look at it is that even if I do introduce a new vulnerability, at least it's unknown vs. the known one I'm patching. This is the degenerate case, but it's still better than an unpatched known vulnerability.
- What's wrong with installing snowflake-connector-python, then replacing cryptography by installing the version you want?
This works for case 1 but not case 2 (from above), and addressing this problem "out of band" makes it harder to track the reason for the override, and I would guess only gets worse as the number of these increases. An official override option would help users make the most minimal override needed, while also helping them remove it when it's no longer necessary.
- ... why can't you build a local wheel of snowflake-connector-python ... If the concern is that it's too fiddly ...
Yes, that's too fiddly to the extent that in requires manual steps, and I'm not sure how this would work with CI. But while I would prefer a pip-native solution, a tool to automate otherwise fiddly solutions would still be welcome. Our main objective is to unblock our development and release process when new conflicts are introduced by upgrading packages. We have already considered many tools (pip-tools, poetry, dephell), but all have the same problem of strict dependency resolution requirements. Integrating a new tool into our build seems fine to me if the end result is a centralized solution for overrides.
- Assuming you get the dependency overridden, is this something you'll just do once, or is it something you'll need to incorporate into your build processes and automations?
We would need to incorporate anywhere we build the app.
... what happens when snowflake-connector-python releases a new version with fixed dependencies?
I think an ideal override option would give you a signal that your override is obsolete - and/or that the overridden dependency has changed in any way. Alternatively, you could pin the overrides to the top-level req versions. e.g. I say: For snowflake-connector-python versions less than or equal to 2.25, override with pyarrow=7.0. Less precise, but still useful.
Will you need to change all of your workflows back to remove the override?
No more than when I need to change all my workflows to change or add a requirement. Presumably overrides, like requirements, would be piped in from a config document in version control, and the workflows would only need to be updated once, to accommodate the override config.
If you want to drill down into anything else, lmk. Also, this article provides a very thorough account of the types of problems we run into building python apps, along with detailed examples. Much better than I have done here, so I highly recommend reading it for a better understanding: https://iscinumpy.dev/post/bound-version-constraints/
Footnotes
-
I had an idea for a tool that would perform overrides by hooking into your Github account, automatically creating forks for every override, patch requirements to point to the fork, then continue with your build. This is obviously absurd, but it could exist, and I think it demonstrates that a less onerous method to perform a practically equivalent solution is reasonable. β©
Here is my tool for unpinning dependencies: https://github.com/KOLANICH-tools/unpin.py
It currently has a flaw: it copies the archive, because the underlying library dealig with zip does so. The lib author has refused to do anything about it because it contradicts to h8s vision of that lib. IDK if he is willing to accept the PR about that, but I guess my next try can be creating similar bindings, but for miniz
.
Is there anything else you would like to know?
Not really, no.
Please let me know how I can help move this conversation forward.
You could summarise, as I suggested. You responded to say that "you're simply in the better position to summarize accurately and clearly", but that doesn't mean it has to be a pip developer. And it's pretty obvious by now that no pip developer wants to spend the time doing this (I know I don't). So unless you want to do that summary, there's not much you can do except wait for someone else to put in the effort.
Honestly, though, the only way this is going to move forward is if someone proposes a design and explains how it addresses the points already made here. Ideally in the form of a PR, as otherwise we'll only get stalled again when someone says "why don't we do it like such-and-such", and everyone seems OK with the idea, and people start saying "so how do we get this implemented"...