New resolver: “Nameless” constraint
uranusjr opened this issue · 44 comments
Spawned from #8209.
According to test_install_distribution_union_with_constraints
, this is a valid constraint:
C:\Temp\packages\foo[bar]
This is quite difficult to deal with. We can store the constraint information in a separate mapping (with the URL as key), but I’m not sure how it can be fetched. Here’s how we get the relevant constraint to use:
def find_matches(self, requirement):
# type: (Requirement) -> Sequence[Candidate]
constraint = self._constraints.get(requirement.name, SpecifierSet())
return requirement.find_matches(constraint)
I guess we can do something like:
def find_matches(self, requirement):
# type: (Requirement) -> Sequence[Candidate]
constraint = self._constraints.get(requirement.name, SpecifierSet())
if isinstance(requirement, ExplicitRequirement) and
isinstance(requirement.candidate, (LinkCandidate, EditableCandidate)):
constraint &= self._url_constraints.get(
requirement.candidate.link.url, SpecifierSet(),
)
return requirement.find_matches(constraint)
but we probably can agree this looks super wrong. Some additional abstraction is needed here.
One question - do we have any indication that anyone actually uses this functionality? It looks incredibly weird, and I don't see what the purpose of such a constraint is. (I can technically work it out, but why would anyone use it?)
I'd be reasonably comfortable taking the approach of (at least initially) dropping support for constructs like this in the new resolver, and seeing if anyone complains.
It would affect the rollout process and how we handle backward compatibility, though, because if someone does pop up saying they use this, we'd need a workaround for them while we work out how to add it. It also changes the project goal, as "all tests must pass" won't be quite right in that case.
@pradyunsg @brainwane Any thoughts? Is there any value in asking the user survey population "do you use constraints and if so how?" Or is that too small a sample to be worth it?
PS I'm not going to review the rest of the constraint-related issues until after the weekend, but I wanted to add my thoughts to this one to give people a chance to respond.
I vaguely remember being taught about mentioned constraints as a workaround to pip’s inability to merge extras (the feature described in #8211), but I assume many such usages are not needed anymore since the new resolver does not have the problem. As for direct URL in constraints, no idea at all.
This is definitely a good case to delibreately not include the feature because YAGNI and see what happens IMO.
100% on board for not implementing this, flagging this as a "this no longer works with the new resolver" line item somewhere, and seeing if anyone comes complaining about this during the beta. :)
Constraints were not a work-around for merging extras. They were added by Openstack because they want to make sure that the entirety of openstack is installable together, without having to either maintain 20 separate requirements.txt
files OR forcing the install of literally all of Openstack.
See #2731 for a tiny bit more information. We would definitely want to reach out to the Openstack community to see if they're still using it.
Actually I think that the feature described in #8211 is wrong? At least the inteded purpose of constraint files were to not cause anything extra to be installed, but to only apply additional constraints to whatever would ultimately get installed.
It is my understanding as well. Sorry if my writing is causing misunderstandings, please feel free to add to/modify the issue.
We would definitely want to reach out to the Openstack community to see if they're still using it.
Openstack are actively testing the constraints implementation in the new resolver (@dhellmann contacted us to help out) so based on your comment, I'd suggest that we wait for feedback from them on whether their testing showed up any issues, and review these issues in the light of that feedback).
I wonder if these tests were mistakenly added from a "completeness" point of view ("because you can add any requirement in a constraints file, we need to decide what foo[bar]
means in a constraints file") rather than focusing on what was requested ("we should only allow version specifiers in a constraints file, so other types of requirement should be invalid or at least not supported")?
As far as I know, OpenStack always uses package names, one rule with == or === and a version per constraint entry, and optionally an expression indicating which version of python needs the constraint (I can never remember the name of that feature, sorry). This allows the test jobs to "pin" to the same versions of a package across tests that have different combinations of applications installed.
The global constraints file for OpenStack is at https://opendev.org/openstack/requirements/src/branch/master/upper-constraints.txt if you want to see an example.
Most projects also have a per-project constraints file using the lower bounds for their requirements, used for a separate unit test job and possibly an application-specific functional test job. We do not run the integration tests with the lower bounds constraints. For an example of one of these files, have a look at https://opendev.org/openstack/nova/src/branch/master/lower-constraints.txt There are going to be a lot of those, since every repo will have its own.
I have asked specifically about nameless constraints on the mailing list http://lists.openstack.org/pipermail/openstack-discuss/2020-May/014790.html
/cc @prometheanfire
we currently do not allow nameless constraints or requirements. As I understand it, those are git/http based uris? we set permit_urls=False
@dhellmann @prometheanfire Could you aldo help verify whether there are currently projects depending on the extras behaviour, as described in #8211?
I'm a bit unsure about the precise meaning of 'nameless' and if this is coupled to the discussion around extras. However I will outline the way openstack-ansible uses constraints files to install a mixture of packages and source code just to make sure that this is captured.
Example is here http://paste.openstack.org/show/793386/ but short story is we use constraints like "git+https://opendev.org/openstack/glance@6e3ced8251cd6e273aa73f553a24fc475b219db5#egg=glance" and put URLs to remote constraints in constraints files all the time.
@jrosser So I'm not at all clear what such a constraint (a URL) would mean in practice. Constraints as I understand them, constrain a project by requiring a certain version or equivalent - so would your quoted example be saying that glance must be installed from precisely that URL - and hence that pip install glance -c <that constraint file>
would fail because you're not requesting the right URL?
The constraint as we use it says "install this package, but take the source code from the URL at the SHA or branch specified". Here is an example I just did which installs ansible from our fork which has a bugfix applied:-
(test) ubuntu@bionic-test:~$ cat constraints.txt
git+https://github.com/bbc/ansible@add-filters-os_router-os_subnet#egg=ansible
(test) ubuntu@bionic-test:~$ pip install ansible --constraint constraints.txt
Collecting ansible from git+https://github.com/bbc/ansible@add-filters-os_router-os_subnet#egg=ansible (from -c constraints.txt (line 1))
Cloning https://github.com/bbc/ansible (to add-filters-os_router-os_subnet) to /tmp/pip-build-yjrh1go0/ansible
Collecting PyYAML (from ansible->-c constraints.txt (line 1))
Cache entry deserialization failed, entry ignored
Cache entry deserialization failed, entry ignored
Downloading https://files.pythonhosted.org/packages/64/c2/b80047c7ac2478f9501676c988a5411ed5572f35d1beff9cae07d321512c/PyYAML-5.3.1.tar.gz (269kB)
100% |████████████████████████████████| 276kB 2.3MB/s
Collecting cryptography (from ansible->-c constraints.txt (line 1))
Cache entry deserialization failed, entry ignored
Downloading https://files.pythonhosted.org/packages/58/95/f1282ca55649b60afcf617e1e2ca384a2a3e7a5cf91f724cf83c8fbe76a1/cryptography-2.9.2-cp35-abi3-manylinux1_x86_64.whl (2.7MB)
100% |████████████████████████████████| 2.7MB 367kB/s
Collecting jinja2 (from ansible->-c constraints.txt (line 1))
Cache entry deserialization failed, entry ignored
Cache entry deserialization failed, entry ignored
<SNIP>
So long as this all keeps working with the new resolver, it's all OK.
Ah, that's not how constraints in the new resolver work right now (and it's not how I read the docs). Looks like there's some extra complexity here we need to look into :-(
@jrosser I'm curious about why that is being done via the constraints list rather than a regular requirement specification. Are there cases where some of the requirements depend on each other?
@pfmoore IIUC, the current implementation of constraints is either limited to exact version matches or the constraint entry overrides all other version rules. The effect is when we list foo==version in a constraint list, we get exactly version of foo, even if that version would have conflicted with the requirements of a package that depends on foo version. That's probably not ideal behavior, and maybe the resolver should consider all of the version specifiers and fail if there is a conflict.
@dhellmann The new resolver certainly won't do that (by design) - the current implementation allows NAME OP VERSION
for any operation (<
, >=
, ==
etc). That restriction is applied to the versions found as part of the normal dependency resolution, limiting the versions found to only those that match the constraint.
There's no provision in the current implementation for the new resolver for any form of constraint that isn't a spec (i.e., of the form above). Adding the "named URL" form that @jrosser described would be very different, as it's introducing a source for the package that otherwise would not have been available, and disabling all other sources. Even if we do implement this (it's not necessarily going to be straightforward to do so) we still wouldn't install it if the result would conflict with other requirements/dependencies.
A key feature of the new resolver is that it will only ever result in a set of candidates to be installed that satisfy the supplied requirements (and their dependencies)1.
1 There's still a possible situation where we could get an inconsistency with other packages - see #7744 for more detail - but that's not relevant here.
@dhellmann looks like the same installation of ansible can be achieved through a requirements spec. This isn't a good enough example to prove that we can do a mixture of downloaded packages, specific things from source and have openstack upper-contraints apply across the whole lot.
(test) debian@rsyslog-test:~$ cat reqs.txt
ansible @ git+https://github.com/bbc/ansible@add-filters-os_router-os_subnet#egg=ansible
(test) debian@rsyslog-test:~$ pip install -r reqs.txt
Collecting ansible@ git+https://github.com/bbc/ansible@add-filters-os_router-os_subnet#egg=ansible from git+https://github.com/bbc/ansible@add-filters-os_router-os_subnet#egg=ansible (from -r reqs.txt (line 1))
Cloning https://github.com/bbc/ansible (to revision add-filters-os_router-os_subnet) to /tmp/pip-install-inkmidaw/ansible
Branch 'add-filters-os_router-os_subnet' set up to track remote branch 'add-filters-os_router-os_subnet' from 'origin'.
Switched to a new branch 'add-filters-os_router-os_subnet'
Collecting PyYAML (from ansible@ git+https://github.com/bbc/ansible@add-filters-os_router-os_subnet#egg=ansible->-r reqs.txt (line 1))
Using cached https://files.pythonhosted.org/packages/64/c2/b80047c7ac2478f9501676c988a5411ed5572f35d1beff9cae07d321512c/PyYAML-5.3.1.tar.gz
Collecting cryptography (from ansible@ git+https://github.com/bbc/ansible@add-filters-os_router-os_subnet#egg=ansible->-r reqs.txt (line 1))
Using cached https://files.pythonhosted.org/packages/58/95/f1282ca55649b60afcf617e1e2ca384a2a3e7a5cf91f724cf83c8fbe76a1/cryptography-2.9.2-cp35-abi3-manylinux1_x86_64.whl
<SNIP>
I'd have to defer to Jesse P who developed a lot of this for openstack-ansible to know if there was a reason the source was specified in constraints rather than requirements.
Yea, I agree that name @ URL is not a constraint, but a requirement. If the given URL does not work, well, the resolution should fail. That's exactly what making it a requirement would do. Passing this as a constraint isn't correct and "worked" because of a fun bunch of technical debt in pip quirks (overtly optimistic resolver, bad abstractions etc).
Part of the problem here is with how constraints got implemented. They're represented with the same object within pip as requirements, and have no distinction other than a single boolean. That was fine then because requirements could only be name-op-version or name. However, this meant that we accidentally started allowing URLs in constraints when we added URLs to requirements! That's a bug we didn't know about until now!
Now, that stuff still worked because the older dependency resolver "upgraded" constraints into a thing to install (by flipping the boolean and setting another attribute), which is fine in the optimistic world view of that resolver (which also includes I won't ever see dependency conflicts, so, it wasn't exactly sound).
The new resolver doesn't work like that (as @pfmoore noted above) and that sort of behaviour is really tricky to reproduce in a model where "limits on what can be installed" and "thing to install" are (correctly) represented by different objects.
I'm strongly opposed to bringing this bug over to the new resolver, since the usecase is already covered by regular requirements already as far as I can tell from the descriptions above.
I think I agree with not carrying the URL handling behavior over.
I definitely agree that constraints should not be installed unless they are listed as a requirement.
I think I agree with not carrying the URL handling behavior over.
Before going all the way, I would like to understand if there is some other way to express that if any requirement introduces a need for a package, it should be installed from source instead of from a dist.
For example, if I am installing two applications that depend on a library and I want to install the library from source as part of a CI job, how do I do that?
--no-binary common-library
I understand this thread has drifted from nameless constraints to considering the removal of constraints as URL. If this is the case, please read on, otherwise apologies for the noise.
I think constraints are important to have (compared to requirements) when you consider top level requirements and REQUESTED
(#8026). As a consequence I also think URLs should remain allowed for constraints.
The typical use case for URL constraints is to install a patched version of a dependency, while not making the dependency a top level requirement.
Imagine you have a project with such a setup.py
:
from setuptools import setup
setup(
name="test",
install_requires=["packaging"],
)
And this constraints.txt
:
packaging @ git+https://github.com/pypa/packaging@20.1`
[I'm using a tag here as an example, a real life use case would be a PR or a bug fix branch.
Also this example is trivial, as one can argue that I could put the URL in install_requires
. However with indirect dependencies it makes real sense.]
If you pip install -c constraints.txt .
, that specific branch of packaging
will be installed, but REQUESTED
will not be set for it.
The same can be achieved with pip install -r constraints.txt .
, but then packaging
would be marked as a user requested top level requirement.
Similarly if you remove packaging
from install_requires
and reinstall in a fresh venv, packaging
will not be installed at all when constraints are used (since it is not a dependency), while it would still be installed when passing the URL via -r
.
IMO this use case for constraints is important and should be preserved. It will become even more important when we implement a better uninstall that considers user requested top level requirements.
I have enough of a problem with the idea of REQUESTED
that I'm not particularly swayed by this argument, to be honest. Is there are real situation where someone would be using REQUESTED
and could benefit from this?
Implementing URLs as constraints would be tricky - it really doesn't fit well with how the new resolver handles constraints. I don't think it would be impossible, but I'd like to see some solid examples where people are using the existing functionality, and can't achieve the same result any other way, before going down that route.
I'm basically in agreement with @pradyunsg here, The documentation says "Constraints files are requirements files that only control which version of a requirement is installed" - that implies to me that the fact that URLs work is an accident of the implementation. That documentation was written in the same commit (bb0b429, addressing #2731) that constraints were implemented, so I think it's acceptable to take it as definitive. It's also worth noting that test_install_distribution_union_with_constraints
wasn't in that original commit - it was added in commit dc8e7f0, addressing #3046 and #3189, neither of which was about constraints having any form other than a version restriction. And that test doesn't actually test a situation that was described in the relevant issues (it tests a constraint having an extra, when the issues were about having a requirement with an extra along with a (normal) constraint, in fact #3046 describes putting the extra into the constraint file as a workaround for the bug, not the correct behaviour.
Having done that digging, I'm now certain that the intended use for constraints was only ever to supply version restrictions, and that's what we should limit the new implementation to.
The URL use of constraints feels more like a feature request for some kind of resolver override. Like a "I don't care what the resolver says, I want this version" override feature, whereas constraints are more like "please feed this additional information into the resolver".
"Constraints files are requirements files that only control which version of a requirement is installed"
A direct URL is a way to specify a version, even if it requires generating metadata before being usable in the resolver.
I don't know why nor how that feature came into existence but it is coherent and useful.
Even if you ignore REQUESTED, I for one use that regularly in applications: pip freeze > constraints.txt
and later, after dependencies have evolved, pip install -e . -c constraints.txt
to have a fresh environment with only the dependencies that are used. No need for pipenv nor pip-tools. I must certainly not be alone doing that and I think I've seen comments going that way in the pip issue tracker (can't find it right now).
Constraints are basic primitives for dependency management and there is no reason they should not support direct URLs. A direct URL is just another way than ==
to say we want to constrain to a specific version, just one that will not come from an index.
This specifying constraints with URL feels a bit like how people use dependency_links
, but is actually in a way we want it to be (i.e. specified at the install site), without needing to add awkward requirements, which dependency_links
advocates have been complained about. I can see we suggest something like
pip install my-package -c my-package-constraints.txt
as a succinct (enough?) way to replace dependency_links
. So I agree it is a useful feature to allow PEP 508 name @ url
formats in the constraints file, even though the reason may be different than it’s original purposes.
And I feel that would be what people expect, even those who want to use dependency_links
.
So... folks arguing that constraints should allow URLs, I'm curious:
constraints.txt: pip @ https://github.com/pypa/pip/archive/master.zip
requirements.txt: pip
What should be the result of this?
From what I can tell, the options are:
- rejecting everything -- because it's never going to get that link from the index?
- pick https://github.com/pypa/pip/archive/master.zip -- adding it to the pool to acceptable "candidates"... and making that the only acceptable candidate?
- something else entirely!?
pick https://github.com/pypa/pip/archive/master.zip -- adding it to the pool to acceptable "candidates"... and making that the only acceptable candidate?
Yes, that, without doubt for me.
Some more cases to illustrate:
- constraints
pip==20.0
with requirementspip
should install pip 20.0. - constraints
pip==20.0
with requirementspip>20
should error (no solution found). - constraints
pip==20.0
with requirementssomeotherpkg
should work, ending up not installingpip
because it is not a requirement nor a dependency of a requirement.
With pip@ ...
constraints that, when preparing metadata, yield the same name and versions as the 3 cases above, the result should be the same
OK, but the fundamental difference between the pip==20.0
constraints you show, and the pip @ ...
constraint, is that the former only considers candidates that would have been included regardless of the presence of the constraints. The pip @
constraint, as described, changes the list of candidates to consider. And that to me is what makes this a different feature, rather than a constraint.
I'm happy to accept that there might be a need for a feature that says "override the finder to only return the following candidate(s) for this project" - but I don't think it should be part of the constraint mechanism, regardless of what historical accidents of implementation exist in the current constraint code.
However, I think there's a bigger question here, which is more of a project management one. Is the current funded work on the resolver expected to deliver a result that matches bug-for-bug the current implementation, or do we have the scope to fix issues with the current behaviour? I don't know the answer to that, but I'll raise it at today's resolver project meeting.
Basically, I'm certain that I don't want this feature in constraints long-term, although I'd be happy if someone wanted to add an equivalent feature that was scoped as "a way to override the finder". But if the new resolver is expected, as initially delivered, to match the existing constraint behaviour, we may need to add a temporary hack to the code to support this behaviour in the interim (I would, however, argue for adding a deprecation warning, to start the process of moving to a cleaner solution).
/cc @brainwane for background before the meeting, if you get a chance :-)
The pip @ constraint, as described, changes the list of candidates to consider.
Can you elaborate that part? I don't understand.
I would think the only difference between pkg==
and pkg@
constraints is that the former has the version readily available in the specifier while the latter requires preparing metadata to get the version.
Is there anything that structurally prevents doing the download and prepare_metadata early (or when needed by the resolver) for direct URL requirements or constraints so the version is available soon enough in the process?
Budget and planning constraints are another thing of course, that I totally understand. Although I guess that should be put aside while discussing the mid-long term desired feature set.
pip install PKG
sees the sdists and wheels on PyPI that relate to PKG. The finder in pip fetches data about those, and passes that information onto the resolver to decide which one to use. (The old resolver's logic is a lot more tangled, but conceptually that's how it works. The new resolver makes that separation cleanly).
In particular, that invocation does not see https://github.com/OWNER/PKG/archive/master.zip.
A constraint file (as described in the docs, and as implemented in the new resolver) removes items from the list of available candidates, to constrain the options available to the resolver when deciding what to install.
The proposed behaviour for pkg @
"constraints" has an additional effect, in that it changes the list of candidates being considered, to add https://github.com/OWNER/PKG/archive/master.zip (and by implication, remove all the other options that the finder had identified).
It's adding that extra candidate to the list that isn't a "constraint" as far as I can see, but rather an "extension" of the list.
Budget and planning constraints are another thing of course, that I totally understand. Although I guess that should be put aside while discussing the mid-long term desired feature set.
Agreed.
Thanks for the explanation, @pfmoore. I better understand the implementation complexity.
The feature as I described it above remains coherent, though :)
How about as an alternative, rather than (mis-)using constraints files for this, we had a separate feature sources.txt
that allowed the user to specify where a package came from - one source per package, always a URL, so basically a file containing a list of direct URLs that would be used in place of the normal finder lookup via indexes etc?
I'm sure this would immediately be subject to massive scope creep, as people decide they want to use it for all sorts of things where a custom index would be a better option, but maybe we could keep it under control 🙂
The part that I still don't get is why direct URLs would be considered a misuse of constraints. I mean when you view them as just another way to specify a version it all makes sense, no?
So a name=>url map such as sources.txt
would be easily generated by extracting direct URLs from constraints? Assuming they are named of course, and naming them is an acceptable price to pay to use that with the new resolver.
The part that I still don't get is why direct URLs would be considered a misuse of constraints. I mean when you view them as just another way to specify a version it all makes sense, no?
Sure. But if you see them as just a way to specify a version (by which I assume you mean version number), then there's no mechanism by which pip would know to consider that URL for installation, as it's not in the list of stuff the finder will locate. So a constraint of https://github.com/pypa/pip/archive/20.1.zip would be effectively identical to a constraint of pip == 20.1
, which I'm pretty sure is not what you intend...
It's important here not to get misled by the current implemenation, which treats constraints as (very broadly) just ordinary requirements with a "do not install" flag set. That's not conceptually how they work, though, and in a very practical sense, it's not possible for the new resolver to implement them that way (I know, I tried...)
Assuming they are named of course, and naming them is an acceptable price to pay to use that with the new resolver.
Unnamed constraints are absolutely not allowed, IMO, as how would we know if they applied, if there's no name to match against what we're installing? And again, preparing a source dist just to say "so this is foo? Nope, we aren't interested in that" isn't reasonable. (It's less unreasonable in the current implementation, but again, don't get misled by that).
would be easily generated by extracting direct URLs from constraints?
Depends on your definition of "easy". :)
Needing to "backfeed" potential sources for a name in the new resolver's (correct) model of separating requirement vs installable thing is... definitely not easy in my book.
I'm happy to accept that there might be a need for a feature that says "override the finder to only return the following candidate(s) for this project" - but I don't think it should be part of the constraint mechanism, regardless of what historical accidents of implementation exist in the current constraint code.
+1
Part of the issue here is that "URLs in contraints" was not added intentionally, and has never undergone the "what can it be used for, how does that interact with other options we provide" etc discussions we have when adding functionality -- that's what's happening now. I don't think we should go unplanned functionality -> use cases, but rather use case -> functionality.
That said, it's already an undocumented thing you can do with pip, and Hyrum's Law applies here -- we'd need to do a deprecation cycle if we remove it, but (IMO) that's a separate conversation of what it means and whether that is a good solution for the problem it's solving.
How about as an alternative, rather than (mis-)using constraints files for this, we had a separate feature
sources.txt
that allowed the user to specify where a package came from - one source per package, always a URL, so basically a file containing a list of direct URLs that would be used in place of the normal finder lookup via indexes etc?I'm sure this would immediately be subject to massive scope creep, as people decide they want to use it for all sorts of things where a custom index would be a better option, but maybe we could keep it under control 🙂
I sort of like this. It meets the requirements I have, which could be described as "force installation from an unpublished location because you're running in a CI integration job that is trying to decide whether to publish the thing being tested". Regardless of whether that thing is eventually published to a custom index or PyPI, until the tests pass it isn't ready to be published at all.
Now, the unpublished thing may not have a predictable version. It's not likely to be tagged, so it will have some version like 10.2.1-55 or something, and I'm not even sure it's possible to put that into a constraint file. It's certainly inconvenient to have to update the constraint to something like ">10.2.1" every time a release is tagged. If you have a few repos, it's not a big deal. When you have 10s of dozens, manual processes don't work.
Selection doesn't have to be done via constraints, of course. If there's some other way to do it, that's fine. It could be the source list you describe, or it could even be a single local filesystem path so that if a package is found there then that version must be used. The CI job could pre-build wheels of everything it wants to install from source and drop them all into the same place. The key is that it's not just "a" source, we need some way to express that it is the only allowed source for a given package.
If that suggestion seems like it would suit people's requirements, let's thrash out the details in a new feature request PR. I'll try to get some time to write up the proposal, but it will certainly be a few days at least, and may be longer (I have quite a lot on my plate at the moment) so ping me if I forget, or someone else can start things going if they want.
@pfmoore I'm not convinced we really need to expose a new concept such as source.txt
to users.
I understand "link constraints" or "link overrides" have to be handled differently than version constraints in the resolver implementation, but for the users they can both be expressed in the same constraints files. I'd think that name=>link map can be built transparently for the user by filtering out link constraints at the very beginning of the resolution process.
Unnamed constraints are absolutely not allowed
If everyone agrees on this point, I think we're good on this issue's goals. :)
Let's close this issue and move the conversation about URL requirements/sources.txt to a separate new issue?
I'm gonna close this issue now -- please feel free to write a comment here, if you'd like to dispute with the conclusion here (unnamed constraints are not allowed).