jazzband/pip-tools

Always remove extras in compiled files

FlorentJeannot opened this issue Β· 45 comments

Issue

I am opening this issue to discuss if the extras should be kept or removed in the compiled files.

@atugushev made a good summary of the situation:

Currently, we have direct references without extras and pinned packages with extras in requirements.txt, which looks wrong and should be synced in some single way.

We would like to get feedback from the community, do you think we should keep them or remove them and why?

My opinion is that we should remove them, since pip-compile is already listing all packages needed for a project, it seems redundant to me to specify it twice (one time in the extra and one time as a top-level dependency). Also, it's in theory possible to install more packages than those specified in the requirements.txt via the extras. In my opinion the generated .txt file of pip-compile should act as a lock file. The only advantage I can see is that we can easily inspect which dependencies are using extras.

@AndydeCleyre said that the order of installation could matter in some cases such as GDAL which requires numpy to be installed first. I checked if having the extra (gdal[numpy]) in the .txt file was making a difference, and I found that it was not working. You can read this gist if you want to have a look at the tests I've done (there's a conclusion at the end if you don't want to read it all).

Links

Some links about the discussion around this:

Samples from dependency management tools

The goal is to show you the output of different management tools when the project specifies extras. This may help you make a decision on the issue.

For each tool, I installed gdal[numpy]==3.2.2.

pip

Command: pip freeze > requirements.txt

The file contains:

GDAL==3.2.2
numpy==1.22.3

pip-tools

Command: pip-compile

The file contains:

gdal[numpy]==3.2.2
    # via -r requirements.in
numpy==1.22.3
    # via gdal

Pipenv

Command: pipenv lock -r > requirements.txt

The file contains:

-i https://pypi.org/simple
gdal[numpy]==3.2.2
numpy==1.22.3

Poetry

Command: poetry export -o requirements.txt

The file contains:

gdal==3.2.2
numpy==1.22.3; python_version >= "3.8"

Pros and cons

I'll try to collect all your feedbacks to update these lists.

Reasons to keep the extras:

  • We can clearly see which dependencies are using extras (FlorentJeannot)
  • #1613 (comment)

Reasons to remove the extras:

  • In theory, it's possible that pip-sync or pip could install more packages than what is listed in a .txt file because of extras. I think the output of pip-compile should act as a lock file, so it should only install what's specified in the .txt file. (FlorentJeannot)
  • It's redundant. Packages specified in the extras are also in the top-level dependencies. (FlorentJeannot)
  • #1613 (comment)

@FlorentJeannot You can remove extras from the output file with the --strip-extras option. This was added in version 6.2.0 to allow creating constraint-compatible requirements files. For example, I have my requirements-dev.in headed with the line -c requirements.txt, so that development requirements don't try to have dependencies that are incompatible with the compiled dependencies for production.

https://github.com/jazzband/pip-tools/releases/tag/6.2.0

This is an important use-case for me, so I wouldn't personally mind having this become the default, but the output you're looking for is already possible to obtain.

  • I think it is possible for a package to change its installation behavior based on whether an optional dependency, implied via an extra-group, is already installed. e.g. coolproj[alternate-file-layout]
  • @LouisAumaitre might want to comment here; the current inclusion of extras makes it invalid as a constraints file, when using the backtracking resolver

I'm going to copy my comment from #1539 here, which sums up my current thoughts on this:


As long as we are offering --strip-extras and not offering its negative, I'd guess that the default output line format would include extras (where this PR currently strips them).

Now that the constraints syntax is stricter (with the backtracking resolver), I expect it will be much more common for folks to need files without the extras. So I would support a separate PR to do that by default, while offering a new option to include them, e.g. --no-strip-extras/--include-extras.

I think it is possible for a package to change its installation behavior based on whether an optional dependency, implied via an extra-group, is already installed. e.g. coolproj[alternate-file-layout]

Oooh, this one is rough. If the order of installation of packages matters, that is a real challenge. One that I'd rather nobody ever have to think about. I don't think it'd ever be possible to make that behavior intuitive.

@ryanhiebert I agree.

I tried to install GDAL here which depends on numpy. It was really painful to have it working with pip, and I don't think there is an easy solution to reproduce that with pip-tools.

@FlorentJeannot from this link from your GDAL gist discussion, I'd say it might be best for us to ignore the ordering thing. It's not intended behavior of setuptools, so we wouldn't want to encourage that type of bad behavior from packages.

What is your motivation behind wanting to create a flag that preserves the current behavior? I'd be fine just changing the behavior entirely, but my perspective may not be seeing an important constraint.

@ryanhiebert I just wanted to emphasize your last message with this example (that it's a real challenge).

I first suggested to remove the extras in the "compiled" files, because I didn't see the point to have them, and since pip freeze is still not doing it, then I was thinking that it's just not needed.

Then @AndydeCleyre told me that there is an order of installation when we declare an extra to a package. GDAL was mentioned in another thread about this installation order, so I wanted to try it out by myself to see what happens when we try to install GDAL with different package management tools.

Now that I've tried it, my opinion is that packages which depend on an order of installation is something tricky (and it also seems to be a rare thing). The way @AndydeCleyre made it work is not trivial and it's not working for me with pip>=22 because the installation order with extras in this version has changed.

So I still think we should not have extras by default in the "compiled" files. We could have an --include-extras, but why would need that? The extras in the .in files seem enough in my opinion.

gdal turned out to be a false example here, because they are trying to control build time behavior based on the installed package set, whereas the extras only guarantee installation order, not whether extra-specified deps are installed at build time.

Agreed with both of you, @FlorentJeannot and @AndydeCleyre . So far as I can see, I think it would be fine to remove extras be the only behavior, and deprecate the --strip-extras flag entirely.

@AndydeCleyre , my question about motivation was intended for you (though I wasn't keeping good track of who I was responding to). Is there some important constraint, other than install order (which we've shown is about a concern that setuptools says should not be considered), that suggests that we should keep the ability to include extras somehow that I'm not seeing?

And you answered that question on the PR linked earlier. I'm also fine with keeping some flag and just changing the default behavior.

@FlorentJeannot thanks a lot for this awesome analysis and detailed summary!

I'm in favor of stripping extras in requirements.txt:

  • once requirements.in is compiled there is (should be) no difference in installation result whether the requirements.txt was with or without extras
  • requirements.txt without extras can be used as a constraint file in the layered workflow. Currently, users have to run pip-compile --strip-extras
  • fewer bytes and less distracting info in requirements.txt
  • requirements.txt should look more like pip freeze, where there are no extras

@atugushev I agree we should start stripping extras by default, but

... once requirements.in is compiled there is (should be) no difference in installation result whether the requirements.txt was with or without extras

What about my comment here?

What about my comment here?

@AndydeCleyre the link does not show the comment. Could you quote here?

@AndydeCleyre the link does not show the comment. Could you quote here?


I'm not saying it's good or common practice, but my understanding is that extras can be used to enforce installation order, and the set of packages already installed can be used by setup.py's install to follow different code paths accordingly.

A hypothetical example:

  • we have a package, mypackage
  • it defines an extra, interactive
  • the same package author also provides a kind of dummy package, mypackage-interactive-installation, required by mypackages's interactive extra
  • during mypackage's install, if and only if mypackage-interactive-installation is already installed, the user is prompted to make some choices which will affect the installation

@AndydeCleyre thanks! That looks like a shoot in the foot πŸ˜„ While I understand there are setup.py hackers (historically) in the wild, however, we would never satisfy their needs due to the dynamic nature of setup.py builds. As far as I see the Python world slowly moving towards static metadata (hello setup.cfg/pyproject.toml), I don't see any reason why we shouldn't encourage that.

I agree with @atugushev

It sounds to me like we have a rough consensus that that changing the default to strip extras is likely appropriate. I think what it needs now is someone to take a stab at making it happen with a pull request. Whether the existing behavior remains is up to the implementer and those that review the pull request. It is possible that having the backward compatibility option (that I'd prefer calling --include-extras if present) could make it easier to agree to change the default when the pull request is created, but its also true that I can't think of a use-case where I'd ever want to use it rather than try to fix what I'd likely consider a broken library.

@ryanhiebert PR already exists (#1608) but it does not include the --include-extras that you suggest, for now.

Oh nice, thank you for letting me see that. How do we draw out some consensus of action at this point?

Proposal πŸ‘€:

  • Release 1:
    • Strip extras by default
    • Deprecate --strip-extras
    • Add flag for including extras
  • Release 2:
    • Remove --strip-extras

Proposal πŸš€:

  • Release 1:
    • Strip extras always
    • Deprecate --strip-extras
  • Release 2:
    • Remove --strip-extras

Also keep in mind that whether we want them to or not, there are definitely teams out there using their own parsers on the output of pip-compile, for their particular build processes.

I'll wait for a comment from @atugushev before making any change in my PR.

I'm still in favor of "always stripping extras" and vote for 2nd proposal. My motivation:

  • pip-tools resolves all dependencies, having extras nothing changes (except in rare casees)
  • consistency with pip freeze
  • compatibility with constraints files pip install -c constraints.txt
  • less code

The only difference between options 1 and 2 suggested by @AndydeCleyre is whether to add --include-extras. It seems to me that there are use cases where it would be useful to include the extras, and in those cases completely removing the ability to keep extras would force using alternative solutions.

Unless the maintenance burden is unusually high, I don't see why you'd want to remove this existing feature.

FWIW, if pip-compile did always remove extras from the compiled requirements, then I wouldn't have ended up filing pip/#11599.

But I think preserving the extras in the compiled requirements is useful for understanding how the dependency tree was calculated, as I noted over in the pip issue.

Why pip-tools should strip extras from pip devs - pypa/pip#11599 (comment).

Since pip-compile --resolver backtracking always strips extra and will be the default resolver in 7.0.0 we can make the legacy resolver consistent with backtracking in this version and deprecate --strip-extras (2nd proposal). Is that alright?

Or leave legacy resolver as is and just deprecate the option.

Using --resolver=backtracking does not currently strip extras. Again: I filed that bug against pip after switching, per the recommendation in the newly-added warning in pip-compile, to --resolver=backtracking.

Why pip-tools should strip extras from pip devs - pypa/pip#11599 (comment).

AFAIU, that comment is specifically giving the rationale for excluding extras from valid constraints files, which is not the only type of file generated by pip-tools.

@AndydeCleyre

AFAIU, that comment is specifically giving the rationale for excluding extras from valid constraints files

That's correct. I'd like to mention that under the hood pip-tools passes requirements.txt (if exists) as -с requirements.txt to pip's resolver:

# Pass compiled requirements from `requirements.txt`
# as constraints to resolver

and strips extras:

ireq.extras = set() # pip does not support extras in constraints


which is not the only type of file generated by pip-tools.

So essentially requirements.txt is a constraint file and the pip-tools' resolver already follows the comment pypa/pip#11599 (comment), except it injects extras to requirements.txt back again inconsistently.

I am going to rename this ticket to "Removal of strip-extras option as being implicit", so we can start implementing it. If I read correctly most of the people agreed on going into that direction, especially as keeping them is a serious maintenance burden and a can of worms.

acompa commented

Out of curiosity, has this been earmarked for a future pip-tools release? 7.0? Is it blocked by #1755?

I once again find myself wanting this to be the default. I think there's consensus on it, from what I can tell, so we may only be waiting on a pull request to manage the --strip-extras and --include-extras` options and give a deprecation notice when neither is given explicitly, so that whenever we're ready to do another major release it will be ready to switch the default.

I don't think that it is blocked on anything, @acompa , so this really may be just a motivated pull request away. Are you the person to do it? Maybe I will be? We'll see who among those who care about this gets around to it first. My expectation is that it would not be a significant undertaking.

Given that the backtracking resolver always strips extras, and it's been the default since version 7.0.0, it makes sense to deprecate the --strip-extras option and update the code accordingly to remove extras from requirements.txt in the next minor release.

Is the plan to allow folks to include extras if they want to, or remove that possible workflow entirely?

Is the plan to allow folks to include extras if they want to, or remove that possible workflow entirely?

I suggest that we remove the workflow entirely, as was suggested and voted for in the second proposal.

Given that the backtracking resolver always strips extras, and it's been the default since version

@atugushev is this right? I’m on version 7, and I ended up back here because I got an extra in my requirements file.

Just checked, and I don't believe that strip extras is the default, even on >7. Notice the pyjwt[crypto] dependency, which has the extra.

$ pip-compile --version
pip-compile, version 7.2.0
$ echo django-allauth > sample.in
$ pip-compile sample.in
#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
#    pip-compile sample.in
#
asgiref==3.7.2
    # via django
certifi==2023.7.22
    # via requests
cffi==1.15.1
    # via cryptography
charset-normalizer==3.2.0
    # via requests
cryptography==41.0.3
    # via pyjwt
defusedxml==0.7.1
    # via python3-openid
django==4.2.4
    # via django-allauth
django-allauth==0.54.0
    # via -r sample.in
idna==3.4
    # via requests
oauthlib==3.2.2
    # via requests-oauthlib
pycparser==2.21
    # via cffi
pyjwt[crypto]==2.8.0
    # via django-allauth
python3-openid==3.2.0
    # via django-allauth
requests==2.31.0
    # via
    #   django-allauth
    #   requests-oauthlib
requests-oauthlib==1.3.1
    # via django-allauth
sqlparse==0.4.4
    # via django
typing-extensions==4.7.1
    # via asgiref
urllib3==2.0.4
    # via requests

We've seen a couple people chime in since that original 2-option proposal that they'd like to be able to keep the extras in the output. I think we're settled that strip-extras should be the default, but unless there are implementation reasons that it makes it difficult to keep both, I think it wisest that we not pull the rug out from under other users that may wish to have the extras in some cases.

That said, rather than having extras in the specified dependencies, I think they would better fit in the compiled requirements file that they were included in the via comment lines. We don't currently do that, it would meet some of the desire to have extras that has been expressed, and I don't anticipate anyone objecting to such a change. If we got that, it might lessen objection to removing the ability to have the extras in the pip-significant output entirely.

@ryanhiebert

is this right? I’m on version 7, and I ended up back here because I got an extra in my requirements file

I mad a mistake, that’s my bad. Thanks for providing the reproducer. Much appreciated.

The original issue was that pip-compile includes extras in regular dependencies, but doesn’t do so for direct references (aka <name> @ <ur>l) in output file. Then pip released an update where reworked constraints files format by forbidding extras there. In response, pip-tools added --strip-extras option to address this.


We've seen a couple people chime in since that original 2-option proposal that they'd like to be able to keep the extras in the output. I think we're settled that strip-extras should be the default, but unless there are implementation reasons that it makes it difficult to keep both, I think it wisest that we not pull the rug out from under other users that may wish to have the extras in some cases.

Making strip-extras=true the default makes sense because, in a layered workflow, we pass requirements as constraint files (via -c <file>) and pip forbids extras there. So, if we want a smooth transition, we should:

  1. Introduce --no-strip-extras to satisfy those who desire extras in the output file.
  2. Add extras to direct reference dependencies (merge this: #1582) to sync the behaviour.
  3. Show a warning that pip-tools will switch the default to strip-extras=true if users haven't explicitly passed the --strip-extras or --no-strip-extras options. This way, users will be aware of and can adapt to future changes.

Feel free to submit a PR. You'll have my support with the review.


That said, rather than having extras in the specified dependencies, I think they would better fit in the compiled requirements file that they were included in the via comment lines. We don't currently do that, it would meet some of the desire to have extras that has been expressed, and I don't anticipate anyone objecting to such a change. If we got that, it might lessen objection to removing the ability to have the extras in the pip-significant output entirely.

Yes, pip-compile have never supported extras in via annotations, and this seems like a separate enhancement issue. If someone wants to jump in and add support, please feel free to open a separate issue and submit a PR. I'd be glad to provide support with the review.


Again, my apologies for making incorrect assumptions. I'm going to close my PR #1953 since it's no longer relevant.

Yes, pip-compile have never supported extras in via annotations, and this seems like a separate enhancement issue. If someone wants to jump in and add support, please feel free to open a separate issue

Is this #1577?

Is this #1577?

Great, it already exists! Thanks @AndydeCleyre πŸ™πŸ»

I haven't seen this elsewhere, so I'll ask here:

How does removing extras from compiled files interact with --generate-hashes and then using --require-hashes in pip install?

I haven't seen this elsewhere, so I'll ask here:

How does removing extras from compiled files interact with --generate-hashes and then using --require-hashes in pip install?

Works great in my experience. It’s important to also use pip install --no-deps, but that isn’t unique to these flags.

pjz commented

That said, rather than having extras in the specified dependencies, I think they would better fit in the compiled requirements file that they were included in the via comment lines.

I'd like to second this, as it will keep the semantic reasoning behind the extras at least somewhat intact in the generated files.