pypa/pip

Adopting "working" scheme for every run

pradyunsg opened this issue ยท 28 comments

Just carrying over my idea in #1056 (comment), for proper dedicated discussion.

AFAIK, there are 3 possible schemes for packages:

  • "system" - for system/global packages (--system)
  • "user" - for packages installed in user space (--user)
  • "local" - for virtualenv packages (--local)

pip enforces a "working" scheme on every run. Outside a virtualenv, the default working scheme would be "system" (It should really be "user", that's another issue #1668). Inside a virtualenv, the default working scheme should be "local". Passing --system or --user or --local overrides the working scheme.

Only packages in the same scheme as the working scheme can be modified. By modifying, I mean installing or uninstalling a package. Trying to modify a package in a different scheme is not allowed and pip would print a message and error out.

So, modifying a package in system scheme with a "user" working scheme is not allowed. Nor is modifing a package in user scheme with a "system" working scheme. Niether are the other permutations with "local".

I think this results in a pretty simple behaviour model.

As @ncoghlan pointed out, this would need some logic to understand that user installs shadow system ones. Also, I would slightly change this to spell it like --scheme {global,user,venv} because I like how this signifies exclusiveness of the behaviours better than plain flags do and is consistent with some internal configuration stuff.

Additionally, while doing this:

  • preparatory work of #1668 can also be done, meaning that defaulting to user would just be flipping the default (something like what we did for --upgrade-strategy).
  • addresses #1056 by requiring the user to specify the scheme that they are working with.

/ping @dstufft @pfmoore

I would slightly change this to spell it like --scheme {global,user,venv}

Oh, and this would let you set a default scheme for yourself using the configuration toolchain.

@dstufft @pfmoore I would like to know what you think about this. :)

In principle this sounds like a reasonable thing to do - but I'm afraid I've got some personal priorities at the moment that mean I don't really have time to think through the implications.

So count me as in favour in principle, but not really able to provide a detailed review at the moment, sorry.

Cool. Thanks! :)

Just noting that I briefly thought that --scheme may not be a good name for this option, due to the potential collision with the concept of installation schemes in sysconfig: https://docs.python.org/3/library/sysconfig.html#installation-paths

However, I subsequently realised that these uses are actually the same use case - the new pip level option is just a helper to select the desired scheme without having to specify the exact platform appropriate scheme name as defined in sysconfig.

Linking to #2418 since I somehow always forget there was an attempt at making --user behaviour default.


concept of installation schemes in sysconfig

This is, actually, nice that the names here and there match. :)

This came up in #4809 where I suggested being able to pip list only packges related to one scope (now it combines both system and user packages, making it impossible to say which package comes from where).

I've gone ahead and made a PR for this -- #4871.

Thoughts @dstufft, @xavfernandez?

Maybe Barry Warsaw (his name's on the Debian patch to pip; #1668 would be fixed as a part of that PR) should be pinged for this discussion?

How does --target mode of operation relate to schemes? It feels like --target essentially is another type of scheme and treating it as such would potentially solve many bugs pertaining to --target option.

Like @piotr-dobrogost, I also think --target needs to be a scheme on its own. This would solve my problem #5686 where packages installed via pip install --target are not manageable at the moment.

I think that's a valid request.

However, I'd say we "promote" it to a scheme after the initial refactoring/functionality change needed to do this for existing install locations that are schemes.

Oh, and this would let you set a default scheme for yourself using the configuration toolchain.

Note that we'd have to be careful here, as the current default (if you don't specify anything) is "if you're in a virtualenv, use local, otherwise use system". But if a user wants to set a default of "if you're in a virtualenv, use local, otherwise use user" (which is likely the most common need), the config system won't help - it'll let them say "always user user", but not make that conditional on whether they are in a virtualenv or not.

It's arguable that users can put user in their config file, and explicitly use --scheme=local on the command line when they want to install to a virtualenv, but I don't think that addresses the use case of people who simply want to say "default to user".

One possibility (given that local only makes sense if you have a virtualenv active) is to allow the scheme to be local,user, local,system, local, user, or system (where the cases with two entries mean "use local if you're in a virtualenv, otherwise fall back to the other option", and a base local means "use local if you're in a virtualenv, otherwise error"). It feels a bit clunky to me, so I'm open to better spellings, but I prefer this idea to any solution that makes specifying something in a config file work differently than specifying it in an environment variable or on the command line.

I haven't been closely tracking the evolution of the working schemes design, but what if the user and system schemes were defined as being aliases for the local scheme when an active virtualenv is detected, and there were separate force-user and force-system schemes to say "use the named scheme even when an active virtualenv is detected"?

Updated to add: Functionally, this is the same thing @pfmoore suggested, but the spellings are different:

  • local,user -> just user
  • local,system -> just system
  • local -> venv-local (error if no active venv detected)
  • user -> force-user
  • system -> force-system

(I initially had some comments here about those names being semantic changes, but then I remembered that this option has never actually been released yet)

Yeah, that's a reasonable option too. Ultimately, it'll be about what people feel is the most "natural" formulation (which should have the shortest name) and I'm not really qualified to comment on that as I pretty much never use anything other than "local" myself.

One thing we do want to consider in terms of semantic changes is how the current (default and --user) behaviour ends up being spelled under these proposals (mine: local,system and user respectively, Nick's: system and force-user).

It's also worth remembering that the original statement on this issue simply named the 3 schemes: system, local and user. It proposed a way of forcing any one of them, but offered no way of naming the default behaviour (which is context dependent, based on whether you're in a venv). By moving to having names for various combinations, we're going beyond that original scope. The reason for this is that people have expressed a strong desire for the combination "local if in a venv, else user". But that's the only case that has been explicitly requested, so we should be careful not to over-generalise too heavily here.

Personally, I think my approach has the advantage of being an "obvious" generalisation (add a fallback if the choice of local isn't valid) while still supporting the local,user use case. But Nick's (with the exception that I prefer "local" over the more verbose "venv-local") has simpler to understand names. The mathematician in me prefers my proposal, the end user in me prefers Nick's ๐Ÿ˜„

@pradyunsg suggested using global instead of system which makes sense. Actually I would go even further and would suggest interpreter as the scope of installation in this case is interpreter-wide and there can be many interpreters installed in the system whereas global suggests something unique in the scope of the system.

but what if the user and system schemes were defined as being aliases for the local scheme when an active virtualenv is detected, and there were separate force-user and force-system schemes to say "use the named scheme even when an active virtualenv is detected"?

The idea of using local scheme by default (and discarding options choosing any other scheme) when working in the context of virtualenv seems right. As to working in other than local scheme in the context of virtualenv; is such an option really needed? What's the use case? What if system/global/interpreter and user schemes where allowed only outside of virtualenv?

@pradyunsg suggested using global instead of system which makes sense. Actually I would go even further and would suggest interpreter as the scope of installation in this case is interpreter-wide and there can be many interpreters installed

I think 'interpreter' is liable to confusion, because environments typically have a Python interpreter inside them (and we recommend using path/to/env/bin/python -m pip to explicitly install into an environment). So I'd take 'interpreter' to mean the environment, not a global installation.

This also relates to another question: I don't know a good general way to distinguish an 'environment' from a systemwide installation; they are both based on a installation prefix with a standard organisation of folders under that. For flit install, I don't try to conceptually separate env vs global, but I pick whether to do a --user install based on whether the library directory is writable. This means that a standard non-sudo install won't try to act systemwide, but a sudo install defaults to systemwide rather than a user install for root (which is probably not what you want).

I just had a chance to read through this after being pointed here. How about having in addition to the original scheme=global,user,local a second config option local_site=global,user? The latter does not necessarily need exposing to command-line as it's more of a system configuration option. It would only affect behaviour when local is used, otherwise it would be ignored.

I am really happy to see that someone is working on dealing with this issue that was hunting me since the dawns of time.

As a note, when implementing it please allow configuration of a sorted preference list so I can configure pip to install: in virtualenv if any, fallback to user, fallback to system (or other order).

Ideally I would like to see a behavior that works like this:

  • install package in current virtualenv, if any
  • try to install on system if you have permissions
  • try to install on user
  • fail if any attempt failed

Some distributions may setup pip such way that it would avoid overriding packages installed using their distro package manager (yum/dnf/apt-get/...). If I am not wrong Debian or Ubuntu did something cool where destination of distro installed packages does not match the pip one and python looks in both, avoiding the risk of install/uninstall conflicts.

One thing to keep in mind: please consider default behaviour very well because pip commands are often saved in scripts and files that the user may not be able to modify to make it work. We want to succeed without having to alter the codebase, if possible.

I personally think that fallback logic is too complicated. If you're operating under a badly configured distro, use virtualenv.

@nanonyme The fallback has nothing to do with what you call "badly" configured distro. Here is a very simple use case: you have a bash script that call "pip install foo", which is needed for testing you code. You want to make this script usable regardless if user is inside an virtualenv or not.

The script could be called by tox so it would be inside a virtualenv, it may be called by user outside a virtualenv or by user after the activated a virtualenv.

All these use cases are not only valid but also wild spread. At this moment it requires adding a lot of extra logic inside that bash script in order to detect virtualenv presence and decide which params to give to pip.

I gave the bash script just as an example but in practice users may not even have a script and only a configuration line that accepts a "command to run", something quite common on CI systems (see travis.ini). Those CI systems may run that code inside or outside a virtualenv.

If pip is not able to detect and make use of virtualenv, it make any usage of it much harder because the user would be forced to write wrappers around that code in order to make it work in various contexts. Writing these wrappers is not even possible in some cases (or just hard and ugly due to the need to cope with multiple levels or quoting in order to use bash to implement that missing logic).

Basically with above you have

  1. Explicit global, ignore virtualenv
  2. Explicit user, ignore virtualenv
  3. Explicit local, local set to global, virtualenv with fallback to global
  4. Explicit local, local set to user, virtualenv with fallback to user

The fallback sequences are far simpler this way

Not sure if I missed it, how would this interact with the set of packages available for import in a setup.py-based install, or a PEP 517 install using --no-build-isolation?

Also, is the assumption that for determining whether a dependency is already installed we'd have some of these schemes "inherit" the packages from the packages that would be available in another? Like

  • user considers packages in system
  • local considers packages in user and system only if system site packages is enabled
  • system doesn't consider any others
  • target doesn't consider any others (or it may be configurable -I think we've seen use cases going both ways)

No changes on that front. This would only affect how a package is installed / unpacked, not how it's built.

Things that are currently importable at build time, would stay importable -- basically anything that's on sys.path when running.

Poking here for feedback on #7164.

It looks like @takluyver in #7164 (comment) has a nice plan for stopping some of the damage of defaulting to write to system wide packages. #7002 That should hopefully stop much of the damage experienced by newbies. Thank you, thankyou, thanksssss to all involved! ๐ŸŽˆ๐ŸŽ‰

(Re?)iterating that this change is still relevant in a post #7002 world, since this is an internal refactoring-related issue in pip.

I'd expect us to start by making refactors to decouple the scheme from the various parts of our codebase (like @chrahunt has started!) and once that's completely/fairly done; we'd start working on the user-facing changes.

robeke commented

I agree this will hopefully improve the current inconsistent behavior. For example using Python 3.11.3 venv with pip 22.3.1 on Linux, I am able to install a package with the --user option, but if I attempt to uninstall the same package using the same venv/pip, it refuses indicating package is "outside environment" and "Can't uninstall 'mypackage'. No files were found to uninstall." I would expect the uninstall to work given the install was permitted.

hmkim commented

If you have a sudo account, use this command.
sudo dnf remove python3-requests

For example, Amazon Linux 2023 have this issue.