pypa/pip

Splitting a package into two packages leads pip to corrupt the packages when upgrading to the new versions

abadger opened this issue Β· 18 comments

(Forgive me if this is a duplicate... It seems like a problem that should have been reported in the past [I've heard that the transition from docker-py to docker resulted in similar problems]. I couldn't find any search terms that found an existing bug report so I'm opening a new one)

Environment

  • pip version: 20.1.1
  • Python version: 3.7
  • OS: Fedora Linux

Description
The ansible-2.9.x package is one large, monolithic package. ansible-2.10.x splits this package into two:

  • an ansible-base package which contains the executables (not entrypoints, if that matters) and the site-packages/ansible library with most of the files which were in ansible-2.9.x.
  • an ansible package which contains all of the addons to the ansible-base package. These are installed into site-packages/ansible_collections.

The ansible-2.10.x package has a dependency on the ansible-base packages. This way people who only need the minimal functionality can pip install ansible-base (>=2.10.0b1). People who just want to get the same experience as ansible-2.9.x can pip install ansible (>=2.10.0a1) .

A clean install works fine:

pip uninstall ansible
pip install ansible==2.10.0a1
# Results in ansible-base-2.10.0b1 and ansible-2.10.0a1 installing correctly.
# Looking at the filesystem will find that all of the correct files are present.
# ansible --version will work

The problem is upgrades:

pip install ansible==2.9.10
pip install ansible==2.10.0a1
# Results in ansible-base-2.10.0b1 and ansible-2.10.0a1 being installed but with missing files
# The files which were in ansible-2.9.10 are erased from the filesystem,
# leaving ansible-base-2.10.0b1 in a broken state
# The reason is that pip first installs ansible-base-2.10.0a1 and then, when it uninstalls ansible-2.9.10,
# it doesn't realize that the ansible-base files have overwritten the same named files in ansible-2.9.10
# and that they should not be removed.

Expected behavior

I expect that upgrading the package will lead to a system where both new packages are installed with all of the files that the new packages include.

How to Reproduce

  1. python3.7 -m pip install --user ansible==2.9.10
  2. python3.7 -m pip install --user ansible==2.10.0a1
  3. ls -al ~/.local/bin/ansible
    • ls: cannot access '/home/badger/.local/bin/ansible': No such file or directory
  4. ls -al ~/.local/lib/python3.7/site-packages/ansible/init.py
    • ls: cannot access '/home/badger/.local/lib/python3.7/site-packages/init.py': No such file or directory

Output

[pts/145@peru /var/tmp/testing]$ python3.7 -m pip install --user ansible==2.9.10                          (08:10:37)
Processing /home/badger/.cache/pip/wheels/bb/e6/9a/05f0b546d96bc1da05865504c3481a4fd3a1b3fd48a38d53a1/ansible-2.9.10-py3-none-any.whl
Requirement already satisfied: PyYAML in /home/badger/.local/lib/python3.7/site-packages (from ansible==2.9.10) (5.3.1)
Requirement already satisfied: jinja2 in /home/badger/.local/lib/python3.7/site-packages (from ansible==2.9.10) (2.11.2)
Requirement already satisfied: cryptography in /home/badger/.local/lib/python3.7/site-packages (from ansible==2.9.10) (2.3.1)
Requirement already satisfied: MarkupSafe>=0.23 in /home/badger/.local/lib/python3.7/site-packages (from jinja2->ansible==2.9.10) (1.1.1)
Requirement already satisfied: cffi!=1.11.3,>=1.7 in /home/badger/.local/lib/python3.7/site-packages (from cryptography->ansible==2.9.10) (1.11.5)
Requirement already satisfied: idna>=2.1 in /home/badger/.local/lib/python3.7/site-packages (from cryptography->ansible==2.9.10) (2.8)
Requirement already satisfied: asn1crypto>=0.21.0 in /home/badger/.local/lib/python3.7/site-packages (from cryptography->ansible==2.9.10) (0.24.0)
Requirement already satisfied: six>=1.4.1 in /home/badger/.local/lib/python3.7/site-packages (from cryptography->ansible==2.9.10) (1.15.0)
Requirement already satisfied: pycparser in /home/badger/.local/lib/python3.7/site-packages (from cffi!=1.11.3,>=1.7->cryptography->ansible==2.9.10) (2.19)
Installing collected packages: ansible
Successfully installed ansible-2.9.10
[pts/145@peru /var/tmp/testing]$ python3.7 -m pip install --user ansible==2.10.0a1                        (08:11:10)
Processing /home/badger/.cache/pip/wheels/1a/52/25/c9e776b2df588061cd9dc831740416edb7e276f5ffe7a8d8d3/ansible-2.10.0a1-py3-none-any.whl
Processing /home/badger/.cache/pip/wheels/dd/db/f0/0890f4f13dd6446092ba5d76f55d62528b52ca6ab74ea871d4/ansible_base-2.10.0b1-py3-none-any.whl
Requirement already satisfied: jinja2 in /home/badger/.local/lib/python3.7/site-packages (from ansible-base<2.11,>=2.10.0.dev1->ansible==2.10.0a1) (2.11.2)
Requirement already satisfied: cryptography in /home/badger/.local/lib/python3.7/site-packages (from ansible-base<2.11,>=2.10.0.dev1->ansible==2.10.0a1) (2.3.1)
Requirement already satisfied: packaging in /home/badger/.local/lib/python3.7/site-packages (from ansible-base<2.11,>=2.10.0.dev1->ansible==2.10.0a1) (20.4)
Requirement already satisfied: PyYAML in /home/badger/.local/lib/python3.7/site-packages (from ansible-base<2.11,>=2.10.0.dev1->ansible==2.10.0a1) (5.3.1)
Requirement already satisfied: MarkupSafe>=0.23 in /home/badger/.local/lib/python3.7/site-packages (from jinja2->ansible-base<2.11,>=2.10.0.dev1->ansible==2.10.0a1) (1.1.1)
Requirement already satisfied: asn1crypto>=0.21.0 in /home/badger/.local/lib/python3.7/site-packages (from cryptography->ansible-base<2.11,>=2.10.0.dev1->ansible==2.10.0a1) (0.24.0)
Requirement already satisfied: idna>=2.1 in /home/badger/.local/lib/python3.7/site-packages (from cryptography->ansible-base<2.11,>=2.10.0.dev1->ansible==2.10.0a1) (2.8)
Requirement already satisfied: six>=1.4.1 in /home/badger/.local/lib/python3.7/site-packages (from cryptography->ansible-base<2.11,>=2.10.0.dev1->ansible==2.10.0a1) (1.15.0)
Requirement already satisfied: cffi!=1.11.3,>=1.7 in /home/badger/.local/lib/python3.7/site-packages (from cryptography->ansible-base<2.11,>=2.10.0.dev1->ansible==2.10.0a1) (1.11.5)
Requirement already satisfied: pyparsing>=2.0.2 in /home/badger/.local/lib/python3.7/site-packages (from packaging->ansible-base<2.11,>=2.10.0.dev1->ansible==2.10.0a1) (2.4.7)
Requirement already satisfied: pycparser in /home/badger/.local/lib/python3.7/site-packages (from cffi!=1.11.3,>=1.7->cryptography->ansible-base<2.11,>=2.10.0.dev1->ansible==2.10.0a1) (2.19)
Installing collected packages: ansible-base, ansible
  Attempting uninstall: ansible
    Found existing installation: ansible 2.9.10
    Uninstalling ansible-2.9.10:
      Successfully uninstalled ansible-2.9.10
Successfully installed ansible-2.10.0a1 ansible-base-2.10.0b1
[pts/145@peru /var/tmp/testing]$ ls -al ~/.local/bin/ansible                                              (08:12:13)
ls: cannot access '/home/badger/.local/bin/ansible': No such file or directory
[pts/145@peru /var/tmp/testing]$ ls -al ~/.local/lib/python3.7/site-packages/ansible/__init__.py          (08:14:19)
ls: cannot access '/home/badger/.local/lib/python3.7/site-packages/ansible/__init__.py': No such file or directory
[pts/145@peru /var/tmp/testing]$ ls -al ~/.local/lib/python3.7/site-packages/ansible/modules/ping.py      (08:14:29)
-rw-r--r--. 1 badger badger 2090 Jun 28 08:11 /home/badger/.local/lib/python3.7/site-packages/ansible/modules/ping.py

System package managers fix this issue by keeping track of what files are owned by the packages being installed in the "transaction" (a transaction is roughly, all the packages that are being installed and uninstalled as a result of what the user requested at the command line). Then, when they uninstall any older versions of the packages, they know not to uninstall the files which were freshly installed as part of the transaction.

Just one more note... I didn't really expect it to help but I tried pip install --user --unstable-feature=resolver ansible==2.10.0a1 just in case the new resolver had an affect. it does not.

The reason for this will be the sequence of events pip goes through:

  1. Determine that ansible is being upgraded.
  2. The new version needs ansible-base, so work on that. No ansible-base present yet, so we just install it.
  3. Now back to ansible, it's already there, so uninstall the old version.
  4. Install the new ansible.

Step 3 will remove stuff that's been moved from ansible to ansible-base.

The easiest workaround, as you've discovered, is to uninstall and do a clean install.

Pip can't easily address this with its current architecture, as it treats each install independently. The only approach I can see that might work is to do all the uninstalls first, then the installs. And keep everything as one transaction, so a failed install in one package can back everything out cleanly. But my gut feeling is that this would be quite a major change.

Yep. And uninstalling everything first opens up a different class of bugs (Probably smaller but they do exist). Any package which would require one of the uninstalled packages in order to install could fail to install.

System packages need deps installed for pre and post scripts. I can't remember if those have finally been approved for python packages or not, though.

python packages would additionally need deps installed for building the package from an sdist in some cases but perhaps that's mitigated by the isolated build feature?

System package managers do all the installs first and then do all the uninstalls so that they don't have to deal with the above problems. But they use the transaction to record what files have just been installed so they don't erase them in the uninstall step. Without a transaction concept right now, this would be an even bigger change to pip than merely doing all the uninstalls before the installs.

Any package which would require one of the uninstalled packages in order to install could fail to install.

That sounds more like a bug in the project's pyproject.toml not specifying build dependencies correctly. But that's a separate issue and we shouldn't sidetrack the discussion of this issue with that.

It is related to the solution. But I'm not sure how much of a system packager knowledge maps to what python packaging currently or in the near future, does.

  • If you have to install a dep to build, then it needs to be specified in pyproject.toml. But you could still end up with those deps conflicting: (for example building foo with bar-1.0; but you have to run quux with bar-2.0). Maybe isolated builds mitigate this and we are free to disregard the non-isolated build case?

  • If python packaging has scripts that can be run pre or post package install or package uninstall, then when you uninstall it can break those scripts as well. I know this was talked about in the past but I don't know if it was rejected or if it's scope is limited (rpm system packages allow a matrix of (pre, post), (install, uninstall), with upgrade having to be handled as a special case in all of those [so really, there's 8 cases that rpms handle... and that's not counting other possible targets that are mostly optimizations of those eight cases]). if there's no plans for these types of scripts, then we don't have to worry about this case when solving this bug.

If neither of those use cases is an issue than moving uninstall to the front of a transaction work queue should work just as well as tracking the files to prevent uninstallation of ones that were installed in this transaction. I just wanted to point them out because I lack the knowledge of whether either of those use cases exist or not.

Build time dependencies specified in pyproject.toml are installed in isolated environments (not the one the built package ends up being installed into), and erased after the build step. Python packaging also has no way to provide pre- and post-installation scripts. So the two problems you mentioned seem entirely non-existent to me.

Cool. So yes, we should be able to add a simple transaction concept where uninstall of everything in the transaction is the first step and then install is the second step. (i say simple in comparison to system package manager's transaction concept but I do acknowledge that it's a big change from what pip currently does).

FFY00 commented

I think the first step here would be to get pip to error out on file conflicts. This seems fairly easy, just read RECORD and check if any of the files are already on the system. @pfmoore is this still difficult to integrate given pip's architecture?

Uh, it’s not that simple, since file-overwriting is a needed feature for pkgutil- and pkg_resources-style namespaces. There are other (admittedly obscure) use cases, but legacy namespace packages alone is a good enough reason pip cannot ban overwriting easily, at least not before dropping Python 2 support and a long deprecation period allowing people to migrate.

fbidu commented

Hello all, a quick FYI - the current way to reproduce this bug is

pip install --user ansible==2.9.10
ANSIBLE_SKIP_CONFLICT_CHECK=1 pip install --user ansible==2.10.0a1

Because of ansible/ansible#70529

pip install --user ansible==2.9.10
ANSIBLE_SKIP_CONFLICT_CHECK=1 pip install --user ansible==2.10.0a1

ANSIBLE_SKIP_CONFLICT_CHECK=1 is not supposed to fix anything, just to assess the person launching the command has understood he will end up with a broken installation.

@bmillemathias yes, but the previous repro doesn't fully demonstrate the problem anymore. And so @fbidu has correctly posted the new reproducer for whoever attempts to dig into it in the future.

fbidu commented

@bmillemathias yes, what @webknjaz said. I'm participating on the packaging sprint at EuroPython and was studying this issue to try to come up with a failing test case for #8579

Now that we dropped Python 2 (for a while now!) maybe it’s time to revisit this. There are two routes to take here:

  1. Error out if a package wants to override an already existing file.
  2. Make pip keep track of all installed paths and which package(s) each file belongs to. Only delete a file if a referencing packages are uninstalled.

Option 1 is obviously simpler, but also more restrictive and would potentially break existing use cases and cause user frustration.

@uranusjr I'm convinced that the better fix would be implementing transactions and splitting the removal stage from the installation. This way, when it's pre-calculated which packages need to be removed from disk, they all could be deleted first and only after that, the new package versions would be placed there.

@webknjaz Having properly transactional behaviour almost certainly would be an improvement. I hinted at that option in a previous comment, but as I said there, I suspect it would be a pretty large change. I don't know of anyone currently working on that, though, and I feel that we shouldn't dismiss the possibility of making a simpler fix in the meantime.

FWIW, if we ever end up working on this, I reckon we'd also want to consider how this interacts with #4625 -- it might make sense to solve them together.