pypa/pip

Install packages in parallel

Opened this issue · 3 comments

What's the problem this feature will solve?

This is to improve the performance of pip.

For example looking at #12613 (comment) of a large install, even with resolving, downloading, buildising sdists, installing takes over 8% of the time. As resolving becomes faster, downloads are run in parallel, and hopefully there are more wheels instead of sdists then installing will become a larger part of the total time.

Describe the solution you'd like

After the resolve, downloads, and sdist build has completed, the installs could run in parallel.

Alternative Solutions

Keep as is.

Additional context

This would require a PR from someone obviously, I think there would need to make sure there are a complement of tests about installing packages in parallel, and different packages (e.g. make sure multiple editables run at the same time, editables and regular installs, etc.).

uv has already implemented this succesfully, following their issue tracker this has been the last problematic part of making things parallel/concurrent.

Code of Conduct

See also #8187 (comment).

Thanks, hadn't seen that before, I'll have a good read through and see if this is a straight up duplicate, and if anything can be done to take the existing work to be landed in pip.

As the author of the linked comment, I'll add that the key new development is that uv has implemented parallel installs. It would be interesting to know how they designed things. It's quite possible that pip could learn some useful lessons.

I've not looked at how uv implements this at all, so the following is pure speculation, but if I had to guess, I'd imagine they have the following things in their favour:

  1. They may well have designed from the start for parallel tasks. One concern I have for pip is getting reporting right, for instance, because we have1 some stateful code that handles getting indentation correct, that might be broken by multiple threads.
  2. Rust has better thread safety than Python, so there's likely a class of issues that uv simply can't encounter (at least, not by accident).
  3. To be blunt, they may just not have worried about pathological cases. For example, installing two wheels in parallel, which both contain the same filename but with different content, is a potential race condition (writing the file itself and RECORD). But it's unlikely in practice, so maybe uv ignored the possibility. Pip has a larger user base, and a longer history of dealing with weird errors, so we may well simply be (for better or worse) more paranoid over things like this.

Footnotes

  1. Or at least we used to, I haven't looked at that code since we started using rich...