pypa/pip

Release beta versions of pip?

astrojuanlu opened this issue ยท 39 comments

After the recent problems of pip 20.0 (kudos to the developers for the quick fix!) I remembered the issues that arose when 19.0 was released with PEP 517 support.

There is a comment from @pfmoore about not doing prereleases:

If you mean do we need a prerelease, I don't think so. It's not something we've done that often in the past, and the PEP 517 changes should default back to legacy processing, so I'd be OK just going for a normal release. In my experience, very few people actually test the betas anyway.

Including a remark that aged somewhat badly:

I'm not against a beta (I think it would be better if we did have a standard process of doing prereleases) but I don't think the PEP 517 changes warrant one on their own.

I wonder if the pip developers would reconsider producing one beta or release candidate before the actual release to prevent this kind of sudden breakage. Even if a tiny percentage of users actually tests the beta, perhaps there would be more chances of catching these bugs earlier.

The important point in my comment above is

In my experience, very few people actually test the betas anyway.

We used to release betas, and got essentially no feedback for what was at the time quite a lot of work. Our release process is a lot more automated now, so releasing betas might be useful. But I still wonder how many people would actually test them.

Conversely, when we do have a release issue, it seems to hit everyone's production CI within minutes - from what we can tell, no-one is testing the new releases before putting them on their production systems - even though we say in the release announcement "We recommend that users test the release before deploying in production, as with any other software release"ยน. I understand that the typical workflow (involving creating a virtualenv which pulls in the latest released pip) makes it way too easy to just take the latest version, and very fiddly to properly pin things, but even so, post-release bug reports are typically the first time we see any widespread use of a new pip version.

So:

  1. I think it's a good suggestion that we try releasing betas and see if it helps.
  2. IMO, the user community should consider alternative workflows that make it easier to test new pip releases before using them in production. This is probably a good topic for discussion at the PyCon packaging summit - I'll suggest that it gets added to the agenda.

Including a remark that aged somewhat badly

Most of my comments that you quoted aged pretty badly :-( I look quite naive when you look back on those comments - I probably was at the time.

To be fair, I think we're all continually learning just how many different ways people use the flexibility Python's tools allow. And how that can cause problems ๐Ÿ™ An ecosystem like rust or go, with strict conventions about how projects are structured and built, would make it much easier to avoid the sorts of issues we've been hitting recently. Ah well.

ยน This comment is intended as friendly advice, even though it could be read as a passive-aggressive "it's not our fault if you don't test stuff" disclaimer. We do understand it's hard to do, we just don't really have any better suggestions to offer ๐Ÿ™

Probably worth flagging, we are actively exploring what it would take to set up a sustainable beta process (one that has a good story for end users of pip, to actually get some testing) as part of #6536.

If we're able to come up with a good process, I'm guessing/hoping that we'd have development, UX, and communications needed for it, as part of https://wiki.python.org/psf/Pip2020DonorFundedRoadmap.

I'd like to share that it's possible to run pip at master, which you could do at any time interval before pip's predictable release process:

$ pip-run -q git+https://github.com/pypa/pip -- -m pip -V                                                                                  
pip 20.1.dev0 from /var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/pip-run-1qj9ln0y/pip (python 3.8)

You could simply pip install that version in whatever environment you would be testing beta versions, perhaps gating that on some period prior to the release.

Fair point @jaraco. Perhaps the devs don't even need to make a beta release, just send an email to python-announce saying "please test pip @ master these days, we plan to make the next release next week". Or have a GitHub issue where interested users can subscribe for notifications of upcoming releases (like trio does for backwards-incompatible changes). This simplifies the burden on the developer side, while giving users a chance of testing from master when a release is about to come to detect bugs.

Also, first time I see pip-run, nice package :)

messa commented

I think that the recent breakage was too huge because people use the freshest newest pip, even a completely new "untested" major release. Instead of relying what they already have in their distro or some pinned version (or just a major version) that is known to work, like it is usual elsewhere when dealing with dependencies.

Why people do this? Maybe because the pip itself prints a WARNING every time "You are using pip version 19.2.3, however version 20.0.1 is available" :)

@messa That's a good point. With that message, we are encouraging regular updates by users. That was (originally) done deliberately, as we otherwise find that users are not upgrading, and then are finding that older versions of pip don't work for them (no support for newer features like PEP 517 or manylinux2010/2014, etc). That can be just as much a breaking change for end users.

I'm not sure how we balance these two concerns. Would it help if the warning said "Please plan to upgrade to 20.0.1", or similar? Or is there other wording that could get the message across that we want you to upgrade, but you need to do so in a controlled manner, just as with any other software?

I assume most people affected by the breakage were doing something like this on their CI:

$ python -m pip install -U pip  # setuptools?

So they are not affected by the wording of the message. After 20.0 breakage, they surely did something like this:

$ python -m pip install -U "pip<20.0"

And at this point users will be told that 20.0.1 fixed the original issue very slowly and gradually lift the upper bound. Until the next time, that is...

messa commented

@Juanlu001 Exactly my case ๐Ÿ™‚

So they are not affected by the wording of the message.

They are. They see it on their desktop so they make sure even the CI is up to date.

then are finding that older versions of pip don't work for them

This is the reason why I personally did pip install -U pip wheel almost everywhere (CI, Dockerfiles, deploy scripts...). But maybe it is not necessary anymore.

@pfmoore The warning could be displayed if the current pip version is older than some threshold.

Update: I see that even the Github Action default template for Python apps starts with python -m pip install --upgrade pip ๐Ÿค”

pypi/warehouse#726 would be, IMO, perfect for our use case. :)

That was (originally) done deliberately, as we otherwise find that users are not upgrading, and then are finding that older versions of pip don't work for them (no support for newer features like PEP 517 or manylinux2010/2014, etc). That can be just as much a breaking change for end users.

@pfmoore here's another idea: Pip could stop unconditionally showing this message immediately after the release and go with some sort of a roll-out.
It'd work like this:

  • when the newest version was released less than 6 hours ago (for example), randomly include that version in the warnings with 5% chance
  • if it's more than 10, increase the chance to 10%
  • and so on, use 100% chance for 48+ hours (or more, probably adjustable/discussable)
  • early adopters and CIs still do pip install -U pip in their automated flows most of the time anyway and a lot of Pips will get upgraded during this rollout period but there will be less collateral damage for less experienced folks.

Besides this, Pip could somehow use info from Dependabot that collects stats on things it sends PRs for: https://dependabot.com/compatibility-score/?dependency-name=pip&package-manager=pip&version-scheme=semver.

@webknjaz Honestly, that seems like way too much complexity for this. After all, we're only speculating that people upgrade aggressively because of the warning. I suspect that the fact that virtualenv defaults to the latest versions of pip, setuptools and wheel is at least as likely to be the important factor here (people test in CI, tests use tox, tox uses virtualenv...).

I'm only partially following, but I don't think a beta release or testing master would have necessarily caught this issue anyways yea? The most recent issue was due to a dirty build directory when producing the final artifact that got released as 20.0. That means it wasn't related at all to the state of the repository, but was instead the state of the machine that the release process was run on. It's entirely possible (likely even) that the same machine would have been used to produce the beta release, but it's also possible that the state didn't exist a week prior to the release and thus the beta release would have worked fine, while the real release would have been broken.

That's not to say we haven't had cases where issues with a release wouldn't have been caught in a beta release, but I just don't think that this case is one where a beta release would have helped anyways.

The only way that a beta release would have helped us here, is if we had some mechanism in place for promoting the exact binary we produced as a beta, to a final production build-- which we don't really have currently (and doing so is difficult given version numbers are baked into the produced "binary").

@dstufft correct. I'm treating this discussion as a general "maybe we should rethink offering betas" question. But it definitely wouldn't have helped in the case of the 20.0 issue.

if we had some mechanism in place for promoting the exact binary we produced as a beta, to a final production build

That's what pypi/warehouse#726 would be, IIUC.

Aye, the two-phase upload would have been the key here. However, I've also been thinking about the problem of how to get folks testing against pip pre-releases (if you were to start doing them again), and remembered this post from Brett Cannon encouraging folks to enable CI tests against CPython development branches in Travis CI: https://snarky.ca/how-to-use-your-project-travis-to-help-test-python-itself/

For projects that use a dependency updating service like requires.io, the suggestion would be straightforward: explicitly list pip as a development dependency, so the updater service triggers for new pip releases, rather than unconditionally updating pip on every build.

For folks that don't use an updater service, the idea that Brett's post inspired is having a CI job defined that checks for a pip pre-release (or potentially any dependency pre-release), exits fast if it doesn't find one, but otherwise auto-executes the project's full test suite against the dependency pre-release.

The one thing about that, is it's not as straight forward as just toss pip in requirements-dev.txt or whatever, because you need to upgrade pip before installing anything else, in two completely different invocations. You can do it for sure, but it requires a tad bit more care to do it.

@dstufft True, so even folks that otherwise use an updater service might need to handle pip pre-release testing in a dedicated CI job anyway.

What you said can work, you just have to make sure that you're doing something like:

pip install -r requirements-pip.txt && pip install -r requirements-dev.txt

Instead of just tacking it into requirements-dev.txt or whatever. They also probably want to ensure that virtualenv is configured to use --no-download.

Per #7951 we are planning to release a beta version of 20.1 this week. Leaving this open as we learn how to do betas in general and try to iron out a better process for making and publicizing them.

I see in #8100 that the release will actually be 20.1b1... Does that mean it is effectively a beta? ๐Ÿ˜…

Hi @astrojuanlu. We're going to release a beta first as 20.1b1, and then after that, we'll release 20.1.

@astrojuanlu you may be interested in reading https://www.python.org/dev/peps/pep-0440/ to understand better how versions comparison works in the Python ecosystem. Specifically, a/b/rc-suffixed versions are considered as pre-releases by pip and Warehouse (PyPI).

Thanks folks, I know how versioning works ๐Ÿ˜… but initially I misread @brainwane comment as "Per #7951 we are planning to release a beta version of 20.1 this week" and that's why I got surprised by the actual beta release. Sorry for the noise ๐Ÿ™

Regarding:

Leaving this open as we learn how to do betas in general and try to iron out a better process for making and publicizing them.

The fact that I could do python -m pip install -U --pre pip is already good enough for me โค๏ธ For publicizing it a bit more, I go back to my idea of having a pinned issue where people can subscribe, which is surely less overhead than crafting a mail announcement.

@astrojuanlu we do make email announcements. #7951 (comment)

Hi @astrojuanlu . I appreciate your concern and desire to save us from effort. However, when we want to publicize something to get people to test it, we need to get the word out to a big mix of people, including people who have never dropped by our repository and who don't subscribe to Python packaging-specific news mediums. So we update relevant GitHub issues, send out mail announcements, tweet, contact Python-related news outlets, etc. It inherently takes substantial effort to get the word out about a beta and that is one of the reasons I'm working on the project right now -- to do those things. (We are in the midst of doing these things about this specific beta now and will probably mention them in #7951 as we do them.)

If we, in the future, end up again in a situation where no one is being paid to work on pip, then probably the minimum we would do to publicize a beta is

  • use an issue on GitHub
  • email distutils-sig
  • post to discuss.python.org
  • tweet

Does that help you understand the general approach we're using?

Thanks to you @brainwane for your constructive comment. Yes, this was very helpful! I was missing more information than I thought, and wrongly assumed that the pip developers were only making small steps towards doing full fledged beta releases - but in fact, having made the announcement in all those channels is โœจ awesome! I have nothing else to contribute to this issue :)

Thanks, @astrojuanlu.

In IRC on the 21st, a few people said:

[Pradyun] It feels weird to not have the issue tracker light up immediately after a release. I guess, that's how betas work?
[Paul Ganssle] Yeah no one's likely to see that until you do a proper deployment. Sad state of affairs.

Of course, this was before we had done the publicity work listed in #7951 (comment) .

@pradyunsg Could you talk a little more about what usually happens after a pip release, in terms of approximately how many new issues you get in the first 30-90 minutes after a pip release, and how many new issues you get in the first 5-7 days after a pip release? I would like to better understand what you mean by "immediately" and "light up". And this will help us calibrate how much publicity work we do, and in what venues, in order to get manual testing of this beta and of future betas.

@pradyunsg mentioned that we did get a few issues reported regarding this beta:

Huge thanks to all who have helped by testing the beta release! During this beta, we were able to identify and fix regressions as well as improve newly added functionality, based on reports and feedback from our testers. You can find more details in the changelog.

By my reckoning, between the beta release and just now when you published pip 20.1, we received the following bug reports:

And we got some feedback in email on python-dev and a tiny bit on Twitter.

For comparison, let's look at what bug reports we get in the next eight days, and what kind of usage and testing was necessary to elicit them. And that will help us get better testing for future prereleases.

Could you talk a little more about what usually happens after a pip release, in terms of approximately how many new issues you get in the first 30-90 minutes after a pip release, and how many new issues you get in the first 5-7 days after a pip release?

In case of pip 20.0, the breakage was severe (you couldn't run pip) and we had reports within 5 minutes from users about the failures. By about 20 minutes, there was enough of a flood that I was actually ignoring incoming comments while working on a fix. In a more not-breaking-the-world release, most of our users upgrade within a week or two (at least according to PyPI numbers) and we get anywhere from 10-50 issues filed during the initial burst of "new release" issues, that usually lasts about 5-7 days. Some of them are bugs in the changes / new functionality that get fixed in bug-fix releases.

most of our users upgrade within a week or two (at least according to PyPI numbers)

This can be seen clearly at https://pepy.tech/project/pip:

image

FWIW, as noted in #8165 (comment), in-place builds ended up being super disruptive to users, and also broke manylinux wheel builds. :)

I'm now in the same camp as @dstufft and I don't think doing betas just-before-the-release is going to be useful for pip's release workflow.

Donald wrote:

I've long believed that beta releases for tools like pip are of marginal benefit. When we've tried to do them in the past we've rarely caught many if any issues, even show stopping ones, until a final release was cut. That's why I stopped doing them when I was still releasing pip and instead tried to be available to quickly react to broken releases. Attempting to phase large scale changes in over time is also a good pattern (with many ways to handle it like dark reads, opt in or opt out flags, etc).

I acknowledge Donald's experience. And I agree that finding technical means to phase large changes in over time is one of several useful strategies. I'm also totally open to moving around our schedules so that the beta of a pip version goes out more like a month before the release instead of a week before.

However, in order to know whether past performance would be a good guide to future results, we need to check whether we are comparing apples to apples.

To me, a "beta" means:

  • release a version whose version number, per PEP 440, denotes that it's a beta prerelease
  • make a manual test plan that has specific areas, workflows, OSes/versions/environments, etc. that we want people to play with, e.g., in their CI setups
  • publicize the beta via various announcement channels/media and encourage bug reports, working especially to publicize it to people who don't usually pay attention to Python packaging stuff (see #7951 (comment) and subsequent comments for examples)

Did we actually do all of those things in the past, in the experiences Donald is referring to? I think we didn't. As I recall, we generally released a version with a beta version number and mentioned it in places where Python packaging experts and enthusiasts pay attention. We didn't have the resources we currently have, so we didn't do as much test plan development or beta publicizing.

As far as I know, we have now done a single pip beta that included something of a manual test plan and some publicity to users. It sounds like it wasn't as useful as you'd like, @pradyunsg, but I believe that means we need to iterate.

For example, one thing I would like to try:

  • looking at bugs that got filed immediately after recent pip releases
  • replying to those issues or emailing those users and asking them to join a beta announcement mailing list

That announcement list would grow over time; it might take a year for it to really start paying off, but I believe it would.

With 20.2b1 released as a feature preview (GH-8280), could/should we make it a habit? (IMHO beta is not the most correct term for it, but until 20.2 I don't think we can go back to dev* releases.) This would avoid directing willing users from having to build from master (e.g. GH-8363) and we'll get more synchronized feedback.

If this is a good idea, we'd also need to find out how frequent should the snapshots be. Personally I don't think regular interval of time is a good idea, since the number of maintainers we have for pip is not large enough for uniform roll-out of changes. I believe a snapshot every time there's an important bug fix or feature (for experimental features) would be really nice.

Disclaimer: not only the new resolver but also my GSoC project (parallel download) would also benefit from having quick feedback from a diverse user-base, so my opinion is quite biased.

Hi @McSinyx -- you mentioned,

not only the new resolver but also my GSoC project (parallel download) would also benefit from having quick feedback from a diverse user-base, so my opinion is quite biased.

Could you point to your project plan (I presume it's somewhere public) for your GSoC project? How much time did you build into that plan for publicizing your work and recruiting testers?

(Knowing that, plus looking at the upcoming schedule for donor-funded work, will help us figure out how many human-hours we have to create and publicize prereleases and get useful testing out of prereleases.)

@brainwane, if everything goes as planned (which is really rare), I'd love to have early feedback on week 4. Currently I'm also eager to know if multithreading/processing is portable enough to roll out to everyone, e.g. to revert the reversion of GH-7962.

How much time did you build into that plan for publicizing your work and recruiting testers?

TBH I did not included that in the plan. As written in the disclaimer, while I might benefit from having early feedback, I want to make sure it is communicated that I don't want prereleases if it makes it more time consuming for the current resolver work.

@McSinyx Thanks. I see that Week 4 is June 22-28. In a resolver meeting earlier this week we decided that we'll reconvene in a few days, on June 3rd, and try to come up with a date for the next beta (hopefully early June). That way we'd have at least 3 weeks of publicizing and testing before a July release of pip 20.2. So, if that works out, that may also help with getting feedback for your project.

In order for people in this thread to feel eager for future prereleases, I think we need to address the concerns brought up (as I did, a little, in #7628 (comment) ). For instance, could you help answer the question I asked in #7628 (comment) , to summarize how many bug reports we got in the following 8 days, and what kind of usage and testing was necessary to elicit them?

I see you have a few buffer weeks in the plan. Would you be willing to revise your plan, and use a few days of that buffer time to help with the following?

  • make a manual test plan that has specific areas, workflows, OSes/versions/environments, etc. that we want people to play with, e.g., in their CI setups
  • look at bugs that got filed immediately after recent pip releases, and reply to those issues or email those users and ask them to join a beta announcement mailing list

Doing these things will help make the prerelease more effective - we'll gather more bug reports that we can fix before the canonical release.

@brainwane, thank you for the pointer to the meeting summary, it gives me a better idea about what's going on.

For instance, could you help answer the question I asked in #7628 (comment), to summarize how many bug reports we got in the following 8 days, and what kind of usage and testing was necessary to elicit them?

I suppose that you mean the bug report we got from the latest beta (May 21)? I've gathered these:

Most of these came from the users who are willing to help out testing the new resolver, which is called for by the UX team I think.

Would you be willing to revise your plan, and use a few days of that buffer time to help with the following? [...]

My short answer is yes. The longer and more meaningful answer is I'm new to planning things and it's purely things I wish to get done in the timeline. By the end of this month I should be able to have a list of use cases and platforms that I especially doubtful about; and I participate on GitHub tracker or regular basis. I am unaware of any beta announcement mailing list though, could you please give me a pointer?

What I proposed in this issue earlier was to have a testing release stream where eager users can track. I imagine that it would be beneficial for pip's scheduled releases too, but the aim was to allow better synchronization between the loving-shiny-thing users and the core devs. I was more of having a question (if it's a good idea) rather than pushing it.

@brainwane I think the bug report summary was covered in #7628 (comment).