python-trio/trio

Better tooling for releases and development

njsmith opened this issue ยท 17 comments

Let's have an issue for collecting discussion about improving our tooling for release automation and similar.

Release notes

I think it would be nice to move to towncrier, and possibly adopt the twisted rule that non-trivial PRs should always close an issue. (The idea is that the issue records what the problem is and what was considered for fixing it, while PRs are used for discussing a particular candidate patch.) Basically the way this works is that whenever you submit a PR, you make a new file in a special directory that contains a description of that change. This way you can generate your user-oriented release notes as you go, in a standard format, without hitting merge conflicts. (Then you also need the rule that PRs have to actually contain one of these files, possibly enforced by a linting bot.)

Alternatives include openstack's reno which seems substantially more complicated (it involves yaml and a special tool that contributors need to use to generate the file, etc.). CPython had a long discussion about this and decided to go with their own tool that they can tweak to match the peculiarities of their workflow, but it's a useful survey (if very long).

[Edit: #262 set things up to use towncrier for release notes on master; let's see how it goes.]

Blue sky

The Rust community has this great bot that does their merging called "homu" (or sometimes "bors", but the modern implementation is "homu"). The way it works is that when a change has passed review, the reviewer tells the bot to go ahead and merge it, and then the bot makes sure the tests pass on current master before merging. (With github status checks, the normal thing is: (1) PR submitted, (2) status checks test that it works against current master, (3) other stuff gets merged into master, (4) PR gets merged into master. So the actual commits on master are generally not tested until after they get merged, which is kind of absurd if you think about it.)

Unfortunately homu itself doesn't currently support our setup with appveyor+travis. It looks like this is the PR for that: barosl/homu#134 (sitting unmerged for ~16 months now, woo. maybe the servo/homu fork is the real one now?)

(I'd also be nervous about using the hosted version of homu while this bug remains unfixed.)

What I'd really love is a bot that works like homu, but whose sequence goes like:

  • Reviewer says "let's do it"
  • Bot checks out the PR
  • Bot automatically fixes formatting (probably by running yapf over everything)
  • Bot tests the reformatted PR
  • If tests pass, bot merges reformatted PR into mainline
  • If some appropriate condition is met, bot tags this commit as a release and uploads to PyPI.

Rationale for the formatting thing: that it's lovely to just not have to worry about formatting during code review, and it's not reasonable to expect casual contributors to set up precommit hooks. But you shouldn't merge code you haven't tested! So the reformatting should be done before the gating tests.

Rationale for having the bot handle releases: traditionally, F/OSS projects handle trust issues by having an inner circle of devs who have commit bits and have a lot of power over the project (e.g., the technical ability to quietly insert backdoors!), and so join this circle there's a long period of building trust. With trio we're trying a model of development where instead of having strict barriers to who's allowed to commit, we give out commit bits like candy, but use branch protection to make sure everything happens in the open through PRs, so if anything nefarious is going on there are lots of people who will notice. But... there's then a second level of special privileges, which is who has permissions on PyPI. For obvious reasons I don't want to just give this out to everyone. But I don't want to be the single point of failure, either, and for several reasons it would be nice not to have to drop this responsibility on some carefully-selected group either. Instead, it would be great if releases could somehow go through our usual PR workflow.

I'm not sure what the "appropriate condition" would be: something like @bot: ship this as v0.2.2 and if the tests all pass and everything then it does that? Maybe with someone else needing to sign off?

...I guess thinking a bit more, it might make more sense to do something like, you file an issue called "v0.2.0 release tracking" and say @bot: release 193428efabde39 as v0.2.0, and it checks out that branch, runs towncrier to generate the news (maybe setting up a branch in the process?), builds the release, merges the generated news back to mainline, tags it, uploads to pypi. While giving updates in the issue, and tells you if something went wrong.

Note

This stuff is extremely not the biggest priority right now, when we're still trying to figure out basic APIs and it's not clear whether trio will get traction. But since I had some thoughts, I figured I'd record them here where they can be picked up again later :-).

I'd be happy if we copied the continuous release tooling and process from Hypothesis: https://hypothesis.works/articles/continuous-releases/

  • Every pull request that touches the source gets a release. No exceptions. (you may want to move or exempt tests though, and docs may but need not cause a release)
  • It does the normal github status checks thing, which has worked very well in practice for Hypothesis (zero problems out of >100 PRs/releases)
  • It's entirely built out of in-repo scripts, with no external bot or service. You just click merge, and CI does its thing - which now includes releasing if the tests pass on master.

TLDR; this wouldn't scale up to CPython or Rust but it works very well for Hypothesis and I expect would for Trio too, if you like the style.

@Zac-HD That's an interesting idea! I really like Hypothesis's approach, and also I'm really excited to see someone actually trying it for real :-). But I see two big differences between Trio and Hypothesis, which may or may not affect the viability of this approach:

  • Trio hands out the commit bit very very easily, which is great to bring people in, but means that we have some relatively inexperienced committers and part of how we try to encourage them to hit the green button is by reassuring them that we can always revert things. That's a little harder when hitting the green button immediately pushes the change to PyPI :-). It also means that there's a somewhat increased chance of vandals or malicious folks pushing intentionally bad code, which is not a big deal if it just goes into master and we revert it immediately, but is not so great if someone can push that all the way to PyPI and onto end user computers before it can be reverted. I'm not really that worried about this; vandals and malicious contributors are incredibly rare in F/OSS. But I think about it, like, a little bit :-).

  • Trio is at a different place in its lifecycle than Hypothesis: in particular, our APIs are in much more flux. One practical consequence is that if we were to do the Hypothesis release script thing, then we'd need a solid way to mark things as deprecated in "whatever release this ends up being". This would probably be a nice feature anyway, and complement towncrier (Twisted's incremental has this feature), but this would make it mandatory. A more fuzzy consequence is... well, the 0.2.0 release had some giant new features that ended up getting put together over multiple PRs, and intermediate releases would have been bad? Of course we want to avoid this anyway, and I'm hoping we never have any more releases that are as big as 0.2.0, but maybe there's more of a chance than for Hypothesis? I'm kind of going back and forth on this.

These are both more like, vague worries than an actual argument either way. And this is very much a question of like, what works better for us, so there's no one true right answer. I'd definitely be interested to hear what other contributors think. Would auto-releasing have freaked you out when you first got the commit bit? Would more frequent releases make you more excited to contribute? Do you think users would prefer deprecations to trickle out one day at a time, or to get a chunk of them every month or two in a larger release?

Bot automatically fixes formatting

I'd be nervous about anything unseen being merged. I think it would be better if a bot tested each commit and automatically issued PRs against the PR branch to fix any formatting problems, or just pushed directly to the PR branch so no intervention was necessary. At least that way the changes could be reviewed

Well, frankly I'd rather require the original author fix formatting, pylint etc. errors by amending the original PR. (You can't do that automatically.)
I don't like revision histories that are cluttered with formatting changes.

I was thinking it could be done if the PR author allowed edits from maintainers:

I don't like revision histories that are cluttered with formatting changes

A lot of PRs include a lot of rubbish wip etc commits and asking less experienced contributors to squash them themselves can scare/drive away contributors. My solution to that problem is to squash-and-merge all PRs such that every single commit on the master branch is guaranteed to pass the tests as they were at the time... but this discussion might be getting a little off-topic! :P

I like Zac's suggestion, and I'm excited by the prospect of dropping a formal release ceremony, but I share some of the already explained worries, especially on this:

A more fuzzy consequence is... well, the 0.2.0 release had some giant new features that ended up getting put together over multiple PRs, and intermediate releases would have been bad? Of course we want to avoid this anyway, and I'm hoping we never have any more releases that are as big as 0.2.0, but maybe there's more of a chance than for Hypothesis?

If a merged PR becomes to mean "a change worth releasing", does that mean that every single PR is supposed to improve the overall quality of the project? Because that's what I expect from subsequent releases. But maybe my expectations are off and this is the kind of assumption that I need to change. Still... it seems hard not to suggest the latest release to a newcomer, and PyPI/pip are biased towards the latest release.

It just seems to put a little more pressure on contributors. If I'm worrying about this then probably someone else would worry too. Does that mean, for example, that a new feature cannot be merged if it causes a performance regression (that maybe can be fixed by another pending PR)? Or when it alters the behavior of something else that was already planned to be removed? What if again it introduces a performance bug, but only on platforms that I have no access to test myself? And the big question is still: what if it's an API change?

In my experience this kind of continuous release works very well for product releases, but I know very little about adoption by open source projects. Hypothesis experience is very encouraging, and I would love to hear more from other projects!

Really excited to see some discussion on this one ๐Ÿ˜„

Re: formatting, I'd leave this alone in Trio for now - not least because Hypothesis will probably try using Black soon. Basically we fail CI if formatting again produces a git diff. It's possible to have Travis commit the diff and try to push it back to the user branch if they accept maintainer pushes (the default), but this risks nasty problems for inexperienced users of Git and IMO isn't worth it.

Re: commit bit, merging, and problematic code... no clear answer from me. In Hypothesis we don't do speculative merges (where a maintainer might need to fix it later); to compensate we try to start review very quickly, give ongoing feedback, and offer to finish any PR that gets too frustrating (while continuing to credit the original contributor). A network library has a very different perspective on security to an offline fuzzing tool!

Re: size of features, you eventually get used to the idea that minor and patch releases just aren't a big deal - every API change (feature, deprecation, or bugfix) gets a minor release, and everything else gets a patch release. For the really big changes - Trio's 0.2, Hypothesis 4.0 - we set up another branch and accept PRs into that, and maintainers merge that one into master later. It doesn't come up often though!

Re: @sorcio's excellent questions on contributor pressure, I'm without a clear answer again - some people probably will worry, but I don't think it's insurmountable. To answer the specifics: no merging features with performance impacts that we don't want to release (but maintainers could address that in the PR). Any changes to behavior in Hypothesis are guided by backwards compatibility, not planned deprecation, but we are unlikely to accept new feature that we know we want to remove. For performance bugs that weren't caught before merge, we issue a patch release (it turns out that this is a great way of finding corporate users). API changes are handled in the usual way - project-specific in the 0.x phase, then minor release for compatible changes and major releases for incompatible changes.

The only bit I really worry about is vandalism; everything else is like the current system but faster - albeit with correspondingly more frequent demands on reviewers.

If a merged PR becomes to mean "a change worth releasing", does that mean that every single PR is supposed to improve the overall quality of the project?

I mean... PRs are always supposed to improve the overall quality of the project, right? :-)

I believe that in Hypothesis's setup, they distinguish between two kinds of PRs: trivial ones that don't have any news entry and don't trigger a release, and non-trivial ones that have a news entry and do trigger a release. So that means that e.g. a commit that just tweaks .travis.yml or fixes a README typo won't necessarily generate a release. But it does mean that after every PR, master has to be releaseable, because every time a non-trivial PR comes along its release will roll-up and include all the recent trivial PRs.

Now, as a general principle, I think this is what we want to strive for anyway. If a PR is known to cause a significant performance regression, then the options are (1) decide the change is important enough to justify the regression, (2) block merging that PR until the regression is fixed, (3) merge the PR anyway, and then block releasing until the regression is fixed. The main difference between (2) and (3) is that in (2), this one change is blocked from reaching users, while in (3), all changes are blocked from reaching users. Plus it's just awkward and frustrating to try to keep track of which known issues are currently lurking in master, and if you have a steady stream of PRs like this then you can even end up in a state where master is never releasable; this is how you end up with the cliche of an open-source project that makes like, one release every 5 years or something. So in theory, we should prefer (1) or (2) and avoid (3) entirely.

But I've known all this theory for like a decade, and yet 0.2.0 still happened โ€“ I think because the new features were large and complicated enough that I needed multiple PRs worth of iterations to figure them all out. There is a way around this though: make a branch, submit multiple PRs against that branch, and then once the branch is in a nice complete state, merge it into master.

There's also another case, where a PR causes a significant performance regression but we don't realize it until after it's merged. In this case I think it's fine if we let the release out โ€“ if we didn't catch it before merging then it's not likely that letting it sit in master for a few weeks, with no-one using it, is going to help us catch it either. The sooner we release it, the sooner someone will notice the problem, file a bug, and then we can fix it and improve our CI so that the next time this happens we can catch it in the PR stage.

Re: formatting, I'd leave this alone in Trio for now - not least because Hypothesis will probably try using Black soon. Basically we fail CI if formatting again produces a git diff. It's possible to have Travis commit the diff and try to push it back to the user branch if they accept maintainer pushes (the default), but this risks nasty problems for inexperienced users of Git and IMO isn't worth it.

Oh, we've been using yapf in this mode since before black existed :-). The musing in my original post was about whether we can make things even simpler for contributors.

I don't like revision histories that are cluttered with formatting changes.

All else being equal, I don't either. But the number 1 priority for a project that wants to successful and attract contributors is to make contributions as easy and frictionless as possible (esp. early contributions) โ€“ see e.g. and surrounding thread. Slightly messy revision histories have much less effect on project success, so I don't worry about them as much :-).

Any PR to Hypothesis may trigger a release by including a changelog entry in RELEASE.rst. Every PR that touches a file in src/hypothesis/ must trigger a release (even for typos!). The optional releases tend to get used when downstream packagers find a problem in our test suite, so we can give them a fixed tag ASAP, or occasionally for documentation-only updates (though if that included a docstring it's a mandatory release).

The distinction between (1) and (2) is just a matter of timing: if you think the PR with regression is a big enough upgrade to release now instead of later, do so! And of course the whole point of the system is to avoid (3), with exactly that feature-branch workaround for major releases.

There's a proposal for the Trio deprecation period - at least one month between first release with deprecation and removal - over in #500.

I mean... PRs are always supposed to improve the overall quality of the project, right? :-)

I... am tempted to agree unconditionally to the general idea. But I can't just say "yes" because there is no single "quality" metric. Not all patches are Pareto improvements :) Maybe "quality" is a red herring in this case and I should instead say "every PR is supposed to further the project goals". Every release is one step closer to the goal, and the path can be made of tradeoffs (let's just assume that the project governance is able to decide what is an acceptable tradeoff). My point then becomes: a contributor proposing a PR will feel the pressure of making a decision.

The rest of your answer makes it pretty clear that you have already thought through the potential issues. And I shall thank you for listening and writing up on this. Trio may be young but it's an excellent example of a project that fosters real collaboration. Yes, with this kind of environment, support from the community, and good tools (e.g. feature branches, mandatory reviews, open discussion on decisions), then the worries I'm exposing fade.

@Zac-HD thank you for your answer, it's precious input :)

Regarding the whole release automation thing, I think I've convinced myself that the main issues to worry about are:

  • New maintainers: if we do release-after-merge, that makes it scarier to click the "merge" button
  • Users: don't want to wait for new features, but also probably would rather not be reading release notes every day to find out what's been deprecated now

Brainstorming about how to address these, I came up with two ideas:

Option 1: implement release-after-merge, and figure out how to make it work

For new maintainers: I think the crucial thing is to have some way to feel like if you did make some major error, there's some chance for someone else to catch it -- but without actually requiring "someone else" to sign off. Maybe if there were a way to merge on a timer? Like we could have a bot where if someone with commit access marks a PR as "approved", and after 24 hours no-one has "requested changes", then the bot goes ahead and merges it.

For users: So it's probably a good idea in any case to pin your trio version in your project, and then use a service like pyup.io to send a PR when a new release comes out. And if you're using pyup.io, then you can pick how often it sends these updates, e.g. you can tell it that no matter how often someone releases, it should only nag you once a week. But... as @pquentin pointed out to me, using pyup.io inside companies may be impossible because for private repos it costs money and many companies are dysfunctional about this kind of thing. On the other hand, maybe those companies are already dysfunctional about upgrades and only upgrade once a year or something, I don't know?

Option 2: Do automated releases on a fixed schedule, e.g. once a week. (Unless there's a bug labeled "release blocker", I guess, which should hopefully never happen, but maybe you want the option.)

I can't decide if this idea is super clever or would just be kind of annoying :-). It keeps that key "releases just magically happen" flavor, so you're still forced to figure out how to automate things, people can count on timely updates, etc. But for maintainers, it breaks that feeling that every time you hit merge you're immediately pushing code out to the world, and increases how long it takes bug fixes to get out to users.


There's also some question about what to do with all the other python-trio projects. For the ones that only get small and occasional changes (like async_generator, sphinxcontrib-trio, ...), the hypothesis approach makes a lot of sense. Maybe we go ahead and implement it there and see how it goes before committing to anything on the main trio project?

There's also still the concern about vandalism, but it's so vanishingly rare that maybe we should wait for it to actually happen before worrying about it.

(And again, would love to hear feedback from other maintainers and users, because really I'm just guessing about what would help you all :-).)

(Let's not worry about companies, they already need to cope with releases from other projects: as long as you increment the major version for breaking changes, any kind of release schedule is fine.)

Re: "merges can be scary" - I'd stick to the simple system! The timer approach is both more complicated, and would make me scared to approve some things at all. For example: when I started reviewing PRs on Hypothesis, I'd often go through and make my comments, and then ping David to check that I'd found all the important issues - and in fact on complex PRs I'll still often approve it with a 'looks good to me', but ping David to do a final review of the scary bits before clicking merge. TLDR: we're not operating on a deadline, and wanting more eyes on something should be encouraged - it's a great way to learn too!

Re: user upgrade issues - users should pin their versions. If your project or company is disfunctional, that's not Trio's problem. (though many of us offer individual consulting services ๐Ÿ˜‰)

Re: fixed schedule - this mostly defeats the point, and is much more difficult to implement robustly. IMO if it's worth doing, we should jump directly to releasing every merge.


We were definitely not comfortable with the idea at first - HypothesisWorks/hypothesis#537
is literally titled A new and terrifying age of continuous deployment is upon us!

Starting by implementing this for the smaller packages sounds great to me though - both as a confidence-builder and to work out what the "Trio standard" is for continuous deployment. Maybe we could eventually add it to the cookiecutter...

The bors-ng maintainer has accepted the proposal to add hooks for autoformatting and continuous deployment. So the next step is for someone to implement it.

The stuff we need to implement is relatively straightforward, but there's an interesting wrinkle: bors-ng is written in Elixir. Elixir is pretty cool! It's a kind of friendly modernized Erlang (runs on the Erlang VM, interoperates with Erlang libraries, but you don't have to put . at the end of every line). But I don't actually know it at all, and am not sure when I'll find a chance to learn :-).

So... anyone looking for an excuse to learn Elixir, and/or implement something real in Elixir?

elixir is ruby-ish, I have the basics of it. what help do you want? I have a plan for learning erlang as well, to contribute to rabbitMQ when needed.

@auvipy Basically implementing this spec as a PR for bors-ng: https://bors.tech/rfcs/0322-pre-test-and-pre-merge-hooks.html

The alternative plan would be to write our own implementation of bors-ng's features in Python, perhaps in snekomatic. I can see arguments either way... bors-ng is an existing thing with some sophisticated features and community. But a local bot in a more familiar language is a lot easier for us to hack on if we have issues, plus we could just implement the hooks we want directly instead of having to do this complex thing involving bouncing back and forth between multiple cloud services.