filecoin-project/lotus

Decide, document, and enforce backport strategy

Opened this issue · 12 comments

Done Criteria

  1. We have documentation for how a maintainer is supposed to backport changes from master to a release branch
  2. We have documented tooling / process that makes it clear for the release engineer what changes are in master that aren't in the release branch
  3. (ideal) We have tooling that helps guide a maintainer through this process.

Why Important

In nv23/1.28.0 and elsewhere we've seen time get burned making sure everything expected (and nothing unexpected) gets backported. This is usually happening at an already "stressful" time when dealing with other unexpected release/network-upgrade items. This is a process we should be able to have smooth and not cause drama.

User/Customer

Maintainers

Notes

  1. This has some overlap with our commit strategies: #12417
  2. I understand there are plenty of existing git commands or tools that can help do this. We're not (or shouldn't be) a snowflake here. As a result, this may just collapse to some documentation and consistency.

Tasks

Preview Give feedback

Input welcome, but this is the kind of experience I would like to see from "tooling"

Here are the set of PRs that are in master that aren't in your release branch:
...

Of those PRs, here are the ones that had the `backport` label: 
...

❓ Which PRs would you like to backport?
(enter PR numbers)
(proper git-foo is done to rebase (I assume) those changes to the release branch, open a PR, etc.)
(get approval on the PR)
(when the PR is rebase merged, remove the backport label from the PRs that were included)

As a potentially interesting reference, these are some docs that outline the Node.js process. Some if it is unnecessarily burdensome and probably should be trimmed down a bit (years of cruft building up in processes). But IMO it works out pretty well for a very standardised and safe merge, backport and release process. There's also associated tooling which we could either learn from or potentially borrow (or even bend to our purposes, cause I'm still a maintainer of a couple of pieces).

There's evolved a reliance on tooling to enforce the standardisation and to make the whole process less error prone. Just to land a PR, someone with write access can't just press the "merge" button, there's now a git node land 1234 command that steps through the process of taking a pull request, checking that it's good to land, squashing, adding metadata and putting it safely into the main branch.

Everything goes into the main branch first, release "lines" get cut off master once every 6 months and live for varying lengths (every second one ends up as LTS and lasts for longer). Each release line X has 2 separate branches:

  • vX.x-staging: where anything that's going to go into the next release will be put. If something is obviously going into that release then it can be put there at any time. (There are restrictions on how long something should "mature" before it goes into various branches though). e.g. https://github.com/nodejs/node/commits/v22.x-staging/
  • vX.x: When someone cuts a release, they open a pull request as a "release proposal" that shows what vX.x-staging -> vX.x is, along with their proposed release notes (whose generation is aided by tooling). That gives everyone a last chance to notice and object or craft messaging. After merging that PR (not an actual "merge", everything is strictly linear) and the release is triggered, a follow-up commit bumps the version, kind of like what we do with -dev versions here. https://github.com/nodejs/node/commits/v22.x/; example of a proposal: nodejs/node#54452

For backports and releases, there's a reliance on some of the metadata put in the commits (such as the PR-URL), and also commit comment matching (because it's standardised, and PRs will often end up as a single commit), but also a heavy use of GitHub labels to know what to backport. There's semver labels, so breaking changes don't go onto a release line branch, and semver-minor changes will generally be grouped and then result in a miner version bump. And because there's multiple release lines being managed at once (often 3, including 2 different forms of LTS), there are labels that dictate what shouldn't be backported to certain branches, e.g.: https://github.com/nodejs/node/labels/dont-land-on-v20.x

The strict reliance on linear git histories, use of standardised metadata and also the GitHub APIs makes it pretty smooth generally, not a whole lot of headaches with backporting as long as things are done somewhat sequentially. Linear histories do mean you lose some of the power of git because you break some of its inherit linkages between branches via merge commits, but IMO if you can make up for it with standardisation and tooling then it's worth it because most mortals stumble when git gets too fancy and linear histories are just easier to grok.

Thanks @rvagg for the NodeJS example. This was interesting to read.

I'm trying to think about what proposal to make based on where we are today, the resources we have to make improvements here, and the needs of the project in its current state (compared a long-time, very mature, high contributor project like NodeJS). I want to make sure we don't let "best be the enemy of better". I agree with your opnion that "if you can make up for it with standardisation and tooling then it's worth it because most mortals stumble when git gets too fancy and linear histories are just easier to grok".

Do you have suggested next steps here? I can spend some time to write some ideas down tomorrow (2024-08-28), or I'd be happy to sync verbally to then give me homework to type up. Feel free to draft next steps or suggestions though independent of me.

Well, one issue I can see is that we seem to be restricting ourselves to a limited semver-minor release schedule. So unless we're going to get comfortable bumping the semver-minor number more frequently and freely, we probably need to manage branches for those over whatever lifetime they have. Right now our flow suggests that all releases will come off master unless they are security or otherwise emergency fixes. Which then complicates our relationship with semver more than it is now. What does the minor number indicate and is the patch number just a "this is the next release"?

We probably need to clarify our position on semver a bit better, and also how the node/miner thing gets tangled up with that. How about this for a proposal:

  • semver-major is a non-issue until we feel we have a large enough historical break that we want to bump it and deal with the complexities Go throws at us. Mostly leave it alone, but we also try not to break things too badly in the spirit of semver.
  • semver-minor is our stand-in for semver-major, mostly. Breaking changes (there's a large discussion under here about what that really means) and notable feature additions go into semver-minor and are held back from semver-patch releases. "Notable feature additions" is a squishy concept but I don't think we have a good reason to be strict with semver on this.
  • semver-patch is for fixes, smaller improvements and less notable (less noticeable) feature additions.

Essentially all v1.x.* releases should have approximately the same feel. They certainly shouldn't break the RPC API and you shouldn't have dramas downgrading within that range (i.e. no db migrations).

Regarding node/miner:

We mostly just release the node and only do a miner release when we have need to but it will almost always be coupled with a node release. So the version numbers go together and users need to know that lotus-miner will likely not have releases for certain version numbers.

Regarding branching:

  • Each semver-minor gets its own branch where we cut semver-patch releases off, cut directly off master: e.g. release/v1.29.x
  • All changes get landed on master and cherry-picked back to the release branch, ideally via a PR (don't need to be done all in one go, the branch can build up).
    • There will be exceptions to this, where it's either impossible to put the change or master or it doesn't make sense. Changelog changes are in this category but there may be other changes that just don't fit on master for some reason (maybe the code has moved too far such that the fix can't even go on master or doesn't need to be on master). These cases are handled carefully, but we accept that they happen.
    • In doing this, the aim should be that the only thing that needs to be cherry-picked back to the master branch from a release branch is the changelog commit.

How's that for a starter proposal? Too different from what we're already iterating towards?

Hi @rvagg. I think some of the comments here were at least intended to be covered in the https://github.com/filecoin-project/lotus/blob/master/LOTUS_RELEASE_FLOW.md changes @rjan90 and I did the last few weeks.

Semver purposing → https://github.com/filecoin-project/lotus/blob/master/LOTUS_RELEASE_FLOW.md#adopted-conventions

Node/miner → https://github.com/filecoin-project/lotus/blob/master/LOTUS_RELEASE_FLOW.md#release-cadence

  • That said, we were planning for versions to diverge and not to always have a coupled node version when doing a miner release.

"Each semver-minor gets its own branch where we cut semver-patch releases off, cut directly off master: e.g. release/v1.29.x" → https://github.com/filecoin-project/lotus/blob/master/LOTUS_RELEASE_FLOW.md#branch-and-tag-strategy

"All changes get landed on master and cherry-picked back to the release branch, ideally via a PR" → https://github.com/filecoin-project/lotus/blob/master/LOTUS_RELEASE_FLOW.md#branch-and-tag-strategy


I do think there are ways to expand or clarify the language based on your phrasing as I do like the way you've phrased it. I can take this on week of 2024-09-02. Feel free to make changes directly.


Assuming we do go with what you've suggested (and what we intended to have communicated in LOTUS_RELEASE_FLOW), is there any suggested commands / tooling we should use and document? Minimum commands/tooling I would like to have are:

  • cherry-pick-prs <pr-#s> (I assume just need to do a github api call to get the commits associated witha a PR and then do git cherry-pick commit1 commit2
  • commits that differ between master and release branch (I know there is good good git commands for doing this - we should just document it)
    I know the above is not unique/special to us, so if there are things we can use/adopt, I'm game to hear.

I'm sure there are some decent plain git commands that would be useful to document, like git log branch1...branch2 and all the variations of this, but in terms of the workflow I like, I still maintain branch-diff (and its twin, which it uses internally, changelog-maker) which I still find useful for doing some of this git branch org work. They rely a bit on a clean git history to work, and wouldn't work well where merge commits get involved or you have badly formed commit messages. But as we improve our standardisation they might become useful.

e.g.

$ branch-diff --user filecoin-project --repo lotus origin/release/v1.29.0 master
* [4a4ddaaecc] - docs: updates about branches and where to target PRs (Steve Loeppky) https://github.com/filecoin-project/lotus/pull/12416
* [475139ff95] - chore: deps: update to CGO-free go-crypto (Peter Rabbitson) https://github.com/filecoin-project/lotus/pull/12411
* [8518d23289] - build: update Lotus Node version to v1.29.1-dev in master (Phi-rjan) https://github.com/filecoin-project/lotus/pull/12409

$ branch-diff --user filecoin-project --repo lotus master origin/release/v1.29.0
* [5bd97edb4e] - build: release Lotus node v1.29.0-rc1 (Phi-rjan) https://github.com/filecoin-project/lotus/pull/12410

(works as long as you have a git repo updated and have the refs locally, the --user and --repo is just to override the default "nodejs/node" when it can't figure it out for itself—it could do better at figuring this out tbh; it can also be run as npx branch-diff if you don't have it installed locally).

If we have a nice labelling system then it can work as the one-stop tool to figure out what to cherry-pick because it has --exclude-label and --include-label. If you have a set of labels applied consistently to PRs then you can get it to tell you specifically what's missing. e.g. I put a label on one of the PRs above and now it filters that out:

$ branch-diff --user filecoin-project --repo lotus --exclude-label=impact/docs origin/release/v1.29.0 master
* [475139ff95] - chore: deps: update to CGO-free go-crypto (Peter Rabbitson) https://github.com/filecoin-project/lotus/pull/12411
* [8518d23289] - build: update Lotus Node version to v1.29.1-dev in master (Phi-rjan) https://github.com/filecoin-project/lotus/pull/12409

Then if you're confident enough you can even --sha and you only get the commit hashes out that you can pipe to git cherry-pick.

You can see in the Node.js release process docs that it can get pretty sophisticated: N=22 sh -c 'branch-diff v$N.x-staging upstream/main --exclude-label=semver-major,dont-land-on-v$N.x,backport-requested-v$N.x,backport-blocked-v$N.x,backport-open-v$N.x,backported-to-v$N.x,baking-for-lts --filter-release --format=simple'

If we're interested, we could invest a bit more in making this tool work for us. Currently it's heavily tuned for the Node.js process and there's ways that could be improved (e.g. --filter-release is basically a regexp for the specific release commit messages for Node.js, and the user/repo thing could be more intelligent).

rvagg commented

^ these might be pretty neat but we're going to need to do something about our changelog because it's going to lead to constant conflicts which will prevent them from automatically working

Hmm, one option would be to require aCHANGELOG: ... line in the PR description, which we could then scrape.

masih commented

Or...

  1. Remove the requirement to update changelog as part of a PR
  2. Script generation of it from all PRs that are included in a release and don't have skip changelog tag.
  • This could be as simple as PR title + whatever in the description (minus the template fluff)
  1. Feed it to LLM to turn into a sane changelog entry.
  2. have a human skim through it before committing it as part of bumping verson.json.
  3. Take the nobel prise money, buy everyone in the team ice cream and retire in Bahamas.

Steps [2-4) will be done entirely by CI on a release branch (when version.json is touched).

rvagg commented

Similar thoughts to the above two comments came up while chatting with @BigLep last week, notes in #12757 (comment)