Decide, document, and enforce accepted merge strategies
Opened this issue · 9 comments
Done Criteria
- We have documentation for those with merge permissions about what merge strategies we accept/use and when.
- We disable any merge strategies we don't want to allow.
Why Important
Inconsistencies or mis-wielding here was a source of pain in the nv23/1.28.0 release train, especially when wanting to compare branches to ensure the right commits were present. Some places this came up:
- https://filecoinproject.slack.com/archives/CP50PPW2X/p1719274557199449?thread_ts=1718923493.254039&cid=CP50PPW2X
- #12138
- https://filecoinproject.slack.com/archives/CP50PPW2X/p1719287930380079
This also affects the general tidiness of the commit history which can impact ease which one can quickly grok the progression of changes.
User/Customer
Maintainers but also users who inspect the commit history.
Notes
- At least of 2024-08-26, Lotus supports merge commits, squash merging, and rebase merging:
- We have a separate issue concerning a standard process for backporting changes: #12418
This is how I'm thinking to structure the conversation (with my proposal from what I know):
Usecase | merge commits | squash merging | rebase merging | note |
---|---|---|---|---|
General contribution to master. (It could be one commit or multiple incremental commits that weren't intentionally structured and minimized.) | X | We don't want to clutter the commit history.Squash merging will put the PR number in by default. | ||
Large contribution to master with intentional incremental commits | X | Don't want to lose the intentionally constructed commit history. Ideally want to make sure it's clear/easy to go from each individual commit back to the parent PR. | ||
Backporting changes from master to release branch | X | Don't want to lose the commit history from master so it's easy to compare commits to master. | ||
CHANGELOG updates in release branch | X | |||
Backport updates from release branch to master | X | Squashing should be fine for CHANGELOG updates. This isn't good though if there are other commits that didn't first land in master. (That said, we should avoid this from happening.) |
I realize a GitHub issue isn't a good place to collaborate. If this is contentious or there are lots of comments, I can pull out somewhere else like a Google doc. (Someone else feel free to.)
Note I wanted to get some initial thoughts down and am out of time for today. I welcome more experienced folks to add usecases or concerns I'm missing.
Cc @rjan90 , @rvagg , @aarshkshah1992 as people who have more recently run into some of this.
The main reason I find myself reaching for squash merging these days is to fix up people's commit summaries. If someone has a nice PR title but sloppy commit messages then I'm going to squash to pretty it all up.
My heuristic for preferring rebase merging isn't very tight, but in addition to having nice commit messages, it's something like this:
- Each commit represents a separate, logical unit, that doesn't make sense to combine into one - perhaps there are two separate components being touched and it's nice to have them separate, e.g. some new API method + use of it in lotus-miner.
- The PR has separate people involved somehow—it's really not nice to re-author someone's work. Although in this case it might be a local interactive rebase to squish bits together to reduce the number of commits.
- The PR has a lot of work and squashing it down to a single commit either feels criminal or may make bisecting errors more difficult (unlikely, I'm not sure when I last used a git bisect).
Top priority for me is that the git history should look clean enough that I can git log --oneline
and there's a logical grouping and sensible messages that tell me what's going on. I think we're getting better but we still have a bit of this:
Fewer commits make both backporting and general rebasing much more straightforward too.
The main reason I find myself reaching for squash merging these days is to fix up people's commit summaries. If someone has a nice PR title but sloppy commit messages then I'm going to squash to pretty it all up.
Agreed.
Each commit represents a separate, logical unit, that doesn't make sense to combine into one - perhaps there are two separate components being touched and it's nice to have them separate, e.g. some new API method + use of it in lotus-miner
Agreed. This would be a good example to put in our docs.
The PR has separate people involved somehow—it's really not nice to re-author someone's work.
Isn't attribution given in the squashed commit listing all authors, and if someone really wants to see who did what, they can look at the PR? Other than it affecting stats like "X contributed Y lines to Lotus", does it matter?
Although in this case it might be a local interactive rebase to squish bits together to reduce the number of commits.
Agreed that we should do actions like this to minimize commits.
The PR has a lot of work and squashing it down to a single commit either feels criminal or may make bisecting errors more difficult (unlikely, I'm not sure when I last used a git bisect)
Sure, but I think we need to live with this. I don't think we need to do anything to feed the idea that the number of commits is important or speaks to the significance of the change. I realize its a proxy to the amount of work involved, and definitely a better measure than the number of lines touched, but it's still far from perfect. (Not actionable here right now, but I'd ideally like us to have an expected impact attribution system for a change along the lines of what is done in https://gitrules.ai/.)
Top priority for me is that the git history should look clean enough that I can git log --oneline and there's a logical grouping and sensible messages that tell me what's going on.
I completely agree. I will probably lift that statement into any documentation written here. Any proposals/PRs here, should be guiding to help achieve this aim.
we still have a bit of this
Yeah, I want to see this clean up.
Fewer commits make both backporting and general rebasing much more straightforward too.
Agreed.
Next steps:
- Are there other usecases (rows) you'd suggest adding to #12417 (comment)
- Are there any cases in practice where merge commits are useful? If not, any concern with proposing to disable them. (I would personally be more ok with keeping them around if we could remove it from being the top/default option in the dropdown, but I don't think there's a way to change that.). In practice, I assume disabling merge commits is another way of accomplishing "linear history requirement" but it's at the repo level vs. at the protected branch level.
- I can't think of any good additional cases to add to the table but I'm not convinced with the X's in the squash column - the only hard X for me is a contribution to master without good commit hygiene as outlined above. The rest are all good for rebase merging, even the changelog stuff, I don't see a good reason to squash anything there.
- Merge commits are mainly useful if you're going to make use of advanced git-fu, and I think we need to be honest that there are very few people who use git in that way which limits their utility. The other place where they are useful is the merge-back case which is how it's still used here, as we've discussed. But as long as we're landing almost everything on master first and backporting from master, then even that case diminishes. The concern is then mostly about worrying about whether the changelog commit made it back or not and that's mostly cosmetic. So I'd personally be fine (and quite happy!) with disabling merge commits entirely.
Fewer commits make both backporting and general rebasing much more straightforward too.
This + ease of doing release chores are the main reasons I prefer squashing. The changelog sorting for prepping a release does take quite some time when there is a lot of PRs that has not been squashed as they show up as:
fix wrong param name
update changelog
--- etc
Without any reference to the PR itself, and you have to manually go and search up the PR number and see that you have it in the autogenerated changelog. While squashed commits shows up as:
feat(libp2p): expose libp2p bandwidth metrics (https://github.com/filecoin-project/lotus/pull/12402) (https://github.com/filecoin-project/lotus/pull/12402)
Large contribution to master with intentional incremental commits
I agree that large contributions and/or PRs that has a lot of auto-generated code (typically network skeleton PRs), it is very nice have incremental commits and it would be nice to keep the commits separate and not squash them. Additionally, it would be helpful to:
- Label PRs that are rebase merges. (maybe there is some off the self GHA that can do this for us?)
That would make it more visible when backporting to releases, and potentially help prevent issues like forgetting to cherry-pick all commits, which occurred during the v1.28.1 release.
Thanks all for the feedback. This makes sense. Unless someone says otherwise, I think the next step is for me to draft a PR to CONTRIBUTING and/or RELEASE_FLOW with this info. That then will enable more targeted suggestions or additional collaboration. I'll plan on doing this the week of 2024-09-02 when I'm back.
I reviewed the merge strategy board before releasing v1.29.0: #12422.
For the CHANGELOG updates in release branch
use case, we should consider adding a note that updating the version string is acceptable. This aligns with the current release issue template.
Alternatively, we can separate these steps if we prefer not to squash the version string update/make gen/and make docsgen steps in the release branches.
There's been lots of useful feedback here. I haven't incorporated it into a drafted doc. This is what I want to cover/do next week:
Things I want to get documented:
- Goals we want to see with our commit history
- history should look clean enough that we can
git log --oneline
and there's a logical grouping and sensible messages that tell me what's going on - all commits, regardless of whether squashed or rebased, deterministically tie back to a PR
- history should look clean enough that we can
- Why do we disable merge commits in master?
- (This is covered in this issue. We're intentionally not doing advanced git-fu)
- Why do we default to squash commits in master?
- We don't want to clutter the commit history.
- Squash merging puts the PR number in commit description by default.
- When do we consider rebase commits for master?
- (Lots of good discussion in this issue to pull from)
I also want to take this accompanying actions (these are new and hadn't previously been discussed - input welcome @rvagg and @rjan90 ):
- Adjust our "changelog enforcement". Per #12417 (comment), lets get rid of [skip changelog] in the title. We can allow the keywords in the PR description instead.
- Make it so one has to explicitly opt-in (be intentional) to do rebase merge and with a rebase merge require all commits to reference the PR.
- To accomplish this, I'm thinking a GitHub Action to block rebase merge unless a certain label is set (e.g., "merge/rebase"). If the label is set, then the GH Action will check to ensure each commit references the PR. If that passes, let the merge through.
I believe 1 and 2 will prevent what we saw happen in #12272, where we had a good commit history that could have been rebased, but instead ended up with a squash commit that also has an ugly "[skip changelog]" prefix.