ingydotnet/git-subrepo

Fixing the rebase/squash issue

Closed this issue · 4 comments

I've encountered the issue where if you squash or rebase the container repository containing the subrepo folder, the .gitrepo > subrepo.parent value gets stale because the commit shas have changed. Any, subsequent git subrepo push or git subrepo pull will give the error:

git-subrepo: Command failed: 'git branch subrepo/external/ag-website-shared '.

fatal: not a valid object name: ''

There are numerous github issues that highlight this issue, and the general solution seems to be to modify the subrepo.parent value manually. However, I have a work around that appears to work, but would love comments on whether it's a good approach, and if so, maybe we can get it merged into the repo properly.

My approach is to create a script that wraps git subrepo and handles the case where the parent sha is stale, and fixes it, then runs the git subrepo command again. If it isn't stale, it just runs the command as normal. The steps are:

  1. Check if the subrepo.parent value is stale with

    git merge-base --is-ancestor ${gitRepoParentValue} HEAD
    

    It returns an error code if the parent sha can't be found, otherwise it returns nothing.

    I got the idea of how to check this from #415

  2. If it is stale, do the extra step of:

    # Get the commit sha of the _parent_ of the `.gitrepo` file
    git log --format=%P --follow -1 ${subRepoFolder}/.gitrepo
    
    # Set the `.gitrepo` parent value to the parent commit
    git config --file ${subRepoFolder}/.gitrepo subrepo.parent ${gitRepoParentCommit}
    
    # Commit the changes
    git commit -am "Update .gitrepo parent sha"
    
  3. Run the git subrepo command as normal

  4. As a cleanup task, if the parent was stale, I squash the commit of the git subrepo with the extra commit for updating .gitrepo (step 2).

Essentially, step 2 is a temporary step, just to get git subrepo commands to work, and once it works, that commit is no longer necessary. I'm not 100% sure this will work for all cases, but it works for the typical github PR flow that rebases or squashes commits. With this script, it doesn't matter if the parent sha is stale from a rebase, as it will be "cleaned" up when the next person runs the script.

Here is the implementation: https://github.com/ag-grid/ag-website-shared/blob/78c8d85224fcc688cd446273ea50a6ffbfbde8d9/scripts/subrepo/src/lib/runSubRepoCommand.ts (the folder containing it, wraps it in a CLI, so that it can be run in the container repositories).

Can anyone see any downsides to this approach?


Other issues that highlight this problem:

@taktran, I have also been updating the parent sha to get around #602, but have to manually copy over skipped commits to ensure those changes are included. If only the sha is updated, I believe the code could be out of sync (depending on whether there were changes in skipped commits).

@mmcev106 , in which cases are there skipped commits? Also, when you update the parent manually, how do you determine which sha to use?

From what I've seen, when there is a subrepo pull/push, the .gitrepo file is updated, so if I use that as a reference point and take the parent of the .gitrepo file change commit, I know it'll be on the HEAD. On the subsequent pull "the content of the new branch replaces your subdir", which would be based on the subrepo.commit value, so I don't see how there would be skipped commits.

Originally, after a rebase on the main branch, I updated the subrepo.parent sha manually, by using the outdated parent sha as a reference to find the same commit based on the commit message (only worked if the original commit came from my machine), and then replaced it with the equivalent commit sha on HEAD that was changed from the rebase. That's what made me realise that actually, by looking at the commit that changed .gitrepo, I could find out the commit that correlated with the stale subrepo.parent sha.

@taktran, if you're only expecting changes to be made in one place, skipping commits should be OK, as the pull will overwrite with the latest (as you said). We have changes made in both the repo directly, and on it's subrepo clone inside a containing repo. Skipping any commit on either end potentially overwrites changes we want made on the opposite end. To recover from #602 I run git --work-tree=/path/to/other/clone to determine what changes might have been missed, copy them over, manually, then do another subrepo pull. Once I've done that, I update the parent to the commit of the sha of the subrepo pull I just made.

Rebasing is of course unsupported, as there's no way for git-subrepo to know where it was once the commits are re-written. I'm guessing this also works for you because commits are only moving in one direction.

@mmcev106 Oh I see. We've made rebase the default, which is more difficult when there are conflicts, but our team is pretty small and are ok with having a soft "lock" on shared folder files when we are making changes, coordinating manually to ensure we don't get into messy situations.

I guess our compromise of simple locking and manual coordination allows us to not have to deal with distributed repo git syncing issues 😅 It's still early days though, and we haven't seen all the issues that can arise from this.