robinst/git-merge-repos

Add option to do moves in a commit before merging different repos

jcarsique opened this issue · 2 comments

Hi,

The history browsing is broken on merged content which has been moved to a subdirectory.

Here are reproduction steps.
Starting from the two following repositories:

$ tree repoA repoB
repoA
├── rootFile
└── treeA
    └── fileA
repoB
├── rootFile
└── treeB
    └── fileB

repoA (master)]$ git log --oneline --stat 
dbdb5aa commit A2
 rootFile    | 0
 treeA/fileA | 1 +
 2 files changed, 1 insertion(+)
e344ad0 commit A1
 treeA/fileA | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

repoA (master)]$ git ls -r HEAD *
rootFile
treeA/fileA

repoB (master)]$ git log --oneline --stat 
8f87ce7 commit B2
 rootFile    | 0
 treeB/fileB | 1 +
 2 files changed, 1 insertion(+)
e7b06af commit B1
 treeB/fileB | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

repoB (master)]$ git ls -r HEAD *
rootFile
treeB/fileB

Then, repoB is merged into repoA under a "repoB" directory:

git-merge-repos (master)]$ ./run.sh /tmp/repoA:. /tmp/repoB:repoB

The global commit history is well preserved and the expected content is there:

merged-repo (master)]$ git log --oneline --stat 
fb4b830 Merge branch 'master' from multiple repositories
8f87ce7 commit B2
 rootFile    | 0
 treeB/fileB | 1 +
 2 files changed, 1 insertion(+)
dbdb5aa commit A2
 rootFile    | 0
 treeA/fileA | 1 +
 2 files changed, 1 insertion(+)
e7b06af commit B1
 treeB/fileB | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
e344ad0 commit A1
 treeA/fileA | 0
 1 file changed, 0 insertions(+), 0 deletions(-)

merged-repo (master)]$ git ls -r HEAD *
repoB/rootFile
repoB/treeB/fileB
rootFile
treeA/fileA

The repoA content history is fine:

(master)]$ git log --oneline treeA/fileA
dbdb5aa commit A2
e344ad0 commit A1

But it is not possible to log the history of repoB content with its current path beside the merge commit:

merged-repo (master)]$ git log --oneline repoB/treeB/fileB
fb4b830 Merge branch 'master' from multiple repositories
merged-repo (master)]$ git log --oneline --follow  repoB/treeB/fileB

It is only possible to get the history with the old path, using the --follow command (note also that the path separator is mandatory since the given path doesn't exist anymore):

merged-repo (master)]$ git log --oneline -- treeB/fileB
merged-repo (master)]$ git log --oneline --follow treeB/fileB
fatal: ambiguous argument 'treeB/fileB': unknown revision or path not in the working tree.
merged-repo (master)]$ git log --oneline --follow -- treeB/fileB
8f87ce7 commit B2
e7b06af commit B1

When a file existed in both repositories, the result is either only repoA history, either a mixed history:

merged-repo (master)]$ git log --oneline rootFile
dbdb5aa commit A2
merged-repo (master)]$ git log --oneline --follow rootFile
8f87ce7 commit B2
e7b06af commit B1

In most GUI tools, it is not possible at all to get the history of a file from repoB.

It looks like if the Git tree was missing a rename/move for repoB content from its root to its subdirectory after merge.

Also, the situation is a little bit improved with a manual commit ("prepare merge") moving the files before the merge:

repoB (master)]$ mkdir repoB
repoB (master)]$ git mv $(git ls HEAD) repoB/
repoB (master)]$ git commit -m"prepare merge" -a

# Then, target for run.sh is "." instead of "repoB":
git-merge-repos (master)]$ ./run.sh /tmp/repoA:. /tmp/repoB:.

merged-repo (master)]$ git log --oneline rootFile
dbdb5aa commit A2
merged-repo (master)]$ git log --oneline --follow rootFile
b3c8bfb prepare merge
8f87ce7 commit B2
e7b06af commit B1
merged-repo (master)]$ git log --oneline repoB/treeB/fileB
b3c8bfb prepare merge
merged-repo (master)]$ git log --oneline --follow repoB/treeB/fileB
b3c8bfb prepare merge
8f87ce7 commit B2
e7b06af commit B1

However, it's not that easy on a real repository since that "prepare merge" commit must be done on all branches and tags. Also, it pollutes a little the logs and is error-prone.
See https://gist.github.com/jcarsique/29ca0df9166926183e2f

It looks like if the Git tree was missing a rename/move for repoB content from its root to its subdirectory after merge.

Git doesn't track renames/moves, it tries to calculate them on the fly when doing git log and other commands. Also, I think Git is a bit confused by the trees being moved around in the merge commit.

Also, the situation is a little bit improved with a manual commit ("prepare merge") moving the files before the merge

That would be a nice enhancement to contribute to this project. I'm not looking into this currently, so if someone else wants to pick this up, please feel free to do so.