Scripts for extracting a history of a subfolder of a git repository into a separate repository.
I needed to extract a repository with MPS Contrib files from the main MPS repository. My requirements were:
- retain file history through renames/moves, even when the files were moved outside of the specified subfolder (this allows to extract multiple folders by moving them into one subfolders first);
- generate a nice readable history without a web of unnecessary merge commits;
- find a solution that will work on a big repository in a reasonable time.
Helpful articles:
- How to Move Folders Between Git Repositories
- git filter-branch '--subdirectory-filter' preserving '--no-ff' merges
-
For every file extract full history (with renames): all commits where the file was changed plus all the paths to the file (
smartlog.sh
). The output contains two files:revisions.txt
with commit ids andhistory.txt
with file names. File history algorithm that is used here is described in GitFileHistory.java from IntelliJ IDEA source code. -
Create a branch only with commits in revisions.txt (
FilterByRevisions.java
). It walks all revisions from HEAD that are in revisions.txt maintaining a set of "roots" -- ends of current branch without parents. Every commit it tries to attach to one of the roots or their children. The output is a bash scriptbuildtree.sh
which generates a resulting graph of commits usingcommit-tree
command. -
Execute
buildtree.sh
and checkout aHEAD
of generated graph. If there are several heads, all but one are lost. -
Filter HEAD by paths and leave only paths from
history.txt
. This is done byfilter.sh
.
Disclaimer: this is a "works on my machine" project, so use at your own risk.
Execute scripts/main.sh <PATH TO REPOSITORY> <RELATIVE PATH TO THE FOLDER TO EXTRACT>