Utilities for efficient git
history rewrites similar to (but ultimately
different from) git filter-branch --index-filter
.
The main conceptual difference to --index-filter
is that any rewrite of a
tree must depend only on the tree itself, but not on the commit or its (new)
parents. This means that history rewrites can be split into
- rewriting trees
- rewriting commits
The tree rewrites can be parallelized and cached, which can make a
huge performance difference compared with --index-filter
.
See also Large scale Git history rewrites.
- python 3.6
- pygit2 (tested with 0.25 and 0.26).
In general, in order to perform your own special rewrite, you will have to
implement a python module with a class deriving from the
git_tree_filter.tree_filter.TreeFilter
class. There are two example
modules in the source code, from which you may learn, or which may already fit
your needs.
This module is designed to unpack all .gz
files in the current repository.
However, it can be used to run arbitrary shell commands to all files of a
given extension. Usage:
python3 git_tree_filter unpack [EXT] [COMMAND] -- --branches --tags
The given command will be called for each distinct file in the repo's history that has the specified extension – with the file content on its STDIN and must output the replacement file content on its STDOUT.
For more details, see git unpack: efficient tree filter.
Remove a few specific files (not dirs) from repository. Usage:
python3 git_tree_filter rm [PATH...] -- --branches --tags
This module is a helper to change the role of a subdirectory in a git repo to a submodule.
For more details, see git dir2mod: subdir to submodule.
Convert the files with the specified extension to unix line endings, and remove trailing whitespace from each line, as well as trailing blank lines, while ensuring that the file ends with a newline. Usage:
python3 git_tree_filter dos2unix [EXT] -- --branches --tags