This project contains a bunch of tools to help analyse the largest blobs (by "on disk" storage) in a repository.
Here is a sample sequence of commands showing typical usage:
-
Typically start with a clean clone of the repository that you want to analyse. It can be bare. For reasonable performance it should be cloned onto "local" disk on a reasonably fast Linux machine.
-
Add these tools to your
PATH
or use a full path to each script or executable. -
Run these tools from the repository undergoing analysis and cleaning.
-
Work out a suitable threshold size by running
generate-larger-than
with experimental parameters. 50000 might be a good starting point. The size is "average bytes after compression by Git". -
Generate a sorted list of objects with file information
generate-larger-than 50000 | sort -k3n | add-file-info >../largeobjs.txt
-
Make a report showing the summary of each commit together with the paths which introduce the large objects, their uncompressed size and file information
report-on-large-objects ../largeobjs.txt
-
Create a temporary work directory and export
RFWORK_DIR
to point to this directory (defaults to the current directory). -
Again, run all commands from the repository being analysed.
-
From the above report, edit down a list of blob ids that can be eliminated. Call this
large-objects.txt
. -
Generate a remove script
make-remove-blobs large-objects.txt >"$RFWORK_DIR"/remove-blobs.pl chmod +x "$RFWORK_DIR"/remove-blobs.pl
-
Optionally edit the remove script to filter out any paths that are not required at the same time
-
Run the filter branch
run-filter-branch
-
Create a new "easy rebase" script for moving work-in-progess branches from the old history to the new history
make-mtnh >"$RFWORK_DIR"/move-to-new-history
-
Push the rewritten refs and the
rewrite-commit-map
branch to all central repositories -
Deploy
move-to-new-history
for users to use