/gib

A script for backing-up with git, based on gibak

Primary LanguagePythonGNU Lesser General Public License v2.1LGPL-2.1

========================================================================

gib: Yet another git based backup system
----------------------------------------

gib is a script for backing up multiple directories to a single
git repository, so as to take advantage of git's efficient
storage of redundant data.  This script started as a rewrite of
gibak in Python 3.1, but now has significantly different
behaviour.

This has been only tested by myself (Mark) so far, so please use
this script with great caution.

You can find some brief guidance on installation and usage below.

  Copyright (c) 2007 Jean-Francois Richard <jean-francois@richard.name>
            (c) 2008 Mauricio Fernandez <mfp@acm.org>
            (c) 2009, 2010 Mark Longair <mark-gib@longair.net>

Installation:
-------------

The dependencies for this script can be installed with:

 sudo apt-get install git python3.1-minimal ocaml-nox rsync

You need to run "make" to create the binaries from ocaml.  Then
you need to copy (or symlink) the following files to somewhere on
your PATH:

  gib
  find-git-files
  find-git-repos
  ometastore

========================================================================

Usage A: (one home directory to back-up to ~/.git):
---------------------------------------------------

Create a repository in ~/.git/ for backing-up ~ to:

 $ gib init

Now edit ~/.gitignore here to exclude web browser caches,
temporary directories, etc.

Create your first backup of your home directory with:

 $ gib commit

Look at the output of "git status" which was printed at the end
of that output - check that you understand what is left as
modified or untracked after the commit.  (These may be files
that were created or modified while you were backing up, or
files that couldn't be added because they were too large or
unreadable.)

To record future states of your home directory, just run:

 $ gib commit

... again.

Usage B: (one home directory to back up to a removable disk):
-------------------------------------------------------------

Plug in your USB mass storage device, and say that's mounted as
/media/big-disk/.  Then you initialise a backup repository for
your home directory with:

 $ gib --git-directory=/media/big-disk/git-backups.git init

... and you can then commit the state of your home directory to
the master branch in that repository with:

 $ gib commit

... since the "init" step will create a file ~/.gib.conf which
specifies the git directory.  (As above, check that the "git
status" output looks reasonable.)

Usage C: (multiple home directories backed up to a single
repository on a removable disk):
---------------------------------------------------------

This is similar to Usage B, but you need to specify a branch
name for each home directory when you use "init".  For example,
if your computer is called "jupiter" and the removable disk is
mounted as above, you would initialize it with:

 $ gib --git-directory=/media/big-disk/git-backups.git \
       --branch=jupiter-home \
       init

Thereafter you can just run:

 $ gib commit

... to commit the state of your home directory as before.

Usage D: (multiple arbitrary directories backed up to a single
repository on a removable disk):
--------------------------------------------------------------

This is similar to usage D, but when initializing you need to
also specify the directory to back up.  e.g. if the situation is
as above, but you want to also back up the directory
/projects/robot/ to the same directory, you could do:

 $ gib --git-directory=/media/big-disk/git-backups.git \
       --directory=/projects/robot/ \
       --branch=robot \
       init

To record the state, you also have to specify the directory, so
that the configuration file that was created by init
(/projects/robot/.gib.conf) can be found.  i.e.:

 $ gib --directory=/projects/robot/ commit

========================================================================

Running arbitrary git commands on your backup:
----------------------------------------------

Your backup repository is just a normal git repository, but to
manipulate it you need to have the working tree and repository
set when calling git.  To make this simpler, you can just run
(for example):

 $ gib git status

... if you are backing up your home directory, or (for example):

 $ gib git -d /projects/robot status

... if you are backing up another directory.  (-d is just the
short form of --directory=, as used above.)

If you need to supply options to your git command, you will need
to use "--" to indicate to gib that it should stop parsing
options for itself.  So, for example, you would have to do:

 $ gib -- git ls-tree -r HEAD

... to stop gib from producing an error that it doesn't
understand the "-r" option.

========================================================================

Extracting a single file from your backup:
------------------------------------------

As an example, to output the most recent version of your .bashrc
to standard output, you can run:

 $ gib show .bashrc

You can also ask to see the file from an aribtrary version but
supplying a commit.  e.g. the version of bashrc from two backups
ago would be:

 $ gib show .bashrc HEAD^^

========================================================================

Extract a subdirectory (tree) from your backup:
-----------------------------------------------

(n.b. currently extracting a particular subtree with "extract"
will not preserve the permissions as "restore".  This is bug.)

To extract a complete subdirectory from your backup you can use
the "extract" subcommand of gib.  For example:

 $ gib extract Documents/papers/ /var/tmp/extracted-papers/

...  would create:

  /var/tmp/extracted-papers/Documents/papers/introduction.pdf
  /var/tmp/extracted-papers/Documents/papers/briefing.pdf
  /var/tmp/extracted-papers/Documents/papers/background.pdf
  ... etc.

========================================================================

Restoring a complete backup:
----------------------------

You can restore a backup to a directory with, for example:

 $ gib -g /media/big-disk/git-backups.git \
       -d /mnt/restored-home-directory \
       -b jupiter-home \
       restore

That will restore the directory that was backed-up to the branch
jupiter-home in the repository /media/big-disk/git-backups.git
to the directory /mnt/restored-home-directory, potentially
overwriting everything in that directory.

========================================================================

Saving disk space by "eating" large files:
------------------------------------------

If you have a large file which you would like to preserve in the
backup but no longer need in your home directory, you can use
the "eat" command to ensure that the contents of the files are
committed, and then removing them from the working tree.  For
example, you could use:

 $ gib eat disk-images/obselete-version.iso

... which will make sure that the current version of that file
is committed to your backup and then remove it from your home
directory.

========================================================================

Finding files in your backup:
-----------------------------

You can find files in the most recent backup that contain the
name "arthur", for example, with:

 $ gib -- git ls-files -r HEAD | grep -i arthur

If you want to search through the whole history of that
directory for a particular file, or find a file on any branch in
your repository, you might want to try the script
find-in-repository.py in this package.  For example, to find
files matching "arthur" in the complete history of your home
directory on jupiter, you might use:

 $ cd /media/big-disk/git-backups.git
 $ find-in-repository.py --all-history \
                         --start-ref=jupiter-home \
                         [aA]rthur

... or to find files matching that regular expression in any
branch in the repository in their entire history, you could try:

 $ cd /media/big-disk/git-backups.git
 $ find-in-repository.py --all-history [aA]rthur

(That will be very slow.)

If you know the file is under a particular tree, you can specify
that to speed up the search, e.g.:

 $ cd /media/big-disk/git-backups.git
 $ find-in-repository.py --start-tree=jupiter-home:audio [aA]rthur

The columns of output include the object name of the commit
which the files were found in, any readable ref names in the
second column (in brackets), the tree or blob's object name and
finally the name of the file.