Create a snapshot of any directory, including your home folder, and commit it to a git repository. Any snapshot can then be mounted later for read-only browsing.
These scripts use git with the open source git-xet plugin. Using git-xet, all binary content is deduped within and across snapshots, allowing the data to be stored very efficiently.
-
Clone this repository. To get this to run reliably, you will need to edit the config.sh file with your settings.
-
Install the git-xet extension. Binaries are available from here.
-
Run
git xet install
to install the proper config settings.
Here are three ways to set up the repository and data store:
This setup assumes that you have a external drive or NAS mounted at a specific folder.
- Create the Repo. Create a git repository on the nas drive and enable the git-xet plugin using
repo_directory="/NAS/backup/snapshot-repo.git" # Change this to your directory
mkdir -p "$repo_directory" && cd "$repo_directory" && git init --bare && git xet init
- Create the Data Store. Simply create a directory to use as the data store.
data_directory="/NAS/backup/data" # Change this to your directory
mkdir -p "$data_directory"
- Edit the settings in config.sh to add these directories:
git_repo="/NAS/backup/snapshot-repo.git"
local_data_store="/NAS/backup/data"
-
Create a new repo on GitHub to use as the backup.
-
Install the XetHub App for github and select your repoository for git-xet configuration.
-
Configure the Data Store.
-
Managed: To store your data in the fully managed XetHub service, follow the instructions in XetHub Account Setup. No other configuration is needed.
-
Local: Follow the instructions for the data store above.
-
-
Edit the settings in config.sh to add this information:
git_repo="git@github.com:username/backup-repo.git"
If using a local data store, also add:
local_data_store="<backup-dir>"
, otherwise leave that variable empty.
The XetHub service is similar to github, but all binary data is conveniently accessible through the web interface.
-
Setup an account by following the instructions in XetHub Account Setup.
-
Create a repository and copy the appropriate URL (e.g.
xet@xethub.com:username/backup-repo.git
). -
Edit the settings in config.sh to add this information:
git_repo="xet@xethub.com:username/backup-repo.git"
-
Ensure the directory you want to snapshot is correct in config.sh. The default is
snapshot_dir=$HOME
to snapshot your home folder. -
To create a snapshot, simply run
./snapshot.sh
This requires the remote repo and optionally the data store to be set in config.sh.
To mount a snapshot at a specific time, use the provided script mount.sh. This is a convenience wrapper around git xet mount, which uses a local nfs server to mount the contents of a xet-enabled repository as a directory, with the file contents being downloaded and materialized lazily.
-
To list commits available:
./mount.sh --list
-
To mount a specific commit:
./mount.sh [COMMIT]
The path for this commit is displayed at the end.
-
To unmount all snapshots:
./mount.sh --unmount
Why not?
... But seriously, git historically has had issues handling enourmous, binary-heavy repositories that evolve over time. However, the ecosystem is changing, and there are now several tools that make this feasible, including git-xet.
There are a number of tools out that allow git to work with large data files. I wrote this tool on top of git-xet for a few reasons:
-
Binary Content Deduplication. Git-xet is currently the only open source git plugin that does full content-based deduplication across different commits and within the same commit. Thus if two files share common content, the common content is only stored once, even if the full file content differs. For more details on this, see our paper at Git is for Data.
-
No per-file configuration. Setup is once and done -- once a repo is set up, then all binary files and large text files pass through the git-xet plugin.
-
Easy browsing. Any commit can be mounted locally as a read-only folder while lazily materializing the data. The included
git xet mount
utility uses the nfsserve package to mount a read-only view of any git commit, allowing previous snapshots to be browsed without materializing the contents. -
Open source. While the integration with xethub has many nice perks, git-xet can be used entirely locally.
Full disclosure: I am a developer on the team building git-xet, so I'm biased. However, this tool was a great way to ensure that git can easily handle terabytes of data with minimal pain, which was our goal in building git-xet. Please let me know if you have any feedback, and please file issues if you have any problems.
The use of the fully managed XetHub service provides many perks, including reliable data storage and seamless minimal-configuration integration with git.
Signup on XetHub and obtain a username and personal access token. You should save this token for later use, or go to https://xethub.com/user/settings/pat to create a new token.
There are two ways to authenticate with XetHub:
Run the command given when you create your personal access token:
git xet login -e <email> -u <username> -p <personal_access_token>
git xet login will write authentication information to ~/.xetconfig
Environment variables may be sometimes more convenient:
export XET_USER_EMAIL = <email>
export XET_USER_NAME = <username>
export XET_USER_TOKEN = <personal_access_token>