Duplicacy is a new generation cross-platform cloud backup tool based on the idea of Lock-Free Deduplication. It is the only cloud backup tool that allows multiple computers to back up to the same storage simultaneously without using any locks (thus readily amenable to various cloud storage services).
The repository hosts source code, design documents, and binary releases of the command line version. There is also a Duplicacy GUI frontend built for Windows and Mac OS X available from https://duplicacy.com.
Duplicacy currently supports major cloud storage providers (Amazon S3, Google Cloud Storage, Microsoft Azure, Dropbox, Backblaze, Google Drive, Microsoft OneDrive, and Hubic) and offers all essential features of a modern backup tool:
- Incremental backup: only back up what has been changed
- Full snapshot : although each backup is incremental, it must behave like a full snapshot for easy restore and deletion
- Deduplication: identical files must be stored as one copy (file-level deduplication), and identical parts from different files must be stored as one copy (block-level deduplication)
- Encryption: encrypt not only file contents but also file paths, sizes, times, etc.
- Deletion: every backup can be deleted independently without affecting others
- Concurrent access: multiple clients can back up to the same storage at the same time
- Snapshot migration: all or selected snapshots can be migrated from one storage to another
The key idea of Lock-Free Deduplication can be summarized as follows:
- Use variable-size chunking algorithm to split files into chunks
- Store each chunk in the storage using a file name derived from its hash, and rely on the file system API to manage chunks without using a centralized indexing database
- Apply a two-step fossil collection algorithm to remove chunks that become unreferenced after a backup is deleted
The design document explains lock-free deduplication in detail.
Duplicacy is written in Go. You can build the executable by running the following commands:
git clone https://github.com/gilbertchen/duplicacy.git
cd duplicacy
go get ./...
go build main/duplicacy_main.go
You can also visit the releases page to download the version suitable for your platform. Installation is not needed.
Once you have the Duplicacy executable on your path, you can change to the directory that you want to back up (called repository) and run the init command:
$ cd path/to/your/repository
$ duplicacy init mywork sftp://user@192.168.1.100/path/to/storage
This init command connects the repository with the remote storage at 192.168.1.00 via SFTP. It will initialize the remote storage if this has not been done before. It also assigns the snapshot id mywork to the repository. This snapshot id is used to uniquely identify this repository if there are other repositories that also back up to the same storage.
You can now create snapshots of the repository by invoking the backup command. The first snapshot may take a while depending on the size of the repository and the upload bandwidth. Subsequent snapshots will be much faster, as only new or modified files will be uploaded. Each snapshot is identified by the snapshot id and an increasing revision number starting from 1.
$ duplicacy backup -stats
Duplicacy provides a set of commands, such as list, check, diff, cat history, to manage snapshots:
$ duplicacy list # List all snapshots
$ duplicacy check # Check integrity of snapshots
$ duplicacy diff # Compare two snapshots, or the same file in two snapshots
$ duplicacy cat # Print a file in a snapshot
$ duplicacy history # Show how a file changes over time
The restore command rolls back the repository to a previous revision:
$ duplicacy restore -r 1
The prune command removes snapshots by revisions, or tags, or retention policies:
$ duplicacy prune -r 1 # Remove the snapshot with revision number 1
$ duplicacy prune -t quick # Remove all snapshots with the tag 'quick'
$ duplicacy prune -keep 1:7 # Keep 1 snapshot per day for snapshots older than 7 days
$ duplicacy prune -keep 7:30 # Keep 1 snapshot every 7 days for snapshots older than 30 days
$ duplicacy prune -keep 0:180 # Remove all snapshots older than 180 days
The first time the prune command is called, it removes the specified snapshots but keeps all unreferenced chunks as fossils. Since it uses the two-step fossil collection algorithm to clean chunks, you will need to run it again to remove those fossils from the storage:
$ duplicacy prune # Chunks from deleted snapshots will be removed if deletion criteria are met
To back up to multiple storages, use the add command to add a new storage. The add command is similar to the init command, except that the first argument is a storage name used to distinguish different storages:
$ duplicacy add s3 mywork s3://amazon.com/mybucket/path/to/storage
You can back up to any storage by specifying the storage name:
$ duplicacy backup -storage s3
However, snapshots created this way will be different on different storages, if the repository has been changed during two backup operations. A better approach, is to use the copy command to copy specified snapshots from one storage to another:
$ duplicacy copy -r 1 -to s3 # Copy snapshot at revision 1 to the s3 storage
$ duplicacy copy -to s3 # Copy every snapshot to the s3 storage
The User Guide contains a complete reference to all commands and other features of Duplicacy.
Duplicacy currently supports local file storage, SFTP, and 5 cloud storage providers.
Storage URL: /path/to/storage (on Linux or Mac OS X)
C:\path\to\storage (on Windows)
Storage URL: sftp://username@server/path/to/storage
Login methods include password authentication and public key authentication. Due to a limitation of the underlying Go SSH library, the key pair for public key authentication must be generated without a passphrase. To work with a key that has a passphrase, you can set up SSH agent forwarding which is also supported by Duplicacy.
Storage URL: dropbox://path/to/storage
For Duplicacy to access your Dropbox storage, you must provide an access token that can be obtained in one of two ways:
-
Create your own app on the Dropbox Developer page, and then generate the access token
-
Or authorize Duplicacy to access its app folder inside your Dropbox (following this link), and Dropbox will generate the access token (which is not visible to us, as the redirect page showing the token is merely a static html hosted by Dropbox)
Dropbox has two advantages over other cloud providers. First, if you are already a paid user then to use the unused space as the backup storage is basically free. Second, unlike other providers Dropbox does not charge bandwidth or API usage fees.
Storage URL: s3://amazon.com/bucket/path/to/storage (default region is us-east-1)
s3://region@amazon.com/bucket/path/to/storage (other regions must be specified)
You'll need to input an access key and a secret key to access your Amazon S3 storage.
Storage URL: gcs://bucket/path/to/storage
Starting from version 2.0.0, a new Google Cloud Storage backend is added which is implemented using the official Google client library. You must first obtain a credential file by authorizing Dupliacy to access your Google Cloud Storage account or by downloading a service account credential file.
You can also use the s3 protocol to access Google Cloud Storage. To do this, you must enable the s3 interoperability in your Google Cloud Storage settings and set the storage url as s3://storage.googleapis.com/bucket/path/to/storage
.
Storage URL: azure://account/container
You'll need to input the access key once prompted.
Storage URL: b2://bucket
You'll need to input the account id and application key.
Backblaze's B2 storage is not only the least expensive (at 0.5 cent per GB per month), but also the fastest. We have been working closely with their developers to leverage the full potentials provided by the B2 API in order to maximumize the transfer speed.
Storage URL: gcd://path/to/storage
To use Google Drive as the storage, you first need to download a token file from https://duplicacy.com/gcd_start by authorizing Duplicacy to access your Google Drive, and then enter the path to this token file to Duplicacy when prompted.
Storage URL: one://path/to/storage
To use Microsoft OneDrive as the storage, you first need to download a token file from https://duplicacy.com/one_start by authorizing Duplicacy to access your OneDrive, and then enter the path to this token file to Duplicacy when prompted.
Storage URL: hubic://path/to/storage
To use Hubic as the storage, you first need to download a token file from https://duplicacy.com/hubic_start by authorizing Duplicacy to access your Hubic drive, and then enter the path to this token file to Duplicacy when prompted.
Hubic offers the most free space (25GB) of all major cloud providers and there is no bandwidth charge (same as Google Drive and OneDrive), so it may be worth a try.
duplicity works by applying the rsync algorithm (or more specific, the librsync library) to find the differences from previous backups and only then uploading the differences. It is the only existing backup tool with extensive cloud support -- the long list of storage backends covers almost every cloud provider one can think of. However, duplicity's biggest flaw lies in its incremental model -- a chain of dependent backups starts with a full backup followed by a number of incremental ones, and ends when another full backup is uploaded. Deleting one backup will render useless all the subsequent backups on the same chain. Periodic full backups are required, in order to make previous backups disposable.
bup also uses librsync to split files into chunks but save chunks in the git packfile format. It doesn't support any cloud storage, or deletion of old backups.
Obnam got the incremental backup model right in the sense that every incremental backup is actually a full snapshot. Although Obnam also splits files into chunks, it does not adopt either the rsync algorithm or the variable-size chunking algorithm. As a result, deletions or insertions of a few bytes will foil the deduplication. Deletion of old backups is possible, but no cloud storages are supported. Multiple clients can back up to the same storage, but only sequential access is granted by the locking on-disk data structures. It is unclear if the lack of cloud backends is due to difficulties in porting the locking data structures to cloud storage APIs.
Attic has been acclaimed by some as the Holy Grail of backups. It follows the same incremental backup model as Obnam, but embraces the variable-size chunk algorithm for better performance and better deduplication. Deletions of old backup is also supported. However, no cloud backends are implemented, as in Obnam. Although concurrent backups from multiple clients to the same storage is in theory possible by the use of locking, it is not recommended by the developer due to chunk indices being kept in a local cache. Concurrent access is not only a convenience; it is a necessity for better deduplication. For instance, if multiple machines with the same OS installed can back up their entire drives to the same storage, only one copy of the system files needs to be stored, greatly reducing the storage space regardless of the number of machines. Attic still adopts the traditional approach of using a centralized indexing database to manage chunks, and relies heavily on caching to improve performance. The presence of exclusive locking makes it hard to be adapted for cloud storage APIs and reduces the level of deduplication.
restic is a more recent addition. It is worth mentioning here because, like Duplicacy, it is written in Go. It uses a format similar to the git packfile format, but not exactly the same. Multiple clients backing up to the same storage are still guarded by locks. A command to delete old backups is in the developer's plan. S3 storage is supported, although it is unclear how hard it is to support other cloud storage APIs because of the need for locking. Overall, it still falls in the same category as Attic. Whether it will eventually reach the same level as Attic remains to be seen.
The following table compares the feature lists of all these backup tools:
Feature/Tool | duplicity | bup | Obnam | Attic | restic | Duplicacy |
---|---|---|---|---|---|---|
Incremental Backup | Yes | Yes | Yes | Yes | Yes | Yes |
Full Snapshot | No | Yes | Yes | Yes | Yes | Yes |
Deduplication | Weak | Yes | Weak | Yes | Yes | Yes |
Encryption | Yes | Yes | Yes | Yes | Yes | Yes |
Deletion | No | No | Yes | Yes | No | Yes |
Concurrent Access | No | No | Exclusive locking | Not recommended | Exclusive locking | Lock-free |
Cloud Support | Extensive | No | No | No | S3 only | S3, GCS, Azure, Dropbox, Backblaze, Google Drive, OneDrive, and Hubic |
Snapshot Migration | No | No | No | No | No | Yes |
Duplicacy CLI is released under the Fair Source 5 License, which means it is free for individual users or any company or organization with less than 5 users. If your company or organization has 5 or more users, then a license for the actual number of users must be purchased from duplicacy.com.
A user is defined as the owner of any files to be backed up by Duplicacy. If you are an IT administrator who uses Duplicacy to back up files for your colleagues, then each colleague will be counted in the user limit permitted by the license.