Utility to archive Linux filesystem with libzip and bspatch; please note this is still in development and highly experimental. This utility will primarily write zip files, starting with a larger one and then potentially only writing timestamped deltas, so that a given archive/backup will be relatively smaller at given points in times.
Main purpose of this utility is to have
- a relatively quick full backup creation
- fast backup updates/deltas over time
- ability to restore files with original timing, ownership and permissions
- usage of a common container (zip) format
fsarchive would first create a zip archive of a given set of directories/files. As example:
./fsarchive -a /archive/path /home/user1
Would create timestamp archive(s) under /archive/path
for all content (not sollowing symlinks) of /home/user1
. At the end of this command one would have a zip file under /archive/path
of the name as fsarc_20230120_221035.zip
. The first time running this command the file would be quite sizeable, containing all the files/directories under /home/user1
.
One week elapses, and we want to update the archive/backup. By executing again:
./fsarchive -a /archive/path /home/user1
A new file as fsarc_20230127_190136.zip
would be created under /archive/path
, this time contaning only the delta and new files that have been changed over one week since the original archive.
This mechinism would work for as amny archives/snapshots will be created over time.
All the deltas would be in the form a full new files or binary patches created through bsdiff/bspatch. If the latter case, only such information will be saved for changed files in delta archives, thus reducing the required space needed for such archive.
You need to have the standard gcc/g++ and libzip-dev installed (for example on ubuntu is sudo apt install libzip-dev
) and then invoke
make -j32 #put your cpu cores
or
make release -j32 #put your cpu cores
Then you can copy the executable fsarchive to your favourite $PATH
location of your chosing.
As per --help option:
Usage: ./fsarchive [options] dir1 dir2 ...
Executes fsarchive 0.3.2
Archive options
-a, --archive (dir) Archives all input files (dir1, dir2, ...) and directories inside
(dir)/fsarchive_<timestamp>.zip and/or updates existing archives generating a new
and/or delta (dir)/fsarchive_<timestamp>.zip
--comp-level (l) Sets the compression level to (l) (from 1 to 9) where 1 is fastest and 9 is best.
0 is default
-f, --comp-filter (f) Excludes files from being compresses; this option follows same format as -x option
and can be repeated multiple times; files matching such expressions won't be compressed
Files that are excluded from compression are also excluded from bsdiff deltas
--no-comp Flag to create zip files without any compression - default off
--force-new-arc Flag to force the creation of a new archive (-a option) even if a previous already
exists (i.e. no delta archive would be created)
-b, --use-bsdiff When creating delta archives do store file differences as bsdiff/bspatch data
Please note this may be rather slow and memory hungry
-x, --exclude (str) Excludes from archiving all the files/directories which match (str); if you want
to have a 'contain' search, do specify the "*(str)*" pattern (i.e. -x "*abc*"
will exclude all the files/dirs which contain the sequence 'abc').
If instead you want to specify a single token of characters, you can use '?'. This
wildcard is useful to specify specific directories/file names counts (i.e. the string
'/abc/?/?.jpg' will match all files/directories such as '/abc/d0/file0.jpg' but would
not match a name such as '/abc/def/d0/file0.jpg')
Please note that the only wildcards supported are * and ?, everything else will be
interpreted as a literal character.
You can specify multiple exclusions (i.e. -x ex1 -x ex2 ... )
--size-filter (sz) Set a maximum file size filter of size (sz); has to be a positive value (bytes) and
can have suffixes such as k, m and g to respectively interpret as KiB, MiB and GiB
-X, --builtin-excl Flag to enable builtin exclusions; currently those are:
/home/?/.cache/*
/home/?/snap/firefox/common/.cache/*
/tmp/*
/dev/*
/proc/*
--crc32-check When creating delta archives, use CRC32 to establish if a file has changed, otherwise
only size and last modified timestamp will be used; the latter (no CRC32 check) is
default behaviour
Restore options
-r, --restore (arc) Restores files from archive (arc) into current dir or ablsolute path if stored so
Specify -d to allow another directory to be the target destination for the restore
-d, --restore-dir (dir) Sets the restore directory to this location
--no-metadata Do not restore metadata (file/dir ownership, permission and times)
Generic options
-v, --verbose Set log to maximum level
--dry-run Flag to execute the command as indicated without writing/amending any file/metadata
--help Prints this help and exit
Brief descriptions of archive and delta creations
All the zip files are created with default compression options and deflate algorithm. fsarchive leverages the zip format extension to store metadata, specifically for each file we store:
typedef struct _stat64 {
mode_t fs_mode; // st_mode from lstat64
uid_t fs_uid; // st_uid from lstat64
gid_t fs_gid; // st_gid from lstat64
uint32_t fs_type; // fsarchive file type (new, unchanged, delta)
time_t fs_atime; // st_atime from lstat64
time_t fs_mtime; // st_mtime from lstat64
time_t fs_ctime; // st_ctime from lstat64
off64_t fs_size; // st_size from lstat64
char fs_prev[32]; // fsarchive previous archive to find unchanged file or file to apply a patch (can be recursive file1 --> patch0 --> patch1 ...)
} stat64_t;
In short, we save some fields from the output of lstat64 and a specific couple of fsarchive are added (see zip_fs.h). libzip (and in general the zip format) already saves some metadata, but is not as accurate as the one returned by lstat64 (some time values are off by a second), hence the lstat64 data is used.
bsdiff/bspatch are used to diff and then re-create files (see fsarchive.cpp for more insight); by default this option is disabled, to enable specify -b
or --use-bsdiff
.
Due to the above binary patching, the memory requirements when running fsarchive are potentially high - one should have at least +2x of largest file being archived of memory available when creating/restoring archives. For this reason, the options -x and/or --size-filter and/or -f are quite handy.
When running in bsdiff/bspatch mode, such patches will be created in the /tmp
filesystem (named as /tmp/fsarc-bsdiff-XXXXXX); do ensure enough disk space is free for the same. The files will be automatically removed upon program termination; interrupting/killing the program may leave these files on the filesystem.
By default this utility will use the file size and last modified time to determine if two files are the same - optionally one can enable --crc32-check to also check the CRC32 (leveraged because is inherently part of the zip format); this of course will imply longer time for delta archival because all the files which are identical between the previous archive and the current delta one will have to be fully read and check-sum with CRC32.
Archive all home directories, filtering files greater than 16 GiB, forcing the creation of a new base archive, excluding the content of the .cache subdirectories inside home:
sudo fsarchive --size-filter 16g --force-new-arc -x '/home/?/.cache/*' -a /archive/dir /home
Archive home directories, writing delta if necessary, excluding caches and avoid compressing some files:
sudo fsarchive -f '*.jpg' -f '*.png' -x '/home/?/.cache/*' -a /archive/dir /home
Restore a given archive/snap not under the original path, but under a new location:
sudo fsarchive -r /archive/dir/fsarc_20230110_000056.zip -d /my/new/location
Thanks to:
- libzip to manage zip archives
- bsdiff/bspatch to create binary diff/patches