/arx

Store files and directory in an archive. Like tar, but faster and with direct random access.

Primary LanguageRustMIT LicenseMIT

What is arx

Arx is a file archive format based on the jubako container format.

It allow you to create, read, extract file archive (as zip or tar does).

Arx (and Jubako) is in active development. While it works pretty well, I do not recommand to use it to do backups. However, you can use it to transfer data or explore archives.

How it works

Jubako is a versatile container format, allowing to store data, compressed or not in a structured way. It main advantage (apart from its versability) is that is designed to allow quick retrieval of data fro the archive without needing to uncompress the whole archive.

Arx use the jubako format and create arx archive which:

  • Store file's data compressed.
  • Store files using a directory/tree structure.
  • Can do random access on the arx archive to get a specific files
  • Allow to mount the archive to explore and use (read only) the files in the archive without decompressing it.

Install arx

Binaries for Windows, MacOS and Linux are available for every release. You can also install arx using Cargo:

cargo install arx

Use arx

Create an archive

Creating an archive is simple :

arx create -o my_archive.arx -r my_directory

It will one file : my_archive.arx, which will contains the my_directory directory.

Extract an archive

Extracting (decompressing) an archive is done with :

arx extract my_archive.arx -C my_out_dir

Listing the content of an archive

You can list the content of (the list of files in) the archive with :

arx list my_archive.arx

And if you want to access to the content of only one file :

arx dump my_archive.arx my_directory/path/to/my_file > my_file
# or
arx dump my_archive.arx my_directory/path/to/my_file -o my_file

Mounting the archive

On linux, you can mount the archive using fuse.

mkdir mount_point
arx mount my_archive.arx mount_point

If you don't provide a mount_point, arx will create a temporary one for you

arx mount my_archive.arx # Will create my_archive.arx.xxxxxx

arx will be running until you unmount mount_point.

Converting a zip archive to an arx archive

zip2arx -o my_archive.arx my_zip_archive.zip

Converting a tar archive to an arx archive

tar2arx -o my_archive.arx my_tar_archive.tar.gz

or

tar2arx -o my_archive.arx https://example.com/my_tar_archive.tar.gz

Performance

The following compare the performance of Arx to different archive formats.

  • Arx, Tar, Squasfs is compressed the content using zstd, level 5.
  • Zip is compressed using level 9
  • Fs is FileSystem (no archive). Archive creation and extraction is simulated with cp -a.

Tests has been done on different data sets :

  • the whole linux kernel (linux-5.19)
  • the drivers directory in linux kernel
  • the document directory in the linux kernel

Source directory is stored on a sdd. All test are run on a tmpfs (archive and extracted files are stored in memory).

Mount diff time is the time to diff the mounted archive with the source directory

arx mount archive.arx mount_point &
time diff -r mount_point/linux-5.19 linux-5.19
umount mount_point

Mounting the tar and zip archive is made with archivemount tool. Squashfs is mounted using kernel. SquashfsFuse is mounted using fuse API. Arx mount is implemented using fuse API.

Linux doc

Documentation directory only of linux source code:

Type Creation Size Extract Listing Mount diff Dump
Arx 150ms963μs 11.10 MB 038ms395μs 004ms051μs 299ms764μs 005ms618μs
FS 150ms639μs 38.45 MB 106ms821μs 006ms962μs 077ms414μs 498μs
Squashfs 103ms076μs 10.60 MB 098ms787μs 005ms365μs 261ms533μs 002ms088μs
SquashfsFuse 097ms863μs 10.60 MB - - 748ms597μs -
Tar 141ms079μs 9.68 MB 065ms744μs 041ms015μs 02m41s 042ms143μs
Zip 01s083ms 15.22 MB 388ms720μs 037ms044μs 03m06s 014ms088μs

This is the ratio time / Arx time. A ratio greater than 100% means Arx is better.

Type Creation Size Extract Listing Mount diff Dump
FS 100% 346% 278% 172% 26% 9%
Squashfs 68% 95% 257% 132% 87% 37%
SquashfsFuse 65% 95% - - 250% -
Tar 93% 87% 171% 1012% 53997% 750%
Zip 718% 137% 1012% 914% 62350% 251%

Linux Driver

Driver directory only of linux source code:

Type Creation Size Extract Listing Mount diff Dump
Arx 01s060ms 98.23 MB 241ms699μs 009ms516μs 01s290ms 007ms193μs
FS 778ms095μs 799.02 MB 523ms191μs 021ms578μs 467ms559μs 495μs
Squashfs 829ms886μs 121.70 MB 435ms851μs 012ms289μs 01s629ms 002ms190μs
SquashfsFuse 829ms237μs 121.70 MB - - 03s823ms -
Tar 911ms042μs 97.96 MB 515ms178μs 472ms060μs - 504ms231μs
Zip 20s498ms 141.91 MB 03s665ms 098ms194μs - 034ms481μs

This is the ratio time / Arx time. A ratio greater than 100% means Arx is better.

Type Creation Size Extract Listing Mount diff Dump
FS 73% 813% 216% 227% 36% 7%
Squashfs 78% 124% 180% 129% 126% 30%
SquashfsFuse 78% 124% - - 296% -
Tar 86% 100% 213% 4961% - 7010%
Zip 1932% 144% 1516% 1032% - 479%

Linux Source Code

Type Creation Size Extract Listing Mount diff Dump
Arx 02s104ms 170.97 MB 435ms846μs 022ms238μs 02s829ms 010ms613μs
FS 01s605ms 1.12 GB 01s046ms 043ms358μs 943ms546μs 493μs
Squashfs 01s430ms 201.43 MB 725ms532μs 024ms050μs 03s272ms 002ms374μs
SquashfsFuse 01s417ms 201.43 MB - - 13s864ms -
Tar 01s479ms 168.77 MB 938ms758μs 799ms550μs - 802ms427μs
Zip 31s810ms 252.96 MB 06s260ms 256ms137μs - 045ms722μs

This is the ratio time / Arx time. A ratio greater than 100% means Arx is better.

Type Creation Size Extract Listing Mount diff Dump
FS 76% 674% 240% 195% 33% 5%
Squashfs 68% 118% 166% 108% 116% 22%
SquashfsFuse 67% 118% - - 490% -
Tar 70% 99% 215% 3595% - 7561%
Zip 1511% 148% 1436% 1152% - 431%

The kernel compilation is the time needed to compile the whole kernel with the default configuration (-j8). For arx, we are compiling the kernel using the source in the archive mounted in mount_point.

Kernel compilation is made is "real" condition. Source or arx archive are stored on ssd.

Type Compilation
Arx 40m
FS 32m

Arx archive are a bit bigger (about 1%) than tar.zst archive but 15% smaller that squashfr. Creation and full extraction time are a bit longer for arx but times are comparable.

Listing files ar accessing individual files from the archive is far more rapid using arx or squash. Access time is almost constant indpendently of the size of the archive. For tar however, time to access individual file is greatly increasing when the archive size is increasing.

Mounting a arx archive make the archive usable without extracting it. A simple diff -r takes 4 more time than a plain diff between two directories but it is a particular use case (access all files "sequentially" and only once).

But for linux documentation arx is 444 time quicker than tar (several hours). The bigger the tar archive is the bigger is this ratio. I haven't try to do a mount-diff for the full kernel.

For kernel compilation, the overhead is about 25%. But on the opposite side, you can compile the kernel without storing 1.3GB of source on your hard drive.