/gabac

Genomics-oriented Context-Adaptive Binary Arithmetic Coding

Primary LanguageC++OtherNOASSERTION

GABAC

Genomics-oriented Context-Adaptive Binary Arithmetic Coding

Build Status

First off, we refer the reader to USAGE_POLICY.md.


Quick start on Linux

Build the GABAC library and the gabacify application:

mkdir build
cd build
cmake ..
make gabacify

Perform a test roundtrip from the build directory:

./gabacify encode -i ../resources/input_files/one_mebibyte_random
./gabacify decode -i ../resources/input_files/one_mebibyte_random.gabac_bytestream
diff ../resources/input_files/one_mebibyte_random ../resources/input_files/one_mebibyte_random.gabac_uncompressed

Comparing GABAC to other codecs

The Bash script scripts/perform_codec_comparison.sh can be used to compare the performance of GABAC to other tools. The scripts compresses and decompresses a test file using gzip, bzip2, xz (implementing the LZMA algorithm), rANS order 0 and rANS order 1 (see https://github.com/voges/rans.git), and gabacify when executing e.g. the following command from the scriptsdirectory:

./perform_codec_comparison ../resources/input_files/one_mebibyte_random

The compression and decompression times, the maximum RAM usage, and the compressed file sizes will be logged in the file ../resources/input_files/one_mebibyte_random.codec_stats.

NOTE: gabacify is designed to run on pieces of data which sizes lie below 1 GB. The entire input file will be read into memory and several buffers will be allocated. The estimated RAM usage for compressing a 1 GB file lies between 10 GB and 30 GB.

Continuous integration

Commits to this repository are continuously tested on Travis CI (https://travis-ci.org/voges/gabac). Take a look at .travis.yml to see what is being done on Travis' (virtual) machines.

Build system

We use CMake (https://cmake.org/) as build system and we provide a CMakeLists.txt to build GABAC.

Version control system

Branching

We use Git and we use the Gitflow workflow (https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow).

That means:

  • The master branch contains only release commits.
  • Every commit on the master branch is tagged according to Semantic Versioning 2.0.0 (see below).
  • Development generally takes place on the develop branch.
  • Actual development takes place in feature branches, e.g., feature/my_fancy_feature.
  • Once a feature is completed, its branch can be merged back into the develop branch.

Version numbers

We use the Semantic Versioning 2.0.0 (https://semver.org).

That means:

  • The release version number format is: MAJOR.MINOR.PATCH
  • Increment the
    • MAJOR version when you make incompatible API changes,
    • MINOR version when you add functionality in a backwards-compatible manner, and
    • PATCH version when you make backwards-compatible bug fixes.
  • Pre-release versions are denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version.
    • Example 1: 1.0.0-alpha.1 ("alpha version 1 of the planned major release 1")
    • Example 2: 1.0.0-beta.1 ("beta version 1 of the planned major release 1")
    • Example 3: 1.0.0-rc.1 ("release candidate (rc) 1 of the planned major release 1")
    • Note: 1.0.0-alpha.1 < 1.0.0-beta.1 < 1.0.0-rc.1 by definition (see https://semver.org)

Who do I talk to?

Jan Voges <voges@tnt.uni-hannover.de>

Tom Paridaens <tom.paridaens@ugent.be>

Mikel Hernaez <mhernaez@illinois.edu>