/geco2

An improved compression tool for DNA sequences

Primary LanguageCGNU General Public License v3.0GPL-3.0

Conda License: GPL v3

GeCo2

Compress and analyze genomic sequences. As a compression tool, GeCo2 is able to provide additional compression gains over several top specific tools, while as an analysis tool, GeCo2 is able to determine absolute measures, namely for many distance computations, and local measures, such as the information content contained in each element, providing a way to quantify and locate specific genomic events. GeCo2 can afford:
  • reference-free compression
  • referential compression
    • relative compression
    • conditional compression

INSTALLATION

Conda

Install Miniconda, then run the following:

conda install -y -c bioconda geco2

Otherwise, CMake is needed for installation (http://www.cmake.org/). You can download it directly from http://www.cmake.org/cmake/resources/software.html or use an appropriate packet manager. In the following instructions we show the procedure to install, compile and run GeCo2:

STEP 1

Download, install and resolve conflicts.

Linux

#sudo apt-get install cmake git
git clone https://github.com/pratas/geco2.git
cd geco2/src/
cmake .
make

Alternatively, you can install (without cmake and only for linux) using

wget https://github.com/pratas/geco2/archive/master.zip
unzip master.zip
cd geco2-master/src/
mv Makefile.linux Makefile
make

macOS

Install brew:

ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"

only if you do not have it. After type:

brew install cmake
brew install wget
brew install gcc48
wget https://github.com/pratas/geco2/archive/master.zip
unzip master.zip
cd geco2-master/src/
cmake .
make

With some versions you might need to create a link to cc or gcc (after the brew install gcc48 command), namely

sudo mv /usr/bin/gcc /usr/bin/gcc-old   # gcc backup
sudo mv /usr/bin/cc /usr/bin/cc-old     # cc backup
sudo ln -s /usr/bin/gcc-4.8 /usr/bin/gcc
sudo ln -s /usr/bin/gcc-4.8 /usr/bin/cc

In some versions, the gcc48 is installed over /usr/local/bin, therefore you might need to substitute the last two commands by the following two:

sudo ln -s /usr/local/bin/gcc-4.8 /usr/bin/gcc
sudo ln -s /usr/local/bin/gcc-4.8 /usr/bin/cc

Windows

In windows use cygwin (https://www.cygwin.com/) and make sure that it is included in the installation: cmake, make, zcat, unzip, wget, tr, grep (and any dependencies). If you install the complete cygwin packet then all these will be installed. After, all steps will be the same as in Linux.

EXECUTION

Run GeCo2

Run GeCo2 using (lazy) level 5:

./GeCo2 -v -l 5 File.seq

PARAMETERS

To see the possible options type

./GeCo2

or

./GeCo2 -h

If you are not interested in setting the template for each model, then use the levels mode. To see the possible levels type:

./GeCo2 -s

CITATION

On using this software/method please cite:

Pratas, Diogo, Morteza Hosseini, and Armando J. Pinho. "GeCo2: An Optimized Tool for Lossless Compression and Analysis of DNA Sequences." International Conference on Practical Applications of Computational Biology & Bioinformatics. Springer, Cham, 2019.

NEW FEATURES IN VERSION 2

  1. Specific Gamma for each model;
  2. Specific Cache-hash sizes;
  3. Mode to allow to run with inverted repeats only;
  4. New interface layout (ascii);
  5. New approximate power function;
  6. Optimized functions;
  7. New 15 pre-computed modes for reference-free compression;

ISSUES

For any issue let us know at issues link.

LICENSE

GPL v3.

For more information:

http://www.gnu.org/licenses/gpl-3.0.html