MetaCRAM

Lossless compression tool for metagenomic reads

##Requirements and Dependencies

MetaCram is suitable for all unix-like operating systems with perl installation with perl modules 1) File::Slurp 2) Array::Utils 3) Capture::Tiny.

MetaCram uses the following software packages:

Kraken - available from https://ccb.jhu.edu/software/kraken/
- download minikraken database
Bowtie2 - available from http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
BLAST - available from http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download
- download nt database.
IDBA - available from http://i.cs.hku.hk/~alse/hkubrg/projects/idba/
CRAM - 3 jar files packaged together with the script
- ExtendedGolombCRAM.jar
- GolombCRAM.jar
- HuffmanCRAM.jar
SAMtools - available from samtools.sourceforge.net
MFCompress - available from http://bioinformatics.ua.pt/software/mfcompress/

You will need to download and install these packages before running MetaCRAM.

MetaCram also makes use of the bacterial genome library from NCBI, which can be downloaded from ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/all.fna.tar.gz. After decompressing the file, we need to combine multiple strands into one folder.

Alternatively, we recommend that you download our genus database in the specific format needed by MetaCRAM, called "genus_db_final.tgz", from https://uofi.box.com/s/sy92jv098eb0e4s277vuiod72m2jjf7c

##Commands to run MetaCRAM

Compression

perl MetaCram.pl --compress --output <output directory> --paired <path to reads> --<exGolomb, huffman, golomb>

Example:

[shared3]$ perl MetaCram.pl --compress --output /shared3/MetaCRAM_SRR359032_Huffman --paired /shared3/SRR359032_1.fasta /shared3/ SRR359032_2.fasta --huffman &>MetaCramLOG_SRR359032_Huffman.txt

Decompression

perl MetaDeCram.pl --input <path to folder containing the Round1 and Round2 folders>

Example:

[shared3]$ perl MetaDeCram.pl --input /shared3/MetaCRAM_processedSRR359032_Huffman/MetaCRAM &>decompressorLogSRR359032_Huffman.txt

(*--paired is optional)

(*<> indicates a choice)

(* to log, append “&> ”)

(Path to directories which contain various databases need to be pre-specified)

##Options For more options, run perl MetaCram.pl --help.

##Contact Minji Kim (mkim158@illinois.edu) and Xiejia Zhang (xzhan121@illinois.edu)

x-jesse-zhang/MetaCRAM

MetaCRAM