/SHOGUN

SHallow shOtGUN profiler

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

Shallow shotgun sequencing

Shallow seq pipeline for optimal shotgun data usage

Installation

These installation instructions are streamlined for Linux and macOS systems. The tool SHOGUN is installable on windows with a few minor tweaks to this tutorial. This package requires anaconda, which is a system agnostic package and virtual environment manager. Follow the installation instructions for your system at http://conda.pydata.org/miniconda.html.

The Easy Way

Once anaconda is installed, get the environment file:

wget https://raw.githubusercontent.com/knights-lab/SHOGUN/master/environment.yml

Then install the requirements into the environment 'shogun':

conda env create -f environment.yml

The Harder Way

Once anaconda is installed, create an environment:

conda create -n shogun python=3

Now activate the environment.

# OSX, Linux
source activate shogun

With the shogun environment activated, install the developmental SHOGUN toolchain.

# If you want to use bowtie2
conda install -c bioconda bowtie2

# NINJA-utils
pip install git+https://github.com/knights-lab/NINJA-utils.git --no-cache-dir --upgrade

# DOJO
pip install git+https://github.com/knights-lab/DOJO.git --no-cache-dir --upgrade

# SHOGUN
pip install git+https://github.com/knights-lab/SHOGUN.git --no-cache-dir --upgrade

With the flags provided to pip, copying and pasting any of these commands will redo the installation if a failure happened.

If you are installing SHOGUN for BugBase, you are done. The database is provided for you.

Building a Database

Next, to test the installation, download the test data.

wget https://www.dropbox.com/s/b5w4xe08x7snm93/shogun_test_files.zip?dl=1

Extract the folder using your favorite extraction utility.

7z x <downloaded file>

Next you create the database.

shogun_bt2_db -i ./test.hmp_species.fna -x '>, '

This will take some time, the DOJO software is lazy loading the NCBI Taxonomy.

shogun_bt2_lca -i ./mock_communities -b ./annotated/bt2/test.hmp_species

The results of the taxonomy counts will be in the taxon_counts.csv 🐱‍👤

To run it with UTree

shogun_utree_db -i ./test.hmp_species.fna -x '>, '

The run LCA:

shogun_utree_lca -i ./mock_communities -u ./annotated/utree/test.hmp_species.ctr

Introduction to Functional Profiling

As of 1/10/17 the only supported functional profiling is through bowtie2 and the IMG database.

# Align reads to IMG
# Input directory has one FASTA (.fna extension) file per sample
shogun_functional -i <input directory> -o <output directory> -l False -b /project/flatiron/tonya/img_bowtie_builds/img.gene.bacteria.bowtie

# Input is a folder filled with SAM files, one SAM file per sample
kegg_parse_img_ids -i <output folder from shogun_functional> -o <location of the kegg.csv file>

# Input file is the KEGG csv file from the kegg_parse_img_ids
# -m mapping file for IMG gene to ko-map was generated by you in the spring
kegg_predictions -i <kegg.csv file> -o <final output>  --algorithm intersection -m /project/flatiron/data/img/img-gene-ko-map.txt