/hocort

HoCoRT - Remove specific organisms from sequencing reads

Primary LanguagePythonMIT LicenseMIT

HoCoRT

install with bioconda anaconda download page
BMC Bioinformatics - HoCoRT: host contamination removal tool
bioRxiv - HoCoRT: Host contamination removal tool

Host Contamination Removal Tool (HoCoRT)

Removes specific organisms from sequencing reads on Linux.

Supports un-/paired FastQ input. Outputs in FastQ format.

Dependencies

Python 3.7+
External programs:

Installing with Bioconda

To install with Bioconda run the following command:

conda install -c conda-forge -c bioconda hocort

HoCoRT's dependencies may conflict with existing packages. This can be solved by installing HoCoRT in a separate environment. To create a new conda environment and install HoCoRT run the following command:

conda create -n hocort -c conda-forge -c bioconda hocort

Manual Installation

First ensure that there is no conda environment called "hocort".
Now download the necessary files:

wget https://raw.githubusercontent.com/ignasrum/hocort/main/install.sh && wget https://raw.githubusercontent.com/ignasrum/hocort/main/environment.yml

After downloading the files, run the installation bash script to install HoCoRT:

bash ./install.sh

The installation is done. Activate the Conda environment:

conda activate hocort

Using HoCoRT

Pipeline naming

Pipelines are named after the tools they utilize. For example, the pipeline bowtie2 uses Bowtie2 to map the reads, and kraken2bowtie2 first classifies using Kraken2, then maps using Bowtie2.

Building indexes

Indexes are required to map sequences, and may be built either manually or with "hocort index" which simplifies the process. A Bowtie2 index may built using "hocort index" with the following command:

hocort index bowtie2 --input genome.fasta --output dir/basename

If one wishes to remove multiple organisms from sequencing reads, the input fasta should contain multiple genomes.

cat genome1.fasta genome2.fasta > combined.fasta

Paired end run

To map reads and output mapped/unmapped reads use the following command:

hocort map bowtie2 -x dir/basename -i input1.fastq input2.fastq -o out1.fastq out2.fastq

Single end run

Exactly as above, but with one input file and one output file.

hocort map bowtie2 -x dir/basename -i input1.fastq -o out1.fastq

Compressed input/output

Most pipelines support .gz compressed input and output. No extra configuration is required aside from having ".gz" extension in the filename.

Removing host contamination

The filter "--filter true/false" argument may be used to switch between outputting mapped/unmapped sequences. For example, if the reads are contaminated with human sequences and the index was built with the human genome, use "--filter true" to output unmapped sequences (everything except the human reads).

Extracting specific sequences

The filter "--filter true/false" argument may also be used to extract specific sequences. First, the index should be built with the genomes of the organisms to extract. Second, the sequencing reads should be mapped with the "--filter false" argument to output only the mapped sequences (sequences which map to the index containing genomes of the specific organisms).

Advanced usage

Importing and using HoCoRT in Python

HoCoRT can be imported in Python scripts and programs with "import hocort". This allows precise configuration of the tools being run.

import hocort.pipelines.bowtie2 as Bowtie2

idx = "dir/basename"
seq1 = "in1.fastq"
seq2 = "in2.fastq"
out1 = "out1.fastq"
out2 = "out2.fastq"
options = ["--local", "--very-fast-local"] # options are passed to the aligner/mapper, this allows precise configuration of the underlying tools

returncode = Bowtie2().run(idx, seq1, out1, seq2=seq2, out2=out2, options=options)

Passing arguments to the underlying tools

It is possible to pass arguments to the underlying tools by specifying them in the -c/--config argument like this:

hocort map bowtie2 -c="--local --very-fast-local --score-min G,21,9"

Wiki

Wiki Homepage

Technical documentation

https://ignasrum.github.io/hocort/