Deblur

Deblur is a greedy deconvolution algorithm based on Illumina Miseq/Hiseq error profiles.

Install

Deblur requires Python 3.5. If Python 3.5 is not installed, you can create a conda environment for deblur using:

conda create -n deblurenv python=3 numpy

and activate it using:

source activate deblurenv

(note you will need to activate this environment every time you want to use deblur)

At the moment, the install is a two stage process as we do not currently have deblur staged in a conda channel.

install deblur dependencies

conda install -c bioconda VSEARCH MAFFT SortMeRNA==2.0 biom-format

Install Deblur:

pip install deblur

Example usage

The input to deblur workflow is a directory of fasta files (1 per sample) or a demultiplexed FASTA or FASTQ file. The output is a biom table with sequences as the OTU ids (final.biom in the output directory).

The simple use case just specifies the input fasta file (or directory) and output directory name:

deblur workflow --seqs-fp all_samples.fna --output-dir output

If starting from a barcode and read file, you can first use the qiime split_libraries_fastq.py command (we recommend using -q 19 to remove low quality reads):

split_libraries_fastq.py -i XXX_R1_001.fastq -m map.txt -o split -b XXXX_I1_001.fastq -q 19

and use the split/seqs.fna as the input to the deblur workflow.

Important options

The sequence read length can be specified by the -t NNN flag, where NNN denotes the length all sequences will be trimmed to (default=150). Note that all reads shorter than this length will be discarded.
In order to run in parallel, the number of threads can be specified by the -O NNN flag (default it 1). Note that running more threads than available cores will not speed up performance.
To get a full list of options, use:

deblur workflow --help

Positive and Negative Filtering

By default, deblur uses positive filtering, keeping only 16S sequences (based on homology to Greengenes 88% representative set). For example:

deblur workflow --seqs-fp all_samples.fna --output-dir output

Negative filtering can be selected using the '-n' flag. This causes deblurring to keep all sequences except for known artifact sequences (i.e. PhiX and Adapter sequences), so other non-16S sequences are retained. For example:

deblur workflow --seqs-fp all_samples.fna --output-dir output -n

Code Development Note