Deblur is a greedy deconvolution algorithm based on Illumina Miseq/Hiseq error profiles.
- Deblur requires Python 3.5. If Python 3.5 is not installed, you can create a conda environment for deblur using:
conda create -n deblurenv python=3 numpy
and activate it using:
source activate deblurenv
(note you will need to activate this environment every time you want to use deblur)
At the moment, the install is a two stage process as we do not currently have deblur staged in a conda channel.
- install deblur dependencies
conda install -c bioconda VSEARCH MAFFT SortMeRNA==2.0 biom-format
- Install Deblur:
pip install deblur
The input to deblur workflow is a directory of fasta files (1 per sample) or a demultiplexed FASTA or FASTQ file. The output is a biom table with sequences as the OTU ids (final.biom in the output directory).
The simple use case just specifies the input fasta file (or directory) and output directory name:
deblur workflow --seqs-fp all_samples.fna --output-dir output
If starting from a barcode and read file, you can first use the qiime split_libraries_fastq.py command (we recommend using -q 19 to remove low quality reads):
split_libraries_fastq.py -i XXX_R1_001.fastq -m map.txt -o split -b XXXX_I1_001.fastq -q 19
and use the split/seqs.fna as the input to the deblur workflow.
-
The sequence read length can be specified by the
-t NNN
flag, where NNN denotes the length all sequences will be trimmed to (default=150). Note that all reads shorter than this length will be discarded. -
In order to run in parallel, the number of threads can be specified by the
-O NNN
flag (default it 1). Note that running more threads than available cores will not speed up performance. -
To get a full list of options, use:
deblur workflow --help
By default, deblur uses positive filtering, keeping only 16S sequences (based on homology to Greengenes 88% representative set). For example:
deblur workflow --seqs-fp all_samples.fna --output-dir output
Negative filtering can be selected using the '-n' flag. This causes deblurring to keep all sequences except for known artifact sequences (i.e. PhiX and Adapter sequences), so other non-16S sequences are retained. For example:
deblur workflow --seqs-fp all_samples.fna --output-dir output -n
Some of the code in the package deblur has been derived from QIIME. The contributors to these specific QIIME modules have granted permission for this porting to take place and put under the BSD license.