A python package for processing UMI tagged mixed amplicon metabarcoding data.
The current version of Caltha requires Python 3.8+.
To install Caltha, simply run the pip install command:
pip install caltha
NOTE: Caltha does require one more dependency which can not be installed
with the Caltha pip or conda package. This dependency is
vsearch (2.14.2).
Executing the following conda install command should install the dependency.
conda install -c bioconda vsearch
Caltha can be run directly from the command line.
usage: caltha [-h] [-v] [-i FLINPUT] [-t FLTABULAR] [-z FLPREZIP] [-b FLBLAST]
[-f STRFORMAT] [-l STRLOCATION] [-a STRANCHOR] [-u INTUMILENGTH]
[-y FLTIDENTITY] [-c INTABUNDANCE] [-w STRFORWARD]
[-r STRREVERSE] [-d STRDIRECTORY] [-@ INTTHREADS]
A python package for processing UMI tagged mixed amplicon metabarcoding data.
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-i FLINPUT, --input FLINPUT
The input fasta/fastq file(s). This can either be a
zip archive or a single fasta/fastq file.
-t FLTABULAR, --tabular FLTABULAR
The output tabular zip file.
-z FLPREZIP, --zip FLPREZIP
The pre validation zip file.
-b FLBLAST, --blast FLBLAST
The output blast zip file.
-f STRFORMAT, --format STRFORMAT
The format of the input file
[fasta/fastq]. (default: fasta)
-l STRLOCATION, --location STRLOCATION
Search for UMIs at the 5'-end [umi5], 3'-end [umi3] or
at the 5'-end and 3'-end [umidouble]. (default: umi5)
-a STRANCHOR, --anchor STRANCHOR
Which anchor type to use
[primer/adapter/zero]. (default: primer)
-u INTUMILENGTH, --length INTUMILENGTH
The length of the UMI sequence. (default: 5)
-y FLTIDENTITY, --identity FLTIDENTITY
The identity percentage with which to perform the
validation. (default: 0.97)
-c INTABUNDANCE, --abundance INTABUNDANCE
The minimum abundance of a sequence in order for it
to be included during validation. (default: 1)
-w STRFORWARD, --forward STRFORWARD
The 5'-end anchor nucleotides.
-r STRREVERSE, --reverse STRREVERSE
The 3'-end anchor nucleotides.
-d STRDIRECTORY, --directory STRDIRECTORY
The location of the temporary working directory
(not created by program). (default: .)
-@ INTTHREADS, --threads INTTHREADS
The number of threads to run Caltha
with. (default: number of threads available on system)
This python package requires one extra dependency which can be easily
installed with conda (conda install -c bioconda vsearch=2.14.2).
Further documentation can be found here.
- Python Software Foundation,
Python 3.8+. 2019.
Python - Rognes T, Flouri T, Nichols B, Quince C, Mahe F,
VSEARCH: A versatile open source tool for metagenomics.
PeerJ. 2016. doi: 10.7717/peerj.2584
vsearch - Augspurger T, Ayd W, Bartak C, Battiston P, Cloud P, Garcia M,
Python Data Analysis Library.
Pandas - Langa L, Willing C, Meyer C, Zijlstra J, Naylor M, Dollenstein Z,
The uncompromising Python code formatter.
Black - Ziadé T, Cordasco I,
Your tool for style guide enforcement.
Flake8 - Sottile A, Struys K, Kuehl C, Finkle M,
A framework for managing and maintaining multi-language pre-commit hooks.
Pre-commit - Python Software Foundation,
The Python Package index.
PyPI - Du L,
A lightweight Python C extension for easy access to sequences from plain and gzipped fasta/q files.
Pyfastx - Cock P, Antao T, Chang J, Chapman B, Cox C, Dalke A,
Biopython: freely available Python tools for computational molecular biology and bioinformatics.
Bioinformatics. 2009; 25(11): 1422-1423. doi: 10.1093/bioinformatics/btp163
Biopython
- Boom J, Caltha.
GitHub repository: https://github.com/JasperBoom/caltha
Copyright (C) 2018 Jasper Boom
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License version 3 as
published by the Free Software Foundation.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.