/psytrans

Parasite Symbiont Transcriptome Separation

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

psytrans.py - Parasite & Symbiont Transcriptome Separation

Description

psytrans.py separates the sequences of a host species from those of its main symbiont(s) or parasite(s) based on Support Vector Machine classification. The program takes as input a file in fasta format with the sequences to be classified. The program also requires a file with sequences of a species related to the host, and a file with sequences related to the symbiont (or parasite). The queries will be compared to these two files using BLASTX. Alternatively, the user can provide the output of pre-computed BLASTX searches (in tabular format: -outfmt 6 or 7). The classification is then carried out using the command line tools from libsvm.

Dependencies

psytrans requires makeblastdb and blastx from the NCBI blast+ distribution, unless the user provides pre-computed blast results. psytrans also requires a few command line utilities from libsvm: svm-scale, svm-train and svm-predict

Installation

Conda

A bioconda package is available for intalling psytrans and all its dependencies. Steps: 1- Create an environemnt for psytrans

conda create -n psytrans -y

2- Activate envrironment

conda activate psytrans

3- Install

conda install -c bioconda psytrans

Docker

Using docker

docker pull quay.io/biocontainers/psytrans:2.0.0--0

Singularity

Using singularity image

singularity run https://depot.galaxyproject.org/singularity/psytrans:2.0.0--0

Options

Generic Program Information

       -h, --help
              Print a usage message briefly summarizing the command-line options.

Global options

       -R, --restart
              Restart the script from the last checkpoint.

       -p, --nbThreads
              Number of threads to use for the blast searches and for the SVM training.

       -V, --verbosemode
              Runs the script in verbose mode.

       -t, --tempDir
              Specify the name of the temporary directory.

       -X, --clearTemp
              Clears all temporary data in the temporary directory upon completion.

       -z, --stopAfter
              Choices:['db','runBlast','parseBlast','kmers','SVM']
              This option allows the user to choose whether the process should stop, once the process has completed a specific stage.
              db refers to the database creation stage;
              runBlast refers to the BLAST search stage;
              parseBlast refers to the separation of unambiguous and ambiguous sequence stage;
              kmers refers to the preparation of SVM input stage;
              SVM refers to the SVM training and testing stage.

Preparation of training set options

       -e, --maxBestEvalue
              Set the maximum value for the best e-value to be used to classify unambiguous sequences.

       -n, --numberOfSeq
              Set the maximum number of training & testing sequences.

Kmer parameters

       -c, --minWordSize
              Set the minimum value of DNA word length.

       -k, --maxWordSize
              Set the maximum value of DNA word length.

Examples

You start with an assembly containing a mixture of sequences from a host A and a symbiont (or parasite) B: host_and_symb.fasta You also provide a file with proteins from a species related to the host: related_host_proteins.fasta and a file with proteins from a species related to the symbiont: related_symb_proteins.fasta You can then start the Psytrans process as follows:

python psytrans.py host_and_symb.fasta -H related_host_proteins.fasta -S related_symb_proteins.fasta

Wait a few hours (depending on the number of sequences), and in the current directory you will find a new file starting with the prefix host_ and another file starting with the prefix symb_, corresponding to the sequences classified as host or symbiont (or parasite) respectively.

To run the program using 8 threads and using /home/user/tmp as a temporary directory, use the following command:

python psytrans.py host_and_symb.fasta -H related_host_proteins.fasta -S related_symb_proteins.fasta -p 8 -t /home/user/tmp

Authors

Psytrans was developed by Sylvain Forêt and Jue-Sheng Ong. This repository modified the script to use python3. Original work can be found at https://github.com/sylvainforet/psytrans
Modifications by Nicola Conci and Ramon Rivera

Reporting bugs

We could help in some issues since we used the tool frequently in the lab.
However, it should be noted that we are not going to be actively developing psytrans.
Since the original repository is not being used anymore you could open issues here if needed.

Original issues at Psytrans repository: https://github.com/sylvainforet/psytrans

COPYRIGHT

Copyright © 2021 Nicola Conci & Ramón Rivera Vicéns.
Copyright © 2014 Sylvain Forêt & Jue-Sheng Ong.

psytrans is a free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under the terms of the GNU General Public License versions 3 or later. For more information about these matters see http://www.gnu.org/licenses/.