ThpPred

A method for predicting therapeutic and non-therapeutic proteins

Introduction

ThpPred is developed for predicting, designing and scanning therapeutic peptides. More information on ThpPred is available from its web server http://webs.iiitd.edu.in/raghava/thppred. This page provide information about standalone version of ThpPred.

Pip installation

The pip version of THPpred is also available for easy installation and usage of the tool. The following command is required to install the package

pip install thppred

To know about the available option for the pip package, type the following command:

thppred -h

Standalone

Standalone version of ThpPred is written in python and the following libraries are necessary for a successful run:

scikit-learn
Pandas
Numpy
Bio
XGBoost
Onnxruntime

Minimum USAGE

To run the example, type the following command:

python thppred_stand.py

Full Usage:

To run the code with your parameters, type the following command:

python thppred_stand.py |Input_File| |Model| |Threshold|

Example Command:

python thppred_stand.py file.fasta 1 0.5

Input File: It allow users to provide input in FASTA format. User should provide the file name along with its full path in case the input file is not in the same folder as the thppred_stand.py file.

Model: In this program, four models have been incorporated;
i) Model1 for predicting given input peptide/protein sequence as therapeutic and non-therapeutic peptide/proteins using XGBoost based on amino-acid composition of the peptide/proteins;

ii) Model2 for predicting given input peptide/protein sequence as therapeutic and non-therapeutic peptide/proteins using Hybrid approach, which is the ensemble of XGBoost + Motif Score. It combines the scores generated from machine learning (XGB), and motif occurence as Hybrid Score, and the prediction is based on Hybrid Score.

iii) Model3 for predicting given input peptide/protein sequence as therapeutic and non-therapeutic peptide/proteins using random forest based on dipeptide composition of the peptide/proteins;

iv) Model4 for predicting given input peptide/protein sequence as therapeutic and non-therapeutic peptide/proteins using Hybrid approach, which is the ensemble of Random forest + Motif Score. It combines the scores generated from machine learning (RF), and motif occurence as Hybrid Score, and the prediction is based on Hybrid Score.

User must enter model number(1,2,3 or 4) in the command line, else default value will be considered i.e. 2.

Threshold: User should provide a threshold value that lies between 0 and 1, please note score is proportional to therapeutic potential of peptide.

User must enter a valid threshold value in the command line, else default value will be considered i.e. 0.5.

ThpPred Package Files

It contain following files, brief description of these files given below

INSTALLATION : Installation instructions

LICENSE : License information

README.md : This file provide information about this package

thppred_stand.py : Main python program

rf_model.onnx : Model file required for running RF based Machine-learning model

xgb_model.onnx : Model file required for running XGB based Machine-learning model

example.fasta : Example file contain protein/peptide sequences in FASTA format. User may use this for test run.

raghavagps/thppred

ThpPred

Introduction

Pip installation

Standalone

ThpPred Package Files

Reference