/TriPepSVM

Predict RNA binding proteins from sequences using string kernel SVMs.

Primary LanguageJupyter Notebook

TriPepSVM

Predict RNA-binding proteins from amino acid sequences using string kernel SVMs.

TriPepSVM was developed by the Marsico RNA bioinformatics group at the Max-Planck-Institute for Molecular Genetics in Berlin.

Getting Started

Requirements

You can install python packages via pip. If you don't have sudo rights, you might want to use the --user option of pip:

pip install --user bioservices pandas

Remarks:

  • If TriPepSVM is applied to a new taxon id, you need a stable internet connection

  • Please change the PATH system variable:

    1. Edit the startup file (~/.bashrc)
    2. Modify PATH variable
    3. Save and close the file

For example (please adjust your path):

export PATH=$PATH:/home/Programms/cdhit-4.6.4
export PATH=$PATH:/home/Programms/hmmer-3.1b2-linux-intel-x86_64/binaries

Usage

./TriPepSVM.sh [OPTION] ... -i INPUT.[fasta|fa]

-i, --input [INPUT.fasta|fa]: AA sequence in fasta format, NO DEFAULT 
-o, --output : path to output folder, DEFAULT: current directory 
-id, --taxon-id [9606|590|...] : Uniprot taxon id, DEFAULT: 9606 (human) 
-c, --cost : change COST parameter, DEFAULT: 1 
-k, --oligo-length : change k parameter, DEFAULT: 3 
-pos, --pos-class : change positive class weight, DEFAULT: inverse proportional to class size 
-neg, --neg-class : change negative class weight, DEFAULT: inverse proportional to class size 
-thr, --threshold : change prediction threshold, DEFAULT: 0 
-r, --recursive [TRUE|FALSE]: apply recursive mode, DEFAULT: FALSE 
-h, --help : help text

Example 1: Salmonella

./TriPepSVM.sh -i salmonellaProteom.fasta -o Results/ -id 590 -r True -posW 1.8 -negW 0.2 -thr 0.68

Example 2: Human

./TriPepSVM.sh -i humanProteom.fasta -o Results/ -id 9606 -posW 1.8 -negW 0.2 -thr 0.28

Output

Result folder contains two files:

  • nameInputFile.TriPepSVM.pred.txt: Main output file containing prediction for the input file

    • Identifier
    • SVM score
    • Classification
    sp|P0CL07|GSA_SALTY -0.664768610799015	Non RNA-binding protein
    sp|O68838|GSH1_SALTY	-0.592678648819721	Non RNA-binding protein
    sp|P43666|EPTB_SALTY	-0.443698432714576	Non RNA-binding protein
    sp|P36555|EPTA_SALTY	-0.303451909779383	Non RNA-binding protein
    ...
    
  • nameInputFile.featureWeights.txt: Feature weights used by SVM classifier

    • Feature (tri-peptide sequences)
    • Feature weight
    AAA	0.518691300046882
    AAC	0.10328499221261
    AAD	0.0894537449099789
    AAE	-0.0464292430990747
    ...
    

Authors