/PiPred

Deep learning based prediction of pi-helices in protein sequences

Primary LanguagePython

Build Status

PiPred

Deep learning based prediction of pi-helices in protein sequences.

Installation

First clone this repository:

$ git clone https://github.com/labstructbioinf/PiPred.git
$ cd PiPred

Required packages to run PiPred are listed in the requirements.txt file. We suggest running PiPred in the virtual environment: If you don't have virtualenv installed do so:

$ pip3 install virtualenv

Create virtual environment and install required packages:

$ cd virtual_envs_location
$ virtualenv pipred_env
$ source pipred_env/bin/activate
$ cd PIPRED_LOCATION
$ pip3 install -r requirements.txt

Test the installation:

$ ./run_example.sh

This should produce output example/1mty_C.out identical to example/1mty_C.out.bk.

Usage

python3.5 pipred.py [-h] -i FILE [-out_path DIR] [-pssm_path DIR]
Option Description
-i Input file in FASTA format. Can contain multiple entries.
-pssm_path Directory with psiblast PSSM files. For each entry in the input fasta file there must be a PSSM file.
-out_path Directory where the predictions are saved. For each entry one file will be saved.

PSSM filenames should be based on the identifiers in the fasta file (only alphanumeric characters and '_'). For example if a fasta sequence is as follows:

>1mty_C
ERRRGLTDPEMAAV...

PSSM file should be named 1mty_C.pssm.

You can generate PSSM files with the following command (requires NR90 database):

psiblast -query 1mty_C.fasta -db NR90_LOCATION -evalue 0.001 -num_iterations 3 -out_ascii_pssm 1mty_C.pssm

In order to generate PSSM file from multiple sequence alignment (MSA) you can use this command:

psiblast -subject sequence.fasta -in_msa alignment.fasta -out_ascii_pssm output.pssm