/ECPred

Primary LanguageJavaGNU General Public License v3.0GPL-3.0

ECPred Version 1.1

Latest Github release

Dependencies

Java 8

For Linux, you can install latest version of Java by running following commands from terminal:

sudo apt-get update
sudo apt-get install default-jre
sudo apt-get install default-jdk

For Mac, you can install latest version of Java by running following commands from terminal:

brew update
brew cask install java

g++ (any version)

For Linux, you can install latest version of g++ by running following commands from terminal

sudo apt-get update
sudo apt-get install build-essential

For Mac, you can install g++ by running following command from terminal.

g++

If you've already installed g++, the terminal prints this message, "no input files".

Download

ECPred.tar.gz

Above file (around 3 GB) should be downloaded from:

https://goo.gl/g2tMJ4

Installation

Extract the files using:

tar -xvf ECPred.tar.gz  

After extraction the total size of the folder will be around 10 GB.

Run runLinux.sh or runMac.sh from terminal according to your OS using one of these commands:

./runLinux.sh 

or

./runMac.sh

These bash scripts will install necessary libraries and tools.

Usage

cd into the ECPred installation folder.

java -jar ECPred.jar method inputFile libraryDir tempDir outputFile

method argument can be one of the following: blast, spmap, pepstats, weighted
inputFile argument is the file that contains protein sequences in fasta format
libraryDir is the path to the directory where the "lib" folder is located
tempDir is the path to the directory where the temporary files are located. You may wish to delete these files after you complete your prediction runs.
outputFile argument is optional. If you don't specify the output file name, the results will be printed to standard output.

Sample run

java -jar ECPred.jar weighted sample.fasta /full/path/to/ECPred/ temp/ results.tsv

You can use the following command to run ECPred (assuming ECPred is extracted on the desktop)

java -jar ECPred.jar weighted sample.fasta ~/Desktop/ECPred/ temp/ results.tsv

Input

There is no limit on the number of protein sequences; however, a single protein is predicted in one minute on average on an Intel 2.70 GHz i7 processor.

Output

Output is optinal. If you don't specify the output file name, the results will be printed to standard output.

Data files

"ECNumberList.txt":

A text file containing the list of EC numbers that ECPred can predict.

"sample.fasta":

An example input fasta file.

"results.tsv":

An example output prediction file (for sample.fasta).

License

ECPred: a tool for the prediction of enzymatic properties of protein sequences based on the EC Nomenclature Copyright (C) 2018 CanSyL

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Citation

If you find ECPred useful, please consider citing our publication:

Dalkiran, A., Rifaioglu, A. S., Martin, M. J., Cetin-Atalay, R., Atalay, V., & Doğan, T. (2018). ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature. BMC bioinformatics, 19(1), 334. https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2368-y