/deepgo

Function prediction using a deep ontology-aware classifier

Primary LanguagePython

DeepGO - Predicting Gene Ontology Functions

DeepGO is a novel method for predicting protein functions using protein sequences and protein-protein interaction (PPI) networks. It uses deep neural networks to learn sequence and PPI network features and hierarchically classifies it with GO classes. PPI network features are learned using a neuro-symbolic approach for learning knowledge graph representations by Alshahrani, et al.

This repository contains script which were used to build and train the DeepGO model together with the scripts for evaluating the model's performance.

Dependencies

To install python dependencies run: pip install -r requirements.txt

Scripts

The scripts require GeneOntology in OBO Format.

  • nn_hierarchical_seq.py - This script is used to build and train the model which uses only the sequence of protein as an input.
  • nn_hierarchical_network.py - This script is used to build and train the model which uses sequence and PPI network embeddings of protein as an input.
  • get_data.py, get_functions.py, mapping.py scripts are used to prepare raw data.
  • blast.py script is used to evaluate BLAST method's performance
  • evaluation.py script is used to evalutate the performance of the FFPred, GOFDR and our method.

Running

  • Download the data file from http://deepgo.bio2vec.net/data/deepgo/data.tar.gz and extract data folder
  • Install diamond program on your system (diamond command should be available)
  • run predict_all.py script with -i <input_fasta_filename> arguments
  • See the results in results.tsv file

Data

Citation

If you use DeepGO for your research, or incorporate our learning algorithms in your work, please cite:

Maxat Kulmanov, Mohammed Asif Khan, Robert Hoehndorf; DeepGO: Predicting protein functions from sequence and interactions using a deep ontology-aware classifier, Bioinformatics, 2017. https://doi.org/10.1093/bioinformatics/btx624