MuPIPR - Mutation effect estimation on protein-protein interactions using deep contextualized representation learning
This is the repository for the NAR Genom. Bioinform. paper "Mutation effect estimation on protein-protein interactions using deep contextualized representation learning" (MuPIPR). This repository contains the source code and links to some datasets used in our paper. (to be updated)
MuPIPR can be run under Linux. The following packages are required: python 3.6, h5py, Tensorflow 1.7 (with GPU support), Keras 2.2.4 and bilm-tf.
Before proceeding make sure you have conda
installed.
Then, make installation script executable
chmod +x install.sh
and run
source install.sh
You may specify a path to conda
executables directory as a first argument of install.sh
. By default it is set to ~/miniconda3/bin/
.
- biLM: contains the pre-trained language models for contextualized amino acid representations.
- data: contains the datasets and processing scripts for the two tasks.
- model: contains the implementation for MuPIPR.
We obtain the corpus to pre-train the BiLSTM language model from the STRING database. A total of 66235 protein sequences of four most frequent species from the SKEMPI database are extracted, i.e. Homo sapiens, Bos taurus, Mus musculus and Escherichia coli. These are the four most frequent species in the SKP1402m dataset.
To serve the pre-trained contextualized embedding model to MuPIPR, please download and unzip the model.zip file in the biLM folder.
Please refer to the readme in the data folder and the model folder, respectively.
This work has been published in the NAR Genomics and Bioinformatics journal.
DOI: https://doi.org/10.1093/nargab/lqaa015
Bibtex:
@article{zhou2020mupipr,
title={Mutation Effect Estimation on Protein–protein Interactions Using Deep Contextualized Representation Learning},
author={Zhou, Guangyu and Chen, Muhao and Ju, Chelsea and Wang, Zheng and Jiang Jyun-yu and Wang, Wei},
journal={NAR Genomics and Bioinformatics},
volume = {2},
number = {2},
year = {2020},
month = {03},
publisher={Oxford University Press}
}
Also check out the follow up work in the Bioinformatics (Procs of ISMB) paper Multifaceted Protein-Protein Interaction Prediction Based on Siamese Residual RCNN, in which we provide an end-to-end neural learning system to predict multifaceted PPI information.
The released software is available at muhaochen/seq_ppi.