# ****************************************************************************** # Author : Srivamshi Pittala (srivamshi.pittala.gr@dartmouth.edu) # Advisor : Prof. Chris Bailey-Kellogg (cbk@cs.dartmouth.edu) # Project : PECAN: Paratope and Epitope Prediction with graph Convolution Attention Network # Description : Help file for running the scripts to learn and evaluate # graph convolution networks for epitope and paratope prediction # Cite : doi: https://doi.org/10.1101/658054 # ****************************************************************************** # This code was developed using the code provided for Fout et al., Protein Interface Prediction using Graph Convolutional Networks published in NeurIPS, 2017. # Copyright (C) <2019> <Srivamshi Pittala> # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # You should have received a copy of the GNU General Public License # along with this program. If not, see <https://www.gnu.org/licenses/>. #*********************** Input Files (pickle format). Can be downloaded from zenodo here: https://zenodo.org/record/3885236. 1. Paratope prediction: results_create_files_for_paratope/ 2. Epitope prediction: results_create_files_for_epitope/ The data format is the following: - list of dictionaries, each corresponding to a protein - - Each dictionary has the following entries(keys): - - - complex_code: str - - - l_vertex: numpy matrix of residues in primary protein (dimensions: num_residues x 63) - - - - Indices [0,20] represent one hot encoding of the residue. Last entry for unresolved amino acid '*'. - - - - Indices [21,40] represent PSSM entry for the - - - - Indices [41:42] represent solvent accessibility - - - - Indices [43:62] represent neighborhood composition - - - l_edge: numpy matrix of edges, currently empty as we don't consider edge features (dimensions: num_residues x 25 x 2) - - - l_hood_indices: numpy matrix specifying spatial neighborhood for graph convolution (dimensions: num_residues x 25) - - - label: numpy matrix with labels {1: interface, -1: non-interface} (dimensions: num_residues x 2) - - - r_vertex: same as l_vertex, but for secondary protein - - - r_edge: same as l_edge, but for secondary protein - - - r_hood_indices: same as l_hood_indices, but for secondary protein - - - label_r: same as label, but for secondary protein #*********************** #----------------------- Software (version) #----------------------- 1. Python 2.7.15 2. PyYAML 3.13 3. Numpy 1.14.5 4. Pandas 0.23.4 5. Scikit-learn 0.19.1 6. tensorflow 1.10.0 (gpu mode) 7. cudatoolkit 8.0.61 8. cudnn 5 #*********************** Training base network on proteins #*********************** 1. cd GCN_protein_base 2. Train "No convolution" network: python sample_experiment_base.py 3. Train "Convolution 1-layer" network: python sample_experiment_conv1.py 4. Train "Convolution 1-layer + Attention" network: python sample_experiment_attn1.py 5. Train "Convolution 2-layer" network: python sample_experiment_conv2.py 6. Train "Convolution 2-layer + Attention" network: python sample_experiment_attn2.py #*********************** For Epitope prediction #*********************** #******** Task-specific learning #******** 1. cd GCN_task 2. Change 'mode_experiment, 'data_directory' and 'output_directory' in configuration.py for epitope data 3. Train "No convolution" network: python sample_experiment_base.py 4. Train "Convolution 1-layer" network: python sample_experiment_conv1.py 5. Train "Convolution 1-layer + Attention" network: python sample_experiment_attn1.py 6. Train "Convolution 2-layer" network: python sample_experiment_conv2.py 7. Train "Convolution 2-layer + Attention" network: python sample_experiment_attn2.py #******** Transfer learning (Make sure the base network on proteins was trained successfully) #******** 1. cd GCN_xTransfer 2. Change 'mode_experiment, 'data_directory' and 'output_directory' in configuration.py for epitope data 3. Train "No convolution" network: python sample_experiment_base.py 4. Train "Convolution 1-layer" network: python sample_experiment_conv1.py 5. Train "Convolution 1-layer + Attention" network: python sample_experiment_attn1.py 6. Train "Convolution 2-layer" network: python sample_experiment_conv2.py 7. Train "Convolution 2-layer + Attention" network: python sample_experiment_attn2.py #*********************** For Paratope prediction #*********************** #******** Task-specific learning #******** 1. cd GCN_task 2. Change 'mode_experiment, 'data_directory' and 'output_directory' in configuration.py for paratope data 3. Train "No convolution" network: python sample_experiment_base.py 4. Train "Convolution 1-layer" network: python sample_experiment_conv1.py 5. Train "Convolution 1-layer + Attention" network: python sample_experiment_attn1.py 6. Train "Convolution 2-layer" network: python sample_experiment_conv2.py 7. Train "Convolution 2-layer + Attention" network: python sample_experiment_attn2.py #******** Transfer learning (Make sure the base network on proteins was trained successfully) #******** 1. cd GCN_xTransfer 2. Change 'mode_experiment, 'data_directory' and 'output_directory' in configuration.py for paratope data 3. Train "No convolution" network: python sample_experiment_base.py 4. Train "Convolution 1-layer" network: python sample_experiment_conv1.py 5. Train "Convolution 1-layer + Attention" network: python sample_experiment_attn1.py 6. Train "Convolution 2-layer" network: python sample_experiment_conv2.py 7. Train "Convolution 2-layer + Attention" network: python sample_experiment_attn2.py