/DGraphDTA

a novel DTA predition method using graph neural network

Primary LanguagePython

DGraphDTA

Inspired by GraphDTA, a method for predicting the affinity of drug-protein based on graph neural network is proposed, which is called DGraphDTA (double Graph DTA predictor). The method can predict the affinity only using the molecule SMILES and protein sequence. This repo gits from GraphDTA, and compared with GraphDTA, the method constructs both the graph of protein and small molecule at the same time to improve the accuracy. The protein graph is constructed according to contact map.

dependencies

numpy == 1.17.4
kreas == 2.3.1
Pconsc4 == 0.4
pytorch == 1.3.0
PyG (torch-geometric) == 1.3.2
hhsuite (https://github.com/soedinglab/hh-suite)
rdkit == 2019.03.4.0
ccmpred (https://github.com/soedinglab/CCMpred)

data preparation

  1. Prepare the data need for train. Get all msa files of the proteins in datasets (for more detail description of datasets, please refer to datasets), and using Pconsc4 to predict all the contact map. A script in the repo can be run to do all the steps:
    python scripts.py

  2. And if you want to skip the long time preparation, please directly download the contact map and msa files which we already generated from files. For more detailed generating information, please refer to the "scripts.py". Then copy the corresponding two folders to each dataset dir. For example:
    (1) download the data.zip and unzip it.
    (2) copy two folders called "aln" and "pconsc4" from davis to the /data/davis of your repo, so do the KIBA.

train (cross validation)

5 folds cross validation.
python training_5folds.py 0 0 0
where the parameters are dataset selection, gpu selection, fold (0,1,2,3,4).

test

This is to do the prediction with the models we trained. And this step is to reproduce the experiments.
python test.py 0 0
and the parameters are dataset selection, gpu selection.

Beacuse our memory limitation, only 8 combinations were fitted for the best result. It is worth mentioning that if more model combinations were explored, there may be better results.