TGSA: Protein-Protein Association-Based Twin Graph Neural Networks for Drug Response Prediction with Similarity Augmentation
Here we provide an implementation of Twin Graph neural networks with Similarity Augmentation (TGSA) in Pytorch and PyTorch Geometric. The repository is organised as follows: Cancel changes
data/
contains the necessary dataset files;models/
contains the implementation of TGDRP and SA;TGDRP_weights
contains the trained weights of TGDRP;utils/
contains the necessary processing subroutines;preprocess_gene.py
preprocessing for genetic profiles;smiles2graph.py
construct molecular graphs based on SMILES;main.py main
function for TGDRP (train or test);
- Please install the environment using anaconda3;
conda create -n TGSA python=3.6 - Install the necessary packages.
conda install -c rdkit rdkit
pip install fitlog
pip install torch (1.6.0)
pip install torch-cluster (1.5.9) (https://pytorch-geometric.com/whl/)
pip install torch-scatter (2.0.6) (https://pytorch-geometric.com/whl/)
pip install torch-sparse (0.6.9) (https://pytorch-geometric.com/whl/)
pip install torch-spline-conv (1.2.1) (https://pytorch-geometric.com/whl/)
pip install torch-geometric (1.6.1)
-
data/CellLines_DepMap/CCLE_580_18281/census_706/
- Raw genetic profiles from CCLE and the processed features. You can also preprocess your own data withpreprocess_gene.py
. -
data/similarity_augment/
- Directoryedge
contains edges of heterogeneous graphs; directorydict
contains necessary data and dictionaries for mapping between drug data or cell line data. -
data/Drugs/drug_smiles.csv
- SMILES for 170 drugs. You can generate pyg graph object withsmiles2graph.py
-
data/PANCANCER_IC_82833_580_170.csv
- There are 82833 ln(IC50) values across 580 cel lines and 170 drugs. -
data/9606.protein.links.detailed.v11.0.txt
anddata/9606.protein.info.v11.0.txt
- Extracted from https://stringdb-static.org/download/protein.links.detailed.v11.0/9606.protein.links.detailed.v11.0.txt.gz
- You can run
python main.py --mode "train"
to train TGDRP or runpython main.py --mode "test"
to test trained TGDRP.
-
First, you can run
heterogeneous_graph.py
to generate edges of heterogeneous graphs. -
Then, you can run
main_SA.py
to generate node features of heterogeneous graphs using two GNNs from TGDRP/TGDRP_pre and to fine-tune sequentially the remained parameters from TGDRP/TGDRP_pre. To be specific, you can use the instructionpython main_SA.py --mode "train"/"test" --pretrain 0/1
to fine-tune TGDRP/TGDRP_pre or to test fine-tuned SA/SA_pre.
MIT