In this repository, you will be able to reproduce the results of the paper "Protein function prediction by incorporating knowledge graph representation of heterogeneous interactions and Gene Ontology." In addition, you will be able to run the ablation experiments to study the effect of various inputs of heterogeneous interaction graphs and parameters on the LATTE2GO model performance.
If there is any problem with the code, please open an issue or contact me at nhat.tran@mavs.uta.edu
.
To run the experiments, you need to have at least 5.5GB of disk space, 50GB of RAM memory, and at least 10GB of GPU RAM.
Please install package requirements, download the dataset, and run the bash commands provided by following instructions.
Please ensure you have the packages listed in requirements.txt installed. You can install them by running:
conda install --file requirements.txt
or alternatively:
pip install -r requirements.txt
There are two datasets used in the paper: DeepGraphGO's dataset and the pre-built HeteroNetwork datasets i.e. MULTISPECIES, HUMAN_MOUSE datasets. DeepGraphGO's dataset is downloaded from the DeepGraphGO GitHub repo automatically with our script. The MULTISPECIES and HUMAN_MOUSE datasets are downloaded from AWS S3.
You must have created a free AWS account, have AWS CLI installed, and configured your credentials to download the datasets.
Run the following commands to download necessary files to data/
directory:
aws configure # If you haven't configured your AWS credentials
python download_data.py
Run the following commands to train and evaluate the model on the DeepGraphGO multi-species AFP dataset:
Parameters for `experiments/run.py`
dataset:
values: [ "MULTISPECIES", "HUMAN_MOUSE" ]
pred_ntypes:
values: [ "molecular_function", "biological_process", "cellular_component", "molecular_function biological_process cellular_component" ]
method:
values: [ "LATTE2GO-1", "LATTE-1", "LATTE2GO-2", "HGT", "DeepGraphGO", "MLP", "DeepGOZero", "RGCN" ]
inductive:
values: [ false ]
seed:
values: [ 1 ]
python experiments/run.py --method LATTE2GO-2 --dataset MULTISPECIES --pred_ntypes "molecular_function" --seed 1
To run the ablation experiments with various combination of the heterogeneous RNA-protein interactions dataset or
LATTE2GO hyperparameters, modify the experiments/configs/latte2go.yaml
file with these parameters.
Parameters for `experiments/configs/latte2go.yaml`
ntype_subset:
values:
- 'Protein MessengerRNA MicroRNA LncRNA biological_process cellular_component molecular_function'
- 'Protein MessengerRNA MicroRNA LncRNA'
- 'Protein MessengerRNA MicroRNA'
- 'Protein MessengerRNA'
- 'Protein'
- ''
go_etypes:
values:
- 'is_a part_of has_part regulates negatively_regulates positively_regulates'
- 'is_a part_of has_part'
- 'is_a'
- null
python experiments/run.py --method LATTE2GO-2 --config experiments/configs/latte2go.yaml --dataset MULTISPECIES --pred_ntypes molecular_function
If you use this code for your research, please cite our paper.