This is the code for NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks
Required environment: Python 3.8, PyTorch 2.0.1
For MIGU and NANA algorithm:
conda activate MIGU conda env update --file environment_pygdemo.yml
For generating biophysical & other features:
conda activate hydrogen_bond conda env update --file environment_bio.yml
Please download the dataset XXX from *** and put the file under the folder XXX
Please run the following command
python hydrogen_bond.py
--training_title
: Name of the training, type: string--epochs
: Training epochs, type: float--lr
: Learning rate, type: Float--optimizer
: Name of the optimizers, type: string, options:[adam]
--dim_h
: the width of the layers, type: integer--batch_size
: Training batch size, type:--model
: Model types, type: string, options:[GIN_Attribute]
--num_workers
: Number of dataloader workers, type: integer--dataset_path
: The path of loaded dataset, type: string--dataset
: The used dataset for the training, type: string--edge
: Whether to use co-embedding residual learning, type: boolean--eval_batch_size
: The batch size in the evaluation stage, type: integer--version
: Use augmented or non-augmented data, "mine" means augmented data, "their" means non-augmented data, type: string, options:[their, mine]
--type
: The types of layers, type: string, options:[GCN, GIN, MPNN]
--bond
: Use edge attributes or not; if the flag is true, it would represent the MiGu augmentation; otherwise it is NaNa augmentation, type: boolean
For example, to train a GCN model with augmented EC dataset and residual learning framework, you can use the following command
CUDA_LAUNCH_BLOCKING=1 python graph_hao_retry.py --training_title 226_MPNN_true_edge --epochs 4000 --lr 0.0005 --optimizer adam --dim_h 128 --batch_size 50 --model GIN_Attribute --num_workers 8 --dataset_path /home/ysl_0128/DIG/examples/threedgraph/dataset --dataset EC --edge true --eval_batch_size 20 --version mine --type GCN --bond false