/MaskMol

Primary LanguagePythonMIT LicenseMIT

MaskMol

Install environment

1. GPU environmentx

CUDA 11.1

2. create a new conda environment

conda create -n MaskMol python=3.7

conda activate MaskMol

3. download some packages

pip install -r requirements.txt

source activate MaskMol

Pretraining

Download pretraining data and put it into ./datasets/pretrain/

1. get masking image

python ./data_process/mask_parallel.py --jobs 15

Note: You can find the img, Atom, Bond, and Morif in ./datasets/pretrain

2. lmdb process

python ./data_process/lmdb_process.py --jobs 15

Note: You can find the four files (img_lmdb, Atom_lmdb, Bond_lmdb, Motif_lmdb) in ./datasets/pretrain/, and you can download the 20w pretraining data.

3. start to pretrain

Usage:

usage: train_muti_GPU_lmdb.py [-h] [--lr LR] [--lrf LRF] [--nums NUMS]
                              [--wd WD] [--workers WORKERS]
                              [--val_workers VAL_WORKERS] [--epochs EPOCHS]
                              [--start_epoch START_EPOCH] [--batch BATCH]
                              [--momentum MOMENTUM]
                              [--checkpoints CHECKPOINTS] [--resume PATH]
                              [--seed SEED] [--data_path DATA_PATH]
                              [--log_dir LOG_DIR] [--proportion PROPORTION]
                              [--ckpt_dir CKPT_DIR] [--verbose] [--ngpu NGPU]
                              [--gpu GPU] [--Atom_lambda ATOM_LAMBDA]
                              [--Bond_lambda BOND_LAMBDA]
                              [--Motif_lambda MOTIF_LAMBDA] [--nodes NODES]
                              [--ngpus_per_node NGPUS_PER_NODE]
                              [--dist-url DIST_URL] [--node_rank NODE_RANK]

Code to pretrain:

python train_muti_GPU_lmdb.py --nodes 1 \
                   --ngpus_per_node 4 \
                   --gpu 0,1,2,3 \
                   --batch 128 \
                   --epochs 50 \
                   --proportion 0.5 \
                   --Atom_lambda 1 \
                   --Bond_lambda 1 \
                   --Motif_lambda 1 \
                   --nums 2000000

For testing, you can simply pre-train MaskMol using single GPU on 20w dataset:

python train_lmdb.py --gpu 0 \
                   --batch 32 \
                   --epochs 50 \
                   --proportion 0.5 \
                   --Atom_lambda 1 \
                   --Bond_lambda 1 \
                   --Motif_lambda 1 \
                   --nums 200000

Finetuning

All finetuning datasets can be download in link

1. Download pre-trained MaskMol

You can download pre-trained model (MaskMol_small, MaskMol_base) and push it into the folder ckpts/

2. Finetune with pre-trained MaskMol

a) You can download activity cliffs estimation and compound potency prediction put it into datasets/finetuning/

b) The usage is as follows:

usage: finetuning.py [-h] [--dataset DATASET] [--dataroot DATAROOT]
                     [--gpu GPU] [--ngpu NGPU] [--workers WORKERS] [--lr LR]
                     [--weight_decay WEIGHT_DECAY] [--momentum MOMENTUM]
                     [--seed SEED] [--runseed RUNSEED] [--epochs EPOCHS]
                     [--start_epoch START_EPOCH] [--batch BATCH]
                     [--resume PATH] [--imageSize IMAGESIZE] [--image_aug]
                     [--save_finetune_ckpt {0,1}] [--eval_metric EVAL_METRIC]
                     [--log_dir LOG_DIR] [--ckpt_dir CKPT_DIR]

For example:

python finetuning_cliffs.py --gpu 0 \
                   --save_finetune_ckpt 1 \
                   --dataroot ./datasets/finetuning/cliffs \
                   --dataset CHEMBL219_Ki \
                   --resume ./ckpts/pretrain/MaskMol_base.pth.tar \
                   --lr 5e-4 \
                   --batch 16 \
                   --epochs 100 \
                   --eval_metric rmse

Attention Visualization

More about GradCAM heatmap can be found from this link: https://github.com/jacobgil/vit-explain

We also provide a script to generate GradCAM heatmaps:

usage: vit_explain.py [-h] [--use_cuda] [--image_path IMAGE_PATH]
                      [--molecule_path MOLECULE_PATH]
                      [--head_fusion HEAD_FUSION]
                      [--discard_ratio DISCARD_RATIO]
                      [--category_index CATEGORY_INDEX] [--resume PATH]
                      [--gpu GPU]

you can run the following script:

python main.py --resume MaskMol \
               --img_path 1.png \
               --discard_ratio 0.9 \
               --gpu 0 \
               --category_index 0