Official Code Repository for the paper 'Regressor-free Molecule Generation to support Drug Response Prediction'.
- Regressor-free guidance molecule generation is proposed to ensure sampling within a more effective space, where the regression controller model can encode the response value between the molecules and the cell lines as conditional guidance for molecule generation.
- To enhance noise prediction performance, we introduce a dual-branch controlled noise prediction model for score estimation, named DBControl. DBControl model consists of two GNN-based branches, each undergoing unconditional training and conditional mixed training, respectively.
- Experimental results demonstrate that our method outperforms the state-of-the-art baselines in conditional molecular graph generation for the DRP task. Furthermore, we provide an effectiveness proof of our method.
Regressor-free guidance is built in Python 3.8.18 and Pytorch 2.0.1. Please use the following command to install the requirements:
pip install -r requirements.txt
For molecule generation, additionally run the following command:
conda install -c conda-forge rdkit=2020.09.1.0
We provide two molecular graph datasets (QM9 and ZINC250k) and one drug response dataset (GDSCv2).
We additionally provide the commands for generating generic graph datasets as follows:
python data/data_generators.py --dataset ${dataset_name}
To preprocess the molecular graph datasets for training models, run the following command:
python data/preprocess.py --dataset ${dataset_name}
python data/preprocess_for_nspdk.py --dataset ${dataset_name}
For the evaluation of generic graph generation tasks, run the following command to compile the ORCA program (see http://www.biolab.si/supp/orca/orca.html):
cd evaluation/orca
g++ -O2 -std=c++11 -o orca orca.cpp
The configurations are provided on the config/
directory in YAML
format.
Hyperparameters used in the experiments are specified in the Appendix C of our paper.
We provide the commands for the following task: Molecule Generation.
To train the score models, first modify config/${dataset}.yaml
accordingly, then run the following command.
CUDA_VISIBLE_DEVICES=${gpu_ids} python main.py --type train --config ${train_config} --seed ${seed}
for example,
CUDA_VISIBLE_DEVICES=0 python main.py --type train --config community_small --seed 42
and
CUDA_VISIBLE_DEVICES=0,1 python main.py --type train --config zinc250k --seed 42
We provide the commands for the following task: Molecule Generation.
To train the score models, first modify config/$cl{task}_main.yaml
accordingly, such as dr
for drug response
, then run the following command.
CUDA_VISIBLE_DEVICES=${gpu_ids} python main.py --type cl{task}_train --config ${train_config} --seed ${seed}
for example,
CUDA_VISIBLE_DEVICES=0 python main.py --type cldr_train --config cldr_train --seed 42
and
CUDA_VISIBLE_DEVICES=0,1 python main.py --type cldt_train --config cldt_train --seed 42
We provide the commands for the following task: Molecule Generation.
To train the score models, first modify config_control/{task}_train{_multi_data}.yaml
accordingly, such as drp
for drug response
, _multi_data
is default, then run the following command.
CUDA_VISIBLE_DEVICES=${gpu_ids} python main.py --type train --config config_control/${train_config} --seed ${seed}
for example,
CUDA_VISIBLE_DEVICES=0 python main.py --type control_drp --config config_control/drp_train --seed 42
CUDA_VISIBLE_DEVICES=0 python main.py --type multidata_control_drp --config config_control/drp_train --seed 42
and
CUDA_VISIBLE_DEVICES=0,1 python main.py --type multidata_control_dta --config config_control/dta_train --seed 42
To generate graphs using the trained score models, run the following command.
CUDA_VISIBLE_DEVICES=${gpu_ids} python main.py --type {task}_condition_sample --config sample_{dataset}
or
CUDA_VISIBLE_DEVICES=${gpu_ids} python main.py --type sample --config sample_zinc250k
We provide checkpoints of the pretrained models on the checkpoints/
directory, which are used in the main experiments.
['GDSCv2', 'QM9']/date-time.pth
QM9/qm9.pth
ZINC250k/zinc250k.pth
We have uploaded a zip pack of code that can be used for conditional sampling directly, which can be downloaded (https://drive.google.com/file/d/15He7YHE39wT9tdNcqSOPR-bvzvO_Isr9/view?usp=drive_link).
cd ./RFMG_Sampling
CUDA_VISIBLE_DEVICES=${gpu_ids} python main.py