MolFilterGAN: A Progressively Augmented Generative Adversarial Network for Triaging AI-designed Molecules
Published on Journal of Cheminformatics.
Artificial intelligence (AI)-based molecular design methods, especially deep generative models for generating novel molecule structures, have gratified our imagination to explore unknown chemical space without relying on brute-force exploration. However, whether designed by AI or human experts, the molecules need to be accessibly synthesized and biologically evaluated, and the trial-and-error process remains a resources-intensive endeavor. Therefore, AI-based drug design methods face a major challenge of how to prioritize the molecular structures with potential for subsequent drug development. This study indicates that common filtering approaches based on traditional screening metrics fail to differentiate AI-designed molecules. To address this issue, we propose a novel molecular filtering method, MolFilterGAN, based on a progressively augmented generative adversarial network. Comparative analysis shows that MolFilterGAN outperforms conventional screening approaches based on drug-likeness or synthetic ability metrics. Retrospective analysis of AI-designed discoidin domain receptor 1 (DDR1) inhibitors shows that MolFilterGAN significantly increases the efficiency of molecular triaging. Further evaluation of MolFilterGAN on eight external ligand sets suggests that MolFilterGAN is useful in triaging or enriching bioactive compounds across a wide range of target types. These results highlighted the importance of MolFilterGAN in evaluating molecules integrally and further accelerating molecular discovery especially combined with advanced AI generative models.
git clone https://github.com/myzhengSIMM/MolFilterGAN
cd MolFilterGAN
This project requires the following libraries.
- NumPy
- Pandas
- PyTorch > 1.2
- RDKit
- tensorboardX
All Data used for Training or Evaluating and the Trained_Models are available at:
https://drive.google.com/drive/folders/1uN7a5m1PmhcXfs5OuOXWPbxyF_KKuZ3A?usp=sharing
Datasets.zip
contains all datasets for training MolFilterGAN
BenchmarkDatasets.zip
contains all the benchmark dataset for evaluating metrics.
pretrained_G.ckpt
is a pre-trained initial generator
pretrained_D.ckpt
is a pre-trained initial discriminator
ADtrained_D.ckpt
is an adversarial-trained discriminator
After Downloading, you can simply unzip the.zip
files to get Datsets/
, BenchmarkDatasets/
and PCBA/
,
and create the directions by mkdir AD_save pretrainD_save pretrainG_save
then put the .ckpt
files in the corresponding directions.
Finally the folder structure will look like this:
MolFilterGAN
|___AD_save
| |___ADtrained_D.ckpt # an adversarial-trained discriminator
|
|___BenchmarkDatasetse # contains all the benchmark dataset for evaluating metrics.
| |chembl-sample10000.smi
| |___...
|
|___Datasets # contains all datasets for training MolFilterGAN
| |Data4InitD_neg.smi
| |___...
|
|___PCBA
| |ALDH1_active_T_rd_rm_less.smi
| |___...
|
|___pretrainD_save
| |___pretrained_D.ckpt # a pre-trained initial discriminator
|
|___pretrainG_save
| |___pretrained_G.ckpt # a pre-trained initial generator
|
|___results
| |___.csv
| |___...
|
|___AdversarialTraining.py
|
|___Dataset.py
...
python PretrainG.py --infile_path Datasets/Data4InitG.smi --log_path test_init_G_log --model_save_path test_init_G_save
python PretrainD.py --infile_path Datasets/Data4InitD.txt --log_path test_init_D_log --model_save_path test_init_D_save
python AdversarialTraining.py --infile_path Datasets/Data4InitD.txt --log_path test_AD_log --model_save_path test_AD_save --load_dir_G pretrainG_save/pretrained_G.ckpt --load_dir_D pretrainD_save/pretrained_D.ckpt
You can easily use the trained_discrimination_models by changing the infile_path
and the load_dir
like:
python Prediction.py --infile_path './BenchmarkDatasets/GA-sample10000.smi' --load_dir AD_save/ADtrained_D.ckpt