This is a joint PyTorch implementation of three papers in VAE-based molecule generation and translation. The papers and the official repos are as follows:
- Junction Tree Variational Autoencoder for Molecular Graph Generation (ICML 2018)
- Learning Multimodal Graph-to-Graph Translation for Molecular Optimization (ICLR 2019)
- Hierarchical Generation of Molecular Graphs using Structural Motifs (ICML 2020)
The master branch works with PyTorch 1.8+.
MolVAE has been tested under Python 3.7 with PyTorch 1.11 on cuda 11.4
-
Create an Anaconda environment
conda create --name vae_py37 python=3.7 conda activate vae_py37
-
Install RDKit
conda install rdkit -c rdkit
-
Install PyTorch following official instructions, e.g. PyTorch on GPU platforms:
conda install pytorch torchvision -c pytorch
-
Install other requirements:
pip install -r requirements.txt
-
Install Chemprop (from source, additional dependency for property-guided finetuning)
git clone https://github.com/chemprop/chemprop.git cd chemprop pip install -e .
- For molecule generation, each line of a training file is a molecule in SMILES representation.
benchmark/moses
andbenchmark/polymers
are used for generation.
- For molecule translation, each line of a training file is a pair of molecules (molA, molB). The target is to translate from molA towards molB, as molB has better chemical properties.
benchmark/drd2
,benchmark/logp04
,benchmark/logp06
andbenchmark/qed
are used for translation.
-
Select config file and raw data according to task and appraoch.
- For molecule generation, go to
configs/moses
orconfigs/polymers
.- For junction tree approach, use
configs/*/jtvae.json
. - For hierarchical substructure approach, use
configs/*/hiervae.json
.
- For junction tree approach, use
- For molecule translation, go to
configs/drd2
,configs/logp04
,configs/logp06
orconfigs/qed
- For junction tree approach, according to with or without GAN loss, use
config/*/vjtnn_gan.json
orconfigs/*/vjtnn.json
- For hierarchical substructure approach, use
configs/*/hiervgnn.json
- For junction tree approach, according to with or without GAN loss, use
- For molecule generation, go to
-
Extract vocabularies from a given set of molecules and preprocess training data. Add the
--get_vocab
argument if you have not extracted the vocabulary before. Replacexxx
with your selected json file.python tools/preprocess.py --config configs/xxx
-
Train the model
-
Without GAN loss
python tools/train.py --config configs/xxx
-
With GAN loss (only for junction tree approach for molecule translation)
python tools/train_gan.py --config configs/xxx
-
-
For molecule generation, replace
yyy
with your selected model inckpt/moses
orckpt/polymers
.python tools/generate.py --config configs/xxx --model ckpt/yyy
-
For molecule translation, replace
yyy
with your selected model inckpt/drd2
,ckpt/logp04
,ckpt/logp06
orckpt/qed
.python tools/translate.py --config configs/xxx --model ckpt/yyy
Calculate metrics on testing result file and replace zzz
with your result file in results/*
.
python tools/eval.py --config configs/xxx --result results/zzz