/dl4chem-geometry

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Molecular Geometry Prediction using a Deep Generative Graph Neural Network

TensorFlow implementation of the models described in the paper Molecular Geometry Prediction using a Deep Generative Graph Neural Network.

We present code for training deep generative models of molecular geometry (conformations), as well as preprocessed datasets and pretrained models.

Dependencies

Python

  • Python 3.6
  • TensorFlow 1.2
  • RDKit 2018.09.1
  • tensorboardX

GPU

  • CUDA (we recommend using the latest version. The version 9.0 was used in all our experiments.)

Downloading Datasets & Pre-trained Models

Note: Due to licensing issues, we can't release preprocessed CSD dataset. However, if you have a license to use CSD dataset please email us and we will send you the preprocessed dataset.

Dataset Model
QM9 Data Model
COD Data Model

Training Conditional Variational Graph Auto Encoder (CVGAE)

QM9

python PredX_train.py --data QM9 --mpnn-steps 3

COD

python PredX_train.py --data COD --mpnn-steps 5

Loading & Generation from Pre-trained Models

QM9

python PredX_train.py --data QM9 --loaddir qm9_model/neuralnet_model_best.ckpt --test --mpnn_steps 3

COD

python PredX_train.py --data COD --loaddir cod_model/neuralnet_model_best.ckpt --test --mpnn_steps 5

Running Force-Field Baselines (ETKDG + MMFF/UFF)

QM9/COD/CSD

python baseline.py --data QM9 --num-total-samples 100 --num-parallel-samples 10 --num-threads 10

Running Force-Field Baselines where initial atom coordinates are provided by neural network (CVGAE + MMFF)

QM9/COD/CSD

python baseline_nn.py --data QM9 --nn-path /path/to/qm9_cvgae_confs Notes: --nn-path points to the folder containing saved conformations generated by CVGAE. These saved conformations by CVGAE can be obtained by adding --savepermol argument during loading/generation stage Example (QM9): python PredX_train.py --data QM9 --loaddir qm9_model/neuralnet_model_best.ckpt --savepermol --test --mpnn_steps 3

Instead of saving conformations by CVGAE and loading them separately, you can also run CVGAE + MMFF all together by adding --useFF argument during loading/generation stage Example (QM9): python PredX_train.py --data QM9 --loaddir qm9_model/neuralnet_model_best.ckpt --useFF --test --mpnn_steps 3

Additional Scripts

QM9_featurize.py, QM9_sdf_to_p.py, COD_featurize.py, COD_sdf_to_p.py, CSD_featurize.py, CSD_sdf_to_p.py are scripts for preprocessing QM9, COD, CSD datasets respectively

Citation

If you find the resources in this repository useful, please consider citing:

@article{Mansimov:19,
  author    = {Elman Mansimov and Omar Mahmood and Seokho Kang and Kyunghyun Cho},
  title     = {Molecular Geometry Prediction using a Deep Generative Graph Neural Network},
  year      = {2019},
  journal   = {arXiv preprint arXiv:1904.00314},
}