/DeLinker

Primary LanguagePythonOtherNOASSERTION

DeLinker - Deep Generative Models for 3D Linker Design

This repository contains our implementation of Deep Generative Models for 3D Linker Design (DeLinker).

If you found DeLinker useful, please cite our paper:

Imrie F, Bradley AR, van der Schaar M, Deane CM. Deep Generative Models for 3D Linker Design. Journal of Chemical Information and Modeling. 2020

@Article{Imrie2020,
author={Imrie, Fergus and Bradley, Anthony R. and van der Schaar, Mihaela and Deane, Charlotte M.},
title={Deep Generative Models for 3D Linker Design},
journal={Journal of Chemical Information and Modeling},
year={2020},
month={Mar},
day={20},
publisher={American Chemical Society},
issn={1549-9596},
doi={10.1021/acs.jcim.9b01120},
url={https://doi.org/10.1021/acs.jcim.9b01120}
}

Requirements

This code was tested in Python 3.6 with Tensorflow 1.10.

A yaml file containing all requirements is provided. This can be readily setup using conda.

conda env create -f DeLinker-env.yml
conda activate DeLinker-env

Data Extraction

Two primary datasets (ZINC and CASF) are in use.

To preprocess these datasets, please go to data directory and run prepare_data.py.

python prepare_data.py

Running DeLinker

We provide two settings of DeLinker. The first setting generates linkers with the same number of atoms as the reference molecule. The second setting generates linkers with a specified number of atoms.

To train and generate molecules using the first setting, use:

python DeLinker.py --dataset zinc --config '{"num_epochs": 10, "epoch_to_generate": 10, "train_file": "data/molecules_zinc_train.json", "valid_file": "data/molecules_zinc_valid.json"}'

To generate molecules with a pretrained model using the first setting, use

python DeLinker.py --dataset zinc --restore models/pretrained_DeLinker_model.pickle --config '{"generation": true, "number_of_generation_per_valid": 250, "batch_size": 1, "train_file": "data/molecules_zinc_test.json", "valid_file": "data/molecules_zinc_test.json"}'

To generate molecules using the second setting, use

python DeLinker_test.py --dataset zinc --restore models/pretrained_DeLinker_model.pickle --config '{"generation": true, "number_of_generation_per_valid": 250, "batch_size": 1, "train_file": "data/molecules_zinc_test_mode2.json", "valid_file": "data/molecules_zinc_test_mode2.json", "min_atoms": 3, "max_atoms": 11}'

In both cases, the output is of the following format:

Input fragments (SMILES) Ground truth molecule/fragments (SMILES) Generated molecule (SMILES)

More configurations can be found at function default_params in DeLinker.py.

Evaluation

A script to evaluate the generated molecules is provided in analysis directory.

python evaluate_generated_mols.py ZINC|CASF PATH_TO_GENERATED_MOLS PATH_TO_REFERENCE_MOLS ../data/data_zinc_final_train.txt SAVE_PATH OUTPUT_NAME NUM_CORES True None ./wehi_pains.csv >> log.txt

Pretrained Models and Generated Molecules

We provide a pretrained model:

models/pretrained_DeLinker_model.pickle

Generated molecules can be obtained upon request.

Examples

An example Jupyter notbook demonstrating the use of DeLinker for fragment linking can be found in the examples directory.

Contact (Questions/Bugs/Requests)

Please submit a Github issue or contact Fergus Imrie imrie@stats.ox.ac.uk.