/DrugEx

De Novo Drug Design with RNNs and Transformers

Primary LanguageJupyter NotebookMIT LicenseMIT

DrugEx

DrugEx is an open-source software library for de novo design of small molecules with deep learning generative models in a multi-objective reinforcement learning framework. The package contains multiple generator architectures and a variety of scoring tools and multi-objective optimisation methods. It has a flexible application programming interface and can readily be used via the command line interface [[1](https://pubs.acs.org/doi/10.1021/acs.jcim.3c00434)] (see [Quick Start](#quick-start) to get to work right away).

History

This software is a continuation of the original and incremental work of Liu et al.'s DrugEx [2,3,4] and is currently developed by Gerard van Westen's Computational Drug Discovery group in Leiden, Netherlands. The first version of DrugEx [2] consisted of a recurrent neural network (RNN) single-task agent of gated recurrent units (GRU) which were updated to long short-term memory (LSTM) units in the second version [3], also introducing MOO-based RL and an updated exploitation-exploration strategy. In its third version, [4] generators based on a variant of the transformer and a novel graph-based encoding allowing for the sampling of molecules with specific substructures were introduced. This package builds on these works and provides a unified API with increased usability and flexibile enough for customization. However, new additional features are beeing added as well [1]. Furthermore, the development and traning of QSAR models, used to score molecules during reinforcement learning has been moved to a separate QSPRpred-package, which became a useful library in its own right.

Workflow

The DrugEx package provides classes to standardize, clean and encode molecules for the various deep learning algorithms provided in the package as well as features to set up and monitor training and optimization. The resulting models can be used readily for generation of focused libraries and are easily transferable.

Fig1

Quick Start

A small step for exploring the drug space in need, a giant leap for exploiting a healthy state indeed.

Installation

DrugEx can be installed with pip like so:

pip install git+https://github.com/CDDLeiden/DrugEx.git@master

Optional Dependencies

QSPRPred - Optional package to install if you want to use the command line interface of DrugEx, which requires the models to be serialized with this package. It is also used by some examples in the tutorial. Install DrugEx with the following command if you want these features:

pip install "drugex[qsprpred] @ git+https://github.com/CDDLeiden/DrugEx.git@master"

RAscore - If you want to use the Retrosynthesis Accessibility Score in the desirability function.

  • The installation of RAscore might downgrade the scikit-Learn packages. If this happens, scikit-Learn should be re-upgraded.

Use

After installation, you will have access to various command line features, but you can also use the Python API directly. Documentation for the current version of both is available here. For a quick start, you can also check out our Jupyter notebook tutorial, which documents the use of the Python API to build different types of models, or take look at the CLI examples. The tutorials as well as the documentation are still work in progress, and we will be happy for any contributions where it is still lacking.

This repository contains almost all models implemented throughout DrugEx history. We also make the following pretrained models available to be used with this package. You can retrieve them from the following table (not all models are available at this moment, but we will keep adding them):

Model RNN SMILES-Based Transformer Graph-Based Transformer
type fragmentation
GRU LSTM BRICS RECAP BRICS RECAP
ChEMBL 27 - Zenodo - - Zenodo -
ChEMBL 31 Zenodo Zenodo - - Zenodo -
Papyrus 05.5 Zenodo Zenodo Zenodo Zenodo Zenodo Zenodo

Hardware Requirements

The DrugEx toolkit offers a variety of models with varying complexities, each with its unique hardware requirements. In order to enable the full suite of models, the user must have a GPU compatible with CUDA 9.2, with a minimum of 8 GB of video memory. This is to facilitate that the models can be transferred to the GPU along with sufficiently large training batches.

It is noteworthy, however, that even on a suboptimal configuration, it should be possible to fine-tune and optimize the basic sequential RNN model using reinforcement learning techniques if a pretrained model is used. Regarding the two transformers, we recommend leveraging multiple GPUs to increase throughput via parallelization, automated by the DrugEx package. This technique divides the model's workload across multiple GPUs, enabling the system to handle more significant volumes of data at a faster rate than when using a single GPU.

License

The software is licensed under the standard MIT license, which means it is free to use also in commercial applications as long as the copyright terms of the license are preserved. You can view the LICENSE file for the full terms. If you have questions about the license or the use of the software in your organization, please, contact Gerard J.P. van Westen:

Gerard J.P. van Westen: gerard@lacdr.leidenuniv.nl

Current Development Team

Contributions

If you find that there is something missing, have a question, or you just want to contribute a new model or feature, please, feel free to open an issue to initiate a discussion. We are more than happy to improve the package with your contributions, bug reports and ideas. After the feature is discussed in its designated issue, the best way to contribute is to fork the repository, make your changes and then create a pull request. We will then review your changes and merge them into the main repository. Alternatively, you can contact us directly via email.

Acknowledgements

We would like to thank the following people for significant contributions:

  • Xuhan Liu
    • author of the original idea to develop the DrugEx models and code, we are happy for his continuous support of the project

We also thank the following Git repositories that gave Xuhan a lot of inspirations:

  1. REINVENT
  2. ORGAN
  3. SeqGAN

References

[1] Sicho M., Luukkonen S., van den Maagdenberg H.W., Schoenmaker L., Béquignon O.J.M., van Westen G.J.P. DrugEx: Deep Learning Models and Tools for Exploration of Drug-like Chemical Space. J. Chem. Inf. Model., 2023, 63, 12.

[2] Liu X., Ye K., van Vlijmen H.W.T, IJzerman A.P., van Westen G.J.P. An Exploration Strategy Improves the Diversity of de novo Ligands Using Deep Reinforcement Learning: a case for the adenosine A2A receptor. J Cheminform., 2019, 11, 35.

[3] Liu X, Ye K, van Vlijmen H.W.T, Emmerich M.T.M., IJzerman AP, van Westen G.J.P. DrugEx v2: De Novo Design of Drug Molecule by Pareto-based Multi-Objective Reinforcement Learning in Polypharmacology. J Cheminform., 2021, 13, 85.

[4] Liu, X., Ye, K., van Vlijmen, H.W.T. IJzerman A.P., van Westen G.J.P. DrugEx v3: Scaffold-Constrained Drug Design with Graph Transformer-based Reinforcement Learning. J Cheminform., 2023, 15, 24.