Beam search for automated design and scoring of novel ROR ligands with machine intelligence

Table of Contents

  1. Description
  2. Requirements
  3. How to run an example experiment
  4. How to run the experiments of the paper
  5. How to run an experiment on your own data
  6. Code organisation
  7. Note
  8. Acknowledgements
  9. How to cite this work
  10. License
  11. Address

Description

This is the supporting code for the paper «Beam search for automated design and scoring of novel ROR ligands with machine intelligence». This code allows you to replicate the experiments of the paper as well as running our method on your own set of molecules.

Access on the journal webpage

Preprint version (not up to date with the published version)

Abstract of the paper: Chemical language models enable de novo drug design without the requirement for explicit molecular construction rules. While such models have been applied to generate novel compounds with desired bioactivity, the actual prioritization and selection of the most promising computational designs remains challenging. In this work, we leveraged the probabilities learnt by chemical language models with the beam search algorithm as a model-intrinsic technique for automated molecule design and scoring. Prospective application of this method yielded three novel inverse agonists of retinoic acid receptor-related orphan receptors (RORs). Each design was synthesizable in three reaction steps and presented low-micromolar to nanomolar potency towards RORγ. This model-intrinsic sampling technique eliminates the strict need for external compound scoring functions, thereby further extending the applicability of generative artificial intelligence to data-driven drug discovery.

Requirements

First, you need to clone the repository:

git clone git@github.com:ETHmodlab/molecular_design_with_beam_search.git

Then, you can run the following command, which will create a conda virtual environement and install all the needed dependencies (if you don't have conda installed, you can get it first by following the instructions here):

cd molecular_design_with_beam_search/
sh install.sh

This command will also install a git submodule in order to use the WHALES descriptor. Once the installation is done, you can activate the conda virtual environement:

conda activate molb

Please note that you will need to activate this virtual environement every time you want to use this project.

How to run an example experiment

Now, you can try quickly the code with a toy experiment by using our example configuration file (which contains explanation of each parameter). By running it with the following command, you will do a fine-tuning experiment with the four natural products modulators of RORγ used in this paper, for two epochs only (to be fast), and sample molecules with the beam search:

cd experiments/
sh run_morty.sh configfiles/0_parameters_example.ini

The results of the analysis can be found in experiments/results/0_parameters_example/. There, you can find a picture with the top ranked molecules, a reproduction of the paper's figures and a .txt file with the SMILES of the top ranked molecules. Please note that sampled molecules not found in ChEMBL based on a similarity search with their webresource client are used in the results.

How to run the experiments of the paper

If you want to run the same experiments as in the paper, you can run the following for the fine-tuning on the four natural products:

cd experiments/
sh run_morty.sh configfiles/A_experiment_one_fine_tuning_step.ini

Or those two experiments in sequence, as the second part of the experiments (as defined by B2_experiment_two_fine_tuning_steps.ini) needs the results of the first part (B1_experiment_two_fine_tuning_steps.ini), for the experiment with the two-steps fine-tuning:

cd experiments/
sh run_morty.sh configfiles/B1_experiment_two_fine_tuning_steps.ini
sh run_morty.sh configfiles/B2_experiment_two_fine_tuning_steps.ini

Note that the experiment with the fine-tuning on the four natural products is fast, even on a CPU. If you don't have a GPU, some patience will be needed, even though we provided the pretrained weights of the chemical language model. Moreover, make sure you run B1_experiment_two_fine_tuning_steps.ini before B2_experiment_two_fine_tuning_steps.ini, as B2_experiment_two_fine_tuning_steps.ini uses the model trained in B1_experiment_two_fine_tuning_steps.ini.

How to run an experiment on your own data

To do an experiment on your own set of molecules, you will need to create your own configuration file (the .ini file). In this file, you can choose your own parameters for the beam serach and the final ranking, as well as give the path your fine-tuning molecules (a .txt file with one SMILES string per line).
Then, you can just run the following command:

sh run_morty.sh configfiles/{your_parameter_file_name}.ini

You will find the results of your experiment in experiments/results/{your_parameter_file_name}/

Code organisation

The main script (run_morty.sh) that allows you to run the full experiment with one command can be used separately. If you wish, for example, to only fine-tune a model on your own data, you can run the following:

sh run_training.sh configfiles/{your_parameter_file_name}.ini

All specific scripts (to fine-tune, do the plots, etc) can be run in the same way.

Note

This work (code and paper) is build on top of our previous research. Notably, if you wish to pretrain a chemical langauge model on your own data—rather than using one of the two available pretrained models here—we recommend you to use the open source code of our previous paper (https://github.com/ETHmodlab/virtual_libraries).

Acknowledgements

This research was supported by the Swiss National Science Foundation (grant no. 205321_182176 to Gisbert Schneider), the RETHINK initiative at ETH Zurich and the Novartis Forschungsstiftung (FreeNovationgrant “AI in Drug Discovery” to Gisbert Schneider).

How to cite this work

Moret, M., Helmstädter, M., Grisoni, F., Schneider, G. and Merk, D. (2021), Beam search for automated design and scoring of novel ROR ligands with machine intelligence. Angew. Chem. Int. Ed.. Accepted Author Manuscript. https://doi.org/10.1002/anie.202104405

License

MIT License

Address

MODLAB
ETH Zurich
Inst. of Pharm. Sciences
HCI H 413
Vladimir-​Prelog-Weg 4
CH-​8093 Zurich