/READRetro

Official code repository for the paper READRetro: Natural Product Biosynthesis Planning with Retrieval-Augmented Dual-View Retrosynthesis

Primary LanguagePythonMIT LicenseMIT

READRetro: Natural Product Biosynthesis Planning with Retrieval-Augmented Dual-View Retrosynthesis

This is the official code repository for the paper READRetro: Natural Product Biosynthesis Planning with Retrieval-Augmented Dual-View Retrosynthesis (bioRxiv, 2023).
We also provide a web version for ease of use.

Data

Download the necessary data folder READRetro_data from Zenodo to ensure proper execution of the code and demonstrations in this repository.

The directory structure of READRetro_data is as follows:

READRetro_data
    ├── data.sh
    ├── data
    │   ├── model_train_data
    │   └── multistep_data
    ├── model
    │   ├── bionavi
    │   ├── g2s
    │   │   └── saved_models
    │   ├── megan
    │   └── retroformer
    │       └── saved_models
    ├── result
    └── scripts

Place READRetro_data into the READRetro directory (i.e., READRetro/READRetro_data) and run sh data.sh in READRetro_data to set up the data.

Ensure the data is correctly located in READRetro. Verify the following:

  • READRetro/retroformer/saved_models should match READRetro_data/model/retroformer/saved_models.
  • READRetro/g2s/saved_models should match READRetro_data/model/g2s/saved_models.
  • READRetro/data should match READRetro_data/data/multistep_data.
  • READRetro/result should match READRetro_data/result.
  • READRetro/scripts should match READRetro_data/scripts.

The directories READRetro_data/model/bionavi, READRetro_data/model/megan, and READRetro_data/data/model_train_data are required for reproducing the values in the manuscript.

Installation

Run the following commands to install the dependencies:

conda create -n readretro python=3.8
conda activate readretro
conda install pytorch==1.12.0 cudatoolkit=11.3 -c pytorch
pip install easydict pandas tqdm numpy==1.22 OpenNMT-py==2.3.0 networkx==2.5
conda install -c conda-forge rdkit=2019.09

Alternatively, you can install the readretro package through pip:

conda create -n readretro python=3.8 -y
conda activate readretro
pip install readretro==1.2.0

Model Preparation

We provide the trained models through Zenodo.
You can use your own models trained using the official codes (https://github.com/coleygroup/Graph2SMILES and https://github.com/yuewan2/Retroformer).
More detailed instructions can be found in demo.ipynb.

Single-step Planning and Evaluation

Run the following commands to evaluate the single-step performance of the models:

CUDA_VISIBLE_DEVICES=${gpu_id} python eval_single.py                    # ensemble
CUDA_VISIBLE_DEVICES=${gpu_id} python eval_single.py -m retroformer     # Retroformer
CUDA_VISIBLE_DEVICES=${gpu_id} python eval_single.py -m g2s -s 200      # Graph2SMILES

Multi-step Planning

Run the following command to plan paths of multiple products using multiprocessing:

CUDA_VISIBLE_DEVICES=${gpu_id} python run_mp.py
# e.g., CUDA_VISIBLE_DEVICES=0 python run_mp.py

You can modify other hyperparameters described in run_mp.py.
Lower num_threads if you run out of GPU capacity.

Run the following command to plan the retrosynthesis path of your own molecule:

CUDA_VISIBLE_DEVICES=${gpu_id} python run.py ${product}
# e.g., CUDA_VISIBLE_DEVICES=0 python run.py 'O=C1C=C2C=CC(O)CC2O1'

Using the command from pip

run_readretro -rc ${retroformer_ckpt} -gc ${g2s_ckpt} ${product}
# e.g., run_readretro -rc retroformer/saved_models/biochem.pt -gc g2s/saved_models/biochem.pt 'O=C1C=C2C=CC(O)CC2O1'
# you can replace the checkpoints with your own trained checkpoints of retroformer and g2s
# you should set the corresponding vocab file as an option if you replace the checkpoints

You can modify other hyperparameters described in run.py.

Multi-step Evaluation

Run the following command to evaluate the planned paths of the test molecules:

python eval.py ${save_file}
# e.g., python eval.py result/debug.txt

Demo

You can reproduce the figures and tables presented in the paper or train your own models by utilizing the provided demo.ipynb.