ParetoDrug: A Python repository from CNDOTA

ParetoDrug — Official PyTorch Implementation

This repository contains the official PyTorch implementation of the paper: Enabling Target-Aware Molecule Generation to Follow Multi Objectives with Pareto MCTS, which is published in Communications Biology.

Datasets

data/train-val-data.tsv: It contains all sequence pairs for training and validation.
data/train-val-split.json: It contains the index of the training pairs and test pairs in train-val-data.tsv.
data/testing-proteins-100.txt: It contains all pdbids of the testing proteins which can be downloaded from PDBbind website.

Requirements

Install

Pleases follow these commands to install the environment:

conda create -n paretodrug python=3.7

conda activate paretodrug

conda install -c conda-forge smina=2020.12.10 rdkit=2020.09.5 openbabel=3.1.1

conda install -c bioconda mmseqs2=13.45111

pip install biopython==1.79 pandas==1.3.4

pip install loguru biopython graphviz easydict tqdm scipy

pip3 install torch==1.13.1 torchvision torchaudio

Model Training

Before training, please make sure train-val-data.tsv is in the data folder.
There are several key args for training listed as follows:
Argument Description Default Type

--layers Number of layers in transformer 4 int

--bs Batch size 32 int

Argument	Description	Default	Type
--layers	Number of layers in transformer	4	int
--bs	Batch size	32	int

Train Lmser Transformer:

cd your_project_path
python train.py --layers 4 --bs 32 --device 0,1,2,3

Pretrained Model

We provide the pretrained model for LT (Lmser Transformer) as follows:

Model	Path
Lmser Transformer	./experiment/LT/model/30.pt

Run Monte Carlo Tree Search (MCTS)

The computational resources to run ParetoDrug normally are 1 GPU and 8 CPU cores. The running time lasts for several hours as ParetoDrug performs MCTS and inferences with the pretrained generative model. You can set a smaller 'st' parameter to reduce the running time.

There are several key args for MCTS listed as follows:

Argument	Description	Default	Type
-k	Protein index	0	int
-st	Number of simulation times in MCTS	150	int
-p	NN model path	LT	str
--max	max mode or freq mode	True	bool
-g	GPU index	0	int

Multi-objective SBDD

Here is an example of running ParetoDrug on protein 1a9u (protein index 0 in test proteins) with 150 simulation times using the pretrained model LT in max mode with GPU 0.

python pareto_mcts.py -k 0 -g 0 -st 150 -p LT --max

Multi-objective SBDD for the specified protein structure

If you want to generate molecules for your own PDB file, please provide the PDB file named #PDBid_protein.pdb and ligand file named #PDBid_ligand.sdf and put them in the "/data/test_pdbs/#PDBid/" folder, then run the following command with the parameter "--protein #PDBid" such as "--protein 1a9u". Note that this will change the original protein index in "/data/test_pdbs".

python pareto_mcts_case.py --protein 1a9u -g 0 -st 150 -p LT --max

Multi-target SBDD with case HIV

For the multi-target SBDD case study of finding HIV-related dual-inhibitor molecules, please run the following command.

python mt_pareto_mcts.py -q HIV

Multi-target multi-objective SBDD with case Lapatinib

For the multi-target multi-objective SBDD case study of the drug Lapatinib, please run the following command.

python mtmo_pareto_mcts.py -q Lapatinib

Cite this article

Yang, Y., Chen, G., Li, J. et al. Enabling target-aware molecule generation to follow multi objectives with Pareto MCTS. Commun Biol 7, 1074 (2024). https://doi.org/10.1038/s42003-024-06746-w

Acknowledgements

This repo is built upon the article: AlphaDrug: protein target specific de novo molecular generation and its repo https://github.com/CMACH508/AlphaDrug. We thanks the authors of AlphaDrug for releasing their codes and data. Please also consider to cite it if you use our repo.

CNDOTA/ParetoDrug