/deepHops

source code for deppHop

Primary LanguagePythonMIT LicenseMIT

DeepHop

Supporting Information for the paper "Deep Scaffold Hopping with Multimodal Transformer Neural Networks"

DeepHop is a multi-modal molecular transformation framework. It accepts a hit molecule and an interest target protein sequence as inputs and design isofunctional molecular structures to the source compound.

DeepHop

Installation

Create a conda environment for QSAR-scorer:

conda create env -f=score/env.yaml

Create a conda environment for Deephop:

conda create env -f=deephop/env.yaml

Note that you should replace three source files(batch.py, field.py, iterator.py) of the torchtext library in "your deep hop env path/python3.7/site-packages/torchtext/data" with the corrsponding three files contained in "deephop/replace_torchtext" since we have modified the codes.

Scaffold hopping pairs construction

For the convenience of illustration, We assume that: you code extract in /data/u1/projects/mget_3d environment for deephop is named deephop_env

cd /data/u1/projects/mget_3d
conda activate deephop_env

you can use make_pair.py to generate hopping pairs.

Dataset split

python split_data.py -out_dir data40_tue_3d/0.60 -protein_group  data40 -target_uniq_rate 0.6 -hopping_pairs_dir hopping_pairs_with_scaffold

Data preprocessing

python preprocess.py -train_src data40_tue_3d/0.60/src-train.txt -train_tgt data40_tue_3d/0.60/tgt-train.txt -train_cond data40_tue_3d/0.60/cond-train.txt -valid_src data40_tue_3d/0.60/src-val.txt -valid_tgt data40_tue_3d/0.60/tgt-val.txt -valid_cond data40_tue_3d/0.60/cond-val.txt -save_data data40_tue_3d/0.60/seqdata -share_vocab -src_seq_length 1000 -tgt_seq_length 1000 -src_vocab_size 1000 -tgt_vocab_size 1000 -with_3d_confomer

Model training

python train.py -condition_dim 768  -use_graph_embedding -arch after_encoding -data data40_tue_3d/0.60/seqdata -save_model experiments/data40_tue_3d/after/models/model -seed 42 -save_checkpoint_steps 158 -keep_checkpoint 400 -train_steps 95193 -param_init 0 -param_init_glorot -max_generator_batches 32 -batch_size 8192 -batch_type tokens -normalization tokens -max_grad_norm 0 -accum_count 4 -optim adam -adam_beta1 0.9 -adam_beta2 0.998 -decay_method noam -warmup_steps 475 -learning_rate 2 -label_smoothing 0.0 -report_every 10 -layers 4 -rnn_size 256 -word_vec_size 256 -encoder_type transformer -decoder_type transformer -dropout 0.1 -position_encoding -share_embeddings -global_attention general -global_attention_function softmax -self_attn_type scaled-dot -heads 8 -transformer_ff 2048 -log_file experiments/data40_tue_3d/after/train.log -tensorboard -tensorboard_log_dir experiments/data40_tue_3d/after/logs -world_size 4 -gpu_ranks 0 1 2 3 -valid_steps 475 -valid_batch_size 32

Hops generation

To generate the output SMILES by loading saved model

python translate.py -condition_dim 768  -use_graph_embedding -arch after_encoding -with_3d_confomer -model /data/u1/projects/mget_3d/experiments/data40_tue/3d_gcn/models/model_step_9500.pt -gpu 0 -src data40_tue_3d/src-test.txt -cond data40_tue_3d/cond-test.txt -output /data/u1/projects/mget_3d/summary_tue/data40/after/9500/pred.txt -beam_size 10 -n_best 10 -batch_size 16 -replace_unk -max_length 200 -fast -use_protein40

Evaluation

To evaluate our model

 python pvalue_score_predictions.py -beam_size 10 -src summary_tue/data40/after/9500/src-test-protein.txt -prediction /data/u1/projects/mget_3d/summary_tue/data40/after/9500/pred.txt -score_file /data/u1/projects/mget_3d/summary_tue/data40/after/9500/score.csv -invalid_smiles -cond summary_tue/data40/after/9500/cond-test-protein.txt -train_data_dir /data/u1/projects/mget_3d/data40_tue_3d -scorer_model_dir /data/u1/projects/score/total_mtr -pvalue_dir /data/u1/projects/mget_3d/score_train_data

where the final result report is saved at /data/u1/projects/mget_3d/summary_tue/data40/after/9500/score_final.csv

Citation

Please cite the following paper if you use this code in your work.

@article{zheng2021deep,
  title={Deep scaffold hopping with multimodal transformer neural networks},
  author={Zheng, Shuangjia and Lei, Zengrong and Ai, Haitao and Chen, Hongming and Deng, Daiguo and Yang, Yuedong},
  journal={Journal of cheminformatics},
  volume={13},
  number={1},
  pages={1--15},
  year={2021},
  publisher={Springer}
}