/IEV2Mol

Primary LanguagePythonMIT LicenseMIT

Trained Models

  • IEV2Mol: MAIN/model/

Data

DM-QP-1M

  • SMILES:
    • all: MAIN/data/Druglike_million_canonical_no_dot_dup.smi
      • This file is zipped. If you use, unzip -j Druglike_million_canonical_no_dot_dup.smi.zip in MAIN/data

active compound dataset

  • SMILES, grid file, interaction file :

    • all: MAIN/data/{protein}/
  • IFP

    • all: IFP-RNN/MAIN/{protein}/

ChEMBL33

  • SMILES
    • all: MAIN/data/chembl_33_no_dot.smi
      • This file is zipped. If you use, unzip -j chembl_33_no_dot.smi.zip in MAIN/data

Environment

conda env create -f=iev_vae_env.yml

If you want to train IFP-RNN, it is necessary to create some enviroments by run commands below, and install mgltools etc. See README in IFP-RNN directry.

conda env create -f=IFP-RNN.yml
conda env create -f=IFP-RNN_py3_7.yml
conda env create -f=vina.yml

Training

IEV2Mol

cd MAIN/model
python train_iev2mol.py

JT-VAE

  • Making training data from SMILES
JTVAE/JTVAE/FastJTNNpy3/fast_molvae sh preprocess_drd2_no_dot.sh

  • Training
cd MAIN/model
python train_jtvae.py

IFP-RNN

  • Calculate IFP from SDF
cd IFP-RNN/MAIN_sdf
python ../AIFP/prepare_sdf_glide.py --sdf drd2_all_smiles_MAIN_HTVS_pv.sdf  --work_dir ./
python ../AIFP/create_reference_sdf.py --config config_ifp.txt --protein 6cm4.pdb --sdf drd2_all_smiles_MAIN_HTVS_pv_prepared.sdf --n_jobs 50
python ../AIFP/create_IFP_sdf.py --config config_ifp.txt --sdf drd2_all_smiles_MAIN_HTVS_pv_prepared.sdf --n_jobs 50
python split_IFP_ResIFP.py
python ../AIFP/prepare_ddc_input_sdf.py --dataset IFP_ResIFP_train.csv --info Tmp_drd2_all_smiles_MAIN_HTVS_pv_prepared/info.csv --type aifp
python ../AIFP/prepare_ddc_input_sdf.py --dataset IFP_ResIFP_test.csv --info Tmp_drd2_all_smiles_MAIN_HTVS_pv_prepared/info.csv --type aifp
  • Training
python ../train_ddc.py --train_csv IFP_ResIFP_train_AIFPsmi.csv  --load_pkl 0  --save ./results/

Generate and evaluate

Generate compounds and caluculate IEV, docking score, and IEV cosine similarity to seed compounds

IEV2Mol

  • Results are saved in MAIN/evaluate_model/results/{protein}/test{i}/raw_csv/iev2mol.csv
cd MAIN/evaluate_model/iev2mol
python make_csv.py

JT-VAE

  • Results are saved in MAIN/evaluate_model/results/{protein}/test{i}/raw_csv/jt-vae.csv
cd MAIN/evaluate_model/jt-vae
python make_csv.py

IFP-RNN

  • Generate
cd IFP-RNN/MAIN
python test_model.py --model results/fullBits--80--0.1731--0.0010000 --IFP IFP_ResIFP_test_AIFPsmi.csv --save generated1000smi

Before executing the following command, separate the compounds generated above and save them as MAIN/evaluate_model/ifp-rnn/test{i}/CHEMBL4467359_0.smi .

  • Evaluate(Results are saved in MAIN/evaluate_model/results/{protein}/test{i}/raw_csv/ifp-rnn.csv
cd MAIN/evaluate_model/ifp-rnn
python make_csv.py

Drawing all valid compounds generated

  • Results are saved in MAIN/evaluate_model/results/{protein}/test{i}/mol_from_rawcsv/
cd MAIN/evaluate_model
python make_figs_from_rawcsv.py 

Draw the compounds and their distributions that satisfy the Tanimoto coefficient and IEV cosine similarity thresholds.

  • Save images of all compounds that meet the threshold and one compound that meets the Tanimoto similarity threshold and has the highest IEV cosine similarity.(Results are saved in MAIN/evaluate_model/results/{protein}/test{i}/tanimoto{threshold}_ievcos{threshold}}/
  • The number of compounds that meet the threshold within each test data and the average across the test data are standardized outputs.
cd MAIN/evaluate_model
python find_high_ievcos_row_tanimoto.py

Plot chemical space and compounds generated by IEV2Mol

  • Results are saved in MAIN/evaluate_model/results/{protein}/test{i}/chemicalspace.jpeg
    • Plot kernel density estimates of ECFP4 reduced to two dimensions by PCA for 10000 randomly selected compounds from the DM-QP-1M dataset and compounds from the DRD2 Active dataset.
    • Each test data point and the 100 data points generated by IEV2Mol using it as a seed are plotted over the chemical space above.
cd MAIN/evaluate_model
python plot_chemicalspace.py

Plot distributions of IEV cosine similarity to seed compoud, Tanimoto coefficient, and docking score

  • Each results of 10 test data are saved in MAIN/evaluate_model/results/{protein}/test{i}/
  • Results of all test data are saved in MAIN/evaluate_model/results/{protein}/
cd MAIN/evaluate_model
python plot_density_graph.py

Calculate validity, uniqueness, diversity, number of dockable, number of IEV cosine similarity≥0.7 and show them on standard output

cd MAIN/evaluate_model
python make_metrics_from_raw_csv.py