Trained Models

IEV2Mol: MAIN/model/

Data

DM-QP-1M

SMILES:
- all: MAIN/data/Druglike_million_canonical_no_dot_dup.smi
  - This file is zipped. If you use, unzip -j Druglike_million_canonical_no_dot_dup.smi.zip in MAIN/data

active compound dataset

SMILES, grid file, interaction file :
- all: MAIN/data/{protein}/
IFP
- all: IFP-RNN/MAIN/{protein}/

ChEMBL33

SMILES
- all: MAIN/data/chembl_33_no_dot.smi
  - This file is zipped. If you use, unzip -j chembl_33_no_dot.smi.zip in MAIN/data

Environment

conda env create -f=iev_vae_env.yml

If you want to train IFP-RNN, it is necessary to create some enviroments by run commands below, and install mgltools etc. See README in IFP-RNN directry.

conda env create -f=IFP-RNN.yml
conda env create -f=IFP-RNN_py3_7.yml
conda env create -f=vina.yml

Training

IEV2Mol

cd MAIN/model
python train_iev2mol.py

JT-VAE

Making training data from SMILES

JTVAE/JTVAE/FastJTNNpy3/fast_molvae sh preprocess_drd2_no_dot.sh

Training

cd MAIN/model
python train_jtvae.py

IFP-RNN

Calculate IFP from SDF

cd IFP-RNN/MAIN_sdf
python ../AIFP/prepare_sdf_glide.py --sdf drd2_all_smiles_MAIN_HTVS_pv.sdf  --work_dir ./
python ../AIFP/create_reference_sdf.py --config config_ifp.txt --protein 6cm4.pdb --sdf drd2_all_smiles_MAIN_HTVS_pv_prepared.sdf --n_jobs 50
python ../AIFP/create_IFP_sdf.py --config config_ifp.txt --sdf drd2_all_smiles_MAIN_HTVS_pv_prepared.sdf --n_jobs 50
python split_IFP_ResIFP.py
python ../AIFP/prepare_ddc_input_sdf.py --dataset IFP_ResIFP_train.csv --info Tmp_drd2_all_smiles_MAIN_HTVS_pv_prepared/info.csv --type aifp
python ../AIFP/prepare_ddc_input_sdf.py --dataset IFP_ResIFP_test.csv --info Tmp_drd2_all_smiles_MAIN_HTVS_pv_prepared/info.csv --type aifp

Training

python ../train_ddc.py --train_csv IFP_ResIFP_train_AIFPsmi.csv  --load_pkl 0  --save ./results/

Generate and evaluate

Generate compounds and caluculate IEV, docking score, and IEV cosine similarity to seed compounds

IEV2Mol

Results are saved in MAIN/evaluate_model/results/{protein}/test{i}/raw_csv/iev2mol.csv

cd MAIN/evaluate_model/iev2mol
python make_csv.py

JT-VAE

Results are saved in MAIN/evaluate_model/results/{protein}/test{i}/raw_csv/jt-vae.csv

cd MAIN/evaluate_model/jt-vae
python make_csv.py

IFP-RNN

Generate

cd IFP-RNN/MAIN
python test_model.py --model results/fullBits--80--0.1731--0.0010000 --IFP IFP_ResIFP_test_AIFPsmi.csv --save generated1000smi

Before executing the following command, separate the compounds generated above and save them as MAIN/evaluate_model/ifp-rnn/test{i}/CHEMBL4467359_0.smi .

Evaluate（Results are saved in MAIN/evaluate_model/results/{protein}/test{i}/raw_csv/ifp-rnn.csv）

cd MAIN/evaluate_model/ifp-rnn
python make_csv.py

Drawing all valid compounds generated

Results are saved in MAIN/evaluate_model/results/{protein}/test{i}/mol_from_rawcsv/

cd MAIN/evaluate_model
python make_figs_from_rawcsv.py

Draw the compounds and their distributions that satisfy the Tanimoto coefficient and IEV cosine similarity thresholds.

Save images of all compounds that meet the threshold and one compound that meets the Tanimoto similarity threshold and has the highest IEV cosine similarity.（Results are saved in MAIN/evaluate_model/results/{protein}/test{i}/tanimoto{threshold}_ievcos{threshold}}/）
The number of compounds that meet the threshold within each test data and the average across the test data are standardized outputs.

cd MAIN/evaluate_model
python find_high_ievcos_row_tanimoto.py

Plot chemical space and compounds generated by IEV2Mol

Results are saved in MAIN/evaluate_model/results/{protein}/test{i}/chemicalspace.jpeg
- Plot kernel density estimates of ECFP4 reduced to two dimensions by PCA for 10000 randomly selected compounds from the DM-QP-1M dataset and compounds from the DRD2 Active dataset.
- Each test data point and the 100 data points generated by IEV2Mol using it as a seed are plotted over the chemical space above.

cd MAIN/evaluate_model
python plot_chemicalspace.py

Plot distributions of IEV cosine similarity to seed compoud, Tanimoto coefficient, and docking score

Each results of 10 test data are saved in MAIN/evaluate_model/results/{protein}/test{i}/
Results of all test data are saved in MAIN/evaluate_model/results/{protein}/

cd MAIN/evaluate_model
python plot_density_graph.py

Calculate validity, uniqueness, diversity, number of dockable, number of IEV cosine similarity≥0.7 and show them on standard output

cd MAIN/evaluate_model
python make_metrics_from_raw_csv.py

sekijima-lab/IEV2Mol

Trained Models

Data

DM-QP-1M

active compound dataset

ChEMBL33

Environment

Training

IEV2Mol

JT-VAE

IFP-RNN

Generate and evaluate

Generate compounds and caluculate IEV, docking score, and IEV cosine similarity to seed compounds

IEV2Mol

JT-VAE

IFP-RNN

Drawing all valid compounds generated

Draw the compounds and their distributions that satisfy the Tanimoto coefficient and IEV cosine similarity thresholds.

Plot chemical space and compounds generated by IEV2Mol

Plot distributions of IEV cosine similarity to seed compoud, Tanimoto coefficient, and docking score

Calculate validity, uniqueness, diversity, number of dockable, number of IEV cosine similarity≥0.7 and show them on standard output