Generative_Models_benchmark_gdb13
This repository includes the script used training, sampling and analyzing of generative models in the project "Comparative study of deep generative models on chemical space coverage". The training and sampling of models were followed the instruction in the related Github repositories.
Fig. 1 Coverage of GDB-13 from 1B Sampled Compounds
Fig. 2 Distribution of ring systems and functional groups in GDB-13
Requirements:
- rdkit
- collections
- numpy
- pandas
- functools
- multiprocessing
- tqdm
- pathlib
- argparse
Usage:
To calculate the function groups of the SMILES in the file gdb13_split_0.csv.
python cal_ifg_atom.py -n 40 -i gdb13_split_0
To calculate the ringsystems of the SMILES in the file gdb13_split_0.csv.
python cal_ringsystem.py -n 40 -i gdb13_split_0
- -n number of threads will be used to process the file.
- -i name of the input file without '.csv'
To count the atoms in the compounds of train.smi.
python atom_counts.py -n 2 -i Dataset/train.smi
After calculation, all the total number of atoms in the compounds of input file will be saved into another file with a file suffix '_atom_counts.csv'.
Related github repository
- CharRNN, AAE, VAE, ORGAN models were retreived from Github repository [https://github.com/molecularsets/moses]
- REINVENT was retreived from Github repository [https://github.com/undeadpixel/reinvent-randomized]
- LatentGAN was retreived from Github repository [https://github.com/Dierme/latent-gan]
- GraphINVENT was retreived from Github repository [https://github.com/MolecularAI/GraphINVENT]