Generative_Models_benchmark_gdb13

This repository includes the script used training, sampling and analyzing of generative models in the project "Comparative study of deep generative models on chemical space coverage". The training and sampling of models were followed the instruction in the related Github repositories.

image Fig. 1 Coverage of GDB-13 from 1B Sampled Compounds

image Fig. 2 Distribution of ring systems and functional groups in GDB-13

Requirements:

  • rdkit
  • collections
  • numpy
  • pandas
  • functools
  • multiprocessing
  • tqdm
  • pathlib
  • argparse

Usage:

To calculate the function groups of the SMILES in the file gdb13_split_0.csv.

python cal_ifg_atom.py -n 40 -i gdb13_split_0

To calculate the ringsystems of the SMILES in the file gdb13_split_0.csv.

python cal_ringsystem.py -n 40 -i gdb13_split_0
  • -n number of threads will be used to process the file.
  • -i name of the input file without '.csv'

To count the atoms in the compounds of train.smi.

python atom_counts.py -n 2 -i Dataset/train.smi

After calculation, all the total number of atoms in the compounds of input file will be saved into another file with a file suffix '_atom_counts.csv'.

Related github repository