Studying the effects of including stereisomeric information in generative models for molecules in optimizing stereochemistry-sensitive properties. We perform optimization on (1) rediscovery of R-albuterol and mestranol, (2) protein-ligand docking, and a stereochemistry-specific (3) CD peak spectra score.
Preprint found on ChemRxiv: Stereochemistry-aware string-based molecular generation. Data files are found on Zenodo
Initialize a python environment, here we use conda, and install the required packages.
git clone git@github.com:aspuru-guzik-group/stereogeneration.git
cd stereogeneration
conda create -n stereogeneration python=3.8
conda activate stereogeneration
pip install -r requirements.txt
XTB will be installed in the requirements.txt
files. Otherwise, you can install from source from xtb from the Grimme Lab. You can also install using conda
. Use the following environment variables:
export MKL_NUM_THREADS=1
export OMP_NUM_THREADS=1,1
export OMP_STACKSIZE=4G
ulimit -s unlimited
Use of CD spectra task will require stda and xtb4stda from the Grimme Lab. The binary files are found in the stereogeneration/stda
directory. The files will have to be made executable, and added to the $PATH
variable:
cd stereogeneration/stda
chmod +x g_spec stda_v1.6.3 xtb4stda
# set file paths which will be used by stda
export PATH=$PATH:$PWD
export XTB4STDAHOME=$PWD
Docking requires executable of the smina
binary:
chmod +x stereogeneration/docking/smina.static
Scripts (main.py
) for running each model are found in the respective folders: reinvent
, janus
, group-janus
. The scripts have commandline arguments that control the fitness function task, and some of the parameters of the models.
python main.py \
--target={1SYH, 1OYT, 6Y2F, cd, fp-albuterol, fp-mestranol} \ # specify task
--stereo # turn on stereo-awareness
The experiments were repeated 10 times for each model each task. The result files are found in Zenodo. The individual runs for each task are saved in folders {i}_stereo
and {i}_nonstereo
for analysis_all.py
, which also requires the zinc.csv
file (available in Zenodo) to be located in the repo directory:
python analysis_all.py \
--target={1SYH, 1OYT, 6Y2F, cd, fp-albuterol, fp-mestranol}
--root_dir='.' # where the dataset and `stereogeneration` import are found
--label='1SYH' # name for target property label (defaults to 1SYH)
--horizontal # toggles horizontal subplots, exclude for vertical subplots