/clone-ChemFlow

Uncover meaningful structures of latent spaces learned by generative models with flows!

Primary LanguagePythonMIT LicenseMIT

ChemFlow: Navigating Chemical Space with Latent Flows

This repo implements the paper 🔗: Navigating Chemical Space with Latent Flows by Guanghao Wei*, Yining Huang*, Chenru Duan, Yue Song, and Yuanqi Du.

Flows can uncover meaningful structures of latent spaces learned by generative models! We propose a unifying framework to characterize latent structures by flows/diffusions for optimization and traversal.

Live Demo

Try our live demo here!

Quick Start

  • Install all dependencies with conda env create -f environment.yml.
    • (Optional) Install AutoDock-GPU for docking binding affinity. See Notes on Compiling AutoDock-GPU.
    • (Recommended) mv .env.defaults .env and specify PROJECT_PATH in .env. This is later used to run the experiments in the project root directory.
  • Download data and put it in the data directory.
  • Train the VAE model by running python experiments/train_vae.py.
  • For supervised learning
    1. Prepare the data by running python experiments/prepare_random_data.py.
    2. Train the supervised surrogate predictor by running bash experiments/supervised/train_prop_predictor.sh.
    3. Train the energy network with supervised semantic guidance by running bash experiments/supervised/train_wavepde_prop.sh.
  • For unsupervised learning
    1. Train the energy network with unsupervised diversity guidance by running python experiments/train_wavepde.py.
    2. Compute the pearson correlation coefficient by running python experiments/unsupervised/corr.py. Refer to notebooks/experiments/unsupervised/corr.ipynb for more details.
    3. Modify experiments/utils/traversal_step.py in place with the best correlation coefficient index.
  • To reproduce the experiment results from the paper, run the following commands:
    • bash experiments/optimization/optimization.sh for similarity constrained optimization.
    • bash experiments/optimization/uc_optim.sh for unconstrained optimization.
    • python experiments/optimization/optimization_multi.py for multi-objective optimization.
    • bash experiments/success_rate/success_rate.sh for molecule manipulation tasks.

Additional Arguments for the scripts

We used lightning(doc) and tap(doc) to parse the arguments. Following is an example command to pass in arguments configured by lightning:

python experiments/supervised/train_prop_predictor.py \
    -e 50 \
    --model.optimizer sgd \
    --data.n 11000 \
    --data.batch_size 100 \
    --data.binding_affinity true \
    --data.prop 1err

Download Data & Model Checkpoints

We extracted 4,253,577 molecules from the three commonly used datasets for drug discovery including MOSES, ZINC250K(download), and ChEMBL.

Notes on Compiling AutoDock-GPU

The conda version of AutoDock-GPU is not compatible with RTX 3080 & 3090. So don't use environment.yml to install AutoDock-GPU. Make sure to follow this issue to compile the source code. A good reference for the SM code can be found here.

Some commands might be useful:

export GPU_INCLUDE_PATH=/usr/local/cuda/include
export GPU_LIBRARY_PATH=/usr/local/cuda/lib64

make DEVICE=CUDA NUMWI=128 TARGETS=86

To test if the compilation is successful, run the following command:

obabel -:"CCN(CCCCl)OC1=CC2=C(Cl)C1C3=C2CCCO3" -O demo.pdbqt -p 7.4 --partialcharge gasteiger --gen3d
autodock_gpu_128wi -M data/raw/1err/1err.maps.fld -L demo.pdbqt -s 0 -N demo

Cite Us

@misc{wei2024navigating,
      title={Navigating Chemical Space with Latent Flows}, 
      author={Guanghao Wei and Yining Huang and Chenru Duan and Yue Song and Yuanqi Du},
      year={2024},
      eprint={2405.03987},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}