/MMP_project

Code for paper

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Code for paper Implications of Additivity and Non-additivity for Machine Learning and Deep Learning Models in Drug Design

This repository contains code to run hyper-parameter optimization for RF, SVR, XGBoost, and PLS algorithms. The data is not included in this repository.

Directories

  • root - Python and shell scripts for running hyper-parameter optimization.
  • notebooks - Jupyter Notebooks for splitting data and computing test scores.
  • data-initial - Initial data, not included in this repo.
  • data - Main data: random split of initial data into train and test data, not included in this repo.
  • downsampled-10-percent - Down-sampled 10% of main data, not included in this repo.
  • optuna-storage - Auxiliary storage for optuna library to track hyper-parameter optimization progress, not included in this repo.
  • best-models - Models with best hyper-parameters, not included in this repo.
  • pred_values - Predicted vs expected values for models with best hyper-parameters.
  • fill-gaps-configs - build configurations for best found hyper-parameters for "filling gaps" (see paper).

Workflow

  1. First, split initial data into training and test datasets using Jupyter Notebook.
  2. Then run all 32 optimization jobs using script submit_all_to_slurm_on_full_data.sh.
  3. If any of the jobs fails:
    • Prepare down-sampled data using Jupyter Notebook.
    • Re-submit failed optimization jobs using down-sampled data.
    • Prepare "fill-gaps" build configurations for the best-found hyper-parameters using Jupyter Notebook.
    • Submit "fill-gaps" build jobs.
  4. Then prepare summary table using Jupyter Notebook.

Dependencies

This code uses QPTUNA to set up hyper-parameter optimization.

Optimization jobs are started using SLURM, but they can be started without SLURM too.

License

Apache 2.0.

Contributors