Code for paper Implications of Additivity and Non-additivity for Machine Learning and Deep Learning Models in Drug Design
This repository contains code to run hyper-parameter optimization for RF, SVR, XGBoost, and PLS algorithms. The data is not included in this repository.
- root - Python and shell scripts for running hyper-parameter optimization.
notebooks
- Jupyter Notebooks for splitting data and computing test scores.data-initial
- Initial data, not included in this repo.data
- Main data: random split of initial data into train and test data, not included in this repo.downsampled-10-percent
- Down-sampled 10% of main data, not included in this repo.optuna-storage
- Auxiliary storage foroptuna
library to track hyper-parameter optimization progress, not included in this repo.best-models
- Models with best hyper-parameters, not included in this repo.pred_values
- Predicted vs expected values for models with best hyper-parameters.fill-gaps-configs
- build configurations for best found hyper-parameters for "filling gaps" (see paper).
- First, split initial data into training and test datasets using Jupyter Notebook.
- Then run all 32 optimization jobs using script
submit_all_to_slurm_on_full_data.sh
. - If any of the jobs fails:
- Prepare down-sampled data using Jupyter Notebook.
- Re-submit failed optimization jobs using down-sampled data.
- Prepare "fill-gaps" build configurations for the best-found hyper-parameters using Jupyter Notebook.
- Submit "fill-gaps" build jobs.
- Then prepare summary table using Jupyter Notebook.
This code uses QPTUNA to set up hyper-parameter optimization.
Optimization jobs are started using SLURM, but they can be started without SLURM
too.
Apache 2.0.
- Christian Margreitter @cmargreitter
- Alexey Voronov @alexvoronov