/multibax-sklearn

Design space subset estimation using Bayesian algorithm execution with sklearn GP models

Primary LanguageJupyter NotebookBSD 2-Clause "Simplified" LicenseBSD-2-Clause

DOI

Multi-property materials subset estimation using Bayesian algorithm execution

This repository implements various Bayesian algorithm execution (BAX) acquisition strategies for precise, targeted chemical and materials discovery. It enables a user to quickly isolate portions of a design space that meet highly customized goals. Scientific applications include finding the set of all compounds which have measured properties that fall within a band of values (level band) or which fall in the top k percentile of a dataset (percentile band), identifying synthesis conditions that produce monodisperse nanoparticles with multiple precisely specified particle sizes, and finding chemically diverse sets of ligands that are strong, non-toxic binders. This framework applies to multi-property measurements and goes beyond the capabilities of multi-objective bayesian optimization.

Description

✍️ Users state their experimental goal via a simple filtering algorithm which is able to return the correct subset of the design space if the true underlying mapping were known (A).

Screen Shot 2023-12-01 at 6 57 37 AM

🦾 The Bayesian algorithm execution procedure circumvents needing to actually know the true underlying mapping and automatically creates a goal-aligned, data acquisition strategy (B-D).

Screen Shot 2023-12-01 at 6 57 55 AM

Installation

  1. Make a new local folder and clone the repository
git clone https://github.com/src47/multibax-sklearn.git
  1. Create a virtual environment and install requirements
python3 -m venv .venv
source ./venv/bin/activate
pip install -r requirements.txt

Directories

notebooks/tutorials

We highly recommend reviewing the following tutorial notebooks before using BAX acquisition functions for specific experimental goals.

  1. Open In Colab Tutorial 1: Expressing a user experimental goal as an algorithm and executing the algorithm on the true function.

  2. Open In Colab Tutorial 2: Executing an algorithm on posterior draws from a trained surrogate Gaussian process model.

  3. Open In Colab Tutorial 3: Defining metrics to quantify data acquisition quality: Number Obtained and Posterior Jaccard Index.

  4. Open In Colab Tutorial 4: Using BAX strategies to find a "wishlist" of regions in a magnetic alloys dataset. This notebook contains a full pipeline of data acquisition using BAX.

src

  • acquisition.py: Implementation of InfoBAX, MeanBAX, SwitchBAX and US.
  • algorithms.py, helper_subspace_functions.py: User algorithms for materials and chemical discovery.
  • metrics.py: Implementation of the Number Obtained and Posterior Jaccard Index metrics.

Citation

Please cite our paper: Targeted materials discovery using Bayesian algorithm execution if you find this project helpful:

@article{chitturi2024targeted,
  title={Targeted materials discovery using Bayesian algorithm execution},
  author={Chitturi, Sathya R and Ramdas, Akash and Wu, Yue and Rohr, Brian and Ermon, Stefano and Dionne, Jennifer and Jornada, Felipe H da and Dunne, Mike and Tassone, Christopher and Neiswanger, Willie and Ratner, Daniel},
  journal={npj Computational Materials},
  volume={10},
  number={1},
  pages={156},
  year={2024},
  publisher={Nature Publishing Group UK London}
}

References

Methodology in this repo builds on InfoBAX [1] and Multi-point BAX [2]

[1] Neiswanger, Willie, et al. "Bayesian algorithm execution: Estimating computable properties of black-box functions using mutual information." International Conference on Machine Learning. PMLR, 2021.

[2] Miskovich, Sara A., et al. "Bayesian algorithm execution for tuning particle accelerator emittance with partial measurements." arXiv preprint arXiv:2209.04587 (2022).

**Please direct any questions or comments to chitturi@stanford.edu, akashr@stanford.edu or willie.neiswanger@gmail.com.