This repository implements various Bayesian algorithm execution (BAX) acquisition strategies for precise, targeted chemical and materials discovery. It enables a user to quickly isolate portions of a design space that meet highly customized goals. Scientific applications include finding the set of all compounds which have measured properties that fall within a band of values (level band) or which fall in the top k percentile of a dataset (percentile band), identifying synthesis conditions that produce monodisperse nanoparticles with multiple precisely specified particle sizes, and finding chemically diverse sets of ligands that are strong, non-toxic binders. This framework applies to multi-property measurements and goes beyond the capabilities of multi-objective bayesian optimization.
✍️ Users state their experimental goal via a simple filtering algorithm which is able to return the correct subset of the design space if the true underlying mapping were known (A).
🦾 The Bayesian algorithm execution procedure circumvents needing to actually know the true underlying mapping and automatically creates a goal-aligned, data acquisition strategy (B-D).
- Make a new local folder and clone the repository
git clone https://github.com/src47/multibax-sklearn.git
- Create a virtual environment and install requirements
python3 -m venv .venv
source ./venv/bin/activate
pip install -r requirements.txt
notebooks/tutorials
We highly recommend reviewing the following tutorial notebooks before using BAX acquisition functions for specific experimental goals.
-
Tutorial 1: Expressing a user experimental goal as an algorithm and executing the algorithm on the true function.
-
Tutorial 2: Executing an algorithm on posterior draws from a trained surrogate Gaussian process model.
-
Tutorial 3: Defining metrics to quantify data acquisition quality: Number Obtained and Posterior Jaccard Index.
-
Tutorial 4: Using BAX strategies to find a "wishlist" of regions in a magnetic alloys dataset. This notebook contains a full pipeline of data acquisition using BAX.
src
- acquisition.py: Implementation of InfoBAX, MeanBAX, SwitchBAX and US.
- algorithms.py, helper_subspace_functions.py: User algorithms for materials and chemical discovery.
- metrics.py: Implementation of the Number Obtained and Posterior Jaccard Index metrics.
Please cite our paper: Targeted materials discovery using Bayesian algorithm execution if you find this project helpful:
@article{chitturi2024targeted,
title={Targeted materials discovery using Bayesian algorithm execution},
author={Chitturi, Sathya R and Ramdas, Akash and Wu, Yue and Rohr, Brian and Ermon, Stefano and Dionne, Jennifer and Jornada, Felipe H da and Dunne, Mike and Tassone, Christopher and Neiswanger, Willie and Ratner, Daniel},
journal={npj Computational Materials},
volume={10},
number={1},
pages={156},
year={2024},
publisher={Nature Publishing Group UK London}
}
Methodology in this repo builds on InfoBAX [1] and Multi-point BAX [2]
[1] Neiswanger, Willie, et al. "Bayesian algorithm execution: Estimating computable properties of black-box functions using mutual information." International Conference on Machine Learning. PMLR, 2021.
[2] Miskovich, Sara A., et al. "Bayesian algorithm execution for tuning particle accelerator emittance with partial measurements." arXiv preprint arXiv:2209.04587 (2022).
**Please direct any questions or comments to chitturi@stanford.edu, akashr@stanford.edu or willie.neiswanger@gmail.com.