/Bayesian-Optimization-on-Sets

Traditionally Bayesian optimization is performed on a continuous domain. However in some cases, the function that we are trying approximate takes set-valued inputs and the domain is a set of sets. This repository is an implementation of the paper "Practical B.O. over Sets", shown on a standard test non-convex function.

Primary LanguagePython

Bayesian-Optimization-on-Sets

Dependencies and instructions

  1. scikit-learn
  2. matplotlib
  3. numpy
  4. scipy
python Bayes_opt.py

Introduction and results

Bayesian optimization is used to approximate black-box functions that are expensive to evaluate. It has proven useful in several applications, including hyperparameter optimization neural architecture search and material design. Classic BO assumes a search region equation and a scalar black-box function f evaluated in the presence of additive noise.

Unlike this standard BO formulation, we assume that our search region is equation for a fixed positive integer m. Thus, for equation, f would take in a "set" containing m elements, all of length d, and return a noisy function value equation

We specifically implement equation 4 in the paper Practical B.O. over Sets which is basically a Set Kernel. To illustrate the algorithm, we take a standard non-convex test function, Branin function, with multiple optima. Furthermore, we discretize the domain into coordinate tuples (x,y). We treat each of these tuples as a set a with cardinality 1 and dimesnion 2. Our entire domain is now a set of sets. Gaussian Process (GP) is used as the surrogate model and Expected improvement acquisition function is used. We make use of our custom SetKernel as the kernel for GP.

In the following GIFs we illustrate the performance of the B.O over sets algorithm using contour plots and surface plots. The B.O is run for 100 iterations and the plots in the GIFs are taken every 5 iterations. The orange triangles represent the global optima of the function. The red dots represent the last 5 points suggested by the B.O. The green X represents the last point suggested by B.O. We can notice that as the iterations proceed, the points being suggested are more closer to the global optima of the function which shows expected behaviour. From the mesh plot we can visualize how the function is getting learned over the course of the iterations.

alt text alt text