RBF Functions for Surrogate Optimization

An Implementation of Radial Basis Functions for means of approximating a function, with applications within surrogate based optimization

Radial Basis Function Explanation

Radial Basis Functions (RBFs) are used to interpolate data, approximating an original or unknown function. Their applications within surrogate modelling are clear, data is sampled within the function space of an expensive function and an RBF is created with a much 'cheaper' evaluation cost. When functions are highly non-linear it is beneficial that they be optimized using stochastic methods, often requiring lots of function evaluations. This compuationally cheaper interpolation of the original function allows for much faster and more efficient optimization.

Our end goal is to produce a function that exactly predicts the correct result at a datapoint, and interpolates areas between datapoints with good accuracy. This function will be in the form:

$f(\textbf x)= \sum^p_{i=1}w_i\phi(||\textbf x-x^{data}_i||)$

Bit complicated at first glance but all will be explained.

We start with a set of sample data that we wish to interpolate along with each point's respective function evaluation as follows:

$x_{data}=\begin{bmatrix} x^1_1& x^1_2 & x_3^1 &... &x^1_n\\ x^2_1& x^2_2 & x_3^2 &...& x^2_n\\ x^3_1&x_2^3 &x^3_3 & ...&x^3_n\\ \vdots& \vdots & \vdots & \ddots\\ x^p_1&x^p_2&x^p_3 \end{bmatrix}$ $y_{data}= \begin{bmatrix} y^1\\ y^2\\ y^3\\ \vdots\\ y^p\\ \end{bmatrix}$

Where p is the number of data points and n is the number of input dimensions.

With this data we then create the interpolation matrix, defined as the following:

$\Psi=\begin{bmatrix} \phi(||x^1-x^1||)&\phi(||x^1-x^2||) & \phi(||x^1-x^3||) &...&\phi(||x^1-x^p||) \\ \phi(||x^2-x^1||)&\phi(||x^2-x^2||) & \phi(||x^2-x^3||) &...&\phi(||x^2-x^p||)\\ \phi(||x^3-x^1||)&\phi(||x^3-x^2||) & \phi(||x^3-x^3||) &...&\phi(||x^3-x^p||) \\ \vdots&\vdots &\vdots &\ddots&\vdots\\ \phi(||x^p-x^1||)&\phi(||x^p-x^2||) & \phi(||x^p-x^3||) &...&\phi(||x^p-x^p||) \end{bmatrix}$

Where the basis function $\phi(||x_i-x_j||)$ takes the input coordinates of two sample points, computes the euclidian norm of the two points (distance between them), then takes this distance (now denoted as r) and calculates a gaussian value associated with this distance.

$\phi(r)=e^{-\epsilon r^2}$

The standard gaussian function is as follows:

$\phi(r)=\frac{1}{\sqrt{2\pi \sigma}}e^{\frac{-(r-\mu)^2}{2\sigma ^2}}$

Indicating that the parameter $\epsilon$ is proportional to the inverse of the standard deviation squared, thus we can think of $\epsilon$ as being a sort of length scale. The higher the value, the lower the standard deviation of our gaussian function, and the smaller our length scale is. Here is a representation of how changing $\epsilon$ effects our gaussian kernel.

It should be noted that the interpolation matrix above is symmetrical, as $\phi(||x_i-x_j||)=\phi(||x_j-x_i||)$ . Therefore only the upper triangular section needs to be calculated, and the tranpose can be added to form the complete matrix.

We now have a matrix containing data relating to how 'correlated' each datapoint is to another datapoint within function space. This is where $y_{data}$ comes into play. By solving the equation $\textbf w\Psi=y_{data}$ for $\textbf w$ , we are obtaining p weights that scale the interpolation matrix to fit the test data exactly. This provides some explanation to the original function

$f(\textbf x)= \sum^p_{i=1}w_i\phi(||\textbf x-x^{data}_i||)$

Evaluating a coordinate can be interpreted as summing all of the scaled gaussian functions that have been fit around the test data, providing an interpolation. These weights along with our test data are what defines our model. It is this embedding of test data within the actual model that allows RBFs to work so efficiently when approximating processes without noise such as expensive computer process simulations, or compuational fluid dynamic systems.

Choice of $\epsilon$

As a property of radial basis functions is that they are exactly accurate at each data point, the loss cannot be calculated without reserving some valuable data or sampling the function space again. Therefore the condition number of the matrix $\Psi$ is used as a measurement of how well the data is fit. With the condition number of a matrix relating to it's stability. In this implementation $\Psi$ is calculated multiple times and the conditional number found over a logarithmic scale of $\epsilon$ . Typically resulting in a graph such as the following:

An $\epsilon$ value for the gaussian basis function can be approximately chosen as 'on the edge of ill-conditioning' or roughly where a condition number equals 10^12. Note that if you're thinking that this is a terrible way of choosing a hyperparameter then it absolutely is, it's mainly just as a tool to show what's giong on. There are more advanced methods that I will talk about later as well as in another repository (Kriging Interpolation).

Effect of $\epsilon$ on interpolation

The following 1D and 2D interpolations provide an intuative look at how changing $\epsilon$ effects the approximate function. Keeping in mind the higher the value of $\epsilon$ , the smaller the standard deviation of the gaussian basis function. Note that at low values of $\epsilon$ instabilities occur due to the interpolation matrix being ill conditioned, producing 'noise'.

Example

The following is a demonstration of how the Rosenbrock function can be interpolated. Starting out with the original function and sampling 30 times, then choosing an appropriate value for $\epsilon$ and finally plotting the final RBF approximation.

Sources

Alexander I. J. Forrester, Andras Sobester and Andy J. Keane- Engineering Design via Surrogate Modelling (John Wiley & Sons, 2008)

trsav/rbf-functions-for-surrogate-optimisation