For this assignment you will be implementing a number of functions which will be used to fit a non-linear function to 1-dimensional data using Gaussian process regression. We do not assume that you are previously familiar with this method and will provide all necessary details to implement the models and related algorithms - some familiarity with the basic distribution theory for multivariate normal distributions will be helpful.
For this task your goal is to write a generic function which will be able to fit a Gaussian process regression model to each of provided data sets via maximum likelihood estimation.
For a Gaussian process regression we will assume a model with the following form,
which implies that our observed values of are derived from a multivariate normal distribution with a specific mean and covariance structure. For the sake of simplicity, we will assume that will always be 0 (a vector of zeros in this case) for these models. The covariance () will be given by a Gaussian / squared exponential kernel that is a function of the values, specifically their distances from one another.
Explicitly, the elements of the covariance matrix are constructed using the following formula,
where , and are the models parameters (what we will be estimating via MLE from the data).
- - this is the nugget variance parameter and represents irreducible measurement error. Note: is an indicator function which is 1 when and 0 otherwise, meaning the nugget variance is only added to the diagonal of covariance matrix.
- - this is the scale variance parameter and determines the average distance away from the mean that can be taken by the function.
- - this is the length-scale parameter which determines the range of the “spatial” dependence between points. Larger values of result in less wiggly functions (greater spatial dependence) and smaller values result in more wiggly functions (lesser spatial dependence) - values are relative to the scale of .
In order to fit the model the goal is to determine the optimal values of these three parameters given the data. We will be accomplishing this via maximum likelihood. Given our multivariate normal model we can then take the MVN density,
we can derive the log likelihood as
The goal therefore is to find,
Once the model parameters have been obtained the goal will be to predict (i.e. draw samples from) our Gaussian process model for new values of . Specifically, we want to provide an fine, equally space grid of values from which we will predict the value of the function. Multiple independent predictions (draws) can then be average to get an overall estimate of the underlying smooth function for each data set.
Therefore the goal is to find the conditional predictive distribution of given , , , and . Given everything is a multivariate normal distribution, this conditional distribution is
where
In these formulae, is the covariance matrix constructed from the prediction locations and is the cross covariance matrix constructed from the prediction locations and the data locations. Note that . As mentioned in the preceding task - we will assume that and are 0.
For this task you will need to take the result from the predict()
function and generate a plot showing the mean predicted y
(across
prediction samples) as well as a shaded region showing a 95% confidence
interval (empirically determined from the prediction samples).
Optionally, the user should be able to provide the original data set d
which would then be overlayed as a scatter plot.
- Write a function named
plot_gp()
which takes the following arguments:pred
- data frame of predictions from thepredict()
functiond
- eitherNone
or a data frame with data observations
- Your function does not need to return anything, but should display a
pyplot figure with all of the above features.
- Feel free to hard code things like
figsize
so that the resulting figures look reasonable in your notebook.
- Feel free to hard code things like
- Include a brief write up describing your function and implementation approach.