Article Structure / Explanation Sequence

Question

Article Structure / Explanation Sequence

rafaelspring opened this issue 5 years ago · 0 comments

First of all, thanks a lot for putting in the time and the work to publish this on Distill. I've been looking for an intuitive article on GPs for a while and was all the more delighted to find this.

However IMO the article structure is a bit backwards.

We will first explore the mathematical foundation that Gaussian procsses are built on — we invite you to follow along using the interactive figures and hands-on examples. They help to explain the impact of individual components, and show the flexibility of Gaussian processes. After following this article we hope that you will have a visual intuition on how Gaussian processes work and how you can configure them for different types of data.

This is a classical bottom-up approach, but I think it's not a good fit for the article. One can safely assume that someone looking to apply GP has at least some background in statistics and knows about multivariate Gaussian distributions.

For example I myself have an interest in using GP for estimation of continuous state variables. I think my background would be suitable to understand GPs and how to use them with the right approach, yet I struggle to follow the explanations and grasp the basic ideas in the article, not even after reading half the article.

I would strongly prefer a top-down approach, where the goals (i.e. what problems do GPs solve) are presented first, alongside with how other competing methods might fail to solve these problems. Secondly, the key ideas behind GP should be presented in a summary style, with as little math as possible, explaining, in plain English, what tricks and ideas GPs apply to solve the problems outlined before. Think startup pitch, not math proof :)

With that framework being set the ideas behind GP can be explored in ever more detail, but keeping a top-down approach.

we are interested in predicting the function values at concrete points, which we call test points X. So how do we derive this functional view from the multivariate normal distributions that we have considered so far? Stochastic processes, such as Gaussian processes, are essentially a set of random variables. In addition, each of these random variables has a corresponding index i. We will use this index to refer to the i-th dimension of our n-dimensional multivariate distributions. Now, the goal of Gaussian processes is to learn this underlying distribution from training data.

This is basically where you left me concretely. What does "learning a distribution" mean? I don't think the goal is to determine the parameters of a Gaussian distribution from some sample data, is it? That's what it sounds like though. Again if the article was top-down I'd probably have the right context to interpret your explanations (here and in the rest of the article) and would know where you're going. Not having that I'm just kind of lost.