Confusing notation for training and test data as well as for target variable
napsternxg opened this issue · 1 comments
I have a minor comment about the mathematical notation in the post.
Throughout the post you have used X to mean test points and Y as training points. E.g. in start of section Posterior Distribution you introduce:
First, we form the joint distribution P_{X,Y} between the test points X and the training points Y.
This was a bit confusing to met at first as in the general ML literature y is usually the target variable. However, here Y is the test data, but I couldn't understand how to go from the observation of independent variables of the test data to the target (unobserved) variable for the test data . I like the notation in one of the notebooks to be easier to read where they say that the distribution is actually of the vector P_{f_a, f_b} which is easier to understand. It also makes it easier to see how we can use it for prediction for unknown values of observed independent variables x.
Also the code in this blog also follows the prediction view with y referring to the target variable.
I have tried to understand GPs before and everytime it has been hard because of the confusing notation and the generalization to infinite dimensional Gaussian which are used by conditioning on the observed data. I think the distill.pub article can really help in breaking the notation issue in explaining GPs.
Finally, this image has been the most descriptive explanation of GPs for me. Source
Thank you very much for your comment! Thinking about it, I have to agree that this can be confusing. On the other hand this is somewhat of a standard notation for Gaussian processes. If you look at the (very nice) figure that you provided, they actually use the same notation—if I'm not mistaken. I will leave this issue open for now, as it might help others. And maybe we can even adjust the notation in the future to make things clearer.