Clarification on gradient calculation process

Question

Clarification on gradient calculation process

acruis opened this issue 8 years ago · 8 comments

Hi, first off, thanks a lot of putting this library online! I'm trying to learn GPs, and this library's code has helped me a lot in understanding how to work with GPs.

I came across this piece of code in master:
https://github.com/dfm/george/blob/master/george/kernels.py#L137

In particular, I'm considering why there is a multiplication on this line:

return g * self.vector_gradient[None, None, :]

I found what looks like the corresponding line in 1.0-dev, which is here:
https://github.com/dfm/george/blob/1.0-dev/templates/kernels.py#L195

return g[:, :, self.unfrozen]

The calculation of g itself doesn't seem to have changed much between the two versions, so I would like to know if the version on master is correct behaviour, because when I run this code on master:

kern = ConstantKernel(3.0)
grad = kern.gradient([ [1], [5], [7] ]) # arbitrarily chosen
print grad

I receive the matrix:

[ [[ 3.] [ 3.] [ 3.]]
  [[ 3.] [ 3.] [ 3.]]
  [[ 3.] [ 3.] [ 3.]] ]

Shouldn't this matrix's entries all be 1.0, regardless of the actual value of the constant? When reading the docs, I noticed that the gradient is supposed to be taken for kern.vector, which is the natural logarithm in this case - is this related to the multiplication operation, or the gradient matrix produced above?

Hope this was clear enough, feel free to clarify if you're unclear about anything!

dfm commented 8 years ago

👍

Answer 1 · 2017-06-10T18:38:25.000Z

After giving it some more thought, my guess is that the multiplication is for taking the derivative for the natural logarithm of the parameters, by dK / d(ln l) = [dK / dl] * [dl / d(ln l)].

However, if this is true, I'm still curious where the multiplication happens in the 1.0-dev version. I've been looking through the code and unfortunately couldn't find where this happens.

Answer 2 · 2017-06-10T18:43:10.000Z

Hi, This difference comes from how the kernel gradients are defined in dev. The derivatives are actually properly implemented with respect to the target parameters in the new version so this Jacobian isn't necessary anymore!

Answer 3 · 2017-06-10T19:05:06.000Z

Thanks for the reply! By being properly implemented, do you mean the {{ param }}_gradient methods in the kernels.h template? I found this line, though I'm not sure if I'm looking at the right thing: https://github.com/dfm/george/blob/1.0-dev/templates/kernels.h#L240

Also, I would like to know, how are the new implementations of the derivatives used? Looking at the grad_lnlikelihood function of gp.py, I notice that it uses self.kernel.get_gradient: https://github.com/dfm/george/blob/1.0-dev/george/gp.py#L455

Grepping the repo for get_gradient, I found it in the kernels.py template, but it only uses the two methods to construct the gradient matrix from either gradient_symmetric or gradient_general from the cython_kernel. Is this the Jacobian that you mentioned?

Answer 4 · 2017-06-10T19:08:38.000Z

In dev, the gradients for each parameter are implemented by the kernels and they are with respect to the log parameters when they should be unlike master. See here for an example.

Answer 5 · 2017-06-11T02:25:01.000Z

Yup, I noticed that the gradients were specified a bit differently from master. So if I'm not wrong, when calculating the gradient of the log-likelihood, the library will still make use of the gradient matrix constructed in cython_kernel?

Answer 6 · 2017-06-11T06:17:49.000Z

Yes. But compare this (from master) to this from dev. They're both correct, but the dev version is more flexible and general.

Answer 7 · 2017-06-19T17:10:44.000Z

I see, thanks very much for the clarification!