Clarification on gradient calculation process
acruis opened this issue · 8 comments
Hi, first off, thanks a lot of putting this library online! I'm trying to learn GPs, and this library's code has helped me a lot in understanding how to work with GPs.
I came across this piece of code in master
:
https://github.com/dfm/george/blob/master/george/kernels.py#L137
In particular, I'm considering why there is a multiplication on this line:
return g * self.vector_gradient[None, None, :]
I found what looks like the corresponding line in 1.0-dev
, which is here:
https://github.com/dfm/george/blob/1.0-dev/templates/kernels.py#L195
return g[:, :, self.unfrozen]
The calculation of g
itself doesn't seem to have changed much between the two versions, so I would like to know if the version on master
is correct behaviour, because when I run this code on master
:
kern = ConstantKernel(3.0)
grad = kern.gradient([ [1], [5], [7] ]) # arbitrarily chosen
print grad
I receive the matrix:
[ [[ 3.] [ 3.] [ 3.]]
[[ 3.] [ 3.] [ 3.]]
[[ 3.] [ 3.] [ 3.]] ]
Shouldn't this matrix's entries all be 1.0
, regardless of the actual value of the constant? When reading the docs, I noticed that the gradient is supposed to be taken for kern.vector
, which is the natural logarithm in this case - is this related to the multiplication operation, or the gradient matrix produced above?
Hope this was clear enough, feel free to clarify if you're unclear about anything!
After giving it some more thought, my guess is that the multiplication is for taking the derivative for the natural logarithm of the parameters, by dK / d(ln l) = [dK / dl] * [dl / d(ln l)]
.
However, if this is true, I'm still curious where the multiplication happens in the 1.0-dev
version. I've been looking through the code and unfortunately couldn't find where this happens.
Hi, This difference comes from how the kernel gradients are defined in dev
. The derivatives are actually properly implemented with respect to the target parameters in the new version so this Jacobian isn't necessary anymore!
Thanks for the reply! By being properly implemented, do you mean the {{ param }}_gradient
methods in the kernels.h
template? I found this line, though I'm not sure if I'm looking at the right thing: https://github.com/dfm/george/blob/1.0-dev/templates/kernels.h#L240
Also, I would like to know, how are the new implementations of the derivatives used? Looking at the grad_lnlikelihood
function of gp.py
, I notice that it uses self.kernel.get_gradient
: https://github.com/dfm/george/blob/1.0-dev/george/gp.py#L455
Grepping the repo for get_gradient
, I found it in the kernels.py
template, but it only uses the two methods to construct the gradient matrix from either gradient_symmetric
or gradient_general
from the cython_kernel
. Is this the Jacobian that you mentioned?
In dev
, the gradients for each parameter are implemented by the kernels and they are with respect to the log parameters when they should be unlike master
. See here for an example.
Yup, I noticed that the gradients were specified a bit differently from master
. So if I'm not wrong, when calculating the gradient of the log-likelihood, the library will still make use of the gradient matrix constructed in cython_kernel
?
I see, thanks very much for the clarification!
👍