M-Estimator weighting is sub-optimal.
Opened this issue · 0 comments
We should use this method (from the ceres mailing list).
https://groups.google.com/forum/#!msg/ceres-solver/CbpQ32dyvvk/arKGcbjOmqwJ
Gim,
Here is a not entirely rigorous derivation, but it should get the point across.
Let us start with the objective function
f = 1/2 rho(z'z)
Differentiating it once and twice we get the gradient and hessian as
g = rho'J'z
H = 2rho'' J'zz'J + rho' JJ' + O(second order)
dropping second order terms we get
g = rho' J' z
H = J'(rho' + 2 rho'' zz')J
Now our aim is to come up with a new expression for J s.t. it is as
close to H as possible. For reasonable functions rho, it will be the
case that rho' is positive and rho'' is negative.
Now, let us assume that we are scaling J by a matrix A, so compare the
expression for the "scaled" Gauss-Newton hessian
J'A'AJ
with
J'(rho' + 2 rho'' zz')J x
Now, assuming that the matrix sandwiched between J'J is not indefinite
we can equat
rho' + 2 rho'' zz' = A'A
if we divide everything by rho' (and assuming its positivity) things
become a little clearer
I + 2 rho''/rho' zz' = 1/sqrt(rho') A' 1/sqrt(rho') A
since for most reasonable values of robust loss functions, rho'' is
negative, we are subtracting a rank one matrix from identity. We
already know the direction along which the subtraction is taking place
it is z, so we can ask
if A = I - alpha hat{z} hat{z}'
for some alpha, what would its value be? Here hat{z} is the unit vector
along z.
A'A = I - 2 alpha hat{z} hat{z}' + alpha^2 hat{z} hat{z}'
= I + (alpha^2 - 2 alpha) hat{z}hat{z}'
comparing coefficients
alpha^2 - 2alpha = 2 rho''/rho . |z|^2
which is the required equation. And yes this performs much better than just re-weighting by a scalar.
Sameer