M-Estimator weighting is sub-optimal.

Question

M-Estimator weighting is sub-optimal.

Opened this issue 10 years ago · 0 comments

furgalep commented 10 years ago

We should use this method (from the ceres mailing list).

https://groups.google.com/forum/#!msg/ceres-solver/CbpQ32dyvvk/arKGcbjOmqwJ

Gim,

Here is a not entirely rigorous derivation, but it should get the point across.

Let us start with the objective function

f = 1/2 rho(z'z)

Differentiating it once and twice we get the gradient and hessian as

g = rho'J'z
H = 2rho'' J'zz'J + rho' JJ' + O(second order)

dropping second order terms we get

g = rho' J' z
H = J'(rho' + 2 rho'' zz')J

Now our aim is to come up with a new expression for J s.t. it is as
close to H as possible. For reasonable functions rho, it will be the
case that rho' is positive and rho'' is negative.

Now, let us assume that we are scaling J by a matrix A, so compare the
expression for the "scaled" Gauss-Newton hessian

J'A'AJ

with

J'(rho' + 2 rho'' zz')J x

Now, assuming that the matrix sandwiched between J'J is not indefinite
we can equat

rho' + 2 rho'' zz' = A'A

if we divide everything by rho' (and assuming its positivity) things
become a little clearer

I + 2 rho''/rho' zz' = 1/sqrt(rho') A' 1/sqrt(rho') A

since for most reasonable values of robust loss functions, rho'' is
negative, we are subtracting a rank one matrix from identity. We
already know the direction along which the subtraction is taking place
it is z, so we can ask

if A = I - alpha hat{z} hat{z}'

for some alpha, what would its value be? Here hat{z} is the unit vector
along z.

A'A = I - 2 alpha hat{z} hat{z}' + alpha^2 hat{z} hat{z}'
= I + (alpha^2 - 2 alpha) hat{z}hat{z}'

comparing coefficients

alpha^2 - 2alpha = 2 rho''/rho . |z|^2

which is the required equation. And yes this performs much better than just re-weighting by a scalar.

Sameer