JuliaManifolds/Manopt.jl

Robust loss functions for outlier rejection

Opened this issue · 8 comments

Affie commented

Is it possible to use loss functions such as Huber, Tukey, and adaptive in Manopt.jl (specifically RLM)?
Similar to ceres solver: http://ceres-solver.org/nnls_modeling.html#lossfunction

Not at the moment but it looks like a relatively simple modification of RLM to support it. The rescaling described in http://ceres-solver.org/nnls_modeling.html#theory looks like it would work for RLM too.

@kellertuer do you think it would be better to add robustification support to the current implementation of RLM or make a separate implementation?

In the Exact Penalty Method (EPM) that is included in the method itself, so I think that would be fitting here as well. We even have types already for different types of relaxations (e.g. Huber).

If we extend those and use them in RLM as well, I think that would be great.

edit: to provide a link, we currently have https://manoptjl.org/stable/solvers/exact_penalty_method/#Manopt.SmoothingTechnique – those could either be extended or combined into a common framework of robustifcation / smoothing.

Cool 👍 . RLM seems to need a somewhat different interface though.

Sure, no problem – maybe we can also revise the interface with EPM a bit to have a common one then.

I revisited this and for the most general case (outside of RLM), I have no real idea how that could be done or what that would even mean for an algorithm like Douglas-Rachford for example.

Within RLM, one could really just store the ρ function mentioned in the docs linked above and and apply that element wise.
It would probably be best to have that as a HessianObjective (though a bit verbose for a 1D-function) to store its derivative and second derivative as well, since the Jacobian is affected by this choice though a chain rule as well.

Ah I am not so sure we need the second derivative for now? We only use first order information in RLM for now I think. Then besides that field only the get_cost and get_jacobian! for the NonlinearLeastSquaresObjective have to be adapted.
And probably a GradientObjective (a function and its derivative in the 1D case) I enough flexibility.

I revisited this and for the most general case (outside of RLM), I have no real idea how that could be done or what that would even mean for an algorithm like Douglas-Rachford for example.

I don't think it can work for any loss function, only those without splitting, nonlinear least squares and most likely stochastic optimization, though I couldn't find any papers about what would be the Euclidean variant. Maybe let's tackle each case separately?

Within RLM, one could really just store the ρ function mentioned in the docs linked above and and apply that element wise.
It would probably be best to have that as a HessianObjective (though a bit verbose for a 1D-function) to store its derivative and second derivative as well, since the Jacobian is affected by this choice though a chain rule as well.

Ah I am not so sure we need the second derivative for now? We only use first order information in RLM for now I think. Then besides that field only the get_cost and get_jacobian! for the NonlinearLeastSquaresObjective have to be adapted.
And probably a GradientObjective (a function and its derivative in the 1D case) I enough flexibility.

As far as I can tell, ρ is not applied elementwise to the output of f but only after the squared norm is computed. Also, maybe it would be most natural to store ρ in NonlinearLeastSquaresObjective.

In an earlier post you had the idea of combining robust loss functions with SmoothingTechnique, maybe we can do that instead of representing it as a HessianObjective?

You are right in full generality that might not be possible, so let's just do it for RLM for now.

Yes, storing \rho in the NLSObj would be my idea for now as well, but we have to store its derivative as well, so that is why I was thinking about storing it as a gradient objective to stay generic.

Yes the smoothing techniques are basically the same, though there we handle that with storing a symbol (and only support 2 functions). I am not yet sure how well this can be combined, but it would most probably be the ELM being adapted to the mode here (since their symbol-approach is far more restrictive than storing the smoothing function).