Unify passing of optional parts of the objective in high-level solver interfaces

Question

Unify passing of optional parts of the objective in high-level solver interfaces

Closed this issue 4 months ago · 14 comments

As a continuation of #379, in order to reduce the ambiguities, we should unify how optional elements of the objective are passed.

For example TCG has 7 signatures – the Hessian, the base point p and the start point X are optional, but positional rguemnts, and any of these can be left out.

On the other hand the difference of convex algorithm there are two signatures

difference_of_convex_algorithm(M, f, g, ∂h, p=rand(M); kwargs...)
difference_of_convex_algorithm(M, mdco, p; kwargs...)

this can be enhanced by for example passing the gradient of g (to get the subproblem automatically generated) or the gradient of f (to get better debug), but all these are keyword arguments.
This should become the default, to reduce possibilities of ambiguity.

Second, there is often an ambiguity for the case where the points are not arrays but numbers and we provide a wrapper; this happens, when just the cost (or in general only one element like a gradient) is necessary for the objective (like in particle swarm).

And idea to resolve this is more like an internal redesign – we could introduce a keyword argument iterate_type=:Number that is set to number if the iterate is number and dispatch on the value internally. That way the dispatch does not happen on the high-level interface. This would even allow users with “non-Number-eltypes that are not arrays” to benefit from the idea of “wrapping into mutable arrays” automatically.
Hopefully this second idea can be done non-breaking; the first is breaking for sure.

Edit

To be precise the ambiguity errors are

Ambiguity #1
kwcall(::NamedTuple, ::typeof(Manopt.difference_of_convex_proximal_point), M::ManifoldsBase.AbstractManifold, grad_h, p::Number) @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/difference-of-convex-proximal-point.jl:239
kwcall(::NamedTuple, ::typeof(Manopt.difference_of_convex_proximal_point), M::ManifoldsBase.AbstractManifold, mdcpo::O, p) where O<:Union{AbstractDecoratedManifoldObjective, ManifoldDifferenceOfConvexProximalObjective} @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/difference-of-convex-proximal-point.jl:273

Possible fix, define
  kwcall(::NamedTuple, ::typeof(Manopt.difference_of_convex_proximal_point), ::ManifoldsBase.AbstractManifold, ::O, ::Number) where O<:Union{Manopt.AbstractDecoratedManifoldObjective, Manopt.ManifoldDifferenceOfConvexProximalObjective}

Problem: Ob the objective level we basically can not handle a Number p, on the other hand we also do not want to type grad_h.

Ambiguity #2
difference_of_convex_proximal_point(M::ManifoldsBase.AbstractManifold, grad_h, p::Number; evaluation, cost, gradient, g, grad_g, prox_g, kwargs...) @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/difference-of-convex-proximal-point.jl:239
difference_of_convex_proximal_point(M::ManifoldsBase.AbstractManifold, mdcpo::O, p; kwargs...) where O<:Union{AbstractDecoratedManifoldObjective, ManifoldDifferenceOfConvexProximalObjective} @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/difference-of-convex-proximal-point.jl:273

Possible fix, define
  difference_of_convex_proximal_point(::ManifoldsBase.AbstractManifold, ::O, ::Number) where O<:Union{Manopt.AbstractDecoratedManifoldObjective, Manopt.ManifoldDifferenceOfConvexProximalObjective}

(the same as ambiguity 1, just for DCPPA instead of DCA)

Ambiguity #3
kwcall(::NamedTuple, ::typeof(Manopt.particle_swarm), M::ManifoldsBase.AbstractManifold, f, swarm::AbstractVector{T}) where T<:Number @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/particle_swarm.jl:249
kwcall(::NamedTuple, ::typeof(Manopt.particle_swarm), M::ManifoldsBase.AbstractManifold, mco::O, swarm::AbstractVector) where O<:Union{AbstractDecoratedManifoldObjective, AbstractManifoldCostObjective} @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/particle_swarm.jl:264

Possible fix, define
  kwcall(::NamedTuple, ::typeof(Manopt.particle_swarm), ::ManifoldsBase.AbstractManifold, ::O, ::AbstractVector{T}) where {O<:Union{Manopt.AbstractDecoratedManifoldObjective, Manopt.AbstractManifoldCostObjective}, T<:Number}

nearly the same as 1&2, just that here it is f being untyped and/or we do not want to specialise a swarm of numbers on a already generated objective.

Ambiguity #4
particle_swarm(M::ManifoldsBase.AbstractManifold, f, swarm::AbstractVector{T}; velocity, kwargs...) where T<:Number @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/particle_swarm.jl:249
particle_swarm(M::ManifoldsBase.AbstractManifold, mco::O, swarm::AbstractVector; kwargs...) where O<:Union{AbstractDecoratedManifoldObjective, AbstractManifoldCostObjective} @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/particle_swarm.jl:264

Possible fix, define
  particle_swarm(::ManifoldsBase.AbstractManifold, ::O, ::AbstractVector{T}) where {O<:Union{Manopt.AbstractDecoratedManifoldObjective, Manopt.AbstractManifoldCostObjective}, T<:Number}

quite similar to 3

Ambiguity #5
kwcall(::NamedTuple, ::typeof(Manopt.stochastic_gradient_descent), M::ManifoldsBase.AbstractManifold, grad_f, p::Number) @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/stochastic_gradient_descent.jl:176
kwcall(::NamedTuple, ::typeof(Manopt.stochastic_gradient_descent), M::ManifoldsBase.AbstractManifold, msgo::O, p) where O<:Union{AbstractDecoratedManifoldObjective, ManifoldStochasticGradientObjective} @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/stochastic_gradient_descent.jl:202

Possible fix, define
  kwcall(::NamedTuple, ::typeof(Manopt.stochastic_gradient_descent), ::ManifoldsBase.AbstractManifold, ::O, ::Number) where O<:Union{Manopt.AbstractDecoratedManifoldObjective, Manopt.ManifoldStochasticGradientObjective}

same as 1&2 just for stochastic gradient descent

Ambiguity #6
stochastic_gradient_descent(M::ManifoldsBase.AbstractManifold, grad_f, p::Number; cost, evaluation, kwargs...) @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/stochastic_gradient_descent.jl:176
stochastic_gradient_descent(M::ManifoldsBase.AbstractManifold, msgo::O, p; kwargs...) where O<:Union{AbstractDecoratedManifoldObjective, ManifoldStochasticGradientObjective} @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/stochastic_gradient_descent.jl:202

Possible fix, define
  stochastic_gradient_descent(::ManifoldsBase.AbstractManifold, ::O, ::Number) where O<:Union{Manopt.AbstractDecoratedManifoldObjective, Manopt.ManifoldStochasticGradientObjective}

same as 5

Ambiguity #7
kwcall(::NamedTuple, ::typeof(Manopt.truncated_conjugate_gradient_descent), M::ManifoldsBase.AbstractManifold, f, grad_f, Hess_f::TH) where TH<:Function @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/truncated_conjugate_gradient_descent.jl:451
kwcall(::NamedTuple, ::typeof(Manopt.truncated_conjugate_gradient_descent), M::ManifoldsBase.AbstractManifold, mho::O, p, X) where O<:Union{AbstractDecoratedManifoldObjective{E, <:AbstractManifoldSubObjective} where E, AbstractManifoldSubObjective} @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/truncated_conjugate_gradient_descent.jl:534

Possible fix, define
  kwcall(::NamedTuple, ::typeof(Manopt.truncated_conjugate_gradient_descent), ::ManifoldsBase.AbstractManifold, ::O, ::Any, ::TH) where {O<:Union{Manopt.AbstractDecoratedManifoldObjective{E, <:Manopt.AbstractManifoldSubObjective} where E, Manopt.AbstractManifoldSubObjective}, TH<:Function}

nearly the same as above just with 2 parameters instead of 1 (f and grad_f since we also have p and X).

Ambiguity #8
truncated_conjugate_gradient_descent(M::ManifoldsBase.AbstractManifold, f, grad_f, Hess_f::TH, p, X; evaluation, preconditioner, kwargs...) where TH<:Function @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/truncated_conjugate_gradient_descent.jl:467
truncated_conjugate_gradient_descent(M::ManifoldsBase.AbstractManifold, F, gradF, x, Y, H::TH; kwargs...) where TH<:Function @ Manopt deprecated.jl:103

Possible fix, define
  truncated_conjugate_gradient_descent(::ManifoldsBase.AbstractManifold, ::Any, ::Any, ::TH, ::Any, ::TH) where {TH<:Function, TH<:Function}

similar to the above just that here the Hessian is typed, since the second is deprecated, this might be resolved in the next breaking change anyways

Ambiguity #9
truncated_conjugate_gradient_descent(M::ManifoldsBase.AbstractManifold, f, grad_f, Hess_f::TH; kwargs...) where TH<:Function @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/truncated_conjugate_gradient_descent.jl:451
truncated_conjugate_gradient_descent(M::ManifoldsBase.AbstractManifold, mho::O, p, X; kwargs...) where O<:Union{AbstractDecoratedManifoldObjective{E, <:AbstractManifoldSubObjective} where E, AbstractManifoldSubObjective} @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/truncated_conjugate_gradient_descent.jl:534

Possible fix, define
  truncated_conjugate_gradient_descent(::ManifoldsBase.AbstractManifold, ::O, ::Any, ::TH) where {O<:Union{Manopt.AbstractDecoratedManifoldObjective{E, <:Manopt.AbstractManifoldSubObjective} where E, Manopt.AbstractManifoldSubObjective}, TH<:Function}

this is maybe a more serious one, since it is an interplay of providing the hessian or initial point/vector.

Ambiguity #10
truncated_conjugate_gradient_descent(M::ManifoldsBase.AbstractManifold, f, grad_f, Hess_f::TH; kwargs...) where TH<:Function @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/truncated_conjugate_gradient_descent.jl:451
truncated_conjugate_gradient_descent(M::ManifoldsBase.AbstractManifold, mho::O, p, X; kwargs...) where O<:Union{AbstractDecoratedManifoldObjective, ManifoldHessianObjective} @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/truncated_conjugate_gradient_descent.jl:525

Possible fix, define
  truncated_conjugate_gradient_descent(::ManifoldsBase.AbstractManifold, ::O, ::Any, ::TH) where {O<:Union{Manopt.AbstractDecoratedManifoldObjective, Manopt.ManifoldHessianObjective}, TH<:Function}

nearly the same as 9

Ambiguity #11
kwcall(::NamedTuple, ::typeof(Manopt.truncated_conjugate_gradient_descent), M::ManifoldsBase.AbstractManifold, f, grad_f, Hess_f::TH) where TH<:Function @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/truncated_conjugate_gradient_descent.jl:451
kwcall(::NamedTuple, ::typeof(Manopt.truncated_conjugate_gradient_descent), M::ManifoldsBase.AbstractManifold, mho::O, p, X) where O<:Union{AbstractDecoratedManifoldObjective, ManifoldHessianObjective} @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/truncated_conjugate_gradient_descent.jl:525

Possible fix, define
  kwcall(::NamedTuple, ::typeof(Manopt.truncated_conjugate_gradient_descent), ::ManifoldsBase.AbstractManifold, ::O, ::Any, ::TH) where {O<:Union{Manopt.AbstractDecoratedManifoldObjective, Manopt.ManifoldHessianObjective}, TH<:Function}

nearly the same as 10

Ambiguity #12
kwcall(::NamedTuple, ::typeof(Manopt.truncated_conjugate_gradient_descent), M::ManifoldsBase.AbstractManifold, f, grad_f, Hess_f::TH, p, X) where TH<:Function @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/truncated_conjugate_gradient_descent.jl:467
kwcall(::NamedTuple, ::typeof(Manopt.truncated_conjugate_gradient_descent), M::ManifoldsBase.AbstractManifold, F, gradF, x, Y, H::TH) where TH<:Function @ Manopt deprecated.jl:103

Possible fix, define
  kwcall(::NamedTuple, ::typeof(Manopt.truncated_conjugate_gradient_descent), ::ManifoldsBase.AbstractManifold, ::Any, ::Any, ::TH, ::Any, ::TH) where {TH<:Function, TH<:Function}

also a combination of the previous 3, though a case that shall not be possible (with 2 hessians) like 9.

Ambiguity #13
truncated_conjugate_gradient_descent!(M::ManifoldsBase.AbstractManifold, f, grad_f, Hess_f::TH, p, X; evaluation, preconditioner, kwargs...) where TH<:Function @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/truncated_conjugate_gradient_descent.jl:591
truncated_conjugate_gradient_descent!(M::ManifoldsBase.AbstractManifold, F::TF, gradF::TG, x, Y, H::TH; kwargs...) where {TF<:Function, TG<:Function, TH<:Function} @ Manopt deprecated.jl:103

Possible fix, define
  truncated_conjugate_gradient_descent!(::ManifoldsBase.AbstractManifold, ::TF, ::TG, ::TH, ::Any, ::TH) where {TF<:Function, TG<:Function, TH<:Function, TH<:Function}

the same as 12.

Ambiguity #14
kwcall(::NamedTuple, ::typeof(Manopt.truncated_conjugate_gradient_descent!), M::ManifoldsBase.AbstractManifold, f, grad_f, Hess_f::TH, p, X) where TH<:Function @ Manopt ~/Repositories/Julia/Manopt.jl/src/solvers/truncated_conjugate_gradient_descent.jl:591
kwcall(::NamedTuple, ::typeof(Manopt.truncated_conjugate_gradient_descent!), M::ManifoldsBase.AbstractManifold, F::TF, gradF::TG, x, Y, H::TH) where {TF<:Function, TG<:Function, TH<:Function} @ Manopt deprecated.jl:103

Possible fix, define
  kwcall(::NamedTuple, ::typeof(Manopt.truncated_conjugate_gradient_descent!), ::ManifoldsBase.AbstractManifold, ::TF, ::TG, ::TH, ::Any, ::TH) where {TF<:Function, TG<:Function, TH<:Function, TH<:Function}

nearly the same as 13.

Answer 1 · 2024-05-31T12:25:21.000Z

I thought about this for a while, and it seems most-unified to put the p= as a keyword argument instead.
That way the user can provide different parts of the objective (f, grad_f,...), that leaves flexibility and avoid ambiguities even in the two signatures above, where only mdci would be typed.

We though have to decide whether p= is a good keyword and whether the inplace variants then still make sense (it is then in place of a keyword, but I think that should be fine).

Answer 2 · 2024-06-03T11:01:01.000Z

I don't have a strong opinion about it. Your solution looks fine though mutating a keyword argument may indeed be a bit surprising.

Answer 3 · 2024-06-03T12:16:31.000Z

The mutating keyword is the one thing I to not like so much, but I also do not like to move mandatory things like the Hessian to keywords. So I am myself a bit divided here.

Answer 4 · 2024-06-05T05:21:57.000Z

I think I have an idea that even makes the objective a bit more flexible and maybe even reduces code duplication.

Similar to the VectorialFunction, where now every constraint has its own range, we could introduce
CostFunction, GradientFunction, HessianFunction – or even names that just indicate what the functions map to.
The nice thing is that such a wrapper would be very small, parametrised and hence basically no overhead, but they would allow to distinguish points (still untyped) and Hessians.

IN general the user does not have to think about that, we would just have one further dispatch in between

difference_of_convex_algorithm(M, f, g, ∂h, p=rand(M); kwargs...) = difference_of_convex_algorithm(M, CostFunction(f), CostFunction(g), SubGradientFunction(h), p; kwargs...)

wherever possible without introducing ambiguities. Whenever ambiguities occur, we would require the HessienFunction to be specified (instead of being a function or untyped).

But there is more. We could even do

struct HessianFunction{F,E<:AbstractEvaluationType}
    f::F
end

and hence have a more fine granular usage of in-place and allocating functions. The (objective wide) old evaluation would be the default, but you could specify that you have an in-place gradient but an allocating Hessian this way. Not sure whether theta useful, but it is a bit more flexible.
Code reduction could occur since the 4 cases ( alloc/in-place function/call) could be done on the function level (and no longer on the get_gradient and so on level, maybe even for different objectives repeated.
These calls could also just be done as generic functors of the above struct.

I feel this would also be a very least breaking thing – but a bit of a rework as well.

What do you think?

Answer 5 · 2024-06-05T14:51:58.000Z

Maybe it would be nicer to introduce a two-level scheme here instead of wrappers, for example

difference_of_convex_algorithm(M, f, g, ∂h, p=rand(M); kwargs...) = _difference_of_convex_algorithm(M, f, g, ∂h, p; kwargs...)

There would be even less room for ambiguity and no need for new wrappers.

Answer 6 · 2024-06-05T14:53:32.000Z

I am not yet so sure how that helps, but I can think about how that might be meant.

Answer 7 · 2024-06-05T14:55:01.000Z

But you might be right that function types are maybe overcomplicating things – the main idea comes from maybe indeed “typing the Hessian” (though not in this but the other signatures.

Answer 8 · 2024-06-05T14:58:21.000Z

If the user doesn't use those wrappers, they don't help with user-facing ambiguities either I think. If there is an ambiguity, the user can already construct the objective themselves and avoid it. That would probably be the easiest way to resolve such ambiguities.

Answer 9 · 2024-06-05T15:00:00.000Z

I do not yet see how your dispatch solves the problem with the ambiguity to
difference_of_convex_algorithm(M, mdco, p; kwargs...)

In most cases the user would not need the HessianFunction type just in the few cases where we are currently hit by ambiguity. So yes in a few cases the user has to specify whether the last parameter is a Hessian (and not a point).

Answer 10 · 2024-06-05T15:05:23.000Z

I do not yet see how your dispatch solves the problem with the ambiguity to
difference_of_convex_algorithm(M, mdco, p; kwargs...)

I didn't know we have an ambiguity when passing the full objective? When does it occur?

Anyway, I haven't thought about it particularly deeply but I don't see if introducing a new type really solves any problem that can't be solved with what we have.

Answer 11 · 2024-06-05T15:08:37.000Z

So there are two problems: sometimes objectives & start point
more often Hessian and start point. But I can check and write them out here (or in the initial post) later (that is list all ambiguities and why they appear)

Answer 12 · 2024-06-05T21:44:00.000Z

I added all cases (14) above and investigated them

group 1: the signature solver(M, f, p0::Number) to write a wrapper for f creates an ambiguity with solver(M, objective, p0) – where we also do not want to allow p0 being a number.
A solution could be to do the wrap function if p0 is a number in an internal call (dispatching there in p0 doing nothing if not a number and wrapping f and maybe grad f and such otherwise.
group 2: interplay between solver(M, f, grad_f, hess_f, p0) and solver(M, f, grad_f, p0) as well as p0 being a number – but also it would be better to avoid having hess_f to be restricted to functions.
the number case can be probably resolved as the idea in group 1, but both hess and p being untyped and positional is complicated, one could maybe find a wary to write just one function where both positional arguments have defaults (hess to the approx hessian, p0 to rand) can check whether that works.
combining this with the wrapper from group 1 might solve this.

Answer 13 · 2024-06-06T08:30:48.000Z

group 1: the signature solver(M, f, p0::Number) to write a wrapper for f creates an ambiguity with solver(M, objective, p0) – where we also do not want to allow p0 being a number.
A solution could be to do the wrap function if p0 is a number in an internal call (dispatching there in p0 doing nothing if not a number and wrapping f and maybe grad f and such otherwise.

Why not wrapping the number p0 in solver(M, objective, p0)? Like this: solver(M, objective, p0::Number) = solver(M, objective, [p0]).

group 2: interplay between solver(M, f, grad_f, hess_f, p0) and solver(M, f, grad_f, p0) as well as p0 being a number – but also it would be better to avoid having hess_f to be restricted to functions.
the number case can be probably resolved as the idea in group 1, but both hess and p being untyped and positional is complicated, one could maybe find a wary to write just one function where both positional arguments have defaults (hess to the approx hessian, p0 to rand) can check whether that works.
combining this with the wrapper from group 1 might solve this.

It would be easier to restrict here hess_f to ::Function. If someone needs a non-function functor object as a Hessian, they can always construct the objective themselves.

Answer 14 · 2024-06-06T08:33:28.000Z

Just fixing p0 is not enough. But I had an idea, we have to_mutating_gradient which I am currently rewriting to do the dispatch on p (instead of letting the interface dispatch) – looks good for now and seems nonbreaking.

For the Hessian: Thatsounds like a solution and would not even be breaking, great :) will add that to the documentation thenhowever.