Magic constants in the initialization

Question

Magic constants in the initialization

Opened this issue 10 years ago · 10 comments

I recognize that Vandenberghe suggests a similar (1+alpha) shift in the initialization when alpha > 0, but the '1' is a magic number that, it seems to me, should scale with an appropriate norm of the vector. I haven't run experiments yet, but I am currently looking into replacing the one with gamma*||z||_max, with sqrt(epsilon) seeming to be a decent default value for gamma.

Answer 1 · 2015-01-02T06:25:43.000Z

Jack: I agree that the one is arbitrary. However, my intuition is that the
constant sqrt(epsilon) will leave the iterate close to the boundary and
affect the scaling of the Hessians. Also, Ideally the initial points have
to be well centered so trying to satisfy s+ mu g(z) = 0. Where g Is the
gradient of the barrier and mu =s'z/nu as best as possible could give us an
idea of what to do.

On Wednesday, December 31, 2014, Jack Poulson notifications@github.com
wrote:

I recognize that Vandenberghe suggests a similar (1+alpha) shift in the
initialization when alpha > 0, but the '1' is a magic number that, it seems
to me, should scale with an appropriate norm of the vector. I haven't run
experiments yet, but I am currently looking into replacing the one with
gamma*||z||_max, with sqrt(epsilon) seeming to be a decent default value
for gamma.

—
Reply to this email directly or view it on GitHub
#100.

Santiago Akle
+1 650 919 4833
tiagoakle@gmail.com

Answer 2 · 2015-01-02T06:29:11.000Z

Why not perform an entrywise clip instead of shifting the entire vector?

Answer 3 · 2015-01-02T16:36:47.000Z

An entry wise clipping would not guarantee that the initial vector is in the appropriate cone, at least not for second-order cones. Why do you think the one is a bad choice? Did you observe e.g. fewer iterations when choosing a constant different from one?

Answer 4 · 2015-01-02T17:36:46.000Z

I apologize for being imprecise in my language, as I have been working on a QP solver. The appropriate generalization of my suggestion is to clip each Jordan algebra atom rather than to clilp each entry (and therefore ensuring membership in the product cone). I prefer such a device since it is a projection and does not lead to a large l2 deviation from the original proposal when a single entry is outside of the cone.

Answer 5 · 2015-01-02T17:44:43.000Z

Ah, I see. I think the issue is that merely projecting on the cone brings the initial iterate onto its boundary, from where progress in IPMs is usually slow - it's usually better to have the initial iterates a) well centered and b) away from the boundary, at the (potential) cost of violating he equality Gx+s=h.

Answer 6 · 2015-01-02T17:48:18.000Z

I actually agree, but the current scheme does not guarantee this; if every atom is greater than 1e-8 e (with the appropriate identity element, e), then no shift is performed. An atomic clip up to 1e-8 e would have the same effect; I am suggesting to allow for atomic clipping to an arbitrary value (perhaps larger than 1e-8 e) rather than using the 1+alpha shift, which effectively has a phase transition from clipping from 1e-8 e up to 1.

Answer 7 · 2015-01-02T18:27:33.000Z

Jack I'm sorry I don't think I understand what you mean by clipping.
Are you thinking about an operation like "x = x > 10 ? 10 : x".

I would argue that the behavior of the current scheme when the
entries of order 1-8e are left where they are can be damaging and
venture the following hypothesis: we do not see that case happen often
in the test cases and therefore do not see its damage. Imagine random test
cases with a spherically symmetric distribution and LPs with m dimensional
positive orthants.
We would see initial points in the cone once every 2^{-2m} times!

On Fri, Jan 2, 2015 at 9:52 AM, Santiago Akle tiagoakle@gmail.com wrote:

Jack: I agree that the one is arbitrary. However, my intuition is that the
constant sqrt(epsilon) will leave the iterate close to the boundary and
affect the scaling of the Hessians. Also, Ideally the initial points have
to be well centered so trying to satisfy s+ mu g(z) = 0. Where g Is the
gradient of the barrier and mu =s'z/nu as best as possible could give us an
idea of what to do.

On Wednesday, December 31, 2014, Jack Poulson notifications@github.com
wrote:

I recognize that Vandenberghe suggests a similar (1+alpha) shift in the
initialization when alpha > 0, but the '1' is a magic number that, it seems
to me, should scale with an appropriate norm of the vector. I haven't run
experiments yet, but I am currently looking into replacing the one with
gamma*||z||_max, with sqrt(epsilon) seeming to be a decent default value
for gamma.

—
Reply to this email directly or view it on GitHub
#100.

Santiago Akle
+1 650 919 4833
tiagoakle@gmail.com

Santiago Akle
+1 650 919 4833
tiagoakle@gmail.com

Answer 8 · 2015-01-02T18:59:42.000Z

@tiagoakle I am referring to the Euclidean projection of each Jordan algebra member onto the set of members with eigenvalues greater than or equal to a particular value. For the positive orthant, where the Jordan algebra is simply real numbers, the (lower) clip would be of the form "x >= gamma ? x : gamma". For the second-order and semi-definite cones, a similar map would be applied to each eigenvalue.

As for the current scheme only breaking rarely under your proposed distribution: why even let it be a possibility?

EDIT: The set of members with eigenvalues greater than or equal to a particular value is not generally a cone, as positive scaling less than 1 could bring a member outside of the set.

Answer 9 · 2015-01-02T19:30:00.000Z

@poulson I agree the current scheme seems failed.

For the scheme you suggest it seems to
me that we want a relatively large value of gamma for the clipping.
Something like sqrt(epsilon)||z||_max could end up with entries in the order
of 1/sqrt(epsilon)||z||_max in the scaling matrix.

Answer 10 · 2015-01-02T20:31:56.000Z

@tiagoakle I agree that it would be worthwhile to support a larger clip value; perhaps eps^(0.25)*|| z || for some norm.