Constrained least-squares/LDP (factored QP)
Opened this issue · 9 comments
I am currently experimenting with a FOSS SQP method, SLSQP, trying to replace the old numerically unstable Lawson-Hanson LSEI solver with a more modern, more reliable QP solver. See: stevengj/nlopt#86 After some mixed experiences with OSQP (which does not have the numerical difficulties Lawson-Hanson has, but has others), I am now trying DAQP.
SLSQP maintains the Hessian approximation for the QPs in LDL-factored form. It does all the updates on the factored matrices, so H is never multiplied out explicitly. Therefore the QPs are in least-squares form (hence the name Sequential Least-SQuares Programming):
/* MINIMIZE with respect to X */
/* ||E*X - F|| */
/* 1/2 T */
/* WITH UPPER TRIANGULAR MATRIX E = +D *L , */
/* -1/2 -1 */
/* AND VECTOR F = -D *L *G, */
/* WHERE THE UNIT LOWER TRIDIANGULAR MATRIX L IS STORED COLUMNWISE */
/* DENSE IN THE N*(N+1)/2 ARRAY L WITH VECTOR D STORED IN ITS */
/* 'DIAGONAL' THUS SUBSTITUTING THE ONE-ELEMENTS OF L */
/* SUBJECT TO */
/* A(J)*X - B(J) = 0 , J=1,...,MEQ, */
/* A(J)*X - B(J) >=0, J=MEQ+1,...,M, */
/* XL(I) <= X(I) <= XU(I), I=1,...,N, */
Compared to the standard QP formulation, this is factored, i.e., this is obviously the same as minimizing ||E*X - F||²=(E*X - F)^T * (E*X - F)
, which expands to a quadratic in X.
The question is now how to best get this into DAQP.
The documentation of OSQP recommends a variable transformation Y=E*X-F
, which brings the problem into the form:
/* MINIMIZE with respect to Y */
/* T */
/* Y *I/2*Y */
/* SUBJECT TO */
/* XL(I) <= X(I) <= XU(I), I=1,...,N, */
/* F(I) <= E*X(I) - Y(I) <= F(I), I=1,...,N, */
/* B(J) <= A(J)*X <= B(J), J=1,...,MEQ, */
/* B(J) <= A(J)*X <= inf, J=MEQ+1,...,M, */
The problem with that formulation is that the Hessian is highly singular, because it has a zero block (the coefficients of X
) and an identity block (the coefficients of Y
). As a result, DAQP by default rejects this as nonconvex (because it is not strictly convex). Enabling proximal iteration fixes that, but (unless I messed up the QP formulation somehow) the proximal iteration does not seem to work reliably: When I tried it on master
, it worked for the simple NLOPT tutorial test case, but not for the application on which I am using SLSQP (where DAQP would just falsely claim optimality of all the QPs on the all-zero starting point, and as a result, SLSQP would also falsely claim optimality of the starting point). When I tried it on dev
, even the NLOPT tutorial test case no longer worked, DAQP just ran out of iterations no matter how high I set the maximum iteration limit, probably getting stuck in some cycle or zigzagging. The proximal iteration has some special-casing for LPs (all-zero Hessian matrices), I wonder whether some of that would also be needed for a blockwise zero Hessian.
Another way would be to expand the quadratic and minimize X^T * (E^T*E) * X - 2 * (E^T*F)^T * X
or equivalently ½ * X^T * (E^T*E) * X - (E^T*F)^T * X
instead. (Obviously, the constant term F^T * F
can be ignored.) Note that E'E=L√D√DL'=LDL'=H and E'F=L√D(-inv(√D))inv(L)G=-G, so this is ultimately just the standard SQP ½ * X^T * H * X + G^T * X
. Is that the right way? I am somewhat hesitant to do that because this ultimately multiplies out an already LDL-factored matrix just to LDL-factor it again, which I fear is suboptimal both with respect to speed and numerical stability. But I may be fearing too much.
Finally, it might be possible to modify the DAQP algorithm to work directly on the factored form, but I do not know how that would best be done. (There are the Lawson-Hanson transformations from the above "LSEI" form to "LSI" form with only inequalities and then to unconstrained LDP form, which are implemented in the original SLSQP code, but those transformations might already be numerically problematic. Their transformation from LDP to NNLS definitely is problematic. And DAQP also has no documented API to call daqp_ldp
directly.)
What would you recommend? (Or would you recommend trying another solver altogether?)
Sounds like an interesting and worthwhile endeavour!
Some initial thoughts from my side:
- DAQP being a dual active-set solver inherently requires the problem to be nonsingular, so I would mainly recommend DAQP as an inner solver if the subproblems can be made nonsingular. (The DAQP + proximal-point iteration code has not been stress-tested as much, so I cannot confidently recommend that)
- Internally DAQP always transforms the problem into and LDP, which is then solved by daqp_ldp. This function is not documented since the focus of the solver has been on QPs (hence, the internal function has been abstracted away from the user.) If possible, I believe providing DAQP with an LDP would be the best case. Is E nonsingular? If I interpret the comments correctly, E is upper triangular, so if it is nonsingular it is basically a Cholesky factor of quadratic term? If yes, then this would be very nice since this is exactly how DAQP decomposes nominal QPs (so E and F have corresponding representation internally in DAQP, and it would be fairly straightforward to subvert DAQP's own preprocessing and directly provide E and F.)
Let me know if I can be of any assistance. I would gladly help out to see if DAQP could be a good fit for this project.
Yes, the matrix E
is supposed to be nonsingular. The Lawson-Hanson code will also return an error code if E
is singular.
What makes this not a standard-form LDP, but an intermediate form between QP and LDP, is that this has general linear constraints (and general bound constraints, which are of course a special case of general linear constraints):
/* A(J)*X - B(J) = 0 , J=1,...,MEQ, */
/* A(J)*X - B(J) >=0, J=MEQ+1,...,M, */
/* XL(I) <= X(I) <= XU(I), I=1,...,N, */
Great! This is exactly the form that DAQP solves. Given a QP, DAQP first computes E (or rather its inverse) and F and then transform the problem into an LDP (with general linear constraints). So to use DAQP in this context one can skip the computation of E and F and provide them directly from the SLSQP. The inverse of E corresponds to "Rinv" in DAQPWorkspace, and F corresponds to "v" in DAQPWorkspace.
I have actually been wanting to create an SQP solver which is based on DAQP for quite some time. And basing it on SLSQP might be a perfect fit.
Just a status update: I am probably going to submit a pull request this week after some more testing.
Experiences with use in SLSQP are mixed so far: DAQP is more reliable than the Lawson-Hanson LSEI code on huge gradients, but there appear to be numerical issues getting closer to the optimum (where presumably the gradients are small). Unless there is yet another subtle bug that I missed in my debugging.
Thank you for the update! If you have an example of a subproblem where DAQP struggles, I would gladly look into it to see if it is a inherent limitation of DAQP, or if it could be addressed somehow.
I still need to do the testing on the SLSQP part, but I decided to submit the DAQP parts anyway, since those should (hopefully) be good.
I have submitted pull request #59 for this issue (note that this takes, in DAQP nomenclature, R and f, not R and v, because doing it this way required fewer changes to DAQP and because SLSQP actually has f, it computes v from it before passing it to the Lawson-Hanson LSEI, so I just skip that transformation when using DAQP), and pull request #60 for a weird symbol conflict I had when testing.
If you have an example of a subproblem where DAQP struggles, I would gladly look into it to see if it is a inherent limitation of DAQP, or if it could be addressed somehow.
Here is one (in the format of pull request #59 – if you want to test it without the patch, you will have to multiply out H
and let the code factor it):
#include <math.h>
#include <stdio.h>
#include <daqp/api.h>
int main(void) {
double H[] = {8.6601894272028339, 2.7340755768569065, -0.98534345081492647, -0.6742958073717773, -0.18258713819462913, 0.0064182756004133173, 14.33042393821845, 0.33744656073607843, 3.6036388229742022, -0.77433821334872044, -0.000504709321387587, 3.9662418461659756, -3.559740972697504, 0.25917214486773693, -0.01722821836876812, 1.4249930789298744, 0.054081052944118524, 0.019102733415228574, 1.0052684880854008, -0.0011852469545498175, 0.028803262571862234};
double f[] = {0, 0, 0, 0, 0, 1};
double A[] = {-0.42239611604025445, 0, 1, 0, 0, 0, -0.025137928869806939, -0.21119805802012723, 0, 1, 0, 0, 0, -0.050275857739613877, 0, 0, 1, 0, -637.92902271930586, -433.54186299742616, 1443.4903635651949, 1121.9814973436191, 3910.0593023430547, -41.21744208782183, 637.92902271930586, 433.54186299742616, -1443.4903635651949, -1121.9814973436191, -3910.0593023430547, 41.21744208782183};
double bupper[] = {1.7888019419798729, 1.974862071130193, 3.9553953802885271, 3.9946909182400465, 3.9993680845321498, 2858.0093143568529, 1.8735013540549517e-16, -2.0816681711721685e-15, 1.3125026239457771e-14, INFINITY, INFINITY};
double blower[] = {-1.2111980580201271, -1.025137928869807, -2.0446046197114729, -2.0053090817599535, -2.0006319154678502, -841.99068564314689, 1.8735013540549517e-16, -2.0816681711721685e-15, 1.3125026239457771e-14, -3255.5096680789361, 5.95719029661268e-11};
int sense[] = {0, 0, 0, 0, 0, 0, 5, 5, 5, 0, 0};
DAQPProblem qp = {6, 11, 6, H, f, A, bupper, blower, sense, 1};
double rx[6], rlam[11];
DAQPResult result;
result.x = rx; /* primal variable */
result.lam = rlam; /* dual variable */
DAQPSettings settings;
/* Define solver settings */
daqp_default_settings(&settings); /* Populate settings with default values */
settings.progress_tol = sqrt(2.2e-16);
settings.primal_tol = 16. * 2.2e-16;
settings.dual_tol = 16. * 2.2e-16;
settings.iter_limit = 65536;
/* Solve the problem */
daqp_quadprog(&result,&qp,&settings);
/* Output the result */
printf("exitflag = %d\n", result.exitflag);
for (int i = 0; i < 6; i++) {
printf("x[%d] = %lg\n", i, rx[i]);
}
for (int i = 0; i < 11; i++) {
printf("lam[%d] = %lg\n", i, rlam[i]);
}
}
DAQP finds a solution with x[5] = 3.20695e-12
. (Tweaking the tolerances does not help. I have tried the above, I have tried the unmodified default settings, and I have tried tighter tolerances.) This is not a descent direction in SLSQP. I believe from the run with Lawson-Hanson LSEI that the correct x[5]
should be around -1.22079e-10
(though I would have to run LSEI with the exact same parameters to be sure, because the LSEI run obviously also differs by several ulps in the previous steps).
The problem appears to be cancellation in u-v
in ldp2qp_solution
: u[5]=34.718289204393734
and v[5]=34.718289204393642
, the difference is tiny and was destroyed by preceding roundoff.
Thank you! I will have a look at it. Those values seems very small, so I assume that the outer iterate is close to the optimum and that the step is essentially 0? From my experience, DAQP can have an error in the order of ~1e-10-1e-12 for problems that are not well-conditioned, which is around the magnitude of your desired solution here. I don't know if it is possible to "normalize" somehow to avoid small search directions?
I could add some kind of similar normalization internally in DAQP, but maybe the core problem is in how the problem is posed to compute step directions close to the optimum?
Those values seems very small, so I assume that the outer iterate is close to the optimum and that the step is essentially 0?
Yes, when SLSQP with DAQP breaks down, it is always while closing up to the optimum.
If you want to take a look: I have pushed my version of the SLSQP code to: https://github.com/kkofler/nlopt/tree/slsqp-daqp – compile with -DNLOPT_DAQP=ON -Ddaqp_DIR=$prefix/cmake
where $prefix
is the DAQP CMAKE_INSTALL_PREFIX
.