antonior92/ip-nonlinear-solver

Test Results

antonior92 opened this issue · 34 comments

Results of the SQP solver on the test problem ELEC.

This problem is described in [1], problem 2, and consist of given np electrons, find the equilibrium state distribution (of minimal Coulomb potential) of the electrons positioned on a conducting sphere.

The performance of other comercial solvers (according to [1]):
screenshot 2017-06-22 18 04 34

I choose this problem because the KNITRO package performs well on it, so our solver is expected to perform well on it as well, since the underlying principle is the same.

For np = 200 the performance of our implementation is:

time = 9.1 sec
f = 1.84392e+04 
c violation = 1.2e-15
optimality = 9.6e-03
niter = 125

As you can see the execution time is very competitive and the final value of f seems very close to the one obtained by the KNITRO package. The constraint violation also seems pretty ok.

I don't know how the optimality is defined on the previous table, but I defined it as the norm of the Lagrangian gradient ||grad L(x, lambda)||. I am not being able to reduce this optimality bellow
9.6e-3, after that the algorithm step gets too small and I cannot get more progress than that.
On the table from the paper, KNITRO seems to be able to reduce the optimality down to 3.9e-7,
but again, maybe it is only because of different definitions.

Besides that, the decay of this optimality measure ||grad L(x, lambda)|| seems pretty consistent along the interactions.

optimality_elec

My initial desire to design small test of my own have not been very successful. Mostly because I did not manage to think meaningful test by myself. So I have been focusing on performing test on problems from CUTEst collection. Nevertheless, I want to include some unit tests for the method. What do you guys think it would be the best option in this case? Implement methods from CUTEst in Python and use them for the unit tests?

[1]    Dolan, Elizabeth D., Jorge J. Moré, and Todd S. Munson. "Benchmarking optimization software with COPS 3.0." Argonne National Laboratory Research Report (2004).

I imagine you want some simpler tests, too. You can get a few of them that are used as examples for other NLP solvers or in textbooks. For example:
https://www.mathworks.com/help/optim/ug/fmincon.html
These simple examples with only some types of constraints could be useful for debugging if a bug is introduced.
But it would also be nice to have some large problems, so maybe implement a few CUTEst problems in Python. The purpose of these would be to show how your solver handles the most general, large-scale problems.

BTW, this is awesome. I am really, really impressed by the performance. You know, your solver wasn't supposed to beat KNITRO in terms of speed, even if KNITRO was doing a lot of unnecessary work reducing the optimality without much progress on the objective.
What's next? unit tests and benchmarking?

Hey Matt, thanks for the suggestions! I will try to take some examples from mathwork website and from textbook.

About the performance, I find it unexpected that my solver beats the results from KNITRO in the paper. But it make sense, this paper is from 2004, so I probably have a better hardware. Besides that, I think KNITRO runs more iterations than I do. Latter I will do some comparisons with IPOPT both running with the same configurations, so we can get more realistic results.

But, nevertheless, it shows that our implementation is reasonably efficient. :)

It is a nice idea. I will try it latter, after I have implemented the last bit of the algorithm. For now the algorithm is not able to deal with inequality constraints yet, only with problems of the type:

minimize f(x)
subject to: c(x) = 0

I made some modifications in the CG tolerance in order to be able to get more accurate solutions. I manage to get the optimality of the solution as low as 2.1e-05 which is much better than the previous 9.6e-03 I was getting before. I still didn't manage to get it as low as 4.0e-07.

I will try to compare the solution of my solver and KNITRO in other problems to try to find out if this should concern us or not.

I made some modifications in the CG tolerance in order to be able to get more accurate solutions.

  1. I'm quite sure the required tolerance of SQP solver and the tolerance of CG solver must be related. The simplest approach is to allow a user to select the tolerance for CG, but I think it should be selected automatically based on the current optimality. My analogy: the Newton-CG method where the tolerance of CG is selected typically as eta * ||grad||. Isn't this discussed in your main source ([1] from wiki)?

  1. The problem with optimality: my simple guess is that "infinity" norm is used in the paper (in my experience it is often chosen for the norm of the gradient). Could you check that as well?

  1. How are the results on other problems? Specifically you mention 2 problems in the wiki.

  1. As for larger problems to include as unit-tests: how about including 1-2 problems from [1]? They seem quite fun, I plan to test your solver on them in the future.

  1. Overall it's great that you are already touching practical non-linear problems, but we must not lose the focus (address this to myself as well). So the plan for this week is welcome.
  1. Regarding my first point: I see that you use some adaptive tolerance in projected_cg. But I can't quickly figure how it is chosen, it doesn't look like a scheme for Newton-CG from Num. Opt.: min(0.5, sqrt(||grad||)) * ||grad||.

My understanding is that with this approach the (global) objective function gradient is c and the tolerance should be chosen based on c.

Can you elaborate?

  1. The basic idea used in the paper is that the optimality of the final result of the CG procedure will be a fraction of the initial value. I was using tol = 0.01 * np.sqrt(rt_g) which was suggested in the paper.
    But I realised that for low enough rt_g we have that 0.01 * np.sqrt(rt_g) > rt_g, which was causing the CG to make no progress at all. I changed the definition to tol = min(0.01 * np.sqrt(rt_g), 0.1 * rt_g) which guarantees that always tol < rt_g. This provided some improvement on the solution.
  2. Thanks for the suggestion. I will check if it does make any difference.
  3. I tested the program on two problems ELEC and CHANNEL. I presented to you the results on ELEC. On CHANNEL the start point is an optimal point that violates the constraints. The algorithm converges very fast to a very optimal solution (8 iterations) and I found nothing interesting about the problem or the results (everything seems to be working well on it).

I was having trouble finding optimisation problems only with equality constraints. But just today I found a lot of test Nocedal himself used to test the Byrd-Omojukun implementation. They are described in [12] (commented references). I will do some test on it.

  1. Are you referring to COPS? Maybe it is a good option

  2. I want to show some results and share some of my concerns before discussing the plan for this week.

After some modification I am getting:

total time = 7.1 sec
f = 1.843891e+04
optimality = 3.4e-07
c violation = 1.3e-15
niter = 113

which seems pretty reasonable to me. Note that the optimality now is better than the one originally attained by KNITRO on the given table.

I am still tuning some of the internal parameters and I intend to do tests on a more diverse set of problems. Reference [12] just give me some good ideas.

What is still troubling me is the way it behaves when it gets very near the solution point:
optimality_elec_soc

After some point the algorithm doesn't progress any further because the trust-region radius starts to decay really fast. And doesn't matter how I set the stop criteria I can't gen any further reduction.

The idea of the algorithm reaching a point where it doesn't progress anything is really troublesome for me. Because it seems to contradict the superlinear convergence of the algorithm near the solution. I think it is happening just because of large roundoff errors. But any thoughts on this would be much appreciated.


My schedule for this week:

  • Today and Tomorrow: Implement unit test for the SQP method. Do more tests from CUTEst (The same ones Nocedal used on [12]). And do some fine tuning of the parameters.
  • Thursday and Friday: Write a implementation plan for the last part of the implementation (sequence of barrier problems). Implement a first draft of it.
  • Next week: Do test on the implementation and tune the parameters

As I have described on my GSoC proposal I will attend to the IFAC world congress from 9-14 of July (Two weeks from now). So my plan is to have a first version of the implementation of the final algorithm before that.

Hey Matt, can you try to send the problems again (probably in another way). It does not appear for me.

Oops, I was sending attachments from my phone. Amateur.

The numerical differentiation problem referred to above:
image

The binary integer LP from above:
image

If you want to go pro, add the constraint Ax = b from:

import numpy as np

def magic_square(n):
    np.random.seed(0)
    M = n*(n**2+1)/2
    
    numbers = np.arange(n**4) // n**2 + 1
    
    numbers = numbers.reshape(n**2,n,n)
    
    zeros = np.zeros((n**2,n,n))
    
    A_list = []
    b_list = []
    
    # Rule 1: use every number exactly once
    for i in range(n**2):
        A_row = zeros.copy()
        A_row[i,:,:] = 1
        A_list.append(A_row.flatten())
        b_list.append(1)
        
    # Rule 2: Only one number per square
    for i in range(n):
        for j in range(n):
            A_row = zeros.copy()
            A_row[:,i,j] = 1
            A_list.append(A_row.flatten())
            b_list.append(1)
        
    # Rule 3: sum of rows is M
    for i in range(n):
        A_row = zeros.copy()
        A_row[:,i,:] = numbers[:,i,:]
        A_list.append(A_row.flatten())
        b_list.append(M)
        
    # Rule 4: sum of columns is M
    for i in range(n):
        A_row = zeros.copy()
        A_row[:,:,i] = numbers[:,:,i]
        A_list.append(A_row.flatten())
        b_list.append(M)
        
    # Rule 5: sum of diagonals is M
    A_row = zeros.copy()
    A_row[:,range(n),range(n)] = numbers[:,range(n),range(n)]
    A_list.append(A_row.flatten())
    b_list.append(M)
    A_row = zeros.copy()
    A_row[:,range(n),range(-1,-n-1,-1)] = numbers[:,range(n),range(-1,-n-1,-1)]
    A_list.append(A_row.flatten())
    b_list.append(M)
    
        
    A = np.array(np.vstack(A_list),dtype = float)
    b = np.array(b_list,dtype= float)
    #c = np.zeros(A.shape[1],dtype= float)
    c = np.random.rand(A.shape[1])

    return A,b,c, numbers
    
n = 3
A, b, c, numbers = getAbc(n)

#[Your solver here. Be sure to add all the binary constraints xi (xi-1) = 0]
#sol = bintprog(c,A_eq=A, b_eq = b, options={"disp":True})

x = sol['x'].astype('int') # you might need to modify this; the point is x is binary
s = (numbers.flatten()*x).reshape(n**2,n,n)
square  = np.sum(s,axis=0)
print(square)

Thank you a lot Matt. I will try it out.

  1. I checked the papers and it seems that they advocate for this strategy. Ref. [12] gives some explanation on p. 690 for that. Honestly I'm still quite confused about it. Good catch that it had a problem in the original form.
  2. I meant COPS.

How about plotting function residuals (from the value given in the paper)? Maybe plot constraint violations as well. I mean maybe they look better (unlikely as you explained about really small radius of the trust regions).


Can you share the setup for ELEC problem to experiment with your code? I want to understand better what is going on when the progress stops.

  1. I think both formulations have similar purposes. I just think it is better to stick with what is proposed at the paper, since it has already been tested.
  2. I am thinking on implement problem (2) as a unit test.

I have tried to plot the constraint violation and the function along the iterations. They doesn't add much information to the scenario.

The problem is to reach a point where the step length goes to 0 very quickly and it gets impossible to get any further improvement. Maybe it is only because of roundoff errors. But nevertheless, it makes me uneasy.

I think the best approach now is to test the solver on lots of problems and see how it performs.


The code for ELEC and CHANNEL:
https://gist.github.com/antonior92/7d3355bcc1c5cc13eff5dbfa86b25a67

I tested the solver on several problems from CUTEst. Including the ones I have already mentioned, the solver was already tested on:

  • ELEC.
  • CHANNEL.
  • ORTHREGA.
  • ORTHREGC.
  • ORTHREGD.
  • HAGER2.
  • HAGER3.
  • DTOC1ND.
  • DTOC2.
  • DTOC3.
  • DTOC4.
  • DTOC5.
  • DTOC6.
  • EIGENA2.
  • EIGENC2.
  • ARTIF.
  • BRATU3D.

I tried out these problems for a few different parameters and it converges to the solution in most of the cases. I found two problems where the solver fails to converge ORTHREGD (only when you set NPTS=2500) and DTOC1ND .

https://gist.github.com/antonior92/3c19e4cf3002f56d0c7cc83c81156190

I will present the results in a more organised way latter. For now, I intend to try to understand why the solver is failing in this two problems.

I think my concerns about the convergence on ELEC have been unfounded, since the solver is converging well enough in most of the problem.

So the plan for today:
First I will try to understand why the solver is failing on ORTHREGDand on DTOC1ND and see what I can do about it. Latter I will implement a few unit tests (the problems Matt suggested and maybe problem 2 from COPS)...

Update: Problems that does not converges for default initial setting, may converge for other parameters. For instance:

  • ORTHREGD (for NPTS=2500 and NPTS=5000) fails to converges for default initial setting. Converges for different initial trust-radius and initial penalty parameter.
  • ORTHREGD (for any other NPTS) always converge.
  • DTOC1ND fails to converges for default initial setting. Converges for different initial trust-radius and initial penalty parameter.

Hey Matt, the largest possible setting is actually npts=5000. I edited the comment right after posting it, but you probably only got the first version via email.

Furthermore, I am interested on the problems you send to me. I will try to implement them.

Hey Matt, this is actually my fault. I always end up editing my comments afterwards. If you take a look at this issue I have edited 9 of my 13 comments :)

It will be nice to voice chat this week. For me it could be Thursday (tomorrow) or Friday some time between 11:00 and 15:00 UTC or between 17:00 and 21:00 UTC. What do you think Matt?
And what about you Nikolay?

It is good for me.

Hi Antonio,

I was trying to run the unit tests this morning. It seemed that the top level ipsolver was not in a working state (which is fine, as it's under development). For instance, the unit test only provided 8 of the 14 required inputs (you must have added the separate bounds and linear constraints recently without default values); there was a typo where nslack should have been n_slack, ipsolver invokes a method update that BarrierSubproblem doesn't have, etc... No big deal. Anyway, just let me know when it's in good shape to try again.

I also tried the EQP solver and found that while ProblemELEC seems to work, ProblemMaratos doesn't. Should it? I didn't look too deeply, but it looks like it's maxing out on iterations and optimality is at 3.0 (rather than < 1e-5).

Hey Matt,
Unittest were working for me on commit "Simple test for ipsolver (together with bug corrections)." (from tomorrow). After that I changed interface, breaking a lot of things. I am working on it right now,
sorry about that.

No problem. And I found the issue with EQP. Degrees is specified as an integer, so integer/180 was 0 for me in Python 2. You should include the usual from __future__ imports at the top.
(never mind the bit about v0 from original post)

I was trying to use the EQP solver for minimization of Rosenbrock function (unconstrained), but the EQP solver seems to need constraints. I'm sure you'll get to that at some point; just thought I'd record it here.

Hey Matt there are some rough edges on those functions. I don't check arguments because they are internal functions and no user will have to actually access them.

I have been testing the unittests mostly on python 3, that is why I missed the division thing you just pointed out.

I defined defaults for the function ipsolver and all the unittests seems to be working for the latest commit.

The solver right now requires at least one constraint. I didn't decide how to proceed about it yet.
I can do modifications in order to allow unconstrained problems as well (it shouldn't be very hard). Or I can just send an error, saying the user to try an unconstrained solver.