coin-or/Ipopt

Ipopt giving nan for objective function

bharswami opened this issue · 39 comments

I am trying to interface SPRAL with IPOPT in Matlab.
In debug mode, I found that the objective function and primal constraints became nan. This should be because the vairable "x" itself becomes nan. What could be the reason?
The optimization problem:
Minimize sin(x)^2
Constaints: -4<=x<=8
Subject to: 1<=x<=10

To give more perspective.
The error occurs in IpPDSearchDirCalc.cpp at
SmartPtr rhs = IpData().curr()->MakeNewContainer();
rhs->Set_x(*IpCq().curr_grad_lag_with_damping_x());
The value being set is nan.

This is because mu in IpAdaptiveMuUpdate.cpp is being set as nan at:
IpData().Set_mu(mu);

mu becomes nan due to the following sequence of function calls:

  • ipopt_libs.mexw64!Ipopt::IpoptCalculatedQuantities::curr_avrg_compl() Line 3594
    Here z_L and z_U are zero and hence result becomes zero. Hence...... (see after next line)
  • ipopt_libs.mexw64!Ipopt::QualityFunctionMuOracle::CalculateMu(double mu_min, double mu_max, double & new_mu) Line 223
    avrg_compl becomes zero here. Thus there is a divide by zero error.
    Number sigma_lo = Max(sigma_min_, mu_min / avrg_compl);
    Number sigma_up = Min(Max(sigma_lo, sigma_1minus), mu_max / avrg_compl);

    Hence sigma is infinity and avrg_compl is zero.
  • Thus Line 429: Number mu = sigma * avrg_compl;
    ipopt_libs.mexw64!Ipopt::AdaptiveMuUpdate::UpdateBarrierParameter() Line 402
    ipopt_libs.mexw64!Ipopt::IpoptAlgorithm::UpdateBarrierParameter() Line 584
    ipopt_libs.mexw64!Ipopt::IpoptAlgorithm::Optimize(bool isResto) Line 368

I am not sure if this is a bug since z_L and z_U are set to zero at lines 794 to 811 in file IpIpoptAlg.cpp.

SmartPtr<Vector> y_c = IpData().curr()->y_c()->MakeNew();
SmartPtr<Vector> y_d = IpData().curr()->y_d()->MakeNew();
bool retval = eq_multiplier_calculator_->CalculateMultipliers(*y_c, *y_d);
if( retval )
{
   SmartPtr<const IteratesVector> curr = IpData().curr();
   SmartPtr<IteratesVector> iterates = curr->MakeNewContainer();
   iterates->Set_x(*curr->x());
   iterates->Set_s(*curr->s());
   iterates->Set_z_L(*curr->z_L());
   iterates->Set_z_U(*curr->z_U());
   iterates->Set_v_L(*curr->v_L());
   iterates->Set_v_U(*curr->v_U());
   iterates->Set_y_c(*y_c);
   iterates->Set_y_d(*y_d);
   IpData().set_trial(iterates);
   IpData().AcceptTrialPoint();
}

The code does not expect avrg_compl to be 0. That would mean a point at the boundary, which one shouldn't have for an interior point algorithm. So I don't really see how it happened that "z_L and z_U are zero" here. The code in "lines 794 to 811 in file IpIpoptAlg.cpp" doesn't set them explicitly to 0, but just copies the values from the current iterate.

Paper https://optimization-online.org/2005/03/1089/ explains the idea behind computing mu in QualityFunctionMuOracle::CalculateMu(). However, eventually, mu is always set to a multiple of avrg_compl, so the latter should not be zero.

Since the problem is very small (which may be the cause for odd things to happen), you could run Ipopt with option print_level set to 12 and attach the log here. Maybe that helps to see what is going on.

Thanks Stefan.
I am attaching the log file here. I can't make much of it, but hopefully you should be able to see what's going on.
ipopt_log.txt

I have put checks at several points on the code, to avoid singularities/divide-by-zeros and NaNs/Infs.

This is the matlab code executing ipopt:
opts.ipopt.spral_use_gpu = 'no';
opts.ipopt.spral_pivot_method = 'block';
opts.ipopt.spral_order = 'metis';
opts.ipopt.spral_print_level = 0;
opts.ipopt.spral_scaling = 'matching';
opts.ipopt.spral_ignore_numa = 'yes';
opts.ipopt.jac_d_constant = 'yes';
opts.ipopt.hessian_constant = 'yes';
opts.ipopt.max_iter = 100;
opts.ipopt.tol = 1e-6;%
opts.ipopt.limited_memory_update_type = 'bfgs' ; % {bfgs}, sr1 = 6; % {6}
opts.ipopt.hessian_approximation = 'limited-memory';
opts.ipopt.derivative_test = 'first-order';
opts.ipopt.print_level = 1;
opts.ipopt.linear_solver = 'spral';
opts.ipopt.nlp_upper_bound_inf = 1e6;
opts.ipopt.nlp_lower_bound_inf = -1e6;
opts.lb = -10;
opts.ub = 10;
opts.cl = -10;
opts.cu = 10;
opts.zl = -10;
opts.zu = 10;
funcs.objective = @(x) (x-1)^2;
funcs.gradient = @(x) 2*(x-1);
funcs.constraints = @(x) x;
funcs.jacobian = @(x) sparse(1);
funcs.jacobianstructure = @(x) sparse(1);
funcs.hessian = @(x) sparse(0);
funcs.hessianstructure = @(x) sparse(1);
[x,fval] = ipopt_libs(0,funcs,opts)

This log doesn't seem to belong to this issue. The place in the code that you mention are executed only if mu_strategy is set to adaptive, but the log doesn't mention this option setting. Further, there are no nan's in the log, it just stops at an iteration limit.

Further, the matlab code you posted now also seems to solve a different NLP, and doesn't change the mu_strategy either.

I had put checks at several places along the code to check for nans and circumvent those.
"The place in the code that you mention are executed only if mu_strategy is set to adaptive, " - are you referring to the avrg_compl issue? It is definitely being hit in this run

"The place in the code that you mention are executed only if mu_strategy is set to adaptive, " - are you referring to the avrg_compl issue? It is definitely being hit in this run

Yes. But I don't see this in the log.
You said the variable x becomes nan. But a grep -F "curr_x[" ipopt_log.txt doesn't show any problematic numbers for the iterate. No nan in the log at all.

Here's an example of what I am doing. It is out of desperation I admit:
SmartPtr zerotrial = IpData().trial()->MakeNewIteratesVectorCopy();
zerotrial->Set(0.0);
xtemp = IpData().curr()->x();
trialx = IpData().trial()->x();
Number val;
mu = IpData().curr_mu();
val = IpData().trial()->Min();
if ((val - val) != 0 || abs(val)>1e100)
IpData().set_trial(zerotrial);
//Here's where I am checking trial is NaN or inf. I believe this trial is what is assigned to x. I can //remove all this Nan circumventions and send you the "actual log".

Here's the log for the problem:
opts = [];
funcs = [];
opts.ipopt.spral_use_gpu = 'no';
opts.ipopt.spral_pivot_method = 'block';
opts.ipopt.spral_order = 'metis';
opts.ipopt.spral_print_level = 0;
opts.ipopt.spral_scaling = 'matching';
opts.ipopt.spral_ignore_numa = 'yes';
opts.ipopt.jac_d_constant = 'yes';
opts.ipopt.hessian_constant = 'yes';
opts.ipopt.max_iter = 100;
opts.ipopt.tol = 1e-6;%
opts.ipopt.limited_memory_update_type = 'bfgs' ; % {bfgs}, sr1 = 6; % {6}
opts.ipopt.hessian_approximation = 'limited-memory';
opts.ipopt.derivative_test = 'first-order';
opts.ipopt.print_level = 12;
opts.ipopt.linear_solver = 'spral';
opts.lb = 0;
opts.ub = 1;
opts.cl = -10;
opts.cu = 10;
opts.zl = -10;
opts.zu = 10;
funcs.objective = @(x) (x-1)^2;
funcs.gradient = @(x) 2*(x-1);
funcs.constraints = @(x) x;
funcs.jacobian = @(x) sparse(1);
funcs.jacobianstructure = @(x) sparse(1);
funcs.hessian = @(x) sparse(0);
funcs.hessianstructure = @(x) sparse(1);
[x,fval] = ipopt_libs(0,funcs,opts)
from the unaltered repository.
You will see the NaNs in this.
ipopt_log.txt

The above log file I sent you is from the unaltered repository.
I am also attaching the log from the unaltered repository with limits on the variable x [-10,10].
ipopt_log_.txt

Hi Stefan. It will be great if you can help me with these issues. I am trying to learn as I build the code and get ready a mex with SPRAL.

Can you atleast confirm if there indeed is a bug in ipopt?

As I don't see from the log where the nan for the Barrier function comes from, I try to reproduce with the C++ interface. But something is odd in your log, where I don't see where this comes from.

First, you have

DenseVector "original x_L unscaled" with 1 elements:
original x_L unscaled[    1]= 0.0000000000000000e+00
DenseVector "original x_U unscaled" with 1 elements:
original x_U unscaled[    1]= 1.0000000000000000e+00
DenseVector "original d_L unscaled" with 1 elements:
original d_L unscaled[    1]=-1.0000000000000000e+01
DenseVector "original d_U unscaled" with 1 elements:
original d_U unscaled[    1]= 1.0000000000000000e+01
DenseVector "modified x_L scaled" with 1 elements:
modified x_L scaled[    1]= 0.0000000000000000e+00
DenseVector "modified x_U scaled" with 1 elements:
modified x_U scaled[    1]= 1.0000000000000000e+00
DenseVector "modified d_L scaled" with 1 elements:
modified d_L scaled[    1]=-1.0000000000000000e+01
DenseVector "modified d_U scaled" with 1 elements:
modified d_U scaled[    1]= 1.0000000000000000e+01

But Ipopt is actually relaxing variable bounds and sides of inequalities by default. The modified bounds should be

DenseVector "modified x_L scaled" with 1 elements:
modified x_L scaled[    1]=-1.0000000000000000e-08
DenseVector "modified x_U scaled" with 1 elements:
modified x_U scaled[    1]= 1.0000000099999999e+00
DenseVector "modified d_L scaled" with 1 elements:
modified d_L scaled[    1]=-1.0000000099999999e+01
DenseVector "modified d_U scaled" with 1 elements:
modified d_U scaled[    1]= 1.0000000099999999e+01

I don't see how this could be skipped, since you didn't have the corresponding options set to skip this.

More importantly, the problem statistics written to the log seem wrong:

Total number of variables............................:        1
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        1
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

The variable has both lower and upper bounds and the constraint also has both lower and upper bounds.

Can you check again that you have no modifications in the Ipopt code?

I will check and revert.
I am using some cpp files from Jonathan Currie’s version of ipopt (for MATLAB). I will download the coin-or ipopt repo again and run.

Do you know where I can get the latest matlab interface files for ipopt?

The only maintained Matlab interface to Ipopt I know of is https://github.com/ebertolazzi/mexIPOPT

But that doesn't have the cpp files? I need those to build

I think that the source files for his Matlab/Ipopt interface are at https://github.com/ebertolazzi/mexIPOPT/tree/master/toolbox/src.
I cannot say how to build this. I don't have Matlab.

Stefan, Thanks for your help. The issue was that the BLAS library files weren't properly interfaced with the IPOPT code. Now the optimizer runs without any NaNs but doesn't converge. Please look at the following example:
opts = [];
funcs = [];
opts.ipopt.spral_use_gpu = 'no';
opts.ipopt.spral_pivot_method = 'block';
opts.ipopt.spral_order = 'metis';
opts.ipopt.spral_print_level = 0;
opts.ipopt.spral_scaling = 'matching';
opts.ipopt.spral_ignore_numa = 'yes';
opts.ipopt.jac_d_constant = 'no';
opts.ipopt.hessian_constant = 'no';
opts.ipopt.max_iter = 1000;
opts.ipopt.tol = 1e-6;%
opts.ipopt.limited_memory_update_type = 'bfgs' ; % {bfgs}, sr1 = 6; % {6}
opts.ipopt.hessian_approximation = 'limited-memory';
opts.ipopt.derivative_test = 'first-order';
opts.ipopt.print_level = 12;
opts.ipopt.linear_solver = 'spral';
opts.lb = [-10];
opts.ub = [10];
opts.cl = -10;
opts.cu = 10;
funcs.objective = @(x) (x-2)^2;
funcs.gradient = @(x) 2*(x-2);
funcs.constraints = @(x) x;
funcs.jacobian = @(x) sparse(1);
funcs.jacobianstructure = @(x) sparse(1);
funcs.hessian = @(x) sparse(0);
funcs.hessianstructure = @(x) sparse(1);
[x,fval] = ipopt_libs(2,funcs,opts)
diary off

The answer is x = 2, but optimizer doesn't converge. Please have a look at the log file. What is preventing the optimizer from converging?
ipopt_log_x.txt

Also have a look at an example of 2 variables.
diary ipopt_log
opts = [];
funcs = [];
opts.ipopt.spral_use_gpu = 'no';
opts.ipopt.spral_pivot_method = 'block';
opts.ipopt.spral_order = 'metis';
opts.ipopt.spral_print_level = 0;
opts.ipopt.spral_scaling = 'matching';
opts.ipopt.spral_ignore_numa = 'yes';
opts.ipopt.jac_d_constant = 'no';
opts.ipopt.hessian_constant = 'no';
opts.ipopt.max_iter = 1000;
opts.ipopt.tol = 1e-6;%
opts.ipopt.limited_memory_update_type = 'bfgs' ; % {bfgs}, sr1 = 6; % {6}
opts.ipopt.hessian_approximation = 'limited-memory';
opts.ipopt.derivative_test = 'first-order';
opts.ipopt.print_level = 12;
opts.ipopt.linear_solver = 'spral';
opts.lb = [-10;-10];
opts.ub = [10;10];
opts.cl = -10;
opts.cu = 10;
funcs.objective = @(x) ((x(1)-1)^2+(x(2)-2)^2);
funcs.gradient = @(x) 2*[x(1)-1 x(2)-2];
funcs.constraints = @(x) x(1);
funcs.jacobian = @(x) sparse([1 0]);
funcs.jacobianstructure = @(x) sparse([1 1]);
funcs.hessian = @(x) sparse([0 0;0 0]);
funcs.hessianstructure = @(x) sparse([1 1;1 1]);
[x,fval] = ipopt_libs([3;3],funcs,opts)
diary off

The solution is x = [1;2]. Doesn't seem to converge again. Log file attached.
ipopt_log_x1_x2.zip
Please suggest. Thanks.

I would think that there might be an issue with Spral. Do you get the same problem when using a different linear solver, e.g., Mumps?

This is the log I get for the first NLP: log_x.txt
When I grep for residual_ratio, it seems that all linear systems are solved fine:

residual_ratio = 1.161487e-18
residual_ratio = 1.708076e-19
residual_ratio = 6.936164e-15
residual_ratio = 1.601423e-16
residual_ratio = 1.216627e-15
residual_ratio = 4.818324e-17
residual_ratio = 2.960622e-15
residual_ratio = 4.131101e-17
residual_ratio = 2.701665e-14
residual_ratio = 1.245868e-17
residual_ratio = 1.946965e-13
residual_ratio = 5.008270e-17

But in your log, things go weird after the first 2:

residual_ratio = 1.161487e-18
residual_ratio = 1.708076e-19
residual_ratio = 1.535039e-01
residual_ratio = 1.531841e-01
residual_ratio = 1.528650e-01
residual_ratio = 1.525466e-01
residual_ratio = 1.522288e-01
residual_ratio = 1.519116e-01
residual_ratio = 1.515952e-01
residual_ratio = 1.512794e-01
residual_ratio = 1.509642e-01
residual_ratio = 1.506497e-01
residual_ratio = 1.503359e-01
residual_ratio = 1.500227e-01
Iterative refinement failed with residual_ratio = 1.500227e-01
...

Maybe increasing spral_print_level will give some insight.

Running it with mumps (from the available Matlab interface for ipopt) works fine.
I will dig more into this residual ratio.

Do you where to look for the bug in SPRAL? SPRAL needs the METIS package.

Not really. If there was an issue with linking blas to ipopt, maybe something similar happened with your build of Spral. As said, increasing the print level for spral may give some hint what is going on there.

I tried increasing the print level, but I didn't see any increase in the print output.
I am attaching in case you are able to spot any.
ipopt_log_x_spral_print.txt

What does the residual_ratio exactly indicate?

I tried increasing the print level, but I didn't see any increase in the print output.

The output from Spral isn't going throught the message handling of Ipopt, I think. So it goes to stdout. Maybe there is some console where something is printed to.

What does the residual_ratio exactly indicate?

It is the scaled violation of the equation system that has been solved by the linear solver (Spral) (follow code at https://github.com/coin-or/Ipopt/blob/stable/3.14/src/Algorithm/IpPDFullSpaceSolver.cpp#L246-L249).
So after solving Ax=b, it computes ||Ax-b|| and then divides this by some quantity in case A, x, or b are huge.

The scaling doesn't matter so much here. The lines around this in your log looks like

max-norm resid_x  1.422677e+00
max-norm resid_s  5.551115e-17
max-norm resid_c  0.000000e+00
max-norm resid_d  0.000000e+00
max-norm resid_zL 0.000000e+00
max-norm resid_zU 0.000000e+00
max-norm resid_vL 0.000000e+00
max-norm resid_vU 0.000000e+00
nrm_rhs = 4.53e+00 nrm_sol = 7.76e-01 nrm_resid = 1.42e+00
residual_ratio = 2.683456e-01

resid_x is one component of Ax-b and it is just too high, indicating that the equation system wasn't solved correctly.

I fixed some bugs w.r.t. interfacing blas routines with ipopt.
I tried the following problem:

opts = [];
funcs = [];
opts.ipopt.spral_use_gpu = 'no';
opts.ipopt.spral_pivot_method = 'block';
opts.ipopt.spral_order = 'metis';
opts.ipopt.spral_print_level = 0;
opts.ipopt.spral_scaling = 'none';
opts.ipopt.spral_ignore_numa = 'no';
opts.ipopt.jac_d_constant = 'no';
opts.ipopt.hessian_constant = 'no';
opts.ipopt.max_iter = 1000;
opts.ipopt.tol = 1e-6;%
opts.ipopt.limited_memory_update_type = 'bfgs' ; % {bfgs}, sr1 = 6; % {6}
opts.ipopt.hessian_approximation = 'limited-memory';
opts.ipopt.derivative_test = 'first-order';
opts.ipopt.print_level = 12;
opts.ipopt.linear_solver = 'spral';
opts.lb = [-10;-10];
opts.ub = [10;10];
opts.cl = -10;
opts.cu = 10;
funcs.objective = @(x) sin(x(1)x(2)-8)^2;
funcs.gradient = @(x) sin(2
x(1)x(2)-16)[x(2) x(1)];
funcs.constraints = @(x) x(1);
funcs.jacobian = @(x) sparse([1 0]);
funcs.jacobianstructure = @(x) sparse([1 1]);
funcs.hessian = @(x) sparse([0 0;0 0]);
funcs.hessianstructure = @(x) sparse([1 1;1 1]);
[x,fval] = ipopt_libs([1;1],funcs,opts);
It is giving an "Optimal Solution Found", but it takes too many iterations and residual ratio is still going away from zero before iterative refinements. Log file attached. Do you have any idea why?
ipopt__sine_log.txt

Another simple problem, where the point is almost feasible but there is an error in calling the restoration phase.
opts.ipopt.spral_use_gpu = 'no';
opts.ipopt.spral_pivot_method = 'block';
opts.ipopt.spral_order = 'metis';
opts.ipopt.spral_print_level = 0;
opts.ipopt.spral_scaling = 'none';
opts.ipopt.spral_ignore_numa = 'no';
opts.ipopt.jac_d_constant = 'no';
opts.ipopt.hessian_constant = 'no';
opts.ipopt.max_iter = 1000;
opts.ipopt.tol = 1e-6;%
opts.ipopt.limited_memory_update_type = 'bfgs' ; % {bfgs}, sr1 = 6; % {6}
opts.ipopt.hessian_approximation = 'limited-memory';
opts.ipopt.derivative_test = 'first-order';
opts.ipopt.print_level = 5;
opts.ipopt.linear_solver = 'spral';
opts.lb = [-10;-10];
opts.ub = [10;10];
opts.cl = -10;
opts.cu = 10;
funcs.objective = @(x) sin(x(1)x(2)^2-8)^2;
funcs.gradient = @(x) sin(2
x(1)x(2)^2-16)[x(2)^2 2*x(2)*x(1)];

funcs.constraints = @(x) x(1);
funcs.jacobian = @(x) sparse([1 0]);
funcs.jacobianstructure = @(x) sparse([1 1]);
funcs.hessian = @(x) sparse([0 0;0 0]);
funcs.hessianstructure = @(x) sparse([1 1;1 1]);
[x,fval] = ipopt_libs([1;1],funcs,opts);
ipopt_log_resto_err.txt

I still cannot reproduce. Maybe you want to try without Matlab first and use the C++ interface of Ipopt directly. Just to reduce the complexity of the setup, and being able to see the Spral log.

Also, you can try switching to Mumps again. If that works, then just stay with Mumps. Or see if you can use HSL codes.

Here the log I get for the sin(x(1)x(2)-8)^2 objective function: ipopt__sine_log_cpp.txt
It solves in 6 iterations.

And here the source file for that example: cpp_src.zip

This is with SPRAL solver?
I am able to solve more complex systems with the same number of iterations as the MUMPS, MA57 solvers (examples given in mexipopt)
I really want to try to build this with SPRAL and get a mex file ready - I need that experience.

I will try with cpp interface and see if I am able to get the desired output. The cpp source you sent is for the cpp interface of ipopt?

My CPP test did not go well. Although the project built, I keep getting errors that some dll's are missing.
I did a text comparison of the log files that I have with the one you sent me. Lines 2600-2800 are a bit weird. There seem to be some extra data from your log. The values also start to differ here. Could you point out why the values are different?
Thanks
ipopt__sine_log_cpp_stefan.txt
ipopt_log_new.txt

The first significant difference is around line 2784, in the approximated Hessian. I got

    W-U[ 1][    1]=-1.4490164783310873e+00
    W-U[ 1][    2]=-1.3681535825665503e+00

and you got

    W-U[ 1][    1]=-3.2039340808998262e-01
    W-U[ 1][    2]=-1.7771074456272162e-01

But I cannot say whether this is a result of the tiny difference before, or something was calculated wrongly.

What about line 2776 (my code) and 2765 (your code)?
The "homogenous element, all elements have value....." value is different.
I found that the BFGS update (function calling IpLapackPotrf) fails on some cholesky factorizations. Is this a significant factor?

I find that the matrix sent for cholesky factorization (to "dpotrf") has some "junk" values which should have been set to zero. This has to be done while initializing the matrix. But where?
For example in IpLowRankAugSystemSolver.cpp
Line 323: bool retchol = J1_->ComputeCholeskyFactor(*M1);
Line 387: bool retchol = J2_->ComputeCholeskyFactor(*M2);

What about line 2776 (my code) and 2765 (your code)? The "homogenous element, all elements have value....." value is different. I found that the BFGS update (function calling IpLapackPotrf) fails on some cholesky factorizations. Is this a significant factor?

Yes. Cholesky factorizations failing doesn't sound good. Which could then lead to differences in the homogenous matrix afterwards.

I find that the matrix sent for cholesky factorization (to "dpotrf") has some "junk" values which should have been set to zero. This has to be done while initializing the matrix. But where? For example in IpLowRankAugSystemSolver.cpp Line 323: bool retchol = J1_->ComputeCholeskyFactor(*M1); Line 387: bool retchol = J2_->ComputeCholeskyFactor(*M2);

Initializing an empty dense matrix or vector probably happens via IpBlasCopy, e.g.,

   const Number zero = 0.;
   IpBlasCopy(NCols() * NRows(), &zero, 0, values_, 1);

puts NCols() * NRows() many zeros into array values_.

Hi Stefan. There was a small niggle in interfacing with the BLAS routines in FORTRAN.
Works fine now.