jump-dev/GLPK.jl

GLPK binaries crashing

jd-lara opened this issue · 15 comments

GLPK is crashing when certain infeasibilities are present but it does it non-deterministically. It looks like this is an issue with the binary.

Assertion failed: teta >= 0.0                               |  ETA: 0:01:19
Error detected in file simplex/spxchuzr.c at line 292
signal (6): Abort trap: 6
in expression starting at REPL[30]:1
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
pthread_kill at /usr/lib/system/libsystem_pthread.dylib (unknown line)
abort at /usr/lib/system/libsystem_c.dylib (unknown line)
errfunc at /Users/jdlara/.julia/artifacts/a8f69d41bc1a2b8d010db7eba6e1e89c06f026b9/lib/libglpk.40.dylib (unknown line)
glp_assert_ at /Users/jdlara/.julia/artifacts/a8f69d41bc1a2b8d010db7eba6e1e89c06f026b9/lib/libglpk.40.dylib (unknown line)
_glp_spx_chuzr_harris at /Users/jdlara/.julia/artifacts/a8f69d41bc1a2b8d010db7eba6e1e89c06f026b9/lib/libglpk.40.dylib (unknown line)
_glp_spx_primal at /Users/jdlara/.julia/artifacts/a8f69d41bc1a2b8d010db7eba6e1e89c06f026b9/lib/libglpk.40.dylib (unknown line)
glp_simplex at /Users/jdlara/.julia/artifacts/a8f69d41bc1a2b8d010db7eba6e1e89c06f026b9/lib/libglpk.40.dylib (unknown line)
glp_simplex at /Users/jdlara/.julia/packages/GLPK/mQmKc/src/gen/libglpk_api.jl:218 [inlined]
_solve_linear_problem at /Users/jdlara/.julia/packages/GLPK/mQmKc/src/MOI_wrapper/MOI_wrapper.jl:1337
optimize! at /Users/jdlara/.julia/packages/GLPK/mQmKc/src/MOI_wrapper/MOI_wrapper.jl:1445
odow commented

So you have an MPS file of the problem?

Pretty hard to do anything about it unless it's semi-reproducible.

I am running a loop of problems until one causes it again to grab the problem file. Will post it soon, sorry if I opened the issue too soon.

odow commented

I was avoiding doing this, but maybe we need to re-introduce the glp_error_hook stuff

GLPK.jl/src/GLPK.jl

Lines 201 to 229 in 9816d0e

# General recoverable exception: all GLPK functions
# throw this in case of recoverable errors
mutable struct GLPKError <: Exception
msg::AbstractString
end
# Fatal exception: when this is thrown, all GLPK
# objects are no longer valid
mutable struct GLPKFatalError <: Exception
msg::AbstractString
end
# Error hook, used to catch internal errors when calling
# GLPK functions
function _err_hook(info::Ptr{Cvoid})
ccall((:glp_error_hook, libglpk), Cvoid, (Ptr{Cvoid}, Ptr{Cvoid}), C_NULL, C_NULL)
ccall((:glp_free_env, libglpk), Cvoid, ())
_del_all_objs()
throw(GLPKFatalError("GLPK call failed. All GLPK objects you defined so far are now invalidated."))
end
macro glpk_ccall(f, args...)
quote
ccall((:glp_error_hook, libglpk), Cvoid, (Ptr{Cvoid}, Ptr{Cvoid}), @cfunction(_err_hook, Cvoid, (Ptr{Cvoid},)), C_NULL)
ret = ccall(($"glp_$f", libglpk), $(map(esc,args)...))
ccall((:glp_error_hook, libglpk), Cvoid, (Ptr{Cvoid}, Ptr{Cvoid}), C_NULL, C_NULL)
ret
end
end

Yeah, the issue is that it is crashing the julia session.

@odow I have the MOF file of the problem that cause the errors but it doesn't trigger the same error when solving after loading from file. I am not sure how to best reproduce this error :.

Can GLPK write the in-memory MPS (with their own methods) ?
So you can try writing always before solve and then keep the last before breaking.
Then to check if the error is reproduced, try loading the MPS with GLPK own methods (to bypass JuMO/MOI).

I zipped the file to be able to upload it here. I extracted it with this code

prob = JuMPmodel.moi_backend.inner
GLPK.glp_write_mps(prob, GLPK.GLP_MPS_FILE, Cvoid, "file.mps")

file.mps.zip

JuMP won't load it though.

m = JuMP.read_from_file("file.mps")
ERROR: Malformed COLUMNS line: PieceWiseLinearCostVariable_Solitude_{pwl_1,_1} R0000001 0 $ empty column
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] parse_columns_line(data::MathOptInterface.FileFormats.MPS.TempMPSModel, items::Vector{String}, multi_objectives::Vector{String})
    @ MathOptInterface.FileFormats.MPS ~/.julia/packages/MathOptInterface/QxT5e/src/FileFormats/MPS/MPS.jl:1022
odow commented

I hate MPS files with their weird formatting. Why is $ a comment.

odow commented

Okay, I remember why I removed the error stuff.

We could add something like:

mutable struct GLPKError <: Exception
    message::String
end

function _glp_error_hook(::Ptr{Cvoid})
    glp_free_env()
    return throw(GLPKError("GLPK call failed. Objects are invalidated"))
end

function __init__()
    glp_error_hook(@cfunction(_glp_error_hook, Cvoid, (Ptr{Cvoid},)), C_NULL)
    return
end

But the PDF has this unhelpful line:

image

So if things go wrong, GLPK just terminates. You can intercept this termination and longjmp away, but you must call glp_free_env. Unfortunately, in doing so you invalidate the current model and any calls against it segfault.

Therefore, there's no real way to gracefully exit from an error, so we might as well terminate immediately without the additional error handling. (MOI doesn't have a concept of "this call failed during a solve, so throw out the model.")

We can fix issues like #207 by checking the input (#208), but we can't really fix assertion errors deep inside the simplex solve.

odow commented

@jd-lara I can't reproduce using your example.

It needs to be reproducible with:

using GLPK
P = glp_create_prob()
parm = glp_mpscp()
glp_init_mpscp(parm)
glp_read_mps(P, GLP_MPS_FILE, parm, "file.mps")
opt = glp_smcp()
glp_init_smcp(opt)
glp_simplex(P, opt)
glp_delete_prob(P)

@odow I got the same crash on another problem instance and found the reason to be that the model that was provided contained NaNs. I don't know if this is intended behaviour (i.e., if the user is responsible for providing GLPK.jl with sensible input) or if some check should be performed to avoid the Julia session from crashing.

That is a bit non-trivial, because checking all coefficients might be a very meaningful bottleneck for some applications.
You can write a file with your MOI model and look for the NaN's.

@joaquimg I solved my problem upstream (the NaN's came from a flaw in the problem generation in my particular application). Thought I'd share a possible source for the error here, however, if someone else encounters something similar.

odow commented

@darnstrom use HiGHS.jl instead.

odow commented

Closing as non-reproducible. Anyone encountering errors is encouraged to use HiGHS.jl instead.