embotech/ecos

Memory leaks

Closed this issue · 3 comments

echu commented

See cvxpy/issues/166.

The source of the error comes from retaining the pointers w->best_x, w->best_y, etc. in preproc.c, but not handing ownership off at the interfaces.

I have just ran valgrind over the core ecos code from the develop branch. Everything seems OK:

valgrind --tool=memcheck ./runecos
==23199== Memcheck, a memory error detector
==23199== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==23199== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==23199== Command: ./runecos
==23199== 

ECOS 1.1.1 - (c) A. Domahidi, ETH Zurich & embotech 2012-15. Support: ecos@embotech.com

It     pcost         dcost      gap     pres    dres     k/t     mu      step     IR
 0   +5.230e-01   +5.230e-01   +4e+02   9e-01   4e-01   1e+00   2e+00    N/A     1 1 -
 1   +1.527e+00   +2.220e+00   +1e+02   1e-01   8e-02   9e-01   6e-01   0.8028   1 1 2
 2   +5.637e-01   +8.452e-01   +6e+01   4e-02   4e-02   4e-01   3e-01   0.7541   2 2 2
 3   +5.296e-01   +7.411e-01   +5e+01   4e-02   4e-02   3e-01   2e-01   0.3012   2 2 2
 4   +1.551e-01   +1.644e-01   +4e+00   3e-03   3e-03   2e-02   2e-02   0.9490   2 2 2
 5   +1.458e-01   +1.490e-01   +2e+00   2e-03   1e-03   6e-03   1e-02   0.6057   2 2 2
 6   +1.707e-01   +1.714e-01   +8e-01   5e-04   4e-04   2e-03   4e-03   0.7514   2 2 2
 7   +1.894e-01   +1.894e-01   +2e-01   1e-04   1e-04   3e-04   9e-04   0.9859   2 1 2
 8   +1.926e-01   +1.926e-01   +4e-03   2e-06   2e-06   5e-06   2e-05   0.9827   2 1 1
 9   +1.927e-01   +1.927e-01   +6e-05   4e-08   3e-08   8e-08   3e-07   0.9835   2 1 1
10   +1.927e-01   +1.927e-01   +4e-06   2e-09   2e-09   5e-09   2e-08   0.9392   2 1 1
11   +1.927e-01   +1.927e-01   +6e-08   4e-11   3e-11   8e-11   3e-10   0.9890   2 1 1
12   +1.927e-01   +1.927e-01   +1e-09   7e-13   6e-13   1e-12   6e-12   0.9815   2 1 1

OPTIMAL (within feastol=7.5e-13, reltol=5.9e-09, abstol=1.1e-09).
Runtime: 0.212726 seconds.

==23199== 
==23199== HEAP SUMMARY:
==23199==     in use at exit: 0 bytes in 0 blocks
==23199==   total heap usage: 112 allocs, 112 frees, 305,736 bytes allocated
==23199== 
==23199== All heap blocks were freed -- no leaks are possible
==23199== 
==23199== For counts of detected and suppressed errors, rerun with: -v
==23199== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@echu Maybe ECOS_free is called with a wrong value of the second argument in the Python interface? It determines which of the variables are freed (some need to be retained because they're returned to the user, but that depends on how many he/she wants back - multipliers, for example).

echu commented

That's what's getting to me. The leak shows up if all I do is setup then
cleanup all. (And it's the same pattern when I solve, cleanup, and retain
the primal, dual, and slack variables for CVXPY: so I know I'm doing that
right.)

But valgrind shows no leaks. So something seems to be accumulating in the
Python module, but I have no idea what. Even running pythons heap analyzer
(guppy) on the program shows a constant heap size.

So the stack is growing? That mostly happens with recursion.... And it
can't be from cvxpy since the other solvers don't exhibit this behavior.
SCS also uses a portion of the ecos Python module, so I'm sure those pieces
aren't contributing to the issue. Anyway, suffice it to say, I have no
ideas. :)
On Sun, Mar 1, 2015 at 3:53 AM Alexander Domahidi notifications@github.com
wrote:

I have just ran valgrind over the core ecos code from the develop branch.
Everything seems OK:

valgrind --tool=memcheck ./runecos
==23199== Memcheck, a memory error detector
==23199== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==23199== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==23199== Command: ./runecos
==23199==

ECOS 1.1.1 - (c) A. Domahidi, ETH Zurich & embotech 2012-15. Support: ecos@embotech.com

It pcost dcost gap pres dres k/t mu step IR
0 +5.230e-01 +5.230e-01 +4e+02 9e-01 4e-01 1e+00 2e+00 N/A 1 1 -
1 +1.527e+00 +2.220e+00 +1e+02 1e-01 8e-02 9e-01 6e-01 0.8028 1 1 2
2 +5.637e-01 +8.452e-01 +6e+01 4e-02 4e-02 4e-01 3e-01 0.7541 2 2 2
3 +5.296e-01 +7.411e-01 +5e+01 4e-02 4e-02 3e-01 2e-01 0.3012 2 2 2
4 +1.551e-01 +1.644e-01 +4e+00 3e-03 3e-03 2e-02 2e-02 0.9490 2 2 2
5 +1.458e-01 +1.490e-01 +2e+00 2e-03 1e-03 6e-03 1e-02 0.6057 2 2 2
6 +1.707e-01 +1.714e-01 +8e-01 5e-04 4e-04 2e-03 4e-03 0.7514 2 2 2
7 +1.894e-01 +1.894e-01 +2e-01 1e-04 1e-04 3e-04 9e-04 0.9859 2 1 2
8 +1.926e-01 +1.926e-01 +4e-03 2e-06 2e-06 5e-06 2e-05 0.9827 2 1 1
9 +1.927e-01 +1.927e-01 +6e-05 4e-08 3e-08 8e-08 3e-07 0.9835 2 1 1
10 +1.927e-01 +1.927e-01 +4e-06 2e-09 2e-09 5e-09 2e-08 0.9392 2 1 1
11 +1.927e-01 +1.927e-01 +6e-08 4e-11 3e-11 8e-11 3e-10 0.9890 2 1 1
12 +1.927e-01 +1.927e-01 +1e-09 7e-13 6e-13 1e-12 6e-12 0.9815 2 1 1

OPTIMAL (within feastol=7.5e-13, reltol=5.9e-09, abstol=1.1e-09).
Runtime: 0.212726 seconds.

==23199==
==23199== HEAP SUMMARY:
==23199== in use at exit: 0 bytes in 0 blocks
==23199== total heap usage: 112 allocs, 112 frees, 305,736 bytes allocated
==23199==
==23199== All heap blocks were freed -- no leaks are possible
==23199==
==23199== For counts of detected and suppressed errors, rerun with: -v
==23199== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

@echu https://github.com/echu Maybe ECOS_free is called with a wrong
value of the second argument in the Python interface? It determines which
of the variables are freed (some need to be retained because they're
returned to the user, but that depends on how many he/she wants back -
multipliers, for example).


Reply to this email directly or view it on GitHub
#111 (comment).

echu commented

Ok, I found it.

I built Python from source, edited Objects/obmalloc.c to #define Py_USING_MEMORY_DEBUGGER, used virtualenv (virtualenv -p $PATH_TO_CUSTOM_PYTHON env) to install ecos deps, copied over Misc/valgrind-python.supp, and ran

valgrind --trace-children=yes --suppressions=valgrind-python.supp --leak-check=full --track-origins=yes python $TEST_SCRIPT

Yes, it was pretty involved, but It showed up with this gem:

==56495==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==56495==    by 0x23D21E99: createSparseMatrix (splamm.c:112)
==56495==    by 0x23D2182F: ECOS_setup (preproc.c:664)
==56495==    by 0x23D150B8: csolve (ecosmodule.c:620)
==56495==    by 0x4B5388: PyEval_EvalFrameEx (ceval.c:4343)
==56495==    by 0x4B71F7: PyEval_EvalCodeEx (ceval.c:3265)
==56495==    by 0x52A14E: function_call (funcobject.c:526)
==56495==    by 0x422FB9: PyObject_Call (abstract.c:2529)
==56495==    by 0x4B229F: PyEval_EvalFrameEx (ceval.c:4346)
==56495==    by 0x4B6687: PyEval_EvalFrameEx (ceval.c:4119)
==56495==    by 0x4B71F7: PyEval_EvalCodeEx (ceval.c:3265)
==56495==    by 0x52A14E: function_call (funcobject.c:526)

The matrix A wasn't getting freed (the matrix data structure, not the underlying data itself, which is owned by Python) because we were passing in a matrix with 0 rows. In other words, it's possible to create a matrix with 0 rows, but when freeing, we check that that workspace->p > 0.

I changed the check to workspace->A != NULL, but we should probably be consistent in what it a "NULL" matrix is: is it synonymous with a 0xn matrix? is it distinct? I think this affects other pieces of our code, but not in any catastrophic way, since in those contexts, a NULL matrix is a 0xn matrix (e.g., we are populating sizes). The most important case where this assumption fails is in checking whether the matrix is allocated (even if it has 0 rows).