Windows bug
hpwxf opened this issue · 15 comments
To reproduce it:
- enable Windows support in
.travis-ci.yml
and run CI test
When test_mat_to_arr.py
is reduced to
print("Test armadillo matrix to numpy array functions.")
import numpy as np
import test_carma as carma
print("Before")
print("Test mat_to_arr with c++ addition.")
sample = np.asarray(
np.random.normal(size=(10, 2)),
dtype=np.float,
order='F'
)
mat = carma.mat_to_arr_plus_one(sample, False)
assert np.allclose(mat, sample + 1)
print("Test mat_to_arr with c++ addition.")
sample = np.asarray(
np.random.normal(size=(10, 2)),
dtype=np.float,
order='F'
)
mat = carma.mat_to_arr_plus_one(sample, True)
assert np.allclose(mat, sample + 1)
print("After")
It is still failing. But if we remove first sub-test or second sub-test, it is OK.
NB: when it fails, there is no output even for the first print
s.
- By removing all other test, we get:
$ ctest -C Release
Test project C:/Users/AzureUser/carma/build
Start 1: pytest
1/1 Test #1: pytest ...........................Exit code 0xc0000374
***Exception: 0.90 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 0.92 sec
or by hand:
PYTHONPATH=../build/tests/Release python test_mat_to_arr.py
PS:
test_arr_to_mat.py
is OK.test_arraystore.py
,test_nparray.py
,test_type_caster.py
are KO liketest_mat_to_arr.py
It sounds like this due to reusing the sample
var name which will trigger destruction of the memory. If this is true a single test like below should trigger the same behaviour.
print("Test armadillo matrix to numpy array functions.")
import numpy as np
import test_carma as carma
print("Before")
print("Test mat_to_arr with c++ addition.")
sample = np.asarray(
np.random.normal(size=(10, 2)),
dtype=np.float,
order='F'
)
mat = carma.mat_to_arr_plus_one(sample, False)
assert np.allclose(mat, sample + 1)
# This should trigger a destructor call
sample = None
# This just removes the sample from the namespace
del sample
print("After")
This would also mean this is a fairly significant memory bug.
Any idea on why this would not show up in *NIX systems but it does with windows?
But the above wouldn't explain why the first print statement isn't reached or why test_nparray.py
has the same issue. Those tests only read or switch some flags...
Exit code 0xc0000374
seems to be a memory corruption (e.g. https://forums.iis.net/t/1150912.aspx?0xc0000374).
If this corruption occurs in the Python interpreter, it is possible that it crashes before to able to print previously computer results or outputs (buffered outputs are not yet flushed).
Since the problem occurs, only this two successive tests (in any order; and ok with only one), it could help to understand where the corruption is.
Next thing to do could be to try with valgrind
on a *NIX system.
I plan also to add a mode with memory sanitizer options.
Your example has also failed (cf https://travis-ci.com/github/hpwxf/carma/jobs/358021677).
I'm going to try to check what happens on Linux with valgrind (using https://pypi.org/project/pytest-valgrind/)
An error seems to occur here:
Valgrind has detected an error:
**5286**
**5286** **********************************************************************
**5286** test_windows_bug.py::test_windows_bug
**5286** **********************************************************************
==5286== Mismatched free() / delete / delete []
==5286== at 0x483708B: operator delete(void*, unsigned long) (vg_replace_malloc.c:585)
==5286== by 0x1F2A4715: pybind11::capsule carma::create_capsule<double>(double*)::{lambda(void*)#1}::operator()(void*) const (utils.h:97)
==5286== by 0x1F2A4735: pybind11::capsule carma::create_capsule<double>(double*)::{lambda(void*)#1}::_FUN(void*) (utils.h:91)
==5286== by 0x1F280925: pybind11::capsule::capsule(void const*, void (*)(void*))::{lambda(_object*)#1}::operator()(_object*) const (pytypes.h:1168)
==5286== by 0x1F280945: pybind11::capsule::capsule(void const*, void (*)(void*))::{lambda(_object*)#1}::_FUN(_object*) (pytypes.h:1169)
==5286== by 0x4AA2FA: ??? (in /usr/bin/python3.7)
==5286== by 0x8D1E847: array_dealloc (in /usr/local/lib/python3.7/dist-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so)
...
If I remove the delete
at utils.h:97, there is no more segfault in Windows (https://travis-ci.com/github/hpwxf/carma/jobs/358032696) but there is, of course, a memory leak (this was just to confirm the location of the error).
I think the bug might be in get_data.
template <typename armaT, typename = std::enable_if_t<is_convertible<armaT>::value>>
inline typename armaT::elem_type * get_data(armaT * src, bool copy) {
using T = typename armaT::elem_type;
if (copy) {
size_t N = src->n_elem;
T * data = new T[N];
std::memcpy(data, src->memptr(), sizeof(T) * N);
return data;
} else {
T * data = src->memptr();
arma::access::rw(src->mem) = 0;
return data;
}
} /* get_data */
If you don't copy in but do copy out we have:
- input_array correctly assumes it owns the data and will correctly free the memory when destructed
- armadillo matrix incorrectly assumes it owns the data and will clear it
- output_array will correctly free the memory when destructed
However, under conditions the input array will get copied, e.g. wrong memory order, owndata is false, which prevents us from always taking ownership away from armadillo.
Perhaps we can set a global flag that indicates when a copy occurred on the way in and use that to either take away ownership from armadillo when a copy is made out.
Wouldn't an intermediate solution be to use a macro that sets delete when OS is not Windows? That way we can support windows albeit with a memory leak but don't interfere with performance on *NIX systems
Since I've planned to use carma in another project (portable on Linux/Mac/Windows), I prefer to solve this problem without any leak in any OS.
Using clang sanitizer, the analysis seems to be easier
# Tested on Debian Buster
CXXFLAGS="-fsanitize=address -fno-omit-frame-pointer" CXX=clang++-7 cmake -DBUILD_TESTS=on -DBUILD_EXAMPLES=on -DCMAKE_BUILD_TYPE=Debug ..
LD_PRELOAD=/usr/lib/gcc/x86_64-linux-gnu/7/libasan.so ASAN_SYMBOLIZER_PATH=/usr/bin/llvm-symbolizer-7 ctest -V
I got:
2: ==6943==ERROR: AddressSanitizer: alloc-dealloc-mismatch (malloc vs operator delete []) on 0x60e000001460
2: #0 0x7f7afc37a0c0 in operator delete[](void*) (/usr/lib/gcc/x86_64-linux-gnu/7/libasan.so+0xdd0c0)
2: #1 0x7f7ae216932a in operator() /data/include/carma/carma/utils.h:98
2: #2 0x7f7ae21692b4 in __invoke /data/include/carma/carma/utils.h:92
2: #3 0x7f7ae2133287 in operator() /data/third_party/pybind11/include/pybind11/detail/../pytypes.h:1168
That means the related bunch of memory has been allocated using malloc
not new
.
Understood !
To deallocate a bunch of memory allocated by armadillo, you cannot assume what allocator it is. It depends on the implementation.
cf https://gitlab.com/conradsnicta/armadillo-code/-/blob/9.900.x/include/armadillo_bits/memory.hpp#L54 .
Thus, I will try to use arma::memory::release
when copy==false
.
Since I've planned to use carma in another project (portable on Linux/Mac/Windows), I prefer to solve this problem without any leak in any OS.
Of course, I was thinking as a stop-gap solution.
Understood !
To deallocate a bunch of memory allocated by armadillo, you cannot assume what allocator it is. It depends on the implementation.
cf https://gitlab.com/conradsnicta/armadillo-code/-/blob/9.900.x/include/armadillo_bits/memory.hpp#L54 .
Thus, I will try to usearma::memory::release
whencopy==false
.
Good point, nooby mistake to assume delete was the appropriate way too free.
What do you think about letting armadillo handle the copy out as well. In that case we can always keep the armadillo matrix/row/... in the capsule and release the memory from armadillo?
In the case of a steal we can move the mat/row/col etc. This might also solve another issue where we can't reliable steal the memory from a Col or Row.
Something like:
template <typename armaT, typename = std::enable_if_t<is_convertible<armaT>::value>>
inline py::capsule create_capsule(const armaT & src) {
using T = typename armaT::elem_type;
armaT data = armaT(src.memptr(), src.n_rows, src.n_cols, true);
return py::capsule(data.memptr(), [](void *f){
T *data = reinterpret_cast<T *>(f);
#ifndef NDEBUG
// if in debug mode let us know what pointer is being freed
std::cerr << "freeing memory @ " << f << std::endl;
#endif
arma::memory::release(data);
});
} /* create_capsule */
template <typename armaT, typename = std::enable_if_t<is_convertible<armaT>::value>>
inline py::capsule create_capsule(armaT && src) {
using T = typename armaT::elem_type;
armaT data = std::move(src);
return py::capsule(data.memptr(), [](void *f){
T *data = reinterpret_cast<T *>(f);
#ifndef NDEBUG
// if in debug mode let us know what pointer is being freed
std::cerr << "freeing memory @ " << f << std::endl;
#endif
arma::memory::release(data);
});
} /* create_capsule */