RUrlus/carma

Help with Armadillo memory bug in pybind11

kjohnsen opened this issue · 13 comments

Hello, even though this bug doesn't seem to be with Carma itself, I turn to you for help because I'm really stuck and I don't know if the pybind11 or Armadillo communities would be as much help in this.

EDIT: it does appear to be a Carma bug and the MRE is working.

I've been trying to boil it down to an MRE and have been unsuccessful. I will explain what I know here, but since this isn't enough to reproduce the bug, what I need most are some ideas as to where to look. This is basically a very hard-to-reproduce memory corruption error (heap corruption, Windows fatal exception: code 0xc0000374). It seems to happen when I have a matrix A, not initialized, and assign a matrix to it large enough that Armadillo acquires memory. The crash happens when that memory is released.

I'm on Windows 10, using Armadillo 10.6.2 and pybind11 v2.7.1, compiling using Clang 11.

// mat_container.cpp
#include "mat_container.h"

MatrixContainer::MatrixContainer(size_t d) {
    A = arma::Mat<double>(d, d, arma::fill::eye);
    std::cerr << "filled arma matrix\n";
}

Binding code

#include <pybind11/numpy.h>
#include <pybind11/pybind11.h>
#include <armadillo>
#include <carma>
#include "mat_container.h"

PYBIND11_MODULE(example, m) {
  py::class_<MatrixContainer>(m, "MC").def(py::init<size_t, bool>())
       .def_readwrite("A", &MatrixContainer::A);
}

All I need to do to trigger the crash is from Python call example.MC(11) and it crashes upon releasing memory in the matrix destructor of the member variable (not the temporary one assigned to it). Armadillo debug messages right before crash:

@ __cdecl arma::Mat<double>::~Mat(void) [eT = double] [this = 000001569470DBA0]
Mat::destructor: releasing memory

Weird things

  • Only happens when calling from pybind11, not pure C++
  • Only happens when the constructor for the class containing the matrix is in a separately compiled source file. It doesn't happen when the constructor is in the header file.
  • (this part does involve carma) Only happens when assigning a non-identity (e.g., np.ones but not np.eye) matrix to an object's matrix member variable. But this only happens on a class in my source code, not on the smaller test class I created!
  • Only happens when A is large enough for Armadillo to acquire memory instead of use local memory
  • Setting the size of A before assigning to it doesn't solve the issue
  • This appears to be Armadillo-specific, when Armadillo releases memory (confirmed using ARMA_EXTRA_DEBUG), because I don't get a similar error when I use a class with another dynamic memory class like a vector

When A is using acquired memory and is replaced by a matrix using local memory, the crash occurs

mc = MC(11)
mc.A = np.eye(3)

Arma debug messages right before crash, as mc.A is being assigned:

@ void __cdecl arma::Mat<double>::init_warm(arma::uword, arma::uword) [eT = double] [in_n_rows = 3, in_n_cols = 3]
Mat::init(): releasing memory

but not when it's the other way around; this runs successfully. It does not crash as the acquired memory is released as A is destroyed:

mc = MC(3)
mc.A = np.eye(11)

And when an 11x11 A is replaced by another 11x11 matrix from numpy, the crash is again Mat::destructor: releasing memory as in the very first example without assignment from Python. What I deduce from these examples is that the crash is averted when an acquired memory matrix prepared by Carma (not Armadillo) is assigned to a local memory A

Attached files bug1.txt and bug2.txt are the logs from the first example and the failed assignment through Carma

Never mind, this does have something to do with Carma! Something about the way Carma causes Armadillo to get configured leads to the crash. I will confirm by adding Carma to the MRE and reproducing the bug.

Still working on this, but it looks like the code fails because arma_extra_code is on (where it wasn't in the MRE). The culprit seems to be ARMA_ALIEN_MEM functions, set here

@kjohnsen The order of includes appears wrong. This matters, as carma changes the configuration of Armadillo. The carma header must be included before the armadillo header. Change the order to:

#include <carma>
#include <armadillo>
#include <pybind11/numpy.h>
#include <pybind11/pybind11.h>
#include "mat_container.h"

Thank you for helping bring that to my attention. I didn't notice all the warnings about include order and about cnalloc.h when reading the docs.

I do think I've found a bug though, and my MRE is now working. Basically, even with the right include order, the crash occurs. I speculate that carma's memory functions are being used and failing even for the Armadillo stuff going on under the hood, when I'm not using carma converters at all.

@kjohnsen Sorry, I've been away on holiday so couldn't respond earlier.

Can you run with cmake -DCARMA_DEV_DEBUG_MODE=ON or define CARMA_DEV_DEBUG and post the above output again? This will trigger the debugging statements in cnalloc.h which will let us know whether the functions in cnalloc.h are being called.

How were you building before? Because the carma header checks if armadillo has been included before it and raise a compile time error if so.

The program appears to crash only for code compiled in a separate library, not from within the binding module or a header file.

You are most likely mixing the allocation/deallocation functions, if you include carma in one library and not in the other it may happen that you are passing ownership to the other library which does not uses the corresponding deallocator for the allocator used. I.e the one that includes carma uses Numpy's wrapper around malloc and the other will call malloc directly.
This will trigger the exception on Windows.

@conradsnicta Thanks for helping out

I had already been using CARMA_EXTRA_DEBUG but I just tried with DEV_DEBUG_MODE too and I'm not seeing any evidence of the cnalloc functions being called. I'm not seeing any carma debug statements.

How were you building before? Because the carma header checks if armadillo has been included before it and raise a compile time error if so.

Before, I was simply linking to the carma target, not the carma::headers target, which I understand is intended to make things work regardless of the include order by pre-compiling cnalloc.h. Now that I switched to the headers target in the MRE, I do see the compile-time error when I use the wrong include order.

Yes, the mixing scenario seems to be what's happening. Creating the object from the bindings module might use NumPy's allocator whereas the separately compiled library would use the standard free function.

🤔 Except for the fact that none of Carma's memory functions seem to be getting called...

Are you saying that normally it's necessary to include carma everywhere in the project? I'm trying to create bindings for a separate, standalone library. Would I need to include in all its source files?

@kjohnsen I was right, the de-allocation is being done with the deallocation function from CARMA and the allocation is not.
The MRE was not working for me, a linking issue with armadillo and the mc target. The below works and is a bit cleaner approach.

project(pybind-arma-bug LANGUAGES CXX)
cmake_minimum_required(VERSION 3.16 FATAL_ERROR)
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD_REQUIRED TRUE)
if (WIN32)
    set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)
endif()

find_package(Armadillo "10.6" CONFIG REQUIRED)
add_subdirectory(pybind11)
add_subdirectory(carma)

add_library(mc SHARED mat_container.cpp)
# use the commented line to reproduce the not_linked output
# target_link_libraries(mc public armadillo)
target_link_libraries(mc public armadillo carma::carma)
target_compile_definitions(mc PUBLIC ARMA_EXTRA_DEBUG CARMA_EXTRA_DEBUG CARMA_DEV_DEBUG)

pybind11_add_module(pymod MODULE pymod.cpp)
target_link_libraries(pymod PUBLIC mc)

FILE(TOUCH "${PROJECT_BINARY_DIR}/__init__.py")

🤔 Except for the fact that none of Carma's memory functions seem to be getting called...

Have a look at what happens when we don't link carma::carma to the mc target:

log_not_linked.txt

When we link carma to the other target it works because we pre-compile the header
I only ran the last test as the first two test work as expected without the linking fix.

log_linked.txt

Are you saying that normally it's necessary to include carma everywhere in the project? I'm trying to create bindings for a separate, standalone library. Would I need to include in all its source files?

No you don't need to include but you have to link carma against the standalone library at compilation time

@conradsnicta This is an issue for projects that link against an external library using armadillo but not linked with carma.

It appears that the allocation of the memory is handled by code from external library, i.e. without ARMA_ALIEN_MEM_ALLOC_FUNCTION set, but destructed using the code compiled with ARMA_ALIEN_MEM_FREE_FUNCTION from the lib that includes carma.

Any ideas how we could avoid this other than requiring that the external lib is linked with carma at compile time?

@kjohnsen Let's close this issue if you don't mind, the title is not correct as this is a CARMA issue and not an Armadillo issue.
We can continue the discussion if needed in #91.

@RUrlus Ok, that is all working for me now. Many thanks! I don't know why I wasn't getting carma debug statements after using both CARMA_EXTRA_DEBUG and -DCMAKE_DEBUG_DEV_MODE=ON, but after using the compile definition CARMA_DEV_DEBUG like you did I'm seeing them now. This has been quite the debugging rabbit hole, but I'm glad to see the fix is so simple.

@kjohnsen Good to hear, I can imagine it was quite the rabbit hole. A type of bug that is quite hard to find unless you very familiar with the internals of the library. Bad timing that I was away.

@RUrlus I probably should have asked for help sooner. I didn't even think it was a Carma problem at first and without an MRE I wasn't sure anyone would be able to help.