RUrlus/carma

Can't modify matrix as a side effect (but don't know if this is supposed to be possible)

kjohnsen opened this issue · 6 comments

First, a clarifying question: is it supposed to be possible to modify a matrix in-place (after borrowing from NumPy) and have those changes reflected in NumPy without returning, that is, as a function side effect? This is what I understood from the "Well behaved" section of the docs:

This is done to ensure the design that pattern that borrowed arrays don’t require a return to reflect the changes.

I'm not totally sure though since the "borrow" section of the "Design Patterns" section mentions that with borrowing you always need to copy out.

I've been unsuccessfully trying to do this (i.e., I make changes to a Mat& after converting to Armadillo and I don't see them back in NumPy) and if this is supposed to be possible I can set up an MRE.

Yes you can modify in-place, at least with the manual conversion you always could. There was a bug in automatic conversion where an assignment triggered a copy rather than a move assignment even though it was an r-value. This is fixed in bfe4324.

What I mean in the documentation regarding the borrow and copying out is that you're not supposed to return the borrowed array as then you have two Numpy arrays pointing to the same memory and one will destruct the memory before the other.

Ok, now that I know this is supported I'll try to make an MRE

Ok, got some weird stuff--apparently the same thing on both stable and unstable branches.

Here's the binding code:

#include <pybind11/numpy.h>
#include <pybind11/pybind11.h>
#include <armadillo>
#include <carma>

// MRE for modifying as side-effect
void modify_arr(py::array_t<double>& arr) {
  arma::Mat<double> mat = carma::arr_to_mat(arr); // <-- I believe I got all the borrow syntax right
  mat(0,0) = 42;
  // std::cout << mat;  // <-- confirmed that the first element was getting set to 42 as expected
}

PYBIND11_MODULE(example, m) {
  m.def("modify_arr", &modify_arr);
}

Here's what happened in a Python interpreter. Looks like identity matrices get mangled and others remain un-modified:

>>> import numpy as np, example
>>> x = np.eye(3)
>>> example.modify_arr(x)
>>> x
array([[1.00000000e+000, 1.35675106e-312, 8.08686649e-320],
       [1.77658241e-307, 5.29980882e-315, 3.11261357e-322],
       [3.47328271e-310, 2.07023782e-317, 0.00000000e+000]])
>>> y = np.eye(3)
>>> example.modify_arr(y)
>>> y
array([[1.00000000e+000, 1.35675106e-312, 8.08686649e-320],
       [1.77658241e-307, 5.29980882e-315, 3.11261357e-322],
       [3.47328271e-310, 2.07023782e-317, 0.00000000e+000]])
>>> y = np.ones((3,3))
>>> example.modify_arr(y)
>>> y
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
>>> y=np.array([[1,2],[3,4]])
>>> example.modify_arr(y)
>>> y
array([[1, 2],
       [3, 4]])

Thanks for the MRE, I could reproduce. A bit surprised this slipped through considering the number of testing there is.
It's fixed in the unstable branch now.

If you're curious, for some reason I decided that we needed to copy if the array was smaller than Armadillo's pre-allocation limits whilst that only needs to happen when the is also stolen.
The garbage values in the first two examples happened due some very strange stride issue for c-style arrays.

@RUrlus For safety reasons, perhaps the default operation should be copying data when going between Python and Armadillo.

Copying data is a relatively "cheap" operation, in contrast to operations such as matrix multiplication and SVD. I suspect that in practice the overhead of copying would be minimal. For folks that do want the extra speed, perhaps there should be an option to enable a "don't copy data" mode.

@conradsnicta You're right its relatively cheap but it prevents borrowing on the conversion in which is a quite common pattern. Perhaps the default on the way out should become to copy but it's quite a change.

For integration into MLPack I need to make some changes regarding the copy behaviour anyway so I might include a mode as you suggested.