SciSharp/Numpy.NET

Numpy.np.vstack/reshape throws memory errors/exceptions (multi-threaded)

bbn-bert opened this issue · 4 comments

I've read through the multi-threading sections of the readme, and I believe I am following everything correctly. I am using Keras.NET to do model prediction, but always get hard crashes in the python interpreter that bring down my application despite having unhandled exception callbacks.

These errors are nondeterministic despite always happening (need to process for a while before it happens) and manifest in one of two ways, either 0xC0000005 (access violation) or 0xC0000409 (stack buffer overrun). However, they are always within Numpy.np.vstack(...) or Numpy.np.reshape(...) and usually PyTuple.ctor() (guessing because numpy makes use of tuple objects a lot?). Some times the 409 errors are access violations on unpacks,

From my understanding, Keras will also initialize Numpy.NET by allocating NDarray's as part of the model load (please correct me if I'm wrong there). My (simplified) initialization code is below.

Machine setup
Windows 10 64 bit
Python 3.7.9
Keras.NET 3.7.4.2
Numpy.Bare 3.7.1.4
Tensorflow.NET 0.20.0.0

From pip list
Keras 2.4.3
numpy 1.18.5
tensorflow 2.2.0

// main thread initialization
using (Py.GIL())
{
    model = Keras.Models.BaseModel.LoadModel(modelPath);
    PythonEngine.Exec(@"import sys; sys.setrecursionlimit(10000)");
}
PythonEngine.BeginAllowThreads();

// continue to kick off processing threads

The setrecursionlimit(10000) was an attempt to fix the 409 error, but it did not. I increased this to 50000 and can still reliably hit the errors.

The gist of what is being done is below. Each NDarray object is roughly 2000 elements (e.g. shape = 200x10), with roughly 100 NDarray objects in the data array, across 20 or so threads. Each thread calls PredictAllData wrapped in Py.GIL().

void PredictAllData(NDarray[] data, BaseModel model)
{
    foreach (NDarray a in data)
    {
        a.reshape(new int[] {10, 200});
        np.expand_dims(a, 0); // add leading dimension for vstack
    }
    NDarray preppedData = np.vstack(data); // error here
    model.PredictOnBatch(preppedData); // or error here
}

void WorkerThread()
{
    BaseModel threadSpecificModel;
    using (Py.GIL())
    {
        threadSpecificModel = Keras.Models.BaseModel.LoadModel(@"C:\path\to\model.savedmodel");
    }

    while (True)
    {
        using (Py.GIL())
        {
            NDarray[] nextData = LoadNextBatch();
            PredictAllData(nextData, threadSpecificModel);
        }
    }
}

Here are the relevant stack traces gathered from crash dumps.

C0000005 error stack traces

C# stack trace


DomainBoundILStubClass.IL_STUB_PInvoke(IntPtr, IntPtr, IntPtr)+84
--
[[InlinedCallFrame] (Python.Runtime.Runtime.PyObject_Call)] Python.Runtime.Runtime.PyObject_Call(IntPtr, IntPtr, IntPtr)
Python.Runtime.PyObject.Invoke(Python.Runtime.PyTuple, Python.Runtime.PyDict)+20
Python.Runtime.PyObject.InvokeMethod(System.String, Python.Runtime.PyTuple, Python.Runtime.PyDict)+25
Numpy.np.vstack(Numpy.NDarray[])+8c

... TRIMMED AT PredictAllData()

Full call stack

[0x7ffbce1ba43b] | _multiarray_umath_cp37_win_amd64+3a43b |  
-- | -- | --
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1ba4b4] | _multiarray_umath_cp37_win_amd64+3a4b4 |  
[0x7ffbce1bae9c] | _multiarray_umath_cp37_win_amd64+3ae9c |  
[0x7ffbce1c7c29] | _multiarray_umath_cp37_win_amd64+47c29 |  
[0x7ffbce1c811a] | _multiarray_umath_cp37_win_amd64+4811a |  
[0x7ffbce1c62cf] | _multiarray_umath_cp37_win_amd64+462cf |  
[0x7ffbce24c050] | _multiarray_umath_cp37_win_amd64!PyInit__multiarray_umath+6b00 |  
[0x7ffbcef8bdbd] | python37!PyMethodDef_RawFastCallKeywords+34d |  
[0x7ffbcef8b8a3] | python37!PyArg_UnpackStack+3d3 |  
[0x7ffbcef9a48b] | python37!PyEval_EvalFrameDefault+215b |  
[0x7ffbcef8c0fc] | python37!PyEval_EvalCodeWithName+1ac |  
[0x7ffbcef8b95c] | python37!PyArg_UnpackStack+48c |  
[0x7ffbcef98a6e] | python37!PyEval_EvalFrameDefault+73e |  
[0x7ffbcef8c0fc] | python37!PyEval_EvalCodeWithName+1ac |  
[0x7ffbcef8b37f] | python37!PyFunction_FastCallDict+1cf |  
[0x7ffbcefabd57] | python37!PyObject_Call+d3 |  
[0x7ffbce1aec2b] | _multiarray_umath_cp37_win_amd64+2ec2b |  
[0x7ffbcef8be14] | python37!PyMethodDef_RawFastCallKeywords+3a4 |  
[0x7ffbcef8b8a3] | python37!PyArg_UnpackStack+3d3 |  
[0x7ffbcef98a6e] | python37!PyEval_EvalFrameDefault+73e |  
[0x7ffbcef8c0fc] | python37!PyEval_EvalCodeWithName+1ac |  
[0x7ffbcef8b37f] | python37!PyFunction_FastCallDict+1cf |  
[0x7ffbcef9c180] | python37!PyEval_EvalFrameDefault+3e50 |  
[0x7ffbcef8b294] | python37!PyFunction_FastCallDict+e4 |  
[0x7ffbcefabd57] | python37!PyObject_Call+d3 |  
[0x7ffbce1aec2b] | _multiarray_umath_cp37_win_amd64+2ec2b |  
[0x7ffbcef8be14] | python37!PyMethodDef_RawFastCallKeywords+3a4 |  
[0x7ffbcef8b8a3] | python37!PyArg_UnpackStack+3d3 |  
[0x7ffbcef98a6e] | python37!PyEval_EvalFrameDefault+73e |  
[0x7ffbcef8c0fc] | python37!PyEval_EvalCodeWithName+1ac |  
[0x7ffbcef8b37f] | python37!PyFunction_FastCallDict+1cf |  
[0x7ffbcefabd57] | python37!PyObject_Call+d3 |  
[0x7ffb92533074] | DomainBoundILStubClass.IL_STUB_PInvoke(IntPtr, IntPtr, IntPtr)+84 |  
[0x7ffb927b2e70] | Python.Runtime.PyObject.Invoke(Python.Runtime.PyTuple, Python.Runtime.PyDict)+20 |  
[0x7ffb927b2e05] | Python.Runtime.PyObject.InvokeMethod(System.String, Python.Runtime.PyTuple, Python.Runtime.PyDict)+25 |  
[0x7ffb92cbccec] | Numpy.np.vstack(Numpy.NDarray[])+8c

... TRIMMED AT PredictAllData()

C0000409 error call stacks

C# call stack


DomainBoundILStubClass.IL_STUB_PInvoke(IntPtr, IntPtr, IntPtr)+84
--
[[InlinedCallFrame] (Python.Runtime.Runtime.PyObject_Call)] Python.Runtime.Runtime.PyObject_Call(IntPtr, IntPtr, IntPtr)
Python.Runtime.PyObject.Invoke(Python.Runtime.PyObject[])+3a
Python.Runtime.PyObject.InvokeMethod(System.String, Python.Runtime.PyObject[])+1e
Python.Runtime.PythonException..ctor()+2a6
Python.Runtime.PyObject.Invoke(Python.Runtime.PyTuple, Python.Runtime.PyDict)+61
Python.Runtime.PyObject.InvokeMethod(System.String, Python.Runtime.PyTuple, Python.Runtime.PyDict)+25
Numpy.np.reshape(Numpy.NDarray, Numpy.Models.Shape, System.String)+c7

... TRIMMED AT PredictAllData()

Full call stack

[0x7ffc089c286e] | ucrtbase!abort+4e |  
-- | -- | --
[0x7ffbd0071bb7] | python37!Py_RestoreSignals+14b |  
[0x7ffbd00712ba] | python37!Py_FatalError+12 |  
[0x7ffbcff53a55] | python37!PyErr_NoMemory+15309 |  
[0x7ffbcfefb6e7] | python37!PyArg_UnpackStack+217 |  
[0x7ffbcff08484] | python37!PyEval_EvalFrameDefault+154 |  
[0x7ffbcfefc0fc] | python37!PyEval_EvalCodeWithName+1ac |  
[0x7ffbcfefb95c] | python37!PyArg_UnpackStack+48c |  
[0x7ffbcff08a6e] | python37!PyEval_EvalFrameDefault+73e |  
[0x7ffbcfefb841] | python37!PyArg_UnpackStack+371 |  
[0x7ffbcff092c8] | python37!PyEval_EvalFrameDefault+f98 |  
[0x7ffbcfefc0fc] | python37!PyEval_EvalCodeWithName+1ac |  
[0x7ffbcfefb95c] | python37!PyArg_UnpackStack+48c |  
[0x7ffbcff08a6e] | python37!PyEval_EvalFrameDefault+73e |  
[0x7ffbcfefc0fc] | python37!PyEval_EvalCodeWithName+1ac |  
[0x7ffbcfefb95c] | python37!PyArg_UnpackStack+48c |  
[0x7ffbcff08a6e] | python37!PyEval_EvalFrameDefault+73e |  
[0x7ffbcfefc0fc] | python37!PyEval_EvalCodeWithName+1ac |  
[0x7ffbcfefb95c] | python37!PyArg_UnpackStack+48c |  
[0x7ffbcff092c8] | python37!PyEval_EvalFrameDefault+f98 |  
[0x7ffbcfefb294] | python37!PyFunction_FastCallDict+e4 |  
[0x7ffbcff32d8b] | python37!PyMapping_Items+33f |  
[0x7ffbcff05ca1] | python37!PyObject_GenericGetAttrWithDict+931 |  
[0x7ffbcff089f2] | python37!PyEval_EvalFrameDefault+6c2 |  
[0x7ffbcfefc0fc] | python37!PyEval_EvalCodeWithName+1ac |  
[0x7ffbcfefb95c] | python37!PyArg_UnpackStack+48c |  
[0x7ffbcff0a48b] | python37!PyEval_EvalFrameDefault+215b |  
[0x7ffbcfefc0fc] | python37!PyEval_EvalCodeWithName+1ac |  
[0x7ffbcfefb95c] | python37!PyArg_UnpackStack+48c |  
[0x7ffbcff0a48b] | python37!PyEval_EvalFrameDefault+215b |  
[0x7ffbcfefc0fc] | python37!PyEval_EvalCodeWithName+1ac |  
[0x7ffbcfefb37f] | python37!PyFunction_FastCallDict+1cf |  
[0x7ffbcff1bd57] | python37!PyObject_Call+d3 |  
[0x7ffb925872c4] | DomainBoundILStubClass.IL_STUB_PInvoke(IntPtr, IntPtr, IntPtr)+84 |  
[0x7ffb92ceb2ea] | Python.Runtime.PyObject.Invoke(Python.Runtime.PyObject[])+3a |  
[0x7ffb92ceb26e] | Python.Runtime.PyObject.InvokeMethod(System.String, Python.Runtime.PyObject[])+1e |  
[0x7ffb927e9456] | Python.Runtime.PythonException..ctor()+2a6 |  
[0x7ffb927ebd01] | Python.Runtime.PyObject.Invoke(Python.Runtime.PyTuple, Python.Runtime.PyDict)+61 |  
[0x7ffb927ebc55] | Python.Runtime.PyObject.InvokeMethod(System.String, Python.Runtime.PyTuple, Python.Runtime.PyDict)+25 |  
[0x7ffb92cefb97] | Numpy.np.reshape(Numpy.NDarray, Numpy.Models.Shape, System.String)+c7


... TRIMMED AT PredictAllData()
henon commented

Hmm, I don't see what I can do to help here. If you would manage to show that np.vstack really has a problem when used on its own (not withing Keras.NET) and you can give me a code snippet that when I run it I can reproduce the problem, then I could help.

But I won't bother with debugging entire applications, just short self-contained code snippets which deterministically reproduce a problem.

I've been trying to create a reproduce-able small code snippet that mimics the entire application's use of numpy, but it does not appear to create the same problem. I'm still not sure where the issue is stemming from within numpy (if it even is), other than python is always throwing no-memory errors when doing numpy operations, typically np.vstack, np.reshape, and np.expand_dims.

I've been trying to find documentation on whether using-statements should be used with numpy.net since all NDarray objects inherit from disposable PyObjects. Are the below functionally equivalent? I understand the using block will dispose of myArray immediately where the second will be left to the garbage collector, but want to verify if using-statements are the correct thing to do/thread safe and not using a using-statement does not result in memory leaks.

using (NDarray myArray = np.zeros(new int[] { 10, 20 }))
{
    // do something with myArray...
}

// or...

NDarray myArray = np.zeros(new int[] { 10, 20 });
// do something with myArray...
henon commented

They are equivalent with respect to the eventual disposal of unmanaged python resources but of course if you dispose manually with using you might conserve memory in a more efficient way.

If you look at PyObject.cs in pythonnet you'll find the answers to your questions:

    /// <summary>Dispose Method</summary>
    /// <remarks>
    /// The Dispose method provides a way to explicitly release the
    /// Python object represented by a PyObject instance. It is a good
    /// idea to call Dispose on PyObjects that wrap resources that are
    /// limited or need strict lifetime control. Otherwise, references
    /// to Python objects will not be released until a managed garbage
    /// collection occurs.
    /// </remarks>
    protected virtual void Dispose(bool disposing)
    {
      if (this.obj == IntPtr.Zero)
        return;
      if (Python.Runtime.Runtime.Py_IsInitialized() == 0)
        throw new InvalidOperationException("Python runtime must be initialized");
      if (!Python.Runtime.Runtime.IsFinalizing)
      {
        if (Python.Runtime.Runtime.Refcount(this.obj) == 1L)
        {
          IntPtr ob;
          IntPtr val;
          IntPtr tb;
          Python.Runtime.Runtime.PyErr_Fetch(out ob, out val, out tb);
          try
          {
            Python.Runtime.Runtime.XDecref(this.obj);
            Python.Runtime.Runtime.CheckExceptionOccurred();
          }
          finally
          {
            Python.Runtime.Runtime.PyErr_Restore(ob, val, tb);
          }
        }
        else
          Python.Runtime.Runtime.XDecref(this.obj);
      }
      this.obj = IntPtr.Zero;
    }

    public void Dispose()
    {
      this.Dispose(true);
      GC.SuppressFinalize((object) this);
    }

    ~PyObject()
    {
      if (this.obj == IntPtr.Zero)
        return;
      Finalizer.Instance.AddFinalizedObject((IPyDisposable) this);
    }

Even if you don't call Dispose it will eventually be called for you by pythonnet by the Finalizer but that might be a little late if you are short on unmanaged memory.

As for your problems you might try to circumvent Numpy.NET entirely and directly call pythonnet which is easily possible, it just is not the same comfort level and see if the problems persist. If so - and this is what I suspect - then Numpy.NET is not responsible.

Closing, the issue is specific to my use case and environment and not Numpy.net.