dmlc/dlpack

Import DLPack tensors directly into NumPy (without going via PyTorch or TF)

vadimkantorov opened this issue · 9 comments

I made an experimental wrapper: https://github.com/vadimkantorov/pydlpack/blob/master/dlpack.py#L107

The most difficult part is managing memory / capsules. Currently it's sort of move-semantics (and deallocation is done in C). I'm sure you'd be able to do it better.

It would be a nice illustration in addition to existing borrowing from NumPy

A more complete usecase of mine: https://github.com/vadimkantorov/readaudio

I guess for proper ref-counting like semantics (so that NumPy doesn't call the deleter too early in presence of other array views) something like weakref would be needed: https://stackoverflow.com/questions/37988849/safer-way-to-expose-a-c-allocated-memory-buffer-using-numpy-ctypes, but not completely sure.

Zero-copy borrowing from numpy is not a difficult issue, it does not have too include weakref or capsule. I have some examples here: https://github.com/dmlc/dlpack/blob/master/apps/from_numpy/main.py.

szha commented

I think for the case of zero-copy into numpy, if the original array doesn't give up the ownership of the data buffer, we do need to make sure that numpy doesn't release the buffer. I thought this would be something that the OWNDATA flag in numpy arrays already deal with (judging from the name) though I haven't look into the details yet.

Yeah. It shouldn't release the buffer and shouldn't call deleter either if there're some other existing arrays (it should also ideally work when torch.from_numpy is called on such a NumPy array)

A quick heads-up: we prototyped a simple pure python library that allows zero-copy between dlpack-compatible array api and numpy ndarray: https://github.com/jwfromm/numpy_dlpack. The lifetime and ownership are properly taken care of if we didn’t miss out anything.

Do you guys think we should contribute the implementation to this repo?

Thanks for sharing @junrushao1994.

Do you guys think we should contribute the implementation to this repo?

I'm not sure that will be helpful in the long run, or if it's worth spending time reviewing if all the corner cases are correct (from a quick scan of your code, I'd say there'll be a few things it doesn't handle). We just need to finish numpy/numpy#19083, which implements DLPack support in NumPy itself.

Thank you @rgommers! Yeah I believe numpy/numpy#19083 is definitely a nicer way to allow numpy to interact with DLPack natively, and of course in the long run we should go all in with the numpy native approach this PR brings :-)

Alternatively, this repo could potentially be a pure python-based example of exchanging data with any numpy-like arrays using DLPack in a non-intrusive way.

Here is my proposal:

  • Contribute dlpack.py to python/dlpack/dlpack.py, so that it could be shared across codebase
  • Contribute from_numpy.py and to_numpy.py to python/dlpack/ so that it could help when numpy's dlpack interface doesn't exist
  • Complete the scripts by detecting if numpy's ndarray has __dlpack__ or from_dlpack APIs. If so, go with the numpy native APIs instead; Otherwise, fall back to this non-intrusive approach

Hmm. I now see that this ctypes example is committed! Good news. One difference with my https://github.com/vadimkantorov/pydlpack/blob/master/dlpack.py#L107 is that my array_interface creation from a DLPack included some sort of calling the wrapped dl_managed_tensor.deleter if the numpy array needed to be destroyed. This piece seems missing from to_numpy.py?

Am seeing this dlpack mention in the NumPy 1.22.0 release notes:

Add NEP 47-compatible dlpack support

Add a ndarray.__dlpack__() method which returns a dlpack C structure wrapped in a PyCapsule. Also add a np._from_dlpack(obj) function, where obj supports __dlpack__(), and returns an ndarray.

(gh-19083)

Given NumPy now supports this, should we close?