[Feature Request] memoryview builtin and support for python buffer protocol
Opened this issue · 5 comments
Review Mojo's priorities
- I have read the roadmap and priorities and I believe this request falls within the priorities.
What is your request?
This enhancement request is to add support for Python's memoryview
builtin and support for python buffer protocol. Here are some ideas about what kind of tasks and level of effort might be involved:
- Add a new Mojo trait (
Bufferable
?) which has dunder methods:__buffer__
and__release_buffer__
. - Add support for python's builtin
memoryview()
on Mojo structs.__buffer__
returnsmemoryview
so this has to be builtin to Mojo (not a python module import). - Add support for the C data interface defined here python buffer protocol. This would allow Mojo structs implementing the C data interface as
Py_buffer
https://docs.python.org/3/c-api/buffer.html to be called from Python. Or maybe they could be wrapped in a PythonObject and returned as amemoryview
?
What is your motivation for this change?
Currently Mojo 0.6 has poor (nonexistent?) support for zero-copy shared memory buffers with Python.
For example in Mojo's documentation the Ray Tracing notebook has an example of raster imagery being copied into a numpy array, using MLIR ops. Not only is this an unnecessary memory copy, it's also too verbose, undocumented, and not pythonic. See def to_numpy_image(self) -> PythonObject:
in source notebook.
Mojo should enable and encourage interop with existing scientific computing packages in the most efficient manner. For example the Apache Arrow format.
The Arrow C data interface is inspired by the Python buffer protocol, which has proven immensely successful in allowing various Python libraries exchange numerical data with no knowledge of each other and near-zero adaptation cost. Arrow Spec
This enhancement would also lay the groundwork for supporting the Python array API standard.
Any other details?
Related Discussions/Issues:
Reference PEPs:
As a struct, it should be named MemoryView
. Please be consistent and avoid Python's mess in naming!
Good suggestion! The naming is a bit confusing- there is the type Py_buffer at the C level, MemoryView in Python land, and memoryview() constructor, also in Python land. Definitely would not want to add new names or concepts if that can be avoided.
Also, I thought maybe this python example with comments may help to illustrate the idea little more:
# made up example (chatbot)
import array
arr = array.array('i', [1, 2, 3, 4, 5])
mem_view = memoryview(arr)
# Access properties of the memoryview
print(mem_view.nbytes)
print(mem_view.itemsize)
# Indexing and slicing like NumPy array
print(mem_view[0])
print(mem_view[-1])
print(mem_view[1:3])
# Iterate through the memoryview
for num in mem_view:
print(num)
# Get a NumPy array from the memoryview
import numpy as np
num_arr = np.frombuffer(mem_view, dtype=np.int32)
print(num_arr)
output
20
4
1
5
<memory at 0x1011590c0>
1
2
3
4
5
[1 2 3 4 5]
I think this enhancement would open up numerous use cases like:
- Mojo <-> C ABI
- Mojo <-> Python modules/packages
- Mojo <-> Python <-> C/Rust/Fortran etc backed packages
I am aware, that I am quite pedantic, but if Mojo would like to implement this, it will be IMHO better to just sacrifice one character more and name this constructor "memory_view". I don't like Python's style to blend words together without any separator. Keeping names strongly synchronized with Python is also not the best, cause it will also require to directly follow its behaviour which may be painful in some cases.
If Mojo will be Python++ instead of its compiled copy, it will gain its own identity and this small improvements will be in this case very noticeable
Linking to a neat related project here: Arrow implementation in Mojo https://github.com/kszucs/firebolt
It unlocks the case where mojo is the consumer of arrow data structures.