Would like memoryView access of a buffer that is mapped for writing.

Question

Would like memoryView access of a buffer that is mapped for writing.

fyellin opened this issue 10 months ago · 7 comments

While wgpu-py provides several methods for writing to a mapped-buffer, it seems to avoid providing a direct memoryview of the buffer. The method get_mapped_range() has specifically not been implemented.

There are times when it would be useful to be able to write array=np.from_buffer(...) on a buffer that's mapped at creation. This gives me the full power of numpy to create the contents of the buffer without the costs of copying.

Experimental support.
Come up with API.
Implement API.
Maybe rename copy arg in buffer.read_mapped() to something like "return_mapped_array"?
In docs, mention the risks, that these mapped arrays are potentially slow, and recommend adding a comment.

Answer 1 · 2024-06-04T09:06:21.000Z

We've deliberately not made this possible because when the buffer is unmapped that memoryview becomes invalid, but there is no way in Python to stop the user from using it. When they do, segfaults can happen.

I also get the wish to avoid data copies though ... I'm working on benchmark related to buffer updates. I'll also look into how a properly mapped buffer would improve performance ...

Answer 2 · 2024-06-11T12:34:10.000Z

I've added experimental support in #522. That way people can experiment with this feature and see how much it really makes things faster.

I've also done a few benchmarks (part of a larger set of benchmarks for buffer uploads). From what I can see the applications where this approach benefits performance is slim. https://github.com/pygfx/pygfx-benchmarks/blob/main/results/bm_wgpu_buffer_upload.md#setting-data-to-a-computation-result

If you know other use-cases that benefit this API, please let us know!

We can leave this issue open to further discuss this API and its use.

Answer 3 · 2024-06-11T17:19:37.000Z

I will look at this and see if I find anything.

I did notice that this won't be the only place that the API is exposing the user to an unsafe memory view. Even in the current system, I can write:

    array = np.frombuffer(buffer.readmapped(copy=False))
    buffer.unmap()

I do have a use case where I'm reading a large binary STL file and putting the vectors and normals into np arrays which are then copied to a buffer. I can see if there is any speed improvement writing directly to the buffer.

Answer 4 · 2024-06-11T19:33:27.000Z

Here is at least one use case where there is an improvement. I create a fake STL file of 20 million triangles. I then read in the STL file and convert it into vertex buffers for the triangles and the normals.

In STLModel.from_file, I assemble the information into a numpy array, and then transfer that to a buffer using create_buffer_with_data.

In STLModel.from_file_alt, I generate two buffers mapped_at_creation=True, get their memory views, and wrap those memory views into a numpy array. I then put the information directly into those numpy arrays.

My times were an average of 1.94s for the original code and 1.53s for the "alternate" code.

I did have to make some small changes to the alt version. Originally, I did:

     vertices[:] = info['vertices']  # copy normal data into buffer
     model_scale = np.max(np.abs(verticies))
     vertices *= scale / model_scale

making the code look like the original code. This turned out to be incredibly slow. Reading from and writing to the memory-mapped buffer is slow. Finding the maximum and scaling needs to happen before the single write.

File has been renamed test_experimental.txt because Git doesn't let me upload a .py file.
test_experimental.txt

Answer 5 · 2024-06-12T07:05:35.000Z

did notice that this won't be the only place that the API is exposing the user to an unsafe memory view.

Indeed there is buffer.readmapped(copy=False).

Reading from and writing to the memory-mapped buffer is slow.

Good point! Would be worth mentioning this in the docs too. I started a little checklist in the top post.

What kind of GPU are you using? In the benchmarks I did, I saw that moving data is slower for discrete GPU's (integrated GPU's being faster because of the shared memory).

Answer 6 · 2024-06-12T17:33:07.000Z

I'm running Python 3.12 on Apple Metal. Chipset Model: Apple M1 Pro Type: GPU Bus: Built-In Total Number of Cores: 16 Vendor: Apple (0x106b) Metal Support: Metal 3 If you decide not to include this, it's really not that big a deal. I don't think I'm going to have to deal with STL models that large, and for smaller sizes, the difference really was negligible. It just felt strange to me that you were worried about safety with reading, but not with writing.

…

Message ID: ***@***.***>

Answer 7 · 2024-06-13T07:04:46.000Z

It just felt strange to me that you were worried about safety with reading, but not with writing.

For reading it was easy to add an argument that must be set explicitly. But in hindsight, I think I'd like to rename that copy arg to something more ominous 😄 to make it clear that the code is entering a danger zone.

Glad to hear that it does not matter much for you. But I have a feeling that sooner or later someone will care. I'm now more open to providing mapped arrays, as long as its obvious from the code that a mapped buffer is being used.