Storage Size Computations Could Detect Duplicates

Question

Storage Size Computations Could Detect Duplicates

Opened this issue 4 years ago · 3 comments

insertinterestingnamehere commented 4 years ago

While it's generally an unreasonable pain to detect arbitrary overlap of numpy array objects, the storage_size helper routine could potentially check for duplicate objects. Do we want to do this?

Answer 1 · 2021-04-07T19:21:46.000Z

Yea. I thought about how it could be hard, but I think at least checking for obvious overlap like the exact same data pointer and size. Also I just realized I don't know if storage_size returns the size of the viewed data or the size of the underlying buffer when passed a view. If the semantics are to return the size of the underlying buffer than views and slices can be fairly easily deduplicated by just removing duplicate "base pointers". It's not total overlap detections, but I think it might map to a natural view of how much space some data takes anyway.

Answer 2 · 2021-04-09T23:48:38.000Z

On the other hand not detecting duplicates provides an easy way for a user to say "I'm also going to allocate something like this other thing."

Answer 3 · 2021-04-09T23:54:54.000Z

They can always write storage_size(a) + storage_size(a) and I think that makes it much clearer than randomly having multiple instances of the same array in the list.