Request tensor factory methods accepting Span<T>

Question

Request tensor factory methods accepting Span<T>

dje-dev opened this issue 7 months ago · 9 comments

Currently it is apparently required to provide an exactly sized array to the tensor static factory methods.

In the case where (for example) tensors are being created with variable batch sizes, the size of the input will vary. With the current design, it is necessary to allocate a new array appropriate for each batch (and one cannot use a pooled object which may be oversized).

Instead, callers may prefer to allocate a large fixed managed array and then only pass the appropriately sized Span over this array to the factor method, thereby saving an object allocation on every call.

Example given below of the suggested API enhancement (to be replicated across the various data types)
EXISTING: public static Tensor tensor(short[] rawArray, ...
DESIRED: public static Tensor tensor(ReadOnlySpan<Span> rawSpan, ...

Answer 1 · 2024-06-11T19:19:09.000Z

In order to create a tensor from the span, we would have to be able to pin it in memory, and GCHandle.Alloc() won't accept a Span, as far as can tell. We can add the overloads as a convenience, but I believe the allocation (and copy) is going to have to happen.

Answer 2 · 2024-06-12T01:40:17.000Z

Good point, Span doesn't quite work.

Instead one or both of these approaches could seemingly avoid one full memory copy of the full tensor to be loaded onto GPU:

Keep current signature but relax to accept oversized T[] relative to the provided shape, instead of throwing System.ArgumentException: 'mismatched total size creating a tensor from an array'
Follow the ONNX pattern of supporting this over Memory (which has a Pin method):
- public static OrtValue CreateTensorValueFromMemory(OrtMemoryInfo memoryInfo, Memory memory, long[] shape) where T : unmanaged

Answer 3 · 2024-06-12T14:08:58.000Z

We can easily accommodate the first method by just relaxing the constraints. Memory can be pinned, but I don't know how to get a pointer to the underlying array without copying, which is needed in order to create a native tensor, so that requires more thought.

Answer 4 · 2024-06-12T14:28:08.000Z

Addressed in PR #1329

Answer 5 · 2024-06-12T15:45:10.000Z

@NiklasGustafsson

I would suggest an extra option to control this behavior, instead of cutting it implicitly.

Meanwhile just cutting the "tail" may be far from expectation I think. To let it make sense, at least an option for start indices is required (since the issue is talking about "spans").

I know it's maybe... impossible currently. So actually I suggest to leave this issue "not planned"/"not current" and revert the changes.

Answer 6 · 2024-06-12T16:02:49.000Z

Hmmm... I really liked the idea of allowing the underlying buffer to be bigger than needed. Low cost, very low risk.

To handle the starting offset > 0, you could just slice it without any copying: t[offset..] after you create it. It allocates a new managed Tensor, but does not copy the buffer.

Answer 7 · 2024-06-12T17:04:31.000Z

Ahhh... I got that. It do solve the original case of this issue and could be useful in some other similar situations.

Answer 8 · 2024-06-13T15:47:25.000Z

I figured out how to get the address out of a Memory<T>. Should have read the docs better... :-) I'm reopening and will add some Memory overloads. Probably not as many different variants as for T[]

Answer 9 · 2024-06-13T19:19:32.000Z

Okay, so both of @dje-dev suggestions have been implemented and merged. Thanks for the ideas!