Request tensor factory methods accepting Span<T>
dje-dev opened this issue · 9 comments
Currently it is apparently required to provide an exactly sized array to the tensor static factory methods.
In the case where (for example) tensors are being created with variable batch sizes, the size of the input will vary. With the current design, it is necessary to allocate a new array appropriate for each batch (and one cannot use a pooled object which may be oversized).
Instead, callers may prefer to allocate a large fixed managed array and then only pass the appropriately sized Span over this array to the factor method, thereby saving an object allocation on every call.
Example given below of the suggested API enhancement (to be replicated across the various data types)
EXISTING: public static Tensor tensor(short[] rawArray, ...
DESIRED: public static Tensor tensor(ReadOnlySpan<Span>
rawSpan, ...
In order to create a tensor from the span, we would have to be able to pin it in memory, and GCHandle.Alloc() won't accept a Span, as far as can tell. We can add the overloads as a convenience, but I believe the allocation (and copy) is going to have to happen.
Good point, Span doesn't quite work.
Instead one or both of these approaches could seemingly avoid one full memory copy of the full tensor to be loaded onto GPU:
- Keep current signature but relax to accept oversized T[] relative to the provided shape, instead of throwing System.ArgumentException: 'mismatched total size creating a tensor from an array'
- Follow the ONNX pattern of supporting this over Memory (which has a Pin method):
- public static OrtValue CreateTensorValueFromMemory(OrtMemoryInfo memoryInfo, Memory memory, long[] shape) where T : unmanaged
We can easily accommodate the first method by just relaxing the constraints. Memory can be pinned, but I don't know how to get a pointer to the underlying array without copying, which is needed in order to create a native tensor, so that requires more thought.
Addressed in PR #1329
I would suggest an extra option to control this behavior, instead of cutting it implicitly.
Meanwhile just cutting the "tail" may be far from expectation I think. To let it make sense, at least an option for start indices is required (since the issue is talking about "spans").
I know it's maybe... impossible currently. So actually I suggest to leave this issue "not planned"/"not current" and revert the changes.
Hmmm... I really liked the idea of allowing the underlying buffer to be bigger than needed. Low cost, very low risk.
To handle the starting offset > 0, you could just slice it without any copying: t[offset..]
after you create it. It allocates a new managed Tensor, but does not copy the buffer.
Ahhh... I got that. It do solve the original case of this issue and could be useful in some other similar situations.
I figured out how to get the address out of a Memory<T>
. Should have read the docs better... :-) I'm reopening and will add some Memory overloads. Probably not as many different variants as for T[]
Okay, so both of @dje-dev suggestions have been implemented and merged. Thanks for the ideas!