dastrobu/NdArray

Subscript of NdArray should be of type NdArray

Closed this issue · 5 comments

As it is, it's not possible to copy the contents of an NdArray to a slice of another array using the subscript, since the subscript expects an NdArraySlice.

For example, the following pattern cannot be implemented using the subscript:

// Say you want to perform some computation a number of times and store
// all the results in a single array.

let n = 10
let array = NdArray<Float>.empty(shape: [n, 128])

for i in 0 ..< n {
    // The result of the computation.
    let result = NdArray<Float>.zeros(128)
    
    // ERROR: Cannot assign value of type 'NdArray<Float>' to type 'NdArraySlice<Float>'
    array[i] = result
    
    // This works on the other hand:
    result.copyTo(array[i])
}

If the subscripts had a public type of NdArray (while still returning an NdArraySlice) this issue would be fixed, but now the API wouldn't explicitly convey that the subscript returns a slice as opposed to a standalone array.

Also, I tried changing the types and found that apply.swift and possibly other files depend on the subscripts returning a NdArraySlice, so it's not entirely straightforward what the solution should be.

Just some quick thoughts, the ERROR: Cannot assign value of type 'NdArray<Float>' to type 'NdArraySlice<Float>' could be easily fixed by

array[i] = result[...]

The reason an array slice needs to be created is that you cannot implement an assignment subscript only. Meaning I would love to implement

    public subscript(i: Int) -> NdArray<T> {
        set {
            newValue.copyTo(NdArray(self[i]))
        }
    }

But this gives error: subscript with a setter must also have a getter. That is the reason why an array slice must always be created when dealing with subscripts.
The reason that subscript must return a different type than NdArray is the following.
When dealing with multidimensional arrays one wants to subscript dimensions in different forms. E.g.

let b = A[...][0][0..<5]

My first thought was to implement a subscript with varargs which would lead to a syntax like

let b = A[..., 0, 0...<5]

Here each parameter of the can be either be UnboundedRange, Int, PartialRangeUpTo and so on. This approach does not fit very well into the type system. Also it would not be clear how to add strides. Remember that python has an explicit syntax for it, e.g. A[::2], while Swift has not.
That is why I decided to have subscripts that only reduce the dimension of the array by one, allowing to overload the subscript for each type, UnboundedRange, Int, PartialRangeUpTo and so on. Also strides can be added easily as optional second parameter, e.g. A[...,2].
This approach, however, requires to save how many dimensions have been subscripted in the ArraySlice object (see sliced). The example reduces to:

let b = A[...][0][0..<5]

let A1 = A[...] // sliced: 1
let A2 = A1[0] // sliced: 2
let A3 = A2[0..<5] // sliced: 3

This approach has two syntactical drawbacks one has to get used to:

  1. Assignments must always be done via slices (even if it is just [...]).
 b[...] = a[...]
  1. After slicing something, one needs to convert to an array explicitly to work with it like with a normal array.
let b = NdArray(a[0]) // shallow copy of first element of a

Hope that helps a little, feel free to share your thoughts on how to improve the subscript system.

Just a quick reply to clarify the change that I was proposing to the subscript signature:

// Declare the return type as NdArray<T>.
public subscript(i: Int) -> NdArray<T> {
    // Everything else unchanged, the type returned at runtime is still NdArraySlice<T>.
    get {
        assert(!isEmpty)
        var startIndex = [Int](repeating: 0, count: ndim)
        startIndex[0] = i
        // here we reduce the shape, hence slice = 0
        let slice = NdArraySlice(self, startIndex: startIndex, sliced: 0)
        // drop leading shape 1
        let shape = [Int](slice.shape[1...])
        if shape.isEmpty {
            slice.shape = [1]
            slice.strides = [1]
            slice.count = 1
        } else {
            slice.shape = shape
            slice.strides = Array(slice.strides[1...])
            slice.count = slice.len
        }
        return slice
    }
    set {
        newValue.copyTo(self[i])
    }
}

Unless I missed something, it seems that NdArraySlice doesn't expose new API that would not be accessible by declaring the subscript as NdArray. The change would surely hide the fact that you're getting a slice, which doesn't own the underlying data, but ownership is not entirely clear from the type anyway: any NdArray could both own its data or not.

In any case, thanks for your answer. I now have a better idea of the design constraints you faced when designing this API. I'll give some more thoughts to the problem space, since the forced allocations of slices / arrays you describe have some noticeable runtime costs that would be best avoided (This would be the second issue that I wanted to share with you, I need to put together a demo project that shows the problem in Instruments).

Although it is a bit more than the little change in the NdArray class, it can be easily doable.
I had some doubts if this is a good idea, since it will be less clear when the slicing should be terminated. Meaning

let row0 = A[0]
A = A[...]
let col0 = A[0]

is very bad to read (notice that A = A[...] assigns a slice to the original array which just has the effect that the next subscript goes on the next dimension). On the other hand this is also possible when the subscript returns the NdArraySlice being a subclass of NdArray. The only thing I am not super happy with is that starting and terminating the slicing is not very obvious and it gets even more difficult to recognise when the subscript returns an NdArray itself. For assignment however I totally agree that it is a good idea.
I quickly created a feature branch relax-subscript-type so feel free to play around with it. I will think a bit about it before merging it.

merged #14 for now, since it is a non-breaking change. Might change subscript behaviour in the future to be more expressive in a breaking change.

NdArraySlice will be phased out in scope of #41.