ThoughtWorksInc/Compute.scala

Add Tensor#toSeq and Tensor#toArray methods

Closed this issue · 26 comments

Atry commented

We need Tensor#toSeq and Tensor#toArray methods for creating n-dimensional scala.collection.Seq or scala.Array, as the reverse conversion of Tensor.apply.

Atry commented

This method can be implemented from existing Tensor#flatArray and Tensor#shape

Hey! Taking a look at this as a first issue.

For the toArray method, we could do something like:

def f(flatArray:Array[A], shape:Array[Int]):Array[B]= {
    // if desired shape is 1d, we're done
    if (shape.length == 1){
        flatArray
    } else {
        // desired shape must match number of elements
        if (shape.product != flatArray.length){
            throw new IllegalArgumentException
        }
        // pick off last dimension, partitioning into slices
        val oneReduced = (0 until shape.product by shape(shape.length-1)).map {
            i => flatArray.slice(i, i + shape(shape.length-1))
        };
        f(oneReduced.toArray, shape.slice(0, shape.length-1))
    }
}   

(with the appropriate types A&B worked out)

But in https://github.com/ThoughtWorksInc/Compute.scala/blob/0.4.x/Tensors/src/main/scala/com/thoughtworks/compute/Tensors.scala#L1105
the output of flatArray is a Future. So do we want to ensure that result is computed before being passed into this helper function, or should this function also deal with and return a Future?

Atry commented

Welcome!
I think it should be a Future, since it is a slow action. For now, all slow actions are Future or Do, except toString, because toString is an overridden method.

Atry commented

But there are other considerations.

  1. Since toArray or toSeq in Scala collection library is not asynchronous, the name toArray will surprise people if it returns a Future
  2. What is the type of B? How to check the type?

I see. One option is two different methods.

It would return an Array of Floats or possibly an Array of Arrays (of either Arrays or Floats). So we could define a custom type or just use Either.

Atry commented

Given there are too many possible dimensions, it is hard to be represented in Either.

def readScalar: Future[Float]
def read1DArray: Future[Array[Float]]
def read2DArray: Future[Array[Array[Float]]]
def read3DArray: Future[Array[Array[Array[Float]]]]

We probably want the ability to work with n-dimensional Tensors / Arrays, right?

Atry commented

That's the purpose of this issue

Yeah - I was trying to say that we can't explicitly give read2DArray,read3DArray, etc. since we want the ability to work with any number of dimensions.

Atry commented

If we want to avoid read2DArray, read3DArray, then a type class for arbitrary dimensions is required.

def read[Out](implicit tensorReader: TensorReader[Out]): Future[Out]
// Usage

tensor1.read[Float]
tensor2.read[Seq[Array[Float]]
tensor3.read[Vector[List[Array[Float]]]

Okay I'll give that a try!

Atry commented

Another option is returning an Any. It is not type safe on dimensions but it can be understood since Tensor is not type safe on dimensions as well.

Yeah I was originally thinking something like Array[Either[Float, Array[A]]] forSome {type A}

Atry commented

Either has to be recursive

def read: T forSome { type T <: Either[Float, Array[T]] }
Atry commented

However it is very inefficient to create an Array[Left[Float]].

So if the user is calling toArray, they probably want a result of type Array[Array[...Array[Float]...], right? So any use of Either type or custom class doesn't really solve the problem, right? And if we return Any, that's also no good:

def f : Any = {Array(2,3)}
var y = f;
y(0)
//error: scala.this.Any does not take parameters

and we have a similar problem if we use Array[Any].

I see why they only defined Array.ofDim for up to 5 arguments.

relevant: https://stackoverflow.com/questions/30623062/6-or-more-dimensional-arrays-in-scala

Atry commented

In this:

def reshapeArray(a: Any, b:Array[Int]): Any = {
  if (b.length == 1) {
    a.asInstanceOf[Array[Any]]
  } else {
    val last = b(b.length - 1)
    val oneReduced = Array.tabulate(last)(i => a.asInstanceOf[Array[Any]].slice(i*last, (i+1)*last))
    reshapeArray(oneReduced, b.slice(0, b.length-1))
  }
}

def toArray: Future[Any] = {
  flatArray.flatMap((z) => Future {reshapeArray(z,shape)})
}

I'm running into problems with

[error]  found   : Any
[error]  required: com.thoughtworks.tryt.covariant.TryT[com.thoughtworks.continuation.UnitContinuation,Any]

Is this related to your version of Future instead of scala.concurrent.Future?

Atry commented
flatArray.map { z => reshapeArray(z, shape) }

Try map.

Thanks - that indeed does compile. Now to write tests (and actually have them pass) 😄

Atry commented

Hint: you can use grouped.toArray / grouped.toSeq instead of tabulate and slice.

Noted about grouped.

In writing some tests I ran into some runtime problems regarding types. I've fixed that, but again I'm running into the same problem, i.e. flatArray returns a thoughtworks Future, but I can't seem to either pass that as an argument to reshapeArray, nor can I write a callback on such a thing. map and flatMap don't seem to work.

I'm getting:

found   : com.thoughtworks.future.Future[Array[_]]
[error]     (which expands to)  com.thoughtworks.future.opacityTypes.Future[Array[_]]
[error]  required: Array[_]

map works with scala.concurrent.Future. I can't see any relevant examples in the documentation.

Atry commented

According the this Scaladoc, you need imports to make map and flatMap available

That is already imported in Tensors.scala.

Atry commented

The error message looks like you are calling a function that accepts an Array while you provides a Future[Array[_]]. Try asking the question on StackOverflow with a minimal reproducible example.

Atry commented

Working with Future or other monadic data type is difficult, because you have to use nested higher ordered functions everywhere.

You can use Each or Dsl.scala to ease the problem.