sbrunk/storch

Type promotion is wrong in some cases for tensor - scalar operations

Opened this issue · 4 comments

sbrunk commented

The Promoted type can cause a mismatch with the runtime promotion in certain cases when combining tensors with scalar values.

The reason is, that the type promotion , somewhat unintuitively, does not promote scalars the same way as tensors, but only to the next category. So if we combine a float32 tensor with a double/float64 scalar, the result will be a float32 tensor:

import torch.*

Tensor(1f) + Tensor(1d)
// res1: Tensor[Float64] = tensor dtype=float64, shape=[], device=CPU 
// 2.0000

Tensor(1f) + 1d
// res2: Tensor[Float64] = tensor dtype=float32, shape=[], device=CPU 
// 2.0000
1f + Tensor(1d)
// res3: Tensor[Float64] = tensor dtype=float64, shape=[], device=CPU 
// 2.0000
Tensor(1) + 1d
// res4: Tensor[Float64] = tensor dtype=float32, shape=[], device=CPU 
// 2.0000

More details are in the type promotion docs. So we need a diffrent Promoted type for tensor/scalar ops, taking that into account.

sbrunk commented

Looks like it's worse actually. This weird promotion rule not only applies to scalar values, but also to scalar tensors (tensors with zero dimensions):

If a zero-dimension tensor operand has a higher category than dimensioned operands, we promote to a type with sufficient size and category to hold all zero-dim tensor operands of that category.

import torch.*

// OK
Tensor(1) + Tensor(1L)
// res0: Tensor[Int64] = tensor dtype=int64, shape=[], device=CPU 
// 2

// OK
Tensor(Seq(1, 2)) + Tensor(Seq(1L, 2L))
// res1: Tensor[Int64] = tensor dtype=int64, shape=[2], device=CPU 
// [2, 4]

// WRONG inferred promoted type
Tensor(Seq(1, 2)) + Tensor(1L)
// res2: Tensor[Int64] = tensor dtype=int32, shape=[2], device=CPU 
// [2, 3]

Which means the promoted type is actually shape dependent in this case. We don't track the shape at compile time and shape can be pretty dynamic so it looks like we we don't have a chance to do this right with the current design.

Any thoughts?

hmf commented

@sbrunk Out of curiosity. Why do you say the last example is wrong? Isn't this a case of broadcasting?

sbrunk commented

@sbrunk Out of curiosity. Why do you say the last example is wrong? Isn't this a case of broadcasting?

The result is correct, but our promoted type is wrong. It's Int64 but the promoted runtime dtype is int32.

But only because the tensor on the right is a scalar tensor (has zero dimensions). You can see that in the examples above with two scalar tensors or two non-scalar tensors the promoted type is int64.

So we have different promoted types depending on the shape of the input tensors. And if I'm not mistaken, we can't do type-level type promotion correctly in this case without also tracking that shape information at compile time

hmf commented

The result is correct, but our promoted type is wrong. It's Int64 but the promoted runtime dtype is int32.

Ok, I understand now.

As for adding the rank/shape to the types, I am interested in this. It has been attempted several times. I recall a presentation on the TDM library. I searched and got the following, possibly interesting, links:

  1. https://tongfei.me/assets/18nescala-slides.pdf
    1. https://arxiv.org/pdf/1710.06892.pdf
    2. https://tongfei.me/nexus/
  2. https://www.youtube.com/watch?v=d-VbiyLwHYQ
    1. https://github.com/AnnabelleGillet/TDM
    2. https://hal.science/hal-03073789/file/IDEAS_2020_TDM_final.pdf
  3. https://github.com/dieproht/matr
  4. https://github.com/emptyflash/shapeless-matrix
  5. https://github.com/tribbloid/shapesafe

May also be of interest:

  1. https://github.com/breandan/kotlingrad
  2. http://platanios.org/tensorflow_scala/guides/tensors.html
  3. https://github.com/pashashiz/scanet3
  4. https://arxiv.org/pdf/1801.08771.pdf

I am sure their are more out there, but this may help in the initial analysis. Maybe Scala 3's new features may make this possible, but if not done correctly, will be a maintenance headache.

My 2 cents.