Type promotion is wrong in some cases for tensor - scalar operations
sbrunk opened this issue · 4 comments
The Promoted
type can cause a mismatch with the runtime promotion in certain cases when combining tensors with scalar values.
The reason is, that the type promotion , somewhat unintuitively, does not promote scalars the same way as tensors, but only to the next category. So if we combine a float32
tensor with a double/float64
scalar, the result will be a float32
tensor:
import torch.*
Tensor(1f) + Tensor(1d)
// res1: Tensor[Float64] = tensor dtype=float64, shape=[], device=CPU
// 2.0000
Tensor(1f) + 1d
// res2: Tensor[Float64] = tensor dtype=float32, shape=[], device=CPU
// 2.0000
1f + Tensor(1d)
// res3: Tensor[Float64] = tensor dtype=float64, shape=[], device=CPU
// 2.0000
Tensor(1) + 1d
// res4: Tensor[Float64] = tensor dtype=float32, shape=[], device=CPU
// 2.0000
More details are in the type promotion docs. So we need a diffrent Promoted
type for tensor/scalar ops, taking that into account.
Looks like it's worse actually. This weird promotion rule not only applies to scalar values, but also to scalar tensors (tensors with zero dimensions):
If a zero-dimension tensor operand has a higher category than dimensioned operands, we promote to a type with sufficient size and category to hold all zero-dim tensor operands of that category.
import torch.*
// OK
Tensor(1) + Tensor(1L)
// res0: Tensor[Int64] = tensor dtype=int64, shape=[], device=CPU
// 2
// OK
Tensor(Seq(1, 2)) + Tensor(Seq(1L, 2L))
// res1: Tensor[Int64] = tensor dtype=int64, shape=[2], device=CPU
// [2, 4]
// WRONG inferred promoted type
Tensor(Seq(1, 2)) + Tensor(1L)
// res2: Tensor[Int64] = tensor dtype=int32, shape=[2], device=CPU
// [2, 3]
Which means the promoted type is actually shape dependent in this case. We don't track the shape at compile time and shape can be pretty dynamic so it looks like we we don't have a chance to do this right with the current design.
Any thoughts?
@sbrunk Out of curiosity. Why do you say the last example is wrong? Isn't this a case of broadcasting?
@sbrunk Out of curiosity. Why do you say the last example is wrong? Isn't this a case of broadcasting?
The result is correct, but our promoted type is wrong. It's Int64
but the promoted runtime dtype is int32
.
But only because the tensor on the right is a scalar tensor (has zero dimensions). You can see that in the examples above with two scalar tensors or two non-scalar tensors the promoted type is int64
.
So we have different promoted types depending on the shape of the input tensors. And if I'm not mistaken, we can't do type-level type promotion correctly in this case without also tracking that shape information at compile time
The result is correct, but our promoted type is wrong. It's
Int64
but the promoted runtime dtype isint32
.
Ok, I understand now.
As for adding the rank/shape to the types, I am interested in this. It has been attempted several times. I recall a presentation on the TDM
library. I searched and got the following, possibly interesting, links:
- https://tongfei.me/assets/18nescala-slides.pdf
- https://www.youtube.com/watch?v=d-VbiyLwHYQ
- https://github.com/dieproht/matr
- https://github.com/emptyflash/shapeless-matrix
- https://github.com/tribbloid/shapesafe
May also be of interest:
- https://github.com/breandan/kotlingrad
- http://platanios.org/tensorflow_scala/guides/tensors.html
- https://github.com/pashashiz/scanet3
- https://arxiv.org/pdf/1801.08771.pdf
I am sure their are more out there, but this may help in the initial analysis. Maybe Scala 3's new features may make this possible, but if not done correctly, will be a maintenance headache.
My 2 cents.