r-lib/rray

Discussion) Use ALTREP attribute setting for `rray_reshape()` on R >= 3.6.0

Opened this issue · 2 comments

Until the interesting ideas of the "compiler" branch are not incorporated into rray, I wonder if small optimizations like this would be worth it:
rray_reshape() can be easily implemented in R as long as no broadcasting is asked. Doing it this way, not only we would get rid of the calls down to c++, but, more importantly, no data would be copied as it entails just attributes changes

library(rray)
library(lobstr)
reshape2<-function(x,d)if(prod(dim(x)) == prod(d)){
  attr(x,"dim") <- d
  return(x)
} else rray_reshape(x,d)
bigrray <- rray(rnorm(1e6),c(10,100,1000))
obj_size(bigrray)
#> 8,001,456 B
x <- rray_reshape(bigrray, c(1000,100,10))
# There was duplication:
obj_size(bigrray,x)
#> 16,002,144 B
y <- reshape2(bigrray, c(1000,100,10))
# But not here in R3.6.0:
obj_size(bigrray,y)
#> 8,001,760 B

Created on 2019-06-01 by the reprex package (v0.2.1)

I'm fairly certain dim names are lost when you set the dim attribute, so we'd have to be careful about that, but R 3.6 does provide some chances for speedup (specifically, it really helps performance of rray objects since we have to maintain attributes after every operation. Basically on < 3.6, adding 2 rray objects would make 2 copies (one for the bare array result, and one for the result + restored attributes, and on 3.6 it only makes 1)

Just as an FYI, reshaping will never broadcast. This line prod(dim(x)) == prod(d) is actually a requirement of a reshape. It cannot add elements to the array, it only reshapes existing ones. That means that if we went this route, existing base R dim setting behavior could handle all of this for us and we wouldn't need xtensor for reshaping.

I'm not quite sure if I want to do this yet, because I'm fairly certain attr(x,"dim") <- d makes two copies on R < 3.6, and I have made a lot of effort to ensure that < 3.6 has decent performance too. We could work around this in a number of ways, but I would just want to ensure that I have all the corner cases covered.

But it is definitely something to keep in mind. If done correctly it could add a nice performance boost to reshaping

Wow! You couldnt have documented the function better, but It seems that of these 6 lines I read only the first five, (and not very carefully) :
#' # You cannot reshape to a total size that is
#' # different from the current size.
#' try(rray_reshape(x, c(6, 2)))
#'
#' # Note that you can broadcast to these dimensions!
#' rray_broadcast(x, c(6, 2))
;)