bwlewis/irlba

Store scaling parameters when doing PCA with scaling

bapike opened this issue · 0 comments

When scaling, stats::prcomp stores the calculated scaling values in the returned object, while irlba::prcomp_irlba only stores scale=TRUE. This behavior doesn't match up with the documentation for irlba::prcomp_irlba.

Storing the scaling values makes it possible to apply the fitted PCA model object to other datasets.

I'm happy to write a patch, though it looks like pull request #52 rewrites irlba::prcomp_irlba; I haven't checked to see if the problem exists there.

Some code to see the difference:

library(irlba)

set.seed(1234)
r<-100L
c<-10L
M<-matrix(data=runif(r*c),nrow=r,ncol=c)

# scaling and centering
builtin<-prcomp(M,rank.=4,center=TRUE,scale.=TRUE)
str(builtin$scale)  # a numeric vector
summary(builtin$x-( sweep(sweep(M,2,builtin$center),2,builtin$scale,FUN=`/`) %*% builtin$rotation ))

packaged<-prcomp_irlba(M,n=4,center=TRUE,scale.=TRUE)
str(packaged$scale)  # the logical TRUE
scaling<-apply(M,2,sd)
summary(packaged$x-( sweep(sweep(M,2,packaged$center),2,scaling,FUN=`/`) %*% packaged$rotation ))


# just scaling. Uses RMS
RMS <- function (v) sqrt(sum(v^2)/(length(v)-1))
builtin<-prcomp(M,rank.=4,center=FALSE,scale.=TRUE)
str(builtin$scale)  # a numeric vector
summary(builtin$x-( sweep(M,2,builtin$scale,FUN=`/`) %*% builtin$rotation ))

packaged<-prcomp_irlba(M,n=4,center=FALSE,scale.=TRUE)
str(packaged$scale)  # the logical TRUE
scaling<-apply(M,2,RMS)
summary(packaged$x-( sweep(M,2,scaling,FUN=`/`) %*% packaged$rotation ))