The result is NA
Knight1995 opened this issue · 12 comments
You should start from the warning messages. X.sub
is probably shorter than K1[, 2]
.
Also, ind
spans the column indices by default in big_apply()
, but here you're using it for the rows.
Thanks for your quick reply! Actually,I have edited my code to calculate each row's result,but the result is also NA.
colmeans <- big_apply(X1, ind = rows_along(X),function(X, ind) {
X.sub <- X[ind,1]
K1<-map_dfr(unique(X[,1]),function(i){
S1 <-mean(Y[which(X[,1]==i),1])
data.frame(Value=S1,clu=i)
})
a<-K1[which(K1[,2]==X.sub),1]
b<-min(K1[which(K1[,2]!=X.sub),1])
si=(b-a)/max(b,a)
return(si)
}, a.combine = 'c')
Yes, cf. my first comment.
No, your code doesn't work when using ind <- 1
.
It is just that X.sub
is of length 1 and gets automatically recycled to match the size of K1[, 2]
.
Which is probably not what you want.
You need to think about what you are trying to achieve here.
If I had to guess, I would say that you need to subset K1[ind, 2]
.
Thanks for your reply. In order to find out the problem,i try a simple test as following.I think it may be that I didn't input one of the two variables, Y, so there is no result. But after I rewrite the code like your multivariate format (https://privefl.github.io/bigstatsr/articles/big-apply.html) , there is still no result output, which is very wired.Could you give me some suggestions? Thanks.
- Properly passing other variables as arguments of the function is necessary only when using parallelism.
- Doing
mean(Y[-ind, ])
is very odd (especially the minus). What are you trying to achieve here (in simple English)? - What do you have for
summary(Y)
?
-
I didn't get that
Y
was also an FBM. Thensummary(Y[])
. -
You understand that
ind
is usually a vector of multiple indices, not just one, right? -
And you want the full
mean()
of the matrix? Not something like therowMeans()
?
Yes, I probably understand what you mean. I tested the simple example above to know how to rewrite the a.FUN in big_apply step by step.My original R code is below. Because the matrix is too big and it runs too slowly, I want to realize this function by using big_apply.cluster_info and dist, which are the original matrix. Their row names and number of rows are the same.
K3<-future_map_dfr(seq(ncol(cluster_info)),function(Y){
K2<-map_dfr(seq(nrow(cluster_info)),function(index){
x <-cluster_info[,Y]
dist2 <- as.data.frame(cbind(x,dist))[-index,]
K1<-map_dfr(unique(x),function(i){
d<-mean(dist2[which(dist2$x==i),index+1])
#d<-sum(dist2[ which(dist2$x==i),index+1])/length( which(dist2$x==i))
data.frame(Value=d,clu=i)
})
si <- (min(K1[K1$clu!=x[index],]$Value)-K1[K1$clu==x[index],]$Value)/max(min(K1[K1$clu!=x[index],]$Value),K1[K1$clu==x[index],]$Value)
if(is.na(si)){
data.frame(cluster=x[index],sil_width=0)
}else{
data.frame(cluster=x[index],sil_width=si)
}
})
data.frame(Resolution=colnames(cluster_info)[Y],silhouette_score=mean(K2$sil_width))
})
I don't get what you're trying to achieve here; sorry I cannot help.